POPFile remove_ignorewords Utility

This utility is deprecated and no longer supported. It will not function properly with versions higher than 0.20.x.

The remove_ignorewords utility is a tool for use with POPFile to synchronize your corpus with recently added ignore words by removing those words from all of your corpus buckets.

Additionally, it accepts a filename on the command line that can contain words that you want removed from all corpus buckets. This feature can be helpful in cases where corrupt 'words' have crept into your corpus and you would like to clean them out without the hassle of manually editing corpus files.

This version has been tested in a Windows environment with versions 0.18.1 and 0.19.0 of POPFile. The author believes that the utility is platform independent and will work properly on non-Windows POPFile installs, but has not tested on those platforms.

POPFile is an automatic email classification tool authored by John Graham-Cumming available from SourceForge.

Instructions for use

Use this utility at your own risk! In order to remove words from your corpus, it must both read in your corpus table and most importantly replace your corpus table by overwritting it with a new corpus table that does not include the words you are removing. You should always take a full backup of your POPFile folder and all sub-folders to protect yourself from data loss.
  1. Download the script to your POPFile install directory, normally c:\Program Files\Popfile by clicking here.

  2. Shutdown POPFile. POPFile should not be running when you run this utility since it modifies the corpus table files used by POPFile.

  3. Open a DOS Command box (click the DOS icon on your desktop or Start/Run and type command in the open box and click ok).

  4. Change to your POPFile installation directory, e.g.,

    cd  "\program files\popfile"
    

  5. Run remove_ignorewords.pl using Perl.

    perl remove_ignorewords.pl > report.txt
    

  6. The resulting diagnostic report will be in the file named 'report.txt', open it with a text editor such as notepad.

    start notepad.exe report.txt
    

  7. Restart POPFile. Visit the buckets page in the UI, your word counts and number of words per bucket will have changed to reflect the removed ignore words.

Note: To remove a list of words not on your ignore list, simply create a text file containing those words, one word per line, and feed the file to remove_ignorewords by putting the filename on the command line as shown below.

perl remove_ignorewords.pl filename >report.txt
The above would remove all words that were found in the file 'filename'.

Copying

Copyright (C) 2003 Scott W. Leighton

Licensed under the terms of the GNU General Public License.

Contributed to the POPFile project under the terms of the POPFile License Agreement.


Back to POPFile Utilities