This utility is deprecated and no longer supported. It will not function properly with versions higher than 0.20.x.
The corpus_diff utility is a POPFile tool intended for developers or those advanced POPFile users who wish to analyze the difference between a reference corpus and the current corpus. The diff will show all deleted, changed and added entries.
The utility is not intended for the casual POPFile user. It requires a good understanding of using command line programs, the directory structure, and of how to copy files from directory to directory.
Corpus_diff is compatible with versions 0.19.0 or higher of POPFile.
POPFile is an automatic email classification tool authored by John Graham-Cumming available from SourceForge.
Download the script;
Create a reference corpus by copying your existing corpus folder to a folder named corpus.bak (replace the word corpus with the name of your corpus folder if you changed it from the default. This reference corpus will be the corpus that corpus_diff compares against.
After reclassifying mail, you can, at will, run the corpus_diff utility to see the changes since your referenece corpus.
Open a DOS box and change to your POPFile directory.
run corpus_diff
perl corpus_diff.pl > diff.htm
View the results via your browser, either browse to your POPFile directory and open diff.htm, or, type
start diff.htmat the DOS prompt to startup the browser and display diff.htm.
The following is a sample of the output from corpus_diff run against the author's corpus on June 25, 2003.
Copyright (C) 2003 Scott W. Leighton
Licensed under the terms of the GNU General Public License.
Contributed to the POPFile project under the terms of the POPFile License Agreement.