POPFile is an automatic email classification tool authored by John Graham-Cumming available from SourceForge.
The utilities here are helper utilities primarily intended for advanced users of POPFile. Although developed on the Windows platform, the author believes they are platform independent and will work properly on non-Windows POPFile installs, but has not tested on those platforms. UPDATE: June 19, 2003 A GNU Linux user, Martin Geisler, reports these utilities seem to work fine under GNU Linux, his report is here. Thanks Martin!
topten enhanced - reporting tool to list the top ten words in each bucket ranked by both probability and word count using html output. This version is compatible with POPFile version 0.19.x or higher. Updated: January 25, 2004
snapshot_stats - A script that takes a "snapshot" of your POPFile accuracy statistics and places that data in an Excel compatible CSV file. Intended for advanced users. This utility requires POPFile version 0.19.0 or higher. Updated: January 25, 2004
dump_corpus_2csv - A utility that dumps your entire POPFile corpus to an excel compatible CSV file where it can be analyzed and manipulated with excel. Compatible with POPFile versions 0.19.x or higher. Updated: January 25, 2004
corpus_diff - A utility intended for developers or those advanced POPFile users who wish to analyze the difference between a reference corpus and the current corpus. The diff will show all deleted, changed and added entries. Compatible with POPFile versions 0.19.x and 0.20.x only. Deprecated, no longer supported.. New: June 25, 2003
skeleton - Example program using the POPFile API to do commandline programs for POPFile. Intended for programmers interested in writing commandline programs for POPFile. Deprecated, no longer supported.. Updated: June 24, 2003
mail_bucket_stats - This HOWTO explains how to use two widely used utilities to regularly grab a copy of your buckets page by email so you can retain a history of your POPFile statistics over time. Updated: May 24, 2003
remove_ignorewords - a utility for advanced users. Synchronizes your corpus files with your ignore words list by removing them from the corpus. Handy for when you have added a bunch of new ignore words. Deprecated, no longer supported.. Updated: May 23, 2003
topten - diagnostic tool to list the top ten words in each bucket ranked by word count. Deprecated, no longer supported.. Updated: June 22, 2003
pfdiagnose - looks for problems with POPFile's configuration parameters and installed files. Deprecated, no longer supported.. Updated: June 21, 2003
ck_corpus - checks for bad entries in the corpus word files. Deprecated, no longer supported.. Updated: May 24, 2003
classify_eudora_mbx - takes an input Eudora mbx file runs it thru POPFile's Bayes classifier and emits Eudora mbx files corresponding to the buckets that POPFile classified the input mail messages into. Deprecated, no longer supported.. Updated: May 25, 2003
test_arch_msg - intended for developers who have turned on archiving in POPFile. This utility takes the archived messages and passes them back thru the Bayes classifier reporting any messages that now classify to a bucket different from the bucket the archive message was found in. Deprecated, no longer supported.. Updated: May 26, 2003
Unless otherwise noted these utilities have been tested on and are compatible with POPFile versions 0.18.1, 0.19.x and 0.20.x. They have not been tested on earlier versions and their use with earlier versions is not recommended. Unless noted, they are not tested with, and may not be compatible with, Bleeding Edge CVS versions of POPFile, use at your own risk.
Copyright (C) 2003 - 2007 Scott W. Leighton
POPFile Utilities are licensed under the terms of the GNU General Public License.
Contributed to the POPFile project under the terms of the POPFile License Agreement.