POPFile classify_eudora_mbx Utility

This utility is deprecated and no longer supported. It will not function properly with versions higher than 0.20.x.

classify_eudora_mbx.pl --- Process Eudora MBX files using POPFile's Classifier::Bayes and emit pf_[bucketname].mbx files for use by Eudora.

Audience

Windows Users of POPFile and Eudora who have a large number of historical emails retained in their Eudora mailboxes and wish to reclassify those emails using the POPFile Bayesian classifier.

It is expected that this is a small number of users.

It is also expected that this would be a one time event, e.g., a new user to POPFile wants to get his/her Eudora mailboxes under control and wants to classify the historical mail into a number of buckets.

Usage

       perl classify_eudora_mbx.pl 

Description

This is a utility function for Eudora users of POPFile who wish to run the mail that was in their Eudora mailbox's, before they began using POPFile, thru POPFile's Bayesian classifier to separate that mail by the buckets defined in POPFile.

This utility is intended for ADVANCED USERS who are familiar with the DOS shell, DOS commands, the concepts of copying and moving files, and understand directory structures. If that's not you, do not use this utility.

What it does not do

This utility does NOT send your historical mail thru POPFile, instead it simply uses POPFile's Bayesian classifier to, in essence, rearrange your mail in your Eudora mailboxes.

There are NO OPTIONS TO RECLASSIFY. For this reason, you would NOT run this utility until your POPFile corpus were mature enough that you can comfortably rely on it's classification.

Most users should not use this utility until their POPFile installation has classified at least 100 emails. That initial training is required to mature your corpus to the point where you can rely on it. Users that have complex setups, e.g., many buckets, will want to wait even longer to get a mature corpus.

Installation

Download the program here.

classify_eudora_mbx.pl must be installed to and run from the POPFile installation directory, normally c:\program files\popfile\

Recommended Usage Instructions

** Before starting, make sure you have a backup of both your Eudora installation and your POPFile installation. **

  1. Prepare your Eudora client by creating a mailbox to contain all of the mail you want to run thru the utility. This can be an existing mailbox (but we do not recommend using "in") or the preferred method is to create a new mailbox (see Technical Notes below regarding the loss of TOC flags for the reason why). In these instructions, we will assume you are using the preferred method and have created a new mailbox named '4popfile'.

  2. Transfer all historical mails you want to classify to the '4popfile' mailbox (either by drag and drop or by selecting and clicking transfer on the menu).

  3. The utility will create mailboxes in the form of:

             pf_[bucketname]
    
    where [bucketname] is the name you defined in POPFile on the Buckets screen. Make sure that you do NOT have any pre-existing Eudora mailboxes with these names, if you do, rename them before proceeding.

  4. Close Eudora. This will ensure that Eudora does not modify any of the mailbox files while you are running the utility.

  5. Open a DOS command box and change to the POPFile installation directory, usually c:\program files\popfile

             cd "c:\program files\popfile"
    

  6. Run the utility program with the location of your Eudora installation, usually c:\program files\qualcomm\eudora mail\ and name of the Mailbox file (the box name in Eudora plus the extension of .mbx).

             perl classify_eudora_mbx.pl "c:\program files\qualcomm\eudora mail\4popfile.mbx"
    
    The utility will parse the mbx file and pass each email it finds to POPFile's Bayesian classifier, where it will be assigned a bucket classification. The utility will then output the classified email to a new Eudora compatible mbx file named after the bucket, e.g., for a mail classified to the POPFile spam bucket, the output mbx will be named
             pf_spam.mbx
    
    Depending on the number of emails, the utility could take some time to complete.

  7. When the utility finishes, you will have .mbx files in the POPFile directory, one for each bucket that emails were classified into. Move those .mbx files to your Eudora directory. Using the defaults mentioned earlier, the DOS command would be:

             move "c:\program files\popfile\*.mbx" "c:\program files\qualcomm\eudora mail\"
    

  8. Exit the DOS box and start Eudora. Open the pf_[bucketname] folders to see the results. NOTE: Eudora TOC files are not created by this utility, so the first time you open the newly created mailbox in Eudora, you'll see a message that Eudora is creating the TOC.

  9. Assuming you are happy with the results, close Eudora then delete the original 4popfile.mbx file and its TOC file, 4popfile.toc, in the Eudora installation directory. IMPORTANT: DO NOT DELETE IT FROM WITHIN EUDORA, doing so risks deletion of any attachment files associated with emails in that folder since Eudora rightfully assumes that a message being deleted can have its attachment also deleted. Eudora has no idea that a second copy of the message exists in the pf_ folder.

  10. reopen Eudora. You're done.

Technical Notes

     Eudora TOC Files and Message Status Flags
     -----------------------------------------

     Eudora stores mailboxes in two separate files, the .mbx file
     and the .toc file. Each mailbox will have those two files
     and will be named with the mailbox name you see in Eudora,
     e.g., the 'in' box has two files on disk:

     in.mbx
     in.toc

     The .toc file contains pointers and flags used by Eudora
     to determine such things as whether or not the email
     message was 'sent' or 'read' or 'unread', etc. THIS UTILITY
     DOES NOT TOUCH THE .TOC FILES. In particular, when the
     utility creates the output pf_[bucketname].mbx file, it
     does NOT create a .toc file to go along with it. For that
     reason, when the pf_[bucketname].mbx is openned by Eudora,
     Eudora will detect the lack of a .toc and it will create
     one that DOES NOT HAVE ANY FLAG SETTINGS FOR THE MESSAGES.

     To make that clear, if you run an in box thru the utility
     and messages in the in box were flagged 'unread', the
     output pf_[bucketname].mbx will loose those flags on
     those messages. If retaining the flag settings is important
     to you, DO NOT USE THIS UTILITY.

Copying

Copyright (C) 2003 Scott W. Leighton

Licensed under the terms of the GNU General Public License.

Contributed to the POPFile project under the terms of the POPFile License Agreement.


Back to POPFile Utilities