How to create a cronjob to use sa-learn to teach spamassassin.

This guide is to describe the steps to give you the ability to teach spamassassin what spam is. The guide is for the mbox format, but can be adjusted to Maildir by making a few alterations.

This will assume that you've already installed spamassassin, and have it running in it's default state. The domain will be called “domain.com” for system account “username”, and email user 'bob”. The folders will be called teach-isspam and teach-isnotspam, but you can use whatever you want.

1) Create a new imap folder that you can place your spam into, and a new folder that you can place false positives into (so it doesn't tag them again)

cd /home/username/imap/domain.com/bob/mail
touch teach-isspam teach-isnotspam
chown username:username teach*
chmod 600 teach*
cd ..
echo "teach-isspam" >> .mailboxlist
echo "teach-isnotspam" >> .mailboxlist

Now you have 2 new mailboxes under user bob when accessed via IMAP (squirrelmail or roundcube). Test these out, try putting messages in them to ensure they function correctly.

2) Now that you have the ability to put your spam and non-spam messages into their correct places, you'll need to setup a cronjob to check these locations with sa-learn. Create an sh file in /home/username/.spamassassin/teach.sh. In it, put:

#!/bin/sh
FILESPAM=/home/username/imap/domain.com/bob/mail/teach-isspam
FILEHAM=/home/username/imap/domain.com/bob/mail/teach-isnotspam
echo "learning spam via $FILESPAM...";
sa-learn --no-sync --spam --mbox $FILESPAM

echo "";
echo "learning ham via $FILEHAM...";
sa-learn --no-sync --ham --mbox $FILEHAM

echo "";
echo "syncing...";
sa-learn --sync

echo "";
echo "current status:"
sa-learn --dump magic

exit 0;

Save, chmod the teach.sh to 700.

At this point, you should be able to manually run the teach.sh to see if it works. If you test it out, run it as username so that you make sure all files written are chowned to username, and not root.

3) Now to automate the frequent running of the teach.sh so you don't have to run it manually every time. Log into DirectAdmin as username and go to the cronjobs section. Enter the commmand

/home/username/.spamassassin/teach.sh

and for the times, put a * character in all filelds, except for “hour”, put */12 so that it runs twice per day (every 12 hours)

That's it. To use it, if you get email that wasn't tagged as spam, drag it into your teach-isspam folder. If you get email that was tagged as spam that should have been, move it to your teach-isnotspam folder. You can delete the email you've place there aftera day or so, to ensure it's been caught by the sa-learn program. Note that sa-learn can process the same email twice and it won't hurt anything.

If you want to empty these 2 folders after each run, add:

echo -n '' > $FILESPAM
echo -n '' > $FILEHAM

just before the “exit 0;”. This will reset those folders to 0 bytes, so that you don't have to delete the messages after they're processed.

 
email/cronjob.txt · Last modified: 2010/02/22 03:13 by muscardin
 
Except where otherwise noted, content on this wiki is licensed under the following license:CC Attribution-Noncommercial-Share Alike 3.0 Unported
Please visit Automatic Backlinks to start earning free backlinks Recent changes RSS feed Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki