Spam Filter
by Roedy Green ©1996-2008 Canadian Mind Products
This essay is about a suggested
student project in
Java programming. This essay gives a rough overview of how it might work. It
does not describe an actual complete program. I have
no source, object,
specifications, file layouts or anything else useful to implementing this
project. Everything I have to say to help you with this project is written below.
I am
not prepared to help you implement it; I have too many other
projects of my own.
I do contract work for a living, which could include writing a program such as
this. However, I don’t do people’s homework
for them. That just robs them of an education.
You have my full permission to implement this project any way you please.
Introduction
There are many spam filtering
programs, however I have not found a single one that worked satisfactorily. What
makes this suggested student project different?
- It is user-extensible.
- It is written in pure Java, so runs almost anywhere.
- You can use it in four modes.
The Four Modes
- As a parallel email client. It goes and gets the mail before your email program
does and deletes mail from the server that is spam, and marks mail on the server
that is likely spam. The advantage of this is simplicity. You don’t have
to reconfigure you mail clients. The disadvantage is that spam can leak through
in the time between you run the filter and the time you run you email program.
The other disadvantage is you have to turn off automatic mail fetch in your
email program to ensure the spam filter has completed.
- As a POP3 proxy. Your email program fetches mail from the spam filter and the
spam filter fetches it from the server. The disadvantage of this approach is you
have to configure your email program to use the proxy filter for receiving and
the ordinary mail server for sending. You can’t do that with some email
programs such as Eudora 6.
- As a POP3/SMTP proxy. Your email program fetches mail from the spam filter and
the spam filter fetches it from the server. The disadvantage of this approach is
you have to configure your email program to use the filter. It is inefficient to
send outgoing mail through the filter.
- As a server application. The program runs on the server and deletes spam without
even having to download it. The disadvantage is you need permission from your
ISP or Webserver admit to run the filter this way.
User Extensible
You can write your own spam filters that are applied just like built-in ones.
All you need do is write Java code that implements this interface:
Possible Filters
Some of the filters you could write using this interface include:
- hook into Vipul’s razor.
- List of spam words
- list of spam phrases, perhaps with weights.
- Filter that extracts everyone in your address book as a friend.
- Filter that extracts everyone you have sent mail to recently as a friend.
- Baysian filtering.
- Neural net.
- Avoid languages you don’t speak.
- Something to deal with a particular virus’s junk mail.
How Implemented
A simple filter that acts as a parallel email client can be implemented using
JavaMail fairly easily. Each filter can extract information from its own
configuration file, or from -D system propertise.
It is simpler than most such applicaions, since it has no user interface, just a
configuration file. It is intended to be used by Java programmers, who then may
configure it and install it for their technopeasant friends and customers.
Standard Inferface
What need are two interfaces for the spam filter.
One is a Java Interface, that gets given a Javamail MimeMessage
and returns a percentage likely this is spam.
Another is a socket protocol where the message gets sent to a socket and gets
back a rating. The filter can then be written in any language.
you need some way to register the existence of multiple spam filters.
Then any email program can plug into any combination of spam filters.