Spam Filter
by Roedy Green ©1996-2009 Canadian Mind Products
This essay does not describe an existing computer program, just
one that should exist. This essay is about a suggested
student
project in Java programming. This essay gives a rough overview of how it
might work. I have
no source, object, specifications, file layouts or
anything else useful to implementing this project. Everything I have to say to
help you with this project is written below. I am
not prepared to help
you implement it; I have too many other projects of my own.
I do contract work for a living, which could include writing a program such as
this. However, I don’t do people’s homework
for them. That just robs them of an education.
You have my full permission to implement this project in any way you please and
to keep all the profits from your endeavor.
Introduction
There are many spam filtering programs, however I have not found a
single one that worked satisfactorily. What makes this suggested student project different?
- It is user-extensible.
- It is written in pure Java, so runs almost anywhere.
- You can use it in four modes.
The Four Modes
- As a parallel email client. It goes and gets the mail before your email program does and deletes mail from the server
that is spam, and marks mail on the server that is likely spam. The advantage of this is simplicity. You don’t
have to reconfigure you mail clients. The disadvantage is that spam can leak through in the time between you run the
filter and the time you run you email program. The other disadvantage is you have to turn off automatic mail fetch in
your email program to ensure the spam filter has completed.
- As a POP3 proxy. Your email program fetches mail from the spam filter and the spam filter fetches it from the server.
The disadvantage of this approach is you have to configure your email program to use the proxy filter for receiving and
the ordinary mail server for sending. You can’t do that with some email programs such as Eudora 6.
- As a POP3/SMTP proxy. Your email program fetches mail from the spam filter and the spam filter fetches it from the
server. The disadvantage of this approach is you have to configure your email program to use the filter. It is
inefficient to send outgoing mail through the filter.
- As a server application. The program runs on the server and deletes spam without even having to download it. The
disadvantage is you need permission from your ISP or Webserver admit to run the filter this way.
User Extensible
You can write your own spam filters that are applied just like built-in ones. All you need do is write Java code that
implements this interface:
Possible Filters
Some of the filters you could write using this interface include:
- hook into Vipul’s razor.
- List of spam words
- list of spam phrases, perhaps with weights.
- Filter that extracts everyone in your address book as a friend.
- Filter that extracts everyone you have sent mail to recently as a friend.
- Baysian filtering.
- Neural net.
- Avoid languages you don’t speak.
- Something to deal with a particular virus’s junk mail.
How Implemented
A simple filter that acts as a parallel email client can be implemented using JavaMail fairly easily. Each filter can
extract information from its own configuration file, or from -D system propertise.
It is simpler than most such applicaions, since it has no user interface, just a configuration file. It is intended to
be used by Java programmers, who then may configure it and install it for their technopeasant friends and customers.
Standard Inferface
What need are two interfaces for the spam filter.
One is a Java Interface, that gets given a Javamail MimeMessage and returns a percentage
likely this is spam.
Another is a socket protocol where the message gets sent to a socket and gets back a rating. The filter can then be
written in any language.
you need some way to register the existence of multiple spam filters.
Then any email program can plug into any combination of spam filters.