image provider

PAD Spam filter


Disclaimer

This essay does not describe an existing computer program, just one that should exist. This essay is about a suggested student project in Java programming. This essay gives a rough overview of how it might work. I have no source, object, specifications, file layouts or anything else useful to implementing this project. Everything I have prepared to help you is right here.

This project outline is not like the artificial, tidy little problems you are spoon-fed in school, when all the facts you need are included, nothing extraneous is mentioned, the answer is fully specified, along with hints to nudge you toward a single expected canonical solution. This project is much more like the real world of messy problems where it is up to you to fully the define the end point, or a series of ever more difficult versions of this project and research the information yourself to solve them.

Everything I have to say to help you with this project is written below. I am not prepared to help you implement it; or give you any additional materials. I have too many other projects of my own.

Though I am a programmer by profession, I don’t do people’s homework for them. That just robs them of an education.

You have my full permission to implement this project in any way you please and to keep all the profits from your endeavour.

Please do not email me about this project without reading the disclaimer above.

If you run a website such as the ones in this list, you will be bombarded by requests for people to list their programs. The requests arrive in the form of URLs (Uniform Resource Locators) pointing to PAD files. Unfortunately, most of this is junk, a form of spam. Some are advertisements disguised as programs. Some are harmful trojan programs. Some are porn videos. Some are ebooks. Some are useless junk. It takes an inordinate amount of time to sift through this to find suitable programs to list. What the world needs is a filtering mechanism specialised for PAD (Portable Application Description) files. What mechanisms might it use?
  1. A Bayesian filter on the various PAD fields.
  2. A blacklist/whitelist of hosting websites.
  3. A collaborative list of judgements on various PAD URLS, websites and other identifying indicators. Participants see a histogram of how other sites adjudicated the PAD.
  4. A PAD verifier to make sure all fields are present and correctly filled in.
  5. A list of certified URLs that you personally research and guarantee to be spam free, virus free, trojan free. You might select PADs (Portable Application Descriptions) for this treatment that others have rated highly.

Rules

Cheating

If we discover someone cheating, by their ratings being much higher or lower than average, we ban them and effectively withdraw all their adjudications, but keep URLs they submitted (most likely with bad ratiings). We alse catch cheaters who rate PADs highly and don’t list them themselves, or who rate them as stinkers and list them anyway.

Implementation

This program would most easily be implemented with a Java Web Start interface running on each client and an SQL (Standard Query Language) database running on a server, exchanging binary messages. The PAD site manager would feed it lists of PADs or PAD URLs. For tighter integration, it might have a Servlet interface and decide the instant a new PAD arrives or even before it is submitted (based on IP (Internet Protocol)) where the PAD is acceptable. PAD sites tend to be written in PHP (Pre-Hypertext Processor). To integrate, PAD sites would have to host a Java server as well as a PHP server. This could easily be too technically challenging to be acceptable.

Likely you would not write a Bayesian filter or PAD verifier from scratch. Generally merging existing software requires more skill than writing from scratch.

However, the biggest hurdle is political. How do you get websites to use it? How do you get them to share any information with the competition? I have seen this problem before with the Phoenix project. Its job was to help NGOs in Africa coordinate their development efforts by letting everyone know what everyone else was doing in any given area. Everyone said it was a wonderful idea. But it turned out, everyone wanted the information about what others were doing but were completely unwilling to share what they were doing. I was shocked at the pettiness of the Red Cross and similar organisations. Web site owners might be equally reluctant to share any information. So you might start the project without any collaboration features and add them gradually.

Before embarking on this, you might write all the PAD vendors in the hassle free list or the minor hassle list to see if they would be interested in the service, what they might be willing to pay and what else they need.

Other than offering the system free, you might offer it in return for a plug on the submissions page. Here is a great place to tout your company’s expertise to software developers.

Bayesian spam filtering
PAD
PAD submission
PadLibrary blacklist: domains that send spam, ads and other objectionable PADs
spam
Student Project to set up your own PADSite

This page is posted
on the web at:

http://mindprod.com/project/padfilter.html

Optional Replicator mirror
of mindprod.com
on local hard disk J:

J:\mindprod\project\padfilter.html
Canadian Mind Products
Please the feedback from other visitors, or your own feedback about the site.
Contact Roedy. Please feel free to link to this page without explicit permission.

IP:[65.110.21.43]
Your face IP:[3.147.57.239]
You are visitor number