This essay does not describe an existing computer program, just one that should exist. This essay is about a suggested student project in Java programming. This essay gives a rough overview of how it might work. I have no source, object, specifications, file layouts or anything else useful to implementing this project. Everything I have prepared to help you is right here.

This project outline is not like the artificial, tidy little problems you are spoon-fed in school, when all the facts you need are included, nothing extraneous is mentioned, the answer is fully specified, along with hints to nudge you toward a single expected canonical solution. This project is much more like the real world of messy problems where it is up to you to fully the define the end point, or a series of ever more difficult versions of this project and research the information yourself to solve them.

Everything I have to say to help you with this project is written below. I am not prepared to help you implement it; or give you any additional materials. I have too many other projects of my own.

Though I am a programmer by profession, I don’t do people’s homework for them. That just robs them of an education.

You have my full permission to implement this project in any way you please and to keep all the profits from your endeavour.

Please do not email me about this project without reading the disclaimer above.


How It Works

Conceptually the utility just glues all the source files together, then starts removing duplicate text. It works on UTF-8 plain text files. To find the duplicates, the utility puts the sentences (with whitespace normalised) into a HashMap. When you find a duplicate, you remove the duplicate. Then you move the text following the duplicate up to the next duplicate just after the original copy of the sentence. The problem with this is many sentences will be for all practical purposes the same, but just differ by a word or two. So I suggest before you put the sentences into the HashMap, you strip the sentences of common words. You might invent your own logic to find still more duplicates.

If you had some grammar processor, it could tell you if two sentences said the same thing. If you can’t remove a substantial about of duplicate text, there is not much point to this utility.

The Result

The result is far from perfect. The result will still have a lot of duplication you will have to remove manually. Further, you will have quite a bit of manual reordering to do. The good news, is all your ideas will be there somewhere. You will also have to add headings and formatting.

