Delta Creator

This essay does not describe an existing computer program, just one that should exist. This essay is about a suggested student project in Java programming. This essay gives a rough overview of how it might work. I have no source, object, specifications, file layouts or anything else useful to implementing this project. Everything I have prepared to help you is right here.

This project outline is not like the artificial, tidy little problems you are spoon-fed in school, when all the facts you need are included, nothing extraneous is mentioned, the answer is fully specified, along with hints to nudge you toward a single expected canonical solution. This project is much more like the real world of messy problems where it is up to you to fully the define the end point, or a series of ever more difficult versions of this project and research the information yourself to solve them.

Everything I have to say to help you with this project is written below. I am not prepared to help you implement it; or give you any additional materials. I have too many other projects of my own.

Though I am a programmer by profession, I don’t do people’s homework for them. That just robs them of an education.

You have my full permission to implement this project in any way you please and to keep all the profits from your endeavour.

I suggest inventing a generic way to do this that works with files other than just text files. Current schemes are not very bright and end up creating delta files almost as big as the new file. They are not good at noticing that text has simply been reordered.

First you have to determine if the file really changed, or just suffered an Oscar Wilde change, e.g. he put in a comma one day into one of his poems, then the next took it out again. So here are the basic tools you have to work with to tell:

For each file type you write a chunker. It divides the file into chunks. For a text file, it may divide at each newline character. For a MSWord file it may divide at each paragraph. For a FORTH BLK file, it might divide at each 1024 bytes. For a Btree file it chunks out each record…

Then you compute a hash on each chunk of the original file and index it in a Hashtable along with a reference offset and length into the old file. You don’t put the text itself in the Hashtable.

To compose the delta file, you look at each chunk in the new file. You compute its hash and look it up in your Hashtable. If you find a match, you make a note that the chunk can be had from offset/length of the old file. If you don’t find a match, you have to insert the chunk from the new file into your delta file. Very often you will get a sequential run of matching chunks from the old file. In your notes, you can collapse these entries into one big offset/length reference.

Creating delta files of *.exe and *.com files requires some special handling. Experiment by adding a single instruction to a program and notice how that tiny change ripples and cause changes throughout the rest of the file. The idea is to send only the fact that one extra instruction was added and somehow generate all the rippling side effects instead of transmitting them.

Once you have these low level tools in place, set up a scheme so that customers are automatically kept up to date from a master copy, using the Internet. The scheme should recover even if customer should accidentally delete files, corrupt files, or restore out of date files.

This project is a subset of a more general problem of distributing file and program updates. See the Automatic Updater project.

	This page is posted on the web at:	http://mindprod.com/project/deltacreator.html
	Optional Replicator mirror of mindprod.com on local hard disk J:	J:\mindprod\project\deltacreator.html
	Please read the feedback from other visitors, or send your own feedback about the site. Contact Roedy. Please feel free to link to this page without explicit permission.
	Canadian Mind Products IP:[65.110.21.43] Your face IP:[216.73.216.126]
Feedback	You are visitor number

Delta Creator

Disclaimer