I do contract work for a living, which could include writing a program such as this. However, I don’t do people’s homework for them. That just robs them of an education.
You have my full permission to implement this project any way you please.
I suggest inventing a generic way to do this that works with files other than just text files. Current schemes are not very bright and end up creating delta files almost as big as the new file. They are not good at noticing that text has simply been reordered.
Here’s how you might tackle it.
First you have to determine if the file really changed, or just suffered an Oscar Wilde change, e.g. he put in a comma one day into one of his poems, then the next took it out again. So here are the basic tools you have to work with to tell:
One you have decide the file actually did change, you can send the changes.
For each file type you write a chunker. It divides the file into chunks. For a text file, it may divide at each newline character. For an MSWord file it may divide at each paragraph. For a FORTH BLK file, it might divide at each 1024 bytes. For a Btree file it chunks out each record…
Then you compute a hash on each chunk of the original file and index it in a Hashtable along with a reference offset and length into the old file. You don’t put the text itself in the Hashtable.
To compose the delta file, you look at each chunk in the new file. You compute its hash, and look it up in your Hashtable. If you find a match, you make a note that the chunk can be had from offset/length of the old file. If you don’t find a match, you have to insert the chunk from the new file into your delta file. Very often you will get a sequential run of matching chunks from the old file. In your notes, you can collapse these entries into one big offset/length reference.
Creating delta files of *.exe and *.com files requires some special handling. Experiment by adding a single instruction to a program and notice how that tiny change ripples and cause changes throughout the rest of the file. The idea is to send only the fact that one extra instruction was added, and somehow generate all the rippling side effects instead of transmitting them.
Once you have these low level tools in place, set up a scheme so that customers are automatically kept up to date from a master copy, using the Internet. The scheme should recover even if customer should accidentally delete files, corrupt files, or restore out of date files.
This project is a subset of a more general problem of distributing file and program updates. See the Automatic Updater project.
![]() |
and suggestions to improve this page to Roedy Green : | ||
| Canadian Mind Products | |||
| mindprod.com IP:[65.110.21.43] | |||
| Your face IP:[38.103.63.17] | The information on this page is for non-military use only. | ||
| You are visitor number 5,928. | Military use includes use by defence contractors. | ||
| You can get a fresh copy of this page from: | or possibly from your local J: drive (Java virtual drive/Mindprod website mirror) | ||
| http://mindprod.com/project/deltacreator.html | J:\mindprod\project\deltacreator.html | ||