image provider

Automatic File Updates


Disclaimer

This essay does not describe an existing computer program, just one that should exist. This essay is about a suggested student project in Java programming. This essay gives a rough overview of how it might work. I have no source, object, specifications, file layouts or anything else useful to implementing this project. Everything I have prepared to help you is right here.

This project outline is not like the artificial, tidy little problems you are spoon-fed in school, when all the facts you need are included, nothing extraneous is mentioned, the answer is fully specified, along with hints to nudge you toward a single expected canonical solution. This project is much more like the real world of messy problems where it is up to you to fully the define the end point, or a series of ever more difficult versions of this project and research the information yourself to solve them.

Everything I have to say to help you with this project is written below. I am not prepared to help you implement it; or give you any additional materials. I have too many other projects of my own.

Though I am a programmer by profession, I don’t do people’s homework for them. That just robs them of an education.

You have my full permission to implement this project in any way you please and to keep all the profits from your endeavour.

Please do not email me about this project without reading the disclaimer above.

The intent of this project is to keep any files (including *.jar and *.zip files) up-to-date on the client site, automatically. It is designed to dovetail with the Delta File Creator that makes the process even more efficient by sending just the parts of files that have changed. This project sends entire files, or entire zip members, if so much as a comma changes in them.

There are already some push tools that update files on the client sites. These include Marímba, Java Web Start, DMP, Funduc Patch and Symantec LiveUpdate. However, they are not suitable for two simple applications I have:

  1. Keeping people’s expanded copies of cmp*.zip up-to-date. These are giant files containing all the *.html, *.gif, *.jpg and *.mp3 files on my website in downloadable zipped format.
  2. Keeping a JavaHelp.jar file up-to-date on a client site.
What are the problems?

How are Automatic Update Files Stored on the Server?

Instead of storing entire jars or zips on the server, the members of the jars are stored as separate *.upd files on the server. The files are given sequential numbered names, e. g. 00000042.upd. The server is a perfectly standard HTTP (Hypertext Transfer Protocol) server. The only thing you need to configure on your server is a new MIME type for the *.upd extension for your component files. When the component of a jar is updated, the new contents are assigned a new sequential number. That way there is no problem uploading a file to the server that others are downloading. Any change gets a new number. You retire updated file numbers. All the *.upd files are stored in GZIP compressed format. Content files in jars and zips are compressed only once. *.upd files representing stand-alone files too are compressed. Compressed-standalone files are effectively compressed twice. Fanatics might want to invent a way to avoid that tiny extra overhead.

How are Automatic Update Files Stored on the Client?

The files are conventional jar, zip, program or stand-alone data files. They can have any name or extension. They can live anywhere on disk. They may be compressed or uncompressed in any conceivable format. Jars can even be digitally signed. They are not permanently stored as *.upd files.

The master copies of the files created by the programmers are also maintained in this conventional way. No one is aware of the *.upd files. They are a transport mechanism only. The conversion to and from the *.upd files is fully automatic.

How Does Automatic Update Work

There is a tiny root file on the server. It contains the sequential number of the current state-of-the-union file. The state-of-the union file also is also stored on the server in GZIP compressed form. It has entries that contain the following data for each file/member managed by the file updating system.
State-Of-The-Union File Fields
Field Purpose
Status A=Active, R=Replaced, D=Deleted. Active means this file is necessary for the client. Replaced means this file has been replaced by some other version, Deleted means this file is no longer used. To start a client from scratch, all you need do is examine the Active entries. Usually all the Replaced entries will be filtered out. They are there mainly for debugging. Similarly very old Deleted entries might be filtered out.
Sequence Number Appending *.upd will give you the name of the corresponding file on the server. All the files for a project are stored in the same directory on the server, even if they are stored in many different directories on the client.
Install Root Code Usually this will be 1 to mean that files are installed relative to the client’s installation directory. For a complex project, you may have multiple installation directories. 0 means file names are absolute.
File Name The fully qualified filename of where this content eventually ends up on the client.
Member Name The fully qualified filename of the jar or zip entry. If this is blank, this entry represents a standalone file.
Date/Time Updated Miliseconds since 1970, using GMT (Greenwich Mean Time), a Unix or Java timestamp. This is used to set or check the file’s system date.
Checksum 32-bit Adlerian checksum of the data. It is computed on the uncompressed form of the file. Adlerian checksums are faster to compute and verify than other types.
When the client wants to refresh its files, it first downloads the tiny root file. From there it can download and decompress current state-of-the-union file. It knows it already has current files, up to and including sequence number N. It knows this even if it had to restore its data files from backup. It then looks in the state-of-the-union file and processes the entries. If it sees a Delete entry, it deletes the corresponding file or member. If the zip or jar has no more members, you delete the file itself. If it sees a Replace entry, it ignores it. If it sees an Active entry, it inserts/replaces that file or member. It may optionally verify the checksum of newly updated or all active members. If there are failures, it can automatically redownload any failed entries and even optionally even totally recreate any jar/zip files from scratch. This makes your applications and files self-healing.

You need a tool to help you prepare your *.upd files for uploading to the server. It starts with a list of directories and files to process. It detects file and member changes via file and member dates or possibly with the checksums, or even with comparison with your current set of *.upd files. It is probably easiest to use file dates exclusively in determining which *.jar files to create and have a checksum verify routine you run periodically. If you get a failure, you manually redate the affected files with a touch utility to force a correction.

How do you handle a file whose date has changed, but whose contents have not? You could:

What if the client refreshes so infrequently that the necessary Delete entries are no longer present?

Extending Automatic Update

There are seven directions you could take this project once you get these basics handled:
  1. You get some eager beavers who check every ten minutes if there have been updates. The way the scheme works now, they would download the entire rather fat state-of-the union *.upd file just to discover nothing had changed. You get around this by maintaining several state-of-the-union files. You might have a yearly, monthly, weekly, daily and hourly version. The hourly version just has changes made in the last hour. The root file points to all of them. In addition the root file tells you the low and high sequence number each state-of-the-union file covers. The yearly version may be completely up-to-date, or it may not. By looking at the ranges, the client can figure out which of these files it needs to download and process, if any. It may need to process more than one or none. You could have as many of them or as few of them as your wanted, spanning any range of sequence numbers.
  2. Getting an install started from scratch is rather inefficient since the client downloads a zillion tiny files. It is also rather inefficient to do a massive update, since a large number of individual files/members would have to be downloaded. Therefore each upd file might live also in one or more lump files, where a number of upd files are consolidated. The client downloader can then decide the most efficient way to get the individual files it needs. The lump files can be retired just like upd files.e They can be updated, to agglutinate groups of upd files in different ways, or to drop retired/replaced upd files. The restriction is, the client must download the entire lump file if it wants even one upd file in it. As a last resort, all upd files are always available individually. Ironically, each lump file is also an upd file, using the same sequence number naming scheme. The client would look at the lump files available and decide which ones have the most stuff they need and the least stuff they don’t. If there is too much unwanted stuff, then it would pay to download files individually. In practice you might have a lump for updates up to the first of this year, one for updates from first of the year to the first of this month, one for the first of the month until yesterday and one for today’s updates. Note that ideally you rebuild all the lumps each day to prune them of deadwood and add any new files to them. A client coming in cold would need to download all four lump files. A client who updated daily would need to download only one. A client who updated hourly would download individually. The server is free to update the lump files at any time even when clients are in the middle of downloads because of the way updates are done by always creating new upd and lump files.
  3. You need to do your updates to the server’s copy of the *.upd files in batches. You don’t update the root file until all the upd files in the batch are complete. The client does not start using his system again until all the upd files mentioned in the root file are downloaded and installed. You don’t want the client using his system when only a few of the files of the update have been installed.
  4. Some files can’t just simply be plopped on the client site. They need to be installed, e.g. inserted into the registry or specially processed, e.g. to set special attribute bits, reboot to replace a DLL (Dynamic Link Library), etc. the way Java Web Start does. You need a way to specify custom installers. See Installer and the Installer Project.
  5. This scheme still redownloads large non-jar files even if so much as a comma in them changes. See the Delta Creator project for how to tackle that problem. The same problem applies to large members where only a tiny part of them has actually changed.
  6. Automatically notify clients when there are changes. This could be done by email, or by a tiny UDP (User Datagram Protocol) or TCP/IP (Transmission Control Protocol/Internet Protocol) probe to the running application. This just probes them to consider doing a refresh cycle. It does not actually send them any update data. If there are very frequent updates to the master files, you have to avoid pestering your clients more frequently than they want to be pestered. You also have to consider they may have many indendent applications using this scheme. The probe had better identify the application and how to unregister email probes. You eventually have to give up on notifying clients that never bother to update or respond to probes. The prober is not necessarily the server where the client gets updates.
  7. If you have a great many clients, you need a way to clone your server files and have clients use all the mirrored servers TuCows-style, picking one close, functioning, up-to-date and not too busy. Ideally you want mirror site selection automatic. Further, you want propagation to the various mirror sites automatic. You want seriously out of date clients to first use the less-up-to-date servers to avoid overloading the up-to-date ones.
  8. Other projects that could be based on such automatic update include Bulk file distributor, HTML Glossary Presenter, On-Line Books, Sanity Checker, Infinite Disk, Prebranded Software rental with auto updates.
You can use the File Transfer classes to transfer the files around locally and remotely. You can use the File I/O Amanuensis to teach you how to compress and decompress files. You will need to study the Zipfile, ZipEntry, ZipInputStream and GZIPInputStream classes for taking apart jar files and compressing/decompressing.

Baby Steps Toward Automatic Update Nirvana

The evolution of automatic update goes like this:
  1. Download entire applications as a lump, e.g. with an installer, or with a giant zip file.
  2. Download just the files that have changed, using Java Web Start.
  3. Download just the members of jars and zips that have changed using the Automatic File Updater described here.
  4. Use lumping to avoid downloading separate upd files most of the time.
  5. Download just the chunks of files or members that have changed using the Delta Creator.
  6. Use the bulk file distributor project so that you can efficiently use multiple client-based distributed servers. This lets you distribute to millions of customers using only a small server.
  7. Instead of using simple HTTP file transfer protocols, use custom server software to let the client grab all the updates in a single TCP/IP session.
  8. Use instantaneous update so that applications can use the up to the second information, even information that becomes available after the app has started. This requires storing data in specially structured files, usually an SQL (Standard Query Language) database. Ironically this type of update is much more evolved than the simpler types described above. See Oracle Distributed Databases and Oracle Replicated Databases.
Bulk File Distributor

This page is posted
on the web at:

http://mindprod.com/project/autoupdate.html

Optional Replicator mirror
of mindprod.com
on local hard disk J:

J:\mindprod\project\autoupdate.html
Canadian Mind Products
Please the feedback from other visitors, or your own feedback about the site.
Contact Roedy. Please feel free to link to this page without explicit permission.

IP:[65.110.21.43]
Your face IP:[18.220.242.160]
You are visitor number