Atomic FTP Uploader

Atomic FTP Uploader


Disclaimer

This essay does not describe an existing computer program, just one that should exist. This essay is about a suggested student project in Java programming. This essay gives a rough overview of how it might work. I have no source, object, specifications, file layouts or anything else useful to implementing this project. Everything I have prepared to help you is right here.

This project outline is not like the artificial, tidy little problems you are spoon-fed in school, when all the facts you need are included, nothing extraneous is mentioned, the answer is fully specified, along with hints to nudge you toward a single expected canonical solution. This project is much more like the real world of messy problems where it is up to you to fully the define the end point, or a series of ever more difficult versions of this project, and research the information yourself to solve them.

Everything I have to say to help you with this project is written below. I am not prepared to help you implement it; or give you any additional materials. I have too many other projects of my own.

Though I am a programmer by profession, I don’t do people’s homework for them. That just robs them of an education.

You have my full permission to implement this project in any way you please and to keep all the profits from your endeavour.

Please do not email me about this project without reading the disclaimer above.

I added a new section on implementation details to this essay on 2005-07-08.

The Problem

FTP (File Transfer Protocol) software is notoriously difficult to use and notoriously unreliable. I have tried dozens of packages. FTP clients are all utterly hopeless at the basic task of keeping a server website identical to the client side. They are really dinosaurs left over from the days when people downloaded files over dial up phone lines with FTP.

What are the problems:

  1. The software gets confused and fails to upload or delete files on the server, or uploads them when it does not need to.
  2. When someone out on the net is reading a file on the server, that locks it from being updated, and bombs the update run.
  3. If I make a massive set of changes to the website, it make take hours to upload. During that time people out on the web will see an incompatible mixture of old and new files. I don’t want the new files to be visible until they are all ready. Uploads should be atomic.
  4. Server and workstation clocks may be out of sync. This should not confuse the software. Usually the workstation is the one in trouble. Its clock should be reset from an atomic clock with software similar to SetClock. Your software should work even when the server’s clock is badly out of whack or in a different time zone.
  5. You upload entire files even if only a few bytes in them have changed.
Ideally you would like to do this without running any software on the server, since usually ISPs (Internet Service Providers) will not let you, or will charge you considerably more if you do run your own software.

Approaches

  1. Upload files to a different directory branch. When they are all ready, delete the master, and rename the uploaded files. It might be possible to do this without server-side code since FTP supports a rename function. I know of no product that does this. It would be very useful since a website can fail when you have half old and half new files being served to the public, or old files referencing images in the process of being deleted or renamed.
  2. Use the Replicator as a core. You upload replicator-style zips to the server, and only once they are all uploaded, unzip them. If they are busy you get on with unzipping the next file and put that file on a queue to handle later.
  3. Use the Subversion version control as a core. From the workstation’s point of view, it is just like checking in changes to a set of source files to version control. How do you get the files in shape for the HTML (Hypertext Markup Language) server? Subversion handles the problems of atomicity, and picking up after a disconnect in the middle of an upload. It also is smart about only uploading changes, thus saving bandwidth.
  4. Look into Rsync for site mirroring. You can use the --delay-updates option for reasonable atomicity. It has the usual Unix utility problems, novice-unfriendly documentation, and the need to tweak and compile source code for your particular server. Perhaps you might write a wrapper to hide Rsync’s installation complexities.

Implementation Details

I would like to see a product specialized for FTP uploads that runs unattended. It would work in conjunction with my Replicator software for automatically distributing and keeping large file sets up to date without needing any server-side software.

However, you don’t have to know a thing about the Replicator to understand this project. I am just telling you I have a couple of paying customers ready for you if you decide to write this.

What I need is a streamlined FTP upload-only program designed specifically to upload website files to a server, with the following features:

I have been so frustrated with GUI-style FTP programs for uploading websites, that I put on my to do list the task of writing my own implementation of this student project. It is a much simpler beast than something like FTP voyager. You might use a GUI (Graphic User Interface) like FTP-Voyager to compose the connections information and test the configurations out so that you don’t have to compose that stuff from scratch in your scripts. Hopefully you can find a companion GUI that will export that information in easy-to-use format.

You can get started with Peter van der Linden’s little LinLyn FTP class and by watching the conversations back and forth between a GUI-style FTP client and a FTP server during an upload. It is much simpler than you might imagine.

From Scratch, FTP Replacement

FTP protocol is old and has a number of disadvantages: Perhaps what is needed is a completely fresh start. The catch is then you need to write software both for the client and server. It might work something like the Replicator.

You might implement deltas, compression, UDP (User Datagram Protocol), SAX-like protocol, automatic recovery from disconnect…

finding closest download mirror project
FTP
FTP Voyager
NetLoad
rsync
SAX

This page is posted
on the web at:

http://mindprod.com/project/smartftp.html

Optional Replicator mirror
of mindprod.com
on local hard disk J:

J:\mindprod\project\smartftp.html
logo
Please the feedback from other visitors, or your own feedback about the site.
Contact Roedy. Please feel free to link to this page without explicit permission.
Blog
IP:[65.110.21.43]
Your face IP:[54.166.65.9]
You are visitor number