| Zip Postal Codes | Directories |
| Zip File Format | Nesting |
| Gotchas | GZIP vs Zip |
| Writing | Encryption |
| Reading Sequentially | TrueZip |
| Reading Randomly | Learning More |
| Verifying | Links |
PKZIP and WinZip use / as the directory separator character. It is up to you to convert the \ to / in element names for the ZipEntry write, and back again on read. If you don’t bother, the \ will get in the zip file, and you will have a platform-dependent zip.
Apache VFS gives you a common API for files that works both for regular files and zip file members. Normally you do your work with ZipFile, ZipEntry. ZipInputStream and ZipOutputStream or for simpler takes GZIPInputStream and GZIPOutputStream.
To read all the elements of a zip, you might think you would use ZipFile. getEntries() to enumerate all the entries. Unfortunately, this enumeration is in "random" order — Hashtable order really. So you need to use the random access method below. To efficiently move the disk arms over the file, you really should sort the entries first in the order they appear in the zip.
There are three approaches to the problem:
When you archive and restore a file, it will no longer have a timestamp precisely matching the original. This is above and beyond he similar problem with Java using 1 millisecond precision and Microsoft Windows using 100 nanosecond increments. PKZIP format derives from MS DOS days and hence uses only 16 bits for time and 16 bits for date. There is defined an extended time stamp in the revised PKZIP format, but Java does not use it.
Inside zip files, dates and times are stored in local time in 16 bits, not UTC as is conventional, using an ancient MS DOS format. Bit 0 is the least signifiant bit. The format is little-endian. There was not room in 16 bit to accurately represent time even to the second, so the seconds field contains the seconds divided by two, giving accuracy only to the even second.
This means the apparent time of files inside a zip will suddenly differ by an hour compared with their uncompressed counterparts every time you have a daylight saving change. It also means that the a zip utility will extract a different UTC time from a Zip member date depending on which timezone the calculation was done. This is ridiculous. PKZIP format needs a modern UTC-based timestamp to avoid these anomalies.
To make matters worse, Standard tools like WinZip or PKZIP will always round the time up to the next even second when they restore, thereby possibly making the file one second to two seconds younger. The JDK (i.e. javaToDosTime in ZipEntry rounds the time down, thereby making the file one to two seconds older.
The format does not support dates prior to 1980-01-01 0:00 UTC. Avoid file dates 1980-01-01 or earlier (local or UTC time).
Wait! It gets even worse. Phil Katz, when he documented the Zip format, did not bother to specify whether the local time used in the archive should be daylight or standard time.
And to cap it off… Info-ZIP, JSE and TrueZIP apply the DST schedule (days where DST began and ended in any given year) for any date when converting times between system time and DOS date/time. This is as it should be. Vista’s Explorer, 7-Zip and WinZip apply only the DST savings, but do not apply the schedule. So they use the current DST savings for any date when converting times between system time and DOS date/time. This is just sloppy.
If you think this is bad, have a look at the goofiness in timestamps for FTP uploads.
Arrggh!
|
| ||||||||||||||||||||||||||||||||||||||||
| You can get the freshest copy of this page from: | or possibly from your local J: drive (Java virtual drive/mindprod.com website mirror) | |
| http://mindprod.com/jgloss/zip.html | J:\mindprod\jgloss\zip.html | |
![]() | ||
| Canadian Mind Products | ||
| mindprod.com IP:[65.110.21.43] | ||
| view Blog | Your face IP:[38.107.191.100] | |
| Feedback | You are visitor number 47,031. | |