TrueZIP : Java Glossary

*0-9ABCDEFGHIJKLMNOPQRSTUVWXYZ (all)
The JDisplay Java Applet displays the large program listings on this web page. JDisplay requires an up-to-date browser and Java version 1.8+, preferably 1.8.0_131. If you can’t see the listings, or if you just want to learn more about JDisplay, click  Help Use Firefox for best results.

TrueZIP
The current version is 7.7.1 Last revised/verified: 2012-11-23. There are two versions TrueZIP File which Works with Java version 1.6 or later and TrueZIP Path which works with Java version 1.7 It is a Java library that lets you treat a zip file as if it were a directory. You can shuffle files around and when you are done, call umount() which creates a new zip reflecting all your changes. TrueZIP gives you complete flexibility in just how much qualification you include with each member. You can have none, full qualification, anything in between, or even include the original drive letter (which maps into a subdirectory called E: for example). It is up to you as a programmer to construct the file name of the zip entry, then pour your file/files into it. The name does not even have to bear any resemblance to the name of the file you are adding. You can’t just copy files into the root of the archive. You must first create a zip entry file with a name to contain each one. It behaves much more like a directory tree than a traditional archive.

TrueZIP compresses as you add, but does not construct the final archive file until you call umount. Using the default settings, TrueZIP archives take up about an extra 28% more space that WinZip using its proprietary compression algorithms. It possible to squeeze more compression out of TrueZIP if you are willing to take more time.

See the timestamp gotchas about PkZip format. They plague TrueZIP too. TrueZIP can use other formats that may avoid these problems.

Java 7 has have built-in True-zip like features. Mark Hall is working on giving TrueZIP an API (Application Programming Interface) that will be compatible. That means you can write code for Java 7 that will work on earlier JDKs (Java Development Kits).

TrueVFS

If you are starting a new project, you should use TrueVFS instead of TrueZIP. It requires Java version 1.7. TrueVFS is more scalable in terms of memory and runtime (less impact on heap), provides new features (Apple Keychain, JMX (Java Management extensions)), truly employs convention over configuration (all features are locatable on the class path) and is better tested (runs new tests which are not available for TrueZIP 7).

TrueVFS does log4J logging. So you also need a jar to handle that. slf4j-nop.jar will turn off logging. logback-classic.jar will turn on logging.

TrueZIP 7

TrueZIP 7 is a rewrite of TrueZIP 6 with a slightly different API.

There are many other little changes, but they pretty easy to discover letting the compiler point out syntax errors and scanning the Javadoc.

TrueZIP 6 works on Java version 1.4 or later. TrueZIP 7 works on Java version 1.6 or later.

Support

TrueZIP supports to following flavours of archive:

Archive Formats that TrueZIP supports
Type Canonical Suffixes Description Advantages Disadvantages
ZIP zip

ZIP file: Archive file with central directory and compressible entries

Widely supported. People can easily access the archive without TrueZIP. Uses standard Java java.util.zip.Deflater to do the actual compression, which is also a disadvantage because it is not a particularly quick or strong. You can trade off time for additional compression. Incompetent date-time stamp format. They don’t understand time zones or daylight saving. They are only accurate to two seconds.
JAR ear | jar | war Java Archive: ZIP with custom directory tree layout Fully multiplatform because of Java support. Uses rather lame compression techniques. Same advantages and disadvantages as zip. Jar is just a flavour of zip.
ODF odb | odf | odg | odm | odp | ods | odt | otg | oth | otp | ots | ott OpenDocument Format, like XML (extensible Markup Language) compressed with PkZip. Works with OpenOffice Not a general format
TZP tzp | zip.rae | zip.raes RAES encrypted ZIP file AES (Advanced Encryption Standard) is serious encryption. Needs aux BouncyCastle bcprov.jar. Does not use JCE (Java Cryptography Extension) because JCE lacks the needed random access.
SFX/EXE exe ZIP file with a code preamble for self extraction If you send the archive to someone, they need no additional software at all to open it. This driver is pretty slow. Windows only. Read-only.
TAR (Tape Archive) tar TAR : Uncompressed tape archive file. Universally supported under Unix. Needs aux ant.jar.
TAR.BZ tbz | tb2 | tar.bz2 TAR file wrapped in BZIP2 compression format More aggressive compression than ZIP. Needs aux ant.jar.
TAR.GZ tar.gz | tgz TAR file wrapped in GZIP compression format. Traditional Unix archive Not particularly aggressive compression. Needs aux ant.jar.

Sample Code for TrueVFS

Here is a simple program to add a file to a zip and display a directory of its contents.

Sample Code for TrueZIP 7

Here is a simple program to add a file to a zip and display a directory of its contents.

Sample Code for TrueZIP 6

Here is a simple program to add a file to a zip and display a directory of its contents.

BackupToZip Substantial TrueZIP Application

Here is the source code for a more complex TrueZIP program that maintains a mirror of a set of files in an archive  view

Encoding Gotcha

ZIP files use IBM437, an eight bit character set to encode the filenames. Anything which is not representable in this charset gets rejected. You can change this in the File API and the ZIP API. For the File API, just do this:

Another way to do it is to create an empty zip file using ZipOutputStream with a specified encoding.

However, this will stop interoperability of the created ZIP files with older tools because support for UTF-8 has been added only fairly recently! So anybody else will probably not be able to extract these ZIP files. WinZip, however, can handle them.

If you need a better option, use the JAR file format — it supports UTF-8. The TAR file format is not an option either because it supports only US-ASCII.

ANT/Genjar

It is fairly tricky to bundle just the parts of the TrueZIP jar that you need inside your own jar. Here is how The truezip works jar contains some ant code in it. If you put that code in the ext directory, ant will stop working. Remove it from the jar first.

TrueZIP vs FileSystem

Now that Java 1.7 supports a TrueZIP-like Zip filesystem, which should you use? I asked Christian Schlichtherle, the author of TrueZIP, the advantages and disadvantages of both. Here is his reply.

Concept:

The most fundamental difference is that TrueZIP is designed as a true VFS (Virtual File System), while NIO.2 (New Input/Output version 2) is just an abstraction over particular file system implementations: NIO.2 lacks file systems federation, so an application cannot transparently traverse and access different file systems with a uniform addressing system.

For example, if an application wants to access a ZIP file, it has to address the ZIP file with a special URI (Uniform Resource Indicator) and do specific API calls for looking up the particular file system provider. Once it has obtained a Path object for that ZIP file, it can only resolve entries within that ZIP file, but not step out to the parent file system or step in to an inner archive file from that Path object. With TrueZIP however, an application doesn’t even need to know that it accesses an archive file — all access is fully transparent with the help of a uniform addressing scheme encapsulated in the TFile or TPath classes.

Once I discussed these constraints about file system federation with Alan Bateman, who is the specification lead for JSR (Java Specification Request) 203 (alias NIO.2 ). At the time, his answer was that they didn’t want to replicate the functionality of JNDI (Java Naming and Directory Interface). My point of view is that JNDI is a terrible API in general and a horrible API to do any I/O in particular.

How does this matter? For example, if you are writing a search engine to index the platform file system, then with TrueZIP your application could easily traverse the directory tree in a simple, uniform way and it would transparently step into archive files (ZIP, TAR, etc) too if you want it to do so. Or if you are writing a software build tool, then with TrueZIP your application could easily compile nested EAR (Enterprise Archive file) / WAR (Web Archive)/JAR from various sources without needing to consider the nesting level and the different compression strategies at each nesting level.

The way I look at it is that the TrueZIP Path module isn’t just another implementation of an NIO.2 FileSystemProvider — it’s also a nice façade to the NIO.2 API.

Also, I have always strived to make simple things easy while making complex things possible. A result of this is the provision of the copy/move/delete operations for directory trees in the TFile class (alias bulk I/O). Thanks to file system federation and this feature, converting a ZIP file to a TAR.GZ file can be done in one line of code as shown on the home page. In comparison, with NIO.2 you need to implement the FileVisitor interface, which is cumbersome and provides very little benefit over implementing the traversal yourself because it accounts for all corner cases.

Features

There are so many differences here that I can hardly address them all. Just looking at the ZIP drivers I can easily spot that Java 7 has no BZIP2 compression, no WinZip AES encryption, no RAES encryption, no appending to ZIP files and doesn’t know about the subtle differences between ZIP and JAR files when it comes to encoding entry names or date/time stamps.

How does this matter? The first time you hit a ZIP file with mojibake (I love this word) in the entry names you’ll know it.

Performance

Because TrueZIP provides convenient methods for copying data, it can also do some important optimizations. As a standard feature, TrueZIP splits reading and writing the data into separate threads. So whenever the application copies data, a pooled thread is used to read the input. Your mileage may vary, but my personal experience is that this easily cuts 30% of the runtime in comparison to a naïve read-stop-write-stop loop.

Another optimization avoids redundant recompression: If the application copies a ZIP entry from one ZIP file to another, then the TrueZIP Driver ZIP recognizes that and avoids to inflate the data from the input ZIP file just to deflate it again to the output ZIP file — this is called RDC (Raw Data Copy). This feature unloads a significant burden from the CPU (Central Processing Unit) and may be more than welcome in a server app, e.g. a Continuous Integration system.

Robustness

Providing file system federation comes with some unique challenges: What if one of the addressed archive files in a path name is a false positive archive file, e.g. a regular directory or file or is non-existent? The TrueZIP Kernel recognizes this and deals with it according to the true state of the file system entry. So if for example a prospective archive file is in fact a regular directory, the application can still proceed with the operation.

Reliability

As a user, you might take this aspect for granted, but please allow me to stress its importance and how TrueZIP tries to make a difference: When you are storing data persistently, would you want to bet its faith on a file system which is unreliable? Of course, not! And so this is the one aspect where I don’t want to make any compromises. In clear and bullish words, I want TrueZIP to be solid as a rock!

TrueZIP uses static code analysis, assertions, unit tests, functional tests and integration tests. Of course, such a complex system isn’t bug free, but with each version, I add more tests to cover even the strangest corner cases, e.g. parallel copying/moving/deleting of entries between different levels of nested archive files which get concurrently synced to their respective parent file system — yeah, I know it sounds weird.

In comparison, at the time Java 7 was released I asked Alan Bateman if there was a test suite for NIO.2 FileSystemProvider implementations. Of course, my intention was to run my implementation for TrueZIP 7.2 against this suite. Unfortunately, there was none, not even for ZipFileSystemProvider. To be fair, I don’t know if this situation has changed. I don’t care anymore because I have ported my integration tests from the TrueZIP File API to the TrueZIP Path API for the release of TrueZIP 7.2.

Disadvantages

NIO.2

There is now a NIO.2 -compatible API for TrueZIP that works like the Oracle NIO (New Input/Output) code, but with more efficient TrueZIP internals. You can find a general introduction to the NIO.2 API at the Java Tutorials: Going from there, all you essentially need to know to use TrueZIP is that instead of calling Paths.get(*) for creating a Path object, you should call the new TrueZIP TPath(*) to leverage the power of the TrueZIP Kernel. So creating an InputStream works like this:
// use of TPath to get a TrueZip style Path.

import de.schlichtherle.truezip.nio.file.TPath;
import java.io.InputStream;
import java.nio.file.Files;
import java.nio.file.Path;
...
// Path path =  Paths.get( "archive.zip/entry" ); // wouldn't work
Path path = new TPath( "archive.zip/entry" ); // use this
InputStream in = Files.newInputStream( path );
Try the TrueZIP Archetype Path. It provides some ready-to-run Java and Scala sample code for the TrueZIP Path API and discusses some options you have with it, e.g. using the fastest way for copying streams.

This page is posted
on the web at:

http://mindprod.com/jgloss/truezip.html

Optional Replicator mirror
of mindprod.com
on local hard disk J:

J:\mindprod\jgloss\truezip.html
Canadian Mind Products
Please the feedback from other visitors, or your own feedback about the site.
Contact Roedy. Please feel free to link to this page without explicit permission.

IP:[65.110.21.43]
Your face IP:[3.22.27.41]
You are visitor number