Introduction  Collisions 
Cryptographic Digests  Computing Collision Probability 
Hashing Functions  Benchmarking 
FNV1a hash function  Links 
A digest complicated checksum that is difficult to fake. A message digest function is an algorithm which takes a variablelength message and produces a fixedlength hash, 128 bits for MD5 (Message Digest algorithm 5), 160 bits for SHA1 (Secure Hash Algorithm 1). Given the hash, it is computationally all but impossible to find another message with that same hash; in fact one can’t determine any usable information about a message from the hash, not even a single bit.
SHA2 is a family of digest functions that produce digests of 224, 256, 384 or 512 bits. The functions are called SHA224, SHA256, SHA384, SHA512, SHA512/224, SHA512/25
Digests are intended to be cryptographically secure — very difficult to fudge a message to make it come out to any given digest. Hashing functions do not have that property. They are easy to fudge. They are not secure. They are, in general, faster to compute. They are typically used in implementing HashMaplike lookup.
The idea is if you compute a message digest, then if any bytes in the message change, the recomputed digest will change and you can detect the tampering when you recompute the hash. If there is no tampering, the digest will remain constant. With regular checksums, such as CRC (Cyclic Redundancy Check) or XOR (exclusive OR), it is not all that difficult to tamper and fiddle the tampered message so that the checksum still comes out the same, e. g. a clever virus could insert itself in a checksummed file and add gibberish to make the checksum come out the same as before so that a checksum verifier would be unable to detect its presence. A oneway hash function can be private or public, just like an encryption function. In other words, anyone can recompute it (public) or just the holder of the private key (private). With a private key scheme, anyone with the public key can verify the checksum is correct, even if they could not compute it from scratch.
A public oneway hash function can be used to speed up a publickey digital signature system. Rather than directly compute the signature of a long message which can take an inordinately long time, you compute the oneway hash of the message and then digitally sign just the hash. The result can be verified with a public key, but created only with a private key. The receiver can verify the digest and hence be assured the file indeed came from you and that none of its bytes have been modified since you signed it. There are two common digest algorithms SHA1 and MD5. Even though Jar files for signed Applets have a signed digest, oddly, Sun did not provide access to classes for computing them until Java version 1.3 when it introduced the java.security.MessageDigest class.
If you want digests that others can’t backwards guess to the original text, condition your original data with secret1 + data + secret2 where + represents concatenation. For a slightly fancier technique use HMAC (Hashed Message Authentication Code) described in RFC 2104.
MD5 and SHA1 are heavyweight digests. The SHA2 family are taking over.
If you want to compute them quickly and are not concerned about cryptographic maliciousness, just random error, you can get away with much simpler digests such as:
For earlier versions of Java, Mitch Gallant provided an MD5 and SHA1 but they are no longer available.
In theory, if you wanted a variable length digest, you could cook one up by concatenating a string of random numbers generated by using your value as a seed. However, making such a digest longer does not improve its quality. It actually makes it even easier to crack. It has no more variability than the size of the seed.
The FNV hash is very easy to compute and very fast. Here is the code for the 32bit version.
Here is the code for the 64bit version.
Imagine the digests for your filenames were evenly spread over all the possible hashes. What is the probability of have no collisions at all with 20,000 hashed filenames? Using the program below I discovered that :
Algorithm  Time in Seconds lower is better 
Number of Collisions lower is better 

32bit String.hashCode  0.393  0 
32bit FNV1a32  5.044  1 
64bit FNV1a64  4.217  0 
32bit SunCRC32  6.133  1 
32bit Adler  6.327  946 
128bit MD5  12.967  0 
160bit SHA1  23.390  0 
160bit SHA  22.690  0 
Here is the program I used to do the benchmarks. If you would like me to test your favourite hashing algorithm, implement the XXX interface and send me the code to include it.
This page is posted 
http://mindprod.com/jgloss/digest.html  
Optional Replicator mirror

J:\mindprod\jgloss\digest.html  
Please read the feedback from other visitors,
or send your own feedback about the site. Contact Roedy. Please feel free to link to this page without explicit permission.  
Canadian
Mind
Products
IP:[65.110.21.43] Your face IP:[23.20.25.122] 
 
Feedback 
You are visitor number  