digest : Java Glossary

*0-9ABCDEFGHIJKLMNOPQRSTUVWXYZ (all)
The JDisplay Java Applet displays the large program listings on this web page. JDisplay requires an up-to-date browser and Java version 1.8+, preferably 1.8.0_102. If you can’t see the listings, or if you just want to learn more about JDisplay, click  here for help. Use Firefox for best results.

digest

Introduction Collisions
Cryptographic Digests Computing Collision Probability
Hashing Functions Benchmarking
FNV1a hash function Links

Introduction

A digest complicated checksum that is difficult to fake. A message digest function is an algorithm which takes a variable-length message and produces a fixed-length hash, 128 bits for MD5 (Message Digest algorithm 5), 160 bits for SHA-1 (Secure Hash Algorithm 1). Given the hash, it is computationally all but impossible to find another message with that same hash; in fact one can’t determine any usable information about a message from the hash, not even a single bit.

Cryptographic Digests

SHA-2 is a family of digest functions that produce digests of 224, 256, 384 or 512 bits. The functions are called SHA-224, SHA-256, SHA-384, SHA-512, SHA-512/224, SHA-512/25

Digests are intended to be cryptographically secure — very difficult to fudge a message to make it come out to any given digest. Hashing functions do not have that property. They are easy to fudge. They are not secure. They are, in general, faster to compute. They are typically used in implementing HashMap-like lookup.

The idea is if you compute a message digest, then if any bytes in the message change, the recomputed digest will change and you can detect the tampering when you recompute the hash. If there is no tampering, the digest will remain constant. With regular checksums, such as CRC (Cyclic Redundancy Check) or XOR (exclusive OR), it is not all that difficult to tamper and fiddle the tampered message so that the checksum still comes out the same, e. g. a clever virus could insert itself in a checksummed file and add gibberish to make the checksum come out the same as before so that a checksum verifier would be unable to detect its presence. A one-way hash function can be private or public, just like an encryption function. In other words, anyone can recompute it (public) or just the holder of the private key (private). With a private key scheme, anyone with the public key can verify the checksum is correct, even if they could not compute it from scratch.

A public one-way hash function can be used to speed up a public-key digital signature system. Rather than directly compute the signature of a long message which can take an inordinately long time, you compute the one-way hash of the message and then digitally sign just the hash. The result can be verified with a public key, but created only with a private key. The receiver can verify the digest and hence be assured the file indeed came from you and that none of its bytes have been modified since you signed it. There are two common digest algorithms SHA-1 and MD5. Even though Jar files for signed Applets have a signed digest, oddly, Sun did not provide access to classes for computing them until Java version 1.3 when it introduced the java.security.MessageDigest class.

If you want digests that others can’t backwards guess to the original text, condition your original data with secret1 + data + secret2 where + represents concatenation. For a slightly fancier technique use HMAC (Hashed Message Authentication Code) described in  RFC 2104.

MD5 and SHA-1 are heavyweight digests. The SHA-2 family are taking over.

Hashing Functions

If you want to compute them quickly and are not concerned about cryptographic maliciousness, just random error, you can get away with much simpler digests such as:

For earlier versions of Java, Mitch Gallant provided an MD5 and SHA-1 but they are no longer available.

In theory, if you wanted a variable length digest, you could cook one up by concatenating a string of random numbers generated by using your value as a seed. However, making such a digest longer does not improve its quality. It actually makes it even easier to crack. It has no more variability than the size of the seed.

FNV1a hash function

The FNV hash is very easy to compute and very fast. Here is the code for the 32-bit version.

Here is the code for the 64-bit version.

Collisions

Imagine the digests for your filenames were evenly spread over all the possible hashes. What is the probability of have no collisions at all with 20,000 hashed filenames? Using the program below I discovered that :

  1. 16 bit hashes (with 65535 possible hash codes) for all practical purposes will always generate collisions.
  2. 32-bit hashes (with 4,294,967,296 possible hashes) avoids collisions once in 22 trials.
  3. 48-bit hashes (with 281,474,976,710,656 possible hashes) avoids collisions once in 1.4 million trials.
  4. 60-bit hashes (with 2,305,843,009,213,693,952 possible hashes) avoids collisions once in 11.5 million trials.
  5. 62-bit hashes (with 4,611,686,018,427,387,904 possible hashes) avoids collisions once in 23.1 million trials.
  6. 64-bit hashes (with 18,446,744,073,709,551,616 possible hashes) are beyond the limits of my program.
With 32-bit hashes you will have you need to deal with collisions once in a blue moon. You need code to handle them, but it does not need to be efficient. With 64-bit hashes the collisions are all but impossible.

Computing Collision Probability

Benchmarking

I did a benchmark of various hashing algorithms. I collected 50,000 real world filenames to hash and check for collisions. I ran each algorithm 25,000,000 times. Here are the results:
Algorithm Time in Seconds
lower is better
Number of Collisions
lower is better
32-bit String.hashCode 0.393 0
32-bit FNV1a32 5.044 1
64-bit FNV1a64 4.217 0
32-bit SunCRC32 6.133 1
32-bit Adler 6.327 946
128-bit MD5 12.967 0
160-bit SHA-1 23.390 0
160-bit SHA 22.690 0

Here is the program I used to do the benchmarks. If you would like me to test your favourite hashing algorithm, implement the XXX interface and send me the code to include it.

view

This page is posted
on the web at:

http://mindprod.com/jgloss/digest.html

Optional Replicator mirror
of mindprod.com
on local hard disk J:

J:\mindprod\jgloss\digest.html
logo
Please the feedback from other visitors, or your own feedback about the site.
Contact Roedy. Please feel free to link to this page without explicit permission.

IP:[65.110.21.43]
Your face IP:[54.242.9.97]
You are visitor number