Introduction | Collisions |
Cryptographic Digests | Computing Collision Probability |
Hashing Functions | Benchmarking |
FNV1a hash function | Links |
A digest complicated checksum that is difficult to fake. A message digest function is an algorithm which takes a variable-length message and produces a fixed-length hash, 128 bits for MD5 (Message Digest algorithm 5), 160 bits for SHA-1 (Secure Hash Algorithm 1). Given the hash, it is computationally all but impossible to find another message with that same hash; in fact one can’t determine any usable information about a message from the hash, not even a single bit.
SHA-2 is a family of digest functions that produce digests of 224, 256, 384 or 512 bits. The functions are called SHA-224, SHA-256, SHA-384, SHA-512, SHA-512/224, SHA-512/25
Digests are intended to be cryptographically secure — very difficult to fudge a message to make it come out to any given digest. Hashing functions do not have that property. They are easy to fudge. They are not secure. They are, in general, faster to compute. They are typically used in implementing HashMap-like lookup.
The idea is if you compute a message digest, then if any bytes in the message change, the recomputed digest will change and you can detect the tampering when you recompute the hash. If there is no tampering, the digest will remain constant. With regular checksums, such as CRC (Cyclic Redundancy Check) or XOR (exclusive OR), it is not all that difficult to tamper and fiddle the tampered message so that the checksum still comes out the same, e. g. a clever virus could insert itself in a checksummed file and add gibberish to make the checksum come out the same as before so that a checksum verifier would be unable to detect its presence. A one-way hash function can be private or public, just like an encryption function. In other words, anyone can recompute it (public) or just the holder of the private key (private). With a private key scheme, anyone with the public key can verify the checksum is correct, even if they could not compute it from scratch.
A public one-way hash function can be used to speed up a public-key digital signature system. Rather than directly compute the signature of a long message which can take an inordinately long time, you compute the one-way hash of the message and then digitally sign just the hash. The result can be verified with a public key, but created only with a private key. The receiver can verify the digest and hence be assured the file indeed came from you and that none of its bytes have been modified since you signed it. There are two common digest algorithms SHA-1 and MD5. Even though Jar files for signed Applets have a signed digest, oddly, Sun did not provide access to classes for computing them until Java version 1.3 when it introduced the java.security.MessageDigest class.
If you want digests that others can’t backwards guess to the original text, condition your original data with secret1 + data + secret2 where + represents concatenation. For a slightly fancier technique use HMAC (Hashed Message Authentication Code) described in RFC 2104.
MD5 and SHA-1 are heavyweight digests. The SHA-2 family are taking over.
If you want to compute them quickly and are not concerned about cryptographic maliciousness, just random error, you can get away with much simpler digests such as:
For earlier versions of Java, Mitch Gallant provided an MD5 and SHA-1 but they are no longer available.
In theory, if you wanted a variable length digest, you could cook one up by concatenating a string of random numbers generated by using your value as a seed. However, making such a digest longer does not improve its quality. It actually makes it even easier to crack. It has no more variability than the size of the seed.
The FNV hash is very easy to compute and very fast. Here is the code for the 32-bit version.
Here is the code for the 64-bit version.
Imagine the digests for your filenames were evenly spread over all the possible hashes. What is the probability of have no collisions at all with 20,000 hashed filenames? Using the program below I discovered that :
Algorithm | Time in Seconds lower is better |
Number of Collisions lower is better |
---|---|---|
32-bit String.hashCode | 0.393 | 0 |
32-bit FNV1a32 | 5.044 | 1 |
64-bit FNV1a64 | 4.217 | 0 |
32-bit SunCRC32 | 6.133 | 1 |
32-bit Adler | 6.327 | 946 |
128-bit MD5 | 12.967 | 0 |
160-bit SHA-1 | 23.390 | 0 |
160-bit SHA | 22.690 | 0 |
Here is the program I used to do the benchmarks. If you would like me to test your favourite hashing algorithm, implement the XXX interface and send me the code to include it.
This page is posted |
http://mindprod.com/jgloss/digest.html | |
Optional Replicator mirror
|
J:\mindprod\jgloss\digest.html | |
Please read the feedback from other visitors,
or send your own feedback about the site. Contact Roedy. Please feel free to link to this page without explicit permission. | ||
Canadian
Mind
Products
IP:[65.110.21.43] Your face IP:[3.145.152.168] |
| |
Feedback |
You are visitor number | |