Hashtable : Java Glossary

* 0-9 A B C D E F G H I J K L M N O P Q R S T U V W X Y Z (all)

Hashtable

For Strings, Hashtables work pretty much as you would expect if you are familiar with hashing. The only catch is remembering to import java.util.Hashtable (with a small t in table). Here is how you use

The key point for novices to remember is the capacity of the table has to be somewhat larger than the number of elements you plan to look up. The more spare capacity you give, the faster the lookup. There are, of course, diminishing returns. Typically, you specify about 25% more capacity than you have elements and a load factor of 0.75 which controls automatic growth of the lookup table when it gets too full. More details on that later.

The capacity of a table is not the number of elements you plan to add! If you specify a capacity of 100 and a load factor of .75, Hashtable will allocate only 100 slots, not the 133 you need for efficient access. The load factor kicks in only when deciding when it is time to double the size of the table, in this case, just after you have added 75 elements, not in deciding its initial size. It is up to you to leave some breathing room.

Tips

Hashtables work fastest if the capacity is a prime number since it reduces collisions. Your hashcodes are unlikely to have a frequency distribution pattern based on modulo a large prime number, where they easily could have one based on a power of two. When you use a power of two, you are effectively masking off all but the low order bits. In Java 1.4.1+, for HashMap there no longer a payback for specifying prime table lengths because HashMap rounds whatever capacity you specify up to a power of two, to avoid the cost of a division (they can do an AND bit mask instead) and to help keep memory from being fragmented. However, Hashtable, which is synchronized, uses the value you give it, so giving it a prime is still worth while.
Now it is more important than ever to have a high quality hashCode function that has good distribution, especially in the low order bits. It must not favour one bit pattern over another, e.g. not tend to generate hashcodes all ending in binary 0000. In JDK (Java Development Kit) 1.4.1+ there in a rehash function used inside HashMap to improve the quality of your provided hashCode by redistributing the bits. rehash has to be quick, or they might as well have used tables whose lengths are multiples of primes and used the modulus operator.

In theory though, you should round your capacity up to prime number. However, avoid 31 or multiples of 31 since the new hashCode algorithm for Strings in Java version 1.2 or later uses 31 as a magic number in the hashing formula. In a Hashtable, you are only using 4 bytes per additional entry. Here are some primes just larger than various powers of two that would be good choices.

11 17 37 67 131 257 521 1031 2053 4099 8209 16411 32771 65537 131101 262147 524309 1048583 2097169 4194319 8388617
You cannot have elements with duplicate keys. If you add a new element that duplicates the key of an existing element, it will replace it.
The initial capacity is not the number of elements you plan to put in the table. If you specified a capacity of 101 and a load factor of .75f, then the initial table could hold only 75 elements, before it had to grow. This means you must specify plenty of slop in your capacity setting. If you used a load factor of 1.0, your performance would be slow because you would have many collisions rather than direct lookups.
Hashtables will automatically grow when you add too many elements. However, growing requires copying, rehashing and rechaining, quite an involved process. That is why you want an accurate estimate for the initial capacity, slightly on the high side.
A load factor of .75f means that the table will be expanded when it gets three quarters full. Hashing works faster when the table is not full. A value of .25f would use more RAM (Random Access Memory) but would work faster. A value of 1.0f would use minimal RAM but would run extremely slowly. The reason you want a decent supply of unusued slots is to reduce collisions, two entries that hash to the same slot. They have to be resolved by chasing a linear chain. If there are very few empty slots left, and new entries are assigend effectively random slots, the odds are they will be assigned a slot already taken. If there is almost no room left, the chains of elements that hash to the same slot rapidly grow longer. That is why you have a load factor, to grow the entire structure when space gets tight and keep the chains of elements that collide on the same slot as short as possible.
HashCodes are usually not unique. If you think about it, how could they be? Consider Strings of length 10 characters (20 bytes, 160 bits). How many different possible strings are there? 2^160. HashCodes are only 32 bits long. How many possible different hashCodes are there? 2^32. You can see there are way more strings that HashCodes, so strings would have to share HashCodes.
If two keys hash to the same hashCode, or to the same hashCode modulo the size of the table, all still works. It just takes a little longer to look up that key, since get() has to chase a chain of clashing keys to find the match. Keys are matched with the equals() method, not ==, or simply by comparing hashCodes.
Let me repeat this another way since it is a point people seem to miss on first reading. Collisions happen not only when two of the HashCodes are identical, (quite infrequent) but when two of the Hashcodes modulo the initial capacity are equal (which almost always happens at least a few times). Collisions are unavoidable. The Hashtable class deals with them automatically. They slow down lookup, but do not derail it. The smaller the load factor, the more RAM you waste, but the fewer the collisions, so the faster the lookup. Using prime numbers for the divisor/capacity also tends to reduce collisions.
This is the same math that comes up when you ask how many people do you need to have in a room for it to be likely that two people have the same birthday. There’s a > 50% chance with 23 people which is a bit more than the square root of 365.
The source code for Hashtable is very short. It takes an order of magnitude more English to describe how it works than does the Java code itself. I suggest you peruse it if you are in the least curious how it works inside. Hashing is something every programmer should understand. You might also study String.hashCode().
Hashtables use generic objects or both keys and values. You need to cast these back to Strings, or whatever you are using.
If you want to use something other than Strings as your keys, make sure you implement an equals() and hashCode() method for those key objects to override the default Object equals() and hashCode() methods. The default equals() method just checks that the two objects are one in the same (using ==), not that they consist of the same bit patterns.
There are two methods for enumerating the Hashtable, keys() which gives you the keys and elements() which gives you the values. You could step through both enumerations simultaneously instead of using get() as I showed. JDK 1.2 HashMap (the unsynchronized replacement for Hashtable) lets you iterate over keys, values or key-value pairs. Note, the enumeration is not sorted in any particular order, not even the order you added your elements. You need either keyset().iterator(), values().iterator() or entryset().iterator() for pairs. For sorted Hashtables, you need to sort the output yourself with Arrays.sort( hashtable.values().toArray( new X[hashtable.size()]) ) or use a TreeMap.
The size method keeps track of how many unique keys are in the Hashtable, not the size of the table including empty slots. Hashtables never contain items with duplicate keys.
Wherever possible use Vectors or ArrayLists in preference to Hashtables when you can assign a unique dense int key. Vectors are much faster to look up when you know the index. The Vector.get index lookup process does not scan through other objects, (dragging them from backing store into real RAM). Thus Vectors even moreso are better than Hashtables when your virtual RAM is not backed totally by real RAM. Vector.contains in contrast does a linear search, comparing each object in turn. This is much slower than a Hashtable lookup except for very small tables. If you are single thread, use ArrayList in preference to Vector. You can build a free slot chain through the embedded unused Vector slots to rapidly recycle numbers.
Never disturb fields in the Hashtable/HashMap/ HashSet key objects that would change the hashCode or the results of equals. Hashtable requires these to sit still for it to work. You can change them, but only when the Objects are not actively being used as keys.
Hashtable and HashMap do not cache a recent get. So if you frequently fetch the same item over and over, you might consider keeping a copy of the previous get result.
When you serialize a Hashtable, it just outputs key-entry pairs, none of the lookup mechanism. When your load it back again it rehashes and rebuilds the lookup structure.
Serializing and generics do not mesh well. To get rid of all warning messages and to ensure type safety you are best to output an array and reconstitute the Hashtable yourself.
Hashtable is a Map, not a Collection, though it is considerd part of the Java Collections Framework. However, the result of Hashtable. values and Hashtable. keySet is a Collection. You can use Collection. toArray to export either to an array.

Dealing With Duplicates

Hashtables don’t permit duplicates. What do you do if some of your objects have duplicate keys? When you have duplicates, what do you want to find when you do a lookup? Just the first, just the last? both? If you are clever, you ca get all those behaviours with a Hashtable.

To get the first duplicate, before you put, do a check to see if the element is already in the Hashtable, if so, do nothing.

To get the last duplicate, just put. It will replace what was there before. To get both, store an ArrayList as the value. Store each of the duplicates in one of the slots of the ArrayList. This is a bit inefficient for non-duplicates. For them, you could store just the plain Object. Any you might handle the special case of just one duplicate with an array. Your logic might look a bit like this:

Of course, are free to simplify the algorithm, going straight to ArrayLists even for singles. My complex recipe is designed to conserve RAM on large lists.

Oracle’s Javadoc on Hashtable class : available:

on the web at Oracle.com
in the current JDK 1.8.0_131 on your local Windows J: drive.

binary search
Collections
Enumeration
equals
hashCode
HashMap
HashSet
Iterator
Map
perfect hash
primes
sort
TreeMap

standard footer
	This page is posted on the web at:	http://mindprod.com/jgloss/hashtable.html
	Optional Replicator mirror of mindprod.com on local hard disk J:	J:\mindprod\jgloss\hashtable.html
	Please read the feedback from other visitors, or send your own feedback about the site. Contact Roedy. Please feel free to link to this page without explicit permission.
	Canadian Mind Products IP:[65.110.21.43] Your face IP:[3.133.91.217]
Feedback	You are visitor number