A HashMap lets you look up values by
exact key (always case-sensitive). It is very much like a Hashtable,
except that it is faster and not thread-safe. There
are some minor other differences:
- HashMaps work with Iterators
where the older Hashtables work with Enumerations
- Hashtables work best with capacities
that are prime numbers. HashMaps round
capacities up to powers of two.
Read the Hashtable
entry. Nearly all of it also applies to HashMap,
The capacity of a table is not
the number of elements you plan to add! If you specify a capacity of
128 and a load factor of .75, HashMap
will allocate only 128 slots, not the 256 you need for efficient
access. The load factor kicks in only when deciding when it is time
to double the size of the table, in this case, just after you have
added 96 elements, not in deciding its initial size. It is up to you
to leave some breathing room.
Don’t use a HashMap unless you need
the lookup feature. It takes orders of magnitude more time to build a
HashMap than an ArrayList.
Sample Use
In this case we use a String to look up a String
in a HashMap. We use the modern Java version 1.5 or later
techniques of generics and for:each loops. Note though that there is no
generic type checking on get. It will take
anything. At run time any objects of the wrong type are detected as
non-matching. This is more compatible with the old non-generic version.
This example also shows how to export the data in random and sorted
orders.
HashMap is a Map,
not a Collection, though it is
considered part of the Java Collections Framework. However, the result
of HashMap. values
and HashMap. keySet
is a Collection. You can use Collection.
toArray to export either to an array.
AutoBoxing Iterations
In this case we use a String to look up a
number in a HashMap. We use the
JDK (Java Development Kit) 1.5+ autoboxing features and the
Java version 1.5 or later style for:each
Old Style Manual Boxing Iterations
In this case we use a String to look up a
number in a HashMap. We use the Java version 1.1 or later
manual boxing and the old style for loop.
Removing Elements from a HashMap
You can’t simply iterate over a HashMap,
you must iterate over either the corresponding keySet,
values, or entrySet.
Further, you may not remove elements from the HashMap
while you are iterating, unless you remove using the Iterator’s
remove method.
Modifying HashMap Keys
and Values
There are four common types of modification you might want to do to the
keys or values in a HashMap.
- To change a HashMap key, you look up
the value object with get, then remove the old key and put
it with the new key.
- To change the fields in a value object, look the value object up
by key with get, then use its setter
methods.
- To replace the value object in its entirely, just put
a new value object at the old key.
- To replace the value object with one based on the old, look the
value object up with get, create a new
object, copy data over from the old one, then put
the new object under the same key.
Have a look at this code to see how it is done.
Accumulate: A Real World Code Example
The Accumulate
class uses a HashMap to accumulate values
against a variety of categories. Source is included. You might use it to
accumulate hours by category in a timesheet, or count how many times
various words appeared in a document. Studying the source will give you
some hints on what you can do with HashMaps.
Here is how you use it:
IdentityHashMap
IdentityHashMap uses ==
instead of equals to compare objects for
equality. It also uses the original Object.
hashCode() via
System.identityHashCode
( Object x). This
is useful when you want a HashMap that keeps
track of a set of heterogeneous objects. Serialization uses one to track
which objects have already been sent to the stream and where in the
stream they are. It does not use Entry objects and chains. I works by
alternating pointers to keys and values in a single table.
Having Your Cake and Eating It Too
Let’s say you can’t decide between the speed of an ArrayList
that looks up by an indexing int and TreeMap that looks up by surname and a HashMap
that looks up by social insurance number.
The beauty of Java Collections is you
can have all three. Just maintain three different Collections
on the same set of objects. The three Collections
don’t have to contain the exact same elements.
HashMap will give you an Iterator
or an array of the whole collection at any point, which may be
sufficient so that you don’t need an auxiliary ArrayList.
Speed
For fast HashMaps:
- Use the newer HashMap instead of Hashtable.
- Give HashMap plenty of spare space to
work in the constructor.
- Make sure your hash function is
quick, possibly caching the result. If you keep looking up the same
key over and over, HashMap goes through
all the work over and over, except for computing the hashCode.
- Make sure your equals function
compares just the fields needed, testing first the fields least
likely to match.
Initialisation
There is no HashMap constructor that
takes a pair of arrays or an array of pairs. You could write your own.
You could spell out the initialisation code e.g.
Putting pairs in a
CSV (Comma-Separated Value)
file and loading it in a static
initialiser makes it possible to maintain the list without
recompiling. There are all kinds of tools
to help you maintain
CSV
files. They are easier to key and proofread that Java source code.
If you are preparing a large HashMap to
go into an Applet where you want to keep
the size of the jar as small as possible, you can write a separate
class to initialise the table, then save it as a serialized gzipped
resource you embed in the jar. Then the only code you need in your Applet to initialise the HashMap
is a single readObject call.
You can us the Properties class which
lets you load from and save to an external file.
Tips
- It seems obvious that the key cannot be null,
but it is not so obvious than the value cannot be null
either. The null value is reserved to
indicate no matching key.
- When you serialize a HashMap, it just
outputs key-entry pairs, none of the lookup mechanism. When your
load it back again it rehashes and rebuilds the lookup structure.
- Serializing and generics do not mesh well. To get rid of all
warning messages and to ensure type safety you are best to output an
array and reconstitute the HashMap
yourself.
- When you extract a keySet(), values()
or
entrySet() nothing is duplicated/copied.
It just provides a view on the original HashMap.
Changes to the HashMap are reflected in
the derivative Collections.
- To create an effective case-insensitive HashMap,
just convert all keys to lower case before feeding them to the HashMap. It is on my to-do list to create CaseInsensitiveHashMap that does this for
you.
- HashMap.keySet
officially returns a Set<K>
but it is actually a
java.util.HashMap$KeySet,
which implements contains with the HashMap hash.
- HashMap.values
officially returns a Collection<V> but it is actually a
java.util.HashMap$Values,
which implements contains with a linear
search.
Digest Keys
Let’s say you had a large number of very long keys. All the
keys to look up would have to fit in memory. You might be tempted to
go to an
SQL (Standard Query Language)
database where they are stored on disk. There is another way. You
need to squish your keys in some way. Imagein if you removed all the
spaces from your keys. When you wanted to look something up, you
would first remove the spaces, then look for that. You can seriously
squish the keys down using a digest
function. If you use 64-bit digests there is almost no chance of two
keys sharing the same digest. If you use 32-bit keys there is slight
chance of a collision, but for some
purposes that might not be so terrible. Every one in a long while,
when you look a key up, you get the result for some different key.
Future
HashMaps create thousands of little glue Entry objects to point to key and value. I
suggest using contained objects that implement a getKey()
method
to find the key. Then you would not need the Entry
glue. In HashMap Entry
objects that hash to the same slot are chained together. This means a prev or next (or both)
pointer in each Entry. You could get rid of
the need for that by using either linear
probing or double
hashing. When you enumerated, the data objects automatically get
both key and value since a reference to the key is embedded in the
object.
Learning More
Oracle’s Javadoc on
HashMap class : available:
Oracle’s Javadoc on
IdentityHashMap class : available:
Oracle’s Javadoc on
<span class="interface">Set</span> interface returned by keySet and entrySet : available:
Oracle’s Javadoc on
IdentityHashMap class : available:
Oracle’s Javadoc on
java.util.Map class : available:
Oracle’s Javadoc on
LinkedHashMap class : available:
Oracle’s Javadoc on
HashSet class : available:
Oracle’s Javadoc on
ConcurrentHashMap class : available:
Oracle’s Javadoc on
EnumMap class : available:
Oracle’s Javadoc on
Collections.unmodifiableMap : available:
Oracle’s Javadoc on
Collections.emptyMap() : available:
To understand HashMap,
read up on Hashtable and HashCode.