is a sorting technique that mimics the old card sorters that worked column
by column sending punched cards into different bins. I first heard of the idea
from the late Tom Meikle who was a novice programmer when he independently
discovered it back in 1968 when we implemented it for the venerable LGP-30. He
called it "DigitSort". Only many years later after teaching it to
scores of people did I learn it was already well known by the name RadixSort.
 |
recommend book⇒The Art Of Computer Programming |
| | hardcover |
|---|
| ISBN10: | 0-201-48541-9 |
|---|
| ISBN13: | 978-0-201-48541-7 |
|---|
| publisher: | Addison-Wesley |
| published: | 1998-10-15 |
| by: | Donald Knuth |
| Knuth’s volumes 1, 2 and 3 are the reference works for standard algorithms. At his website he describes plans for volumes 4 and 5. |
|
The radix-distribution sort (which I call here simply RadixSort) is discussed at
length in Knuth’s Art of Computer Programming vol 3 #5.2.5, where he says
that it was first published in 1929 in connection with tabulating machines,
although it was probably known to the operators before that, and first described
in the form included here in 1954. There is also a related radix-exchange sort.
Advantages
In Java, RadixSort is faster than either QuickSort or HeapSort in the benchmarks
I ran. For small numbers of items, almost any sorting algorithm will work just
fine. There is rarely any reason to use anything but the built-in tournament
sort, unless the sort is too big to fit in RAM, where you need something like opt-sort.
RadixSort is stable, meaning it preserves existing order of equal keys. It works
in linear time, unlike most other sorts. In other words, it does not bog down
when you have large numbers of items to sort. The time per item to sort is
constant. With other sorts, the time to sort per time increases with the number
of items. Mathematicians would put it that most sorts run in O(n
log(n)) or O(n²) time, where RadixSort
runs in O(n) time.
Drawbacks
- RadixSort does not work well when you have very long keys, because the the total
sorting time is proportional to key length and to the number of items to sort.
If you insisted that all records have unique keys, necessarily the keys would
have to be at least log2(n) bits long,
which would tend to defeat the advantage of RadixSort. RadixSort shines when you
have large numbers of records to sort with short keys.
- Unfortunately, in this particular implementation, 16 bit characters count as two
key slots, unless you can guarantee you never user the high order parts. In
other words Unicode takes twice as long to sort as byte arrays
- The biggest drawback with RadixSort is that you have to write an unconventional
compare routine.
- The other drawback is RadixSort also uses somewhat more working storage than a
traditional sort, because it copies references to the elements to sort back and
forth between two arrays for each pass. Many other sorts make do with one array
and some stack space.
Tuning
If you want a high speed version of RadixSort for a particular application, you
will have to inline the comparison delegate code since delegate.getKeyByteAt,
is where it chews up all the CPU time.
Since all sorts can use the same Comparator
interface, it is possible to experiment with various sorts to figure out which
one works best for your situation.
Variants
There are other ways to implement the same basic algorithm by chaining the
objects to be sorted together, but that tends to be inefficient in Java since
you need separate glue objects to do the linking unless the items to be sorted
themselves implement some chaining interface.