Putting items in ascending or descending order by key fields. The English
language meaning of sort is simply to categorise, e.g. to put all your bills
into folders by category, e.g. all the electric bills together, all the phone
bills together. An internal sort is done strictly in RAM. An external sort uses
hard disk temporary files. A sort is stable if it
naturally preserves existing order when you have duplicate keys. If you want
existing order preserved and you are using an unstable sort, you must add an
extra sequence number key to your records and use that as a secondary tie-breaking
key.
Internal Sort
Here is James
Goslings’s QuickSort.
In the JDK 1.2 there are built-in sorts such as Arrays.sort
and Collections.sort . See java.util.Comparator
and java.util.Comparable for different sorting
orders. For String sorting in various international orders see java.text.Collator,
java.text.RuleBasedCollator, and java.text.CollationKey.
In the simplest case, you can sort an array like this:
import java.util.Arrays;
Arrays.sort( myArray );
or an ArrayList (or any other Collection) like this:
import java.util.ArrayList;
import java.util.Collections;
...
Collections.sort( myArrayList );
Note that there is no such method as ArrayList.sort.
Comparators
For more complex sorts you need a class that implements
Comparator. Often you can piggyback that
code on some other class. The following example does the same as the code above,
but could be extended to sort by several fields.
To sort in descending order, simply write a Comparator
with a compare that returns the negative of the value
it does now i.e.
return -( (String )a ).compareTo( (String)b );
Comparables
If you write a class, you can define its natural sort
order, by having that class implement the Comparable
interface. Then you can sort arrays or ArrayLists of
your Objects without needing to specify a Comparator.
Instead of a compare method, an Comparable
interface has a compareTo method.
Sorting Speed
In the following table, speed is measured relative to RadixSort on a 100,000
item set of 5-character random keys of the form A9999 where SunSort is
arbitrarily assigned 100. Tests done under Java 1.4.1 java.exe on a 233 MHz
Pentium 128 MB RAM. Bigger is better.
| Sort |
speed
(bigger is better) |
stable? |
notes |
| SunSort (Java’s Built-In Sort) |
100 |
yes |
This is a most excellent sort. It uses a modified QuickSort for primitives
and a MergeSort (sometimes called a tournament sort for objects). The merge sort
makes a clone of the array (but not the objects) before it starts, so be aware
of the memory usage. There is no reason to use any but this sort except for very
large sorts with keys of 4 characters or less where RadixSort might beat it out.
Of course you can’t use this in the older Java versions before it was
available. Invoke with Arrays.sort and Collections.sort.
Beware when using the versions with startIndex and endIndex.
The endIndex points one past the last element. |
| RadixSort |
86 |
yes |
Good for very large sorts, takes same time per item no matter how many items.
Ordinary sorts take more time per item as the number of items goes up. More
complicated Comparator routine to write. Likes
short keys. If you double the length of the key the sort time doubles. |
| HeapSort |
48 |
no |
Works best when data are almost already sorted. |
| ShellSort |
34 |
no |
Small and quick-loading. Good for small sorts under 2000 items. |
| QuickSort |
14 |
no |
Likes random data to sort. Order in the data can cause it to take a very
long time to sort. Not good on very small sorts. Varies wildly in time to sort,
sometimes a speed demon, sometimes a pig. It just depends on the luck of the
draw. I’m told the version I have could be made much better with a few
tweaks. |
External Sort
If you have too much data to sort to fit in RAM at once, you will need to use an
external sort such as OptTech
neé Opt-Tech Data Processing. It works by sorting RAM-fulls of records,
then doing an N-way merge on the sorted batches, to create fewer larger sorted
batches. Eventually it ends up with one large sorted batch.
Abundance uses Op-Tech sort. It rolls itself out to disk, calls in Opt-Tech sort,
giving it all of RAM, then rolls itself back in.
Merge Sort
A merge sort is used primarily in large sorts where the data are to big to fit
in RAM. You sort bundles and write them to disk, using any of the usual internal
sorting techniques, but not a merge sort. Then you merge N bundles,
reading them sequentially and interleaving them into one bundle N times as long.
You repeat until you have one big bundle. To get maximum speed, to eliminate
head motion, your scratch intermediate disk files should be on separate physical
drives. You tweak N to use RAM/virtual RAM most efficiently. Merge sorts are not
particularly good for internal sorts.
You might experiment with a merge sort to use on a dual core CPU. You split the
data in two. Sort each on a separate thread then merge the results in a single
pass.
Tag Sort
One problem you often run into is having two arrays you want to keep in sync
when you sort one. There are three approaches.
- Combine the two arrays into one, with objects having two or more fields. Write a Comparator
or implement Comparable for these new combined
objects.
- Sorts don’t actually sort the records, they just order an array of
references. The objects themselves don’t move. If the things you are
sorting are not collected nicely together into objects you can use a tag sort.
Tag records are short records consisting of the sorting key fields plus a
reference back to the original record. This reference may be an array index or a
pointer to the original object. The tag records need not contain any data at all,
just a way to find it when the comparator is called.
- Use the cern.colt.GenericSorting
class.
Bad Sorts
There are a few simple sorts you should almost never use since the take time o(n²).
In order words they turn into utter pigs when the number of elements to sort
gets large.
One is a selection sort:
- Select the largest of the N keys and move its record to the output area.
- Select the largest of the remaining N-1 keys and move its record to the output
area. etc.
Another is the bubble sort, where you pairwise swap
elements in a pass through all the elements to percolate the biggest element to
the top. Then repeat for the list not containing the biggest elements so far.
etc.
A variant of the bubble sort, that is a tiny bit faster is the bi-directional
sort that percolates the smallest element to the bottom and the biggest
element to the top in alternating direction passes, gradually working toward the
middle. Sun has built an
to let you watch bubble sort and bidirectional
BubbleSort race against QuickSort. It becomes obvious why BubbleSort should not
be used.
Another is the insertion sort, where you create a new
list and gradually add to it, always keeping the new list in sorted order by
inserting the new element in the place where it belongs. You find the place by
binary search, and slide all the subsequent elements down to make room for it.
You sometimes have to use this type of sort when you don’t have all the
elements at the start of the sort. They are just given to you one at a time, and
you want a sorted list immediately output after each element is added.
Ordered Collections
Instead of sorting, you can maintain a set in order with TreeSet
or its brother TreeMap. Whether you go the TreeSet
route or the export with toArray and sort route
depends on whether you need sorted order only on occasion. If you do, the export
method is better. If you need the sorted order maintained up to the second, then
the TreeSet method is better. The TreeSet
method consumes more memory and more CPU time, and is perhaps a trifle more
complex to code. With either technique, it is possible view the same data in
several sorted orders by sorting the exported array in different orders or by
maintaining multiple TreeSets on the same data
objects.
Ascending/Descending Arrow Indicators
If you do a GUI, you will want some way of indicating which way your data are
sorted ascending or descending. The problem is, no matter what you do you are
going to confuse some of you users. Why is this? What does the direction mean to
various people?
- Up is ascending because ascending means climbing up. It has nothing to do with
the order of the items on the page.
- Up is ascending because up connotes normal and happy and so does ascending.
- Up is ascending because the indicator represents the keyword the programmer gave
to the SQL sort to get items to paint top to bottom. He did not actually paint
bottom to top for descending order. By default, you get ascending. When you talk
of up and down, usually up is the default. So the two are associated by both
being the default.
- Up is ascending because when things on the page are ordered in ascending order,
you see the small easy numbers first then work toward the more difficult.
Climbing up, i.e. ascending stairs, is the difficult direction.
- Up is ascending because ascending elevators and escalators go up.
- Down is ascending because the bigger items are at the bottom of the page.
- Down is ascending because down is the direction you read to see ever-bigger
items.
- Down is ascending, because the indicator means the direction of flow and of pull
of gravity. It points downward toward the logical gravitation point attracting
the bigger/heavier items.
- Whatever you think it is, it should be the opposite, because the arrow does not
tell you how things are, but how they will be if you click the icon. The icon
represents a command to sort, not an indictor of how things are.
Perhaps some day the association and icon choices will be a configurable part of
look an feel so users will have consistent conventions across all the apps they
use. Consistency is the problem.
I have found the majority up application use an upward arrow to indicate that
items are displayed in ascending order.
My convention is to use an arrow pointing up and to the right to indicate the
data are sorted in ascending order. This helps break the connection with reading
direction, and suggests the climbing association.
No matter what you do, you are going do it “backwards” for some
people, and irritate them. So you might use indicators that don’t have any
arrows or at least no confusing up/down arrows. Let the user figure out what
your colours or letters A/D mean for example.
Generic Sorts
If you write a sort for a List, e.g. ArrayList,
with proper generics, it will work on collections of any type that supports Comparable
or Comparator. To see how to pull it off, have a
look at the source for any of my sorts, or Sun’s sort.
However, because of Java’s lack of orthogonality, your List
sort won’t work for arrays of such Objects.
You need to write very similar code to do that. Even that array version won’t
sort an array of primitives such as long, int
or byte. You have to write yet another slightly
different version of the sort to handle each type of primitive.
Learning More
Sun’s Javadoc on
Collections.
sort : available:
Sun’s Javadoc on
Arrays.
sort : available: