benchmark : Java Glossary

*0-9ABCDEFGHIJKLMNOPQRSTUVWXYZ (all)
The JDisplay Java Applet displays the large program listings on this web page. JDisplay requires an up-to-date browser and Java version 1.8+, preferably 1.8.0_131. If you can’t see the listings, or if you just want to learn more about JDisplay, click  Help Use Firefox for best results.

benchmark
Comparing the speeds of different ways of doing the same thing using different algorithms, different hardware, or different compilers. You might do this to fine tune software you are writing or to help with a purchase decision of hardware or software.
Designing A Benchmark 64-bit Timing Results
Signum Benchmark Discoveries
Conformance Results The Bottom Line
32-bit Timing Results Links

Designing a Benchmark

Designing and running a benchmark is great fun. Just like any horse race, it attracts a crowd of onlookers.

Signum Benchmark Source Code

Here view

Signum Benchmark Conformance Results

Checking conformance on crucial corner values…
stephan fails at -9,223,372,036,854,775,808 giving 0
sgn fails at 0 giving 1
Checking conformance on 20,000,000 random longs…
Running timing tests with 200,000,000 iterations per candidate. Warm refers to letting the program run for a while to allow Hotspot optimisation to kick in. Cold means timing from initial load.

Signum Benchmark Timing Results from Roedy’s 32-bit Athlon Machine

C:\sys\TestSignum.exe com.mindprod.example.TestSignum JET (Just Enough Time) "Athlon dual core 3800-1 2GHz 1GB Vista JET 5.0"
JET Athlon dual core 3800-1 2GHz 1GB Vista JET 5.0 Cold
elapsed ns cpu ns call ns SI (Systèm Internationale (metric)) candidate notes
3,344,281,520 967,472,560 0.48 252.20 piotr Overall champ. Best algorithm and smartest compiler. Congratulations to Piotr and JET compiler team.
4,137,639,200 1,760,830,240 0.88 138.57 kobzda
4,322,604,320 1,945,795,360 0.97 125.40 wibble
6,524,701,120 4,147,892,160 2.07 58.83 signOf
8,044,919,480 5,668,110,520 2.83 43.05 lohi32
12,694,140,200 10,317,331,240 5.16 23.65 signHalf
13,431,391,400 11,054,582,440 5.53 22.07 twoifs
37,676,389,560 35,299,580,600 17.65 6.91 Oracle Long.signum
JET Athlon dual core 3800-1 2GHz 1GB Vista JET 5.0 Warm
elapsed ns cpu ns call ns SI candidate notes
3,362,579,280 984,360,880 0.49 247.88 piotr
4,129,492,920 1,751,274,520 0.88 139.33 kobzda
4,304,273,600 1,926,055,200 0.96 126.68 wibble
6,525,282,320 4,147,063,920 2.07 58.84 signOf
8,072,396,280 5,694,177,880 2.85 42.85 lohi32
12,666,196,560 10,287,978,160 5.14 23.72 signHalf
13,476,031,320 11,097,812,920 5.55 21.99 twoifs
37,330,209,000 34,951,990,600 17.48 6.98 Oracle Long.signum
F:\Program Files\java\jdk1.6.0_04\jre\bin\java.exe -server com.mindprod.example.TestSignum
java -server
Athlon dual core 3800-1 2GHz 1GB Vista Oracle Java 1.6.0_04 -server
java -server Athlon dual core 3800-1 2GHz 1GB Vista Oracle Java 1.6.0_04 -server Cold
elapsed ns cpu ns call ns SI candidate notes
10,365,912,640 3,405,380,960 1.70 71.65 signOf
10,887,718,160 3,927,186,480 1.96 62.13 piotr
10,916,370,920 3,955,839,240 1.98 61.68 kobzda
10,949,379,320 3,988,847,640 1.99 61.17 wibble
11,818,289,520 4,857,757,840 2.43 50.23 Oracle Long.signum
14,373,547,000 7,413,015,320 3.71 32.92 lohi32
14,923,013,480 7,962,481,800 3.98 30.64 twoifs
15,439,985,360 8,479,453,680 4.24 28.78 signHalf
java -server Athlon dual core 3800-1 2GHz 1GB Vista Oracle Java 1.6.0_04 -server Warm
elapsed ns cpu ns call ns SI candidate notes
10,287,585,920 3,274,022,320 1.64 74.53 signOf Best for Java -server
10,926,949,360 3,913,385,760 1.96 62.35 piotr
10,930,609,120 3,917,045,520 1.96 62.29 wibble
10,963,141,200 3,949,577,600 1.97 61.78 kobzda
11,840,139,800 4,826,576,200 2.41 50.55 Oracle Long.signum
14,332,857,400 7,319,293,800 3.66 33.34 lohi32
14,941,113,040 7,927,549,440 3.96 30.78 twoifs
15,539,811,080 8,526,247,480 4.26 28.62 signHalf
F:\Program Files\java\jdk1.6.0_04\jre\bin\java.exe -server com.mindprod.example.TestSignum
java -client
Athlon dual core 3800-1 2GHz 1GB Vista Oracle Java 1.6.0_04 -client
java -client Athlon dual core 3800-1 2GHz 1GB Vista Oracle Java 1.6.0_04 -client Cold
elapsed ns cpu ns call ns SI candidate notes
15,670,817,720 9,251,776,360 4.63 26.37 kobzda Best for Java -client
16,377,826,160 9,958,784,800 4.98 24.50 wibble
16,478,109,161 10,059,067,801 5.03 24.26 piotr
17,164,541,520 10,745,500,160 5.37 22.71 signHalf
17,889,199,200 11,470,157,840 5.74 21.27 signOf
19,007,007,520 12,587,966,160 6.29 19.38 lohi32
22,342,446,400 15,923,405,040 7.96 15.32 Oracle Long.signum
27,039,031,280 20,619,989,920 10.31 11.83 twoifs
java -client Athlon dual core 3800-1 2GHz 1GB Vista Oracle Java 1.6.0_04 -client Warm
elapsed ns cpu ns call ns SI candidate notes
16,333,153,040 10,183,484,960 5.09 23.96 wibble
17,596,315,320 11,446,647,240 5.72 21.32 piotr
17,604,726,760 11,455,058,680 5.73 21.30 kobzda Oddly, slows down with optimisation.
18,821,540,160 12,671,872,080 6.34 19.26 signHalf
18,935,385,120 12,785,717,040 6.39 19.08 signOf
21,277,398,960 15,127,730,880 7.56 16.13 lohi32
23,869,804,600 17,720,136,520 8.86 13.77 Oracle Long.signum
C:\sys\TestSignum.exe com.mindprod.example.TestSignum JET "Athlon XP 2000+ 1.692GHz 512MB Win2K JET 1.4"
JET Athlon XP 2000+ 1.692GHz 512MB Win2K JET 1.4 Cold
elapsed ns cpu ns call ns SI candidate notes
5,281,784,696 2,365,144,732 1.18 103.16 piotr
5,506,891,519 2,590,251,555 1.30 94.20 kobzda
5,932,717,706 3,016,077,742 1.51 80.90 wibble
7,728,695,127 4,812,055,163 2.41 50.71 signOf
9,421,796,625 6,505,156,661 3.25 37.51 lohi32
14,259,883,868 11,343,243,904 5.67 21.51 signHalf
16,185,063,465 13,268,423,501 6.63 18.39 twoifs
49,186,516,163 46,269,876,199 23.13 5.27 Oracle Long.signum
JET Athlon XP 2000+ 1.692GHz 512MB Win2K JET 1.4 Warm
elapsed ns cpu ns call ns SI candidate notes
5,243,209,123 2,335,581,199 1.17 104.47 piotr
5,479,713,763 2,572,085,839 1.29 94.86 kobzda
5,908,412,103 3,000,784,179 1.50 81.31 wibble
7,736,785,262 4,829,157,338 2.41 50.53 signOf
9,405,542,604 6,497,914,680 3.25 37.55 lohi32
14,295,691,212 11,388,063,288 5.69 21.43 signHalf
16,225,925,362 13,318,297,438 6.66 18.32 twoifs
48,860,881,480 45,953,253,556 22.98 5.31 Oracle Long.signum
F:\Program Files\java\jdk1.5.0_06\jre\bin\java.exe
-server com.mindprod.example.TestSignum java -server Athlon XP 2000+ 1.692GHz 512MB "Win2K Oracle Java1 1.5.0_06 -server"
java -server Athlon XP 2000+ 1.692GHz 512MB Win2K Oracle Java1 1.5.0_06 -server Cold
elapsed ns cpu ns call ns SI candidate notes
11,855,804,578 5,330,495,077 2.67 45.77 lohi32 ironically lohi is much faster before java -server optimises it.
12,970,420,263 6,445,110,762 3.22 37.86 wibble
13,029,356,245 6,504,046,744 3.25 37.52 kobzda
13,102,492,356 6,577,182,855 3.29 37.10 piotr
14,446,682,190 7,921,372,689 3.96 30.80 signOf
17,703,303,073 11,177,993,572 5.59 21.83 twoifs
17,797,852,241 11,272,542,740 5.64 21.65 signHalf
19,075,627,489 12,550,317,988 6.28 19.44 Oracle Long.signum
java -server Athlon XP 2000+ 1.692GHz 512MB Win2K Oracle Java1 1.5.0_06 -server Warm
elapsed ns cpu ns call ns SI candidate notes
10,638,250,951 8,546,872,019 4.27 28.55 signHalf
12,405,689,550 10,314,310,618 5.16 23.66 signOf
12,620,232,460 10,528,853,528 5.26 23.17 lohi32 ironically lohi is much faster before java -server optimises it.
13,899,270,996 11,807,892,064 5.90 20.66 twoifs
14,287,446,588 12,196,067,656 6.10 20.01 kobzda
14,291,250,983 12,199,872,051 6.10 20.00 wibble
14,874,393,254 12,783,014,322 6.39 19.09 piotr
19,402,170,946 17,310,792,014 8.66 14.10 Oracle Long.signum
C:\Program Files\java\jre1.5.0_06\bin\java.exe -client com.mindprod.example.TestSignum
java -client "Athlon XP 2000+ 1.692GHz 512MB Win2K Oracle Java1 1.5.0_06 -client"
java -client Athlon XP 2000+ 1.692GHz 512MB Win2K Oracle Java1 1.5.0_06 -client Cold
elapsed ns cpu ns call ns SI candidate notes
21,099,934,210 10,438,392,311 5.22 23.38 kobzda
22,147,754,533 11,486,212,634 5.74 21.24 piotr
24,760,747,246 14,099,205,347 7.05 17.31 twoifs
24,996,750,704 14,335,208,805 7.17 17.02 signOf
25,543,174,342 14,881,632,443 7.44 16.40 wibble
29,203,051,785 18,541,509,886 9.27 13.16 signHalf
29,446,785,834 18,785,243,935 9.39 12.99 lohi32
31,703,339,391 21,041,797,492 10.52 11.60 Oracle Long.signum
java -client Athlon XP 2000+ 1.692GHz 512MB Win2K Oracle Java1 1.5.0_06 -client Warm
elapsed ns cpu ns call ns SI candidate notes
23,360,680,757 12,805,052,877 6.40 19.05 piotr
23,392,697,117 12,837,069,237 6.42 19.01 kobzda
23,556,589,176 13,000,961,296 6.50 18.77 wibble
24,707,445,220 14,151,817,340 7.08 17.24 twoifs
27,133,499,090 16,577,871,210 8.29 14.72 signOf
27,842,990,101 17,287,362,221 8.64 14.11 signHalf
30,607,897,651 20,052,269,771 10.03 12.17 Oracle Long.signum
30,873,779,488 20,318,151,608 10.16 12.01 lohi32
java.exe com.mindprod.example.TestSignum Eclipse 3.1 Athlon XP 2000+ 1.692 GHz 512MB "Win2K Eclipse 3.1 JDK (Java Development Kit) 1.5.0_06"
Eclipse 3.1 Athlon XP 2000+ 1.692 GHz 512MB Win2K Eclipse 3.1 JDK 1.5.0_06 Cold
elapsed ns cpu ns call ns SI candidate notes
22,594,342,577 8,545,130,457 4.27 28.55 kobzda kobzda looks as if it runs better cold but if you look at the elapsed times, it actually runs better warm. This must be some problem with measuring the overhead.
23,378,348,086 9,329,135,966 4.66 26.15 wibble
23,608,897,779 9,559,685,659 4.78 25.52 piotr
26,063,079,525 12,013,867,405 6.01 20.31 twoifs
29,183,365,205 15,134,153,085 7.57 16.12 signHalf
30,359,019,094 16,309,806,974 8.15 14.96 lohi32
30,661,922,396 16,612,710,276 8.31 14.69 signOf
34,313,033,919 20,263,821,799 10.13 12.04 Oracle Long.signum
Eclipse 3.1 Athlon XP 2000+ 1.692 GHz 512MB Win2K Eclipse 3.1 JDK 1.5.0_06 Warm
elapsed ns cpu ns call ns SI candidate notes
21,875,532,225 10,672,795,564 5.34 22.86 kobzda
23,023,551,317 11,820,814,656 5.91 20.64 twoifs
23,298,356,356 12,095,619,695 6.05 20.17 piotr
23,373,526,803 12,170,790,142 6.09 20.05 wibble
24,893,875,897 13,691,139,236 6.85 17.82 signOf
27,303,411,467 16,100,674,806 8.05 15.15 signHalf
27,691,829,548 16,489,092,887 8.24 14.80 lohi32
33,171,950,066 21,969,213,405 10.98 11.11 Oracle Long.signum

Signum Benchmark Timing Results from Rob Skedgell 64-bit Opteron Machine

Rob’s machine has almost the same raw clock speed as my 32-bit machine but is getting about 3.5 times the performance. I don’t know how much of this is due to hardware, Oracle’s JVM (Java Virtual Machine) and 64-vs 32-bit instruction set.
java.exe com.mindprod.example.TestSignum Oracle java -server 64bit Opteron 244 (x86_64 SMP) 1793MHz 2048MB "Linux/2.6.15-git12-6-smp Java HotSpot(TM) 64-bit Server VM (build 1.5.0_06-b05, mixed mode)"
Oracle java -server 64bit Opteron 244 (x86_64 SMP) 1793MHz 2048MB Linux/2.6.15-git12-6-smp Java HotSpot(TM) 64-bit Server VM (build 1.5.0_06-b05, mixed mode) Cold
elapsed ns cpu ns call ns SI candidate notes
8,723,701,000 1,051,999,000 0.53 231.94 lohi32 Ironically, lohi is 2.25 times faster before java -server optimises it.
9,487,317,000 1,815,615,000 0.91 134.39 wibble
9,504,710,000 1,833,008,000 0.92 133.11 kobzda
9,826,029,000 2,154,327,000 1.08 113.26 piotr
9,855,598,000 2,183,896,000 1.09 111.73 Oracle Long.signum
10,589,746,000 2,918,044,000 1.46 83.62 signOf
14,435,002,000 6,763,300,000 3.38 36.08 signHalf
16,302,964,000 8,631,262,000 4.32 28.27 twoifs
Oracle java -server 64bit Opteron 244 (x86_64 SMP) 1793MHz 2048MB Linux/2.6.15-git12-6-smp Java HotSpot(TM) 64-bit Server VM (build 1.5.0_06-b05, mixed mode) Warm
elapsed ns cpu ns call ns SI candidate notes
4,741,824,000 2,491,146,000 1.25 97.95 signHalf
5,435,073,000 3,184,395,000 1.59 76.62 lohi32
5,643,139,000 3,392,461,000 1.70 71.92 Oracle Long.signum
5,955,541,000 3,704,863,000 1.85 65.86 signOf
6,656,652,000 4,405,974,000 2.20 55.38 twoifs
6,721,896,000 4,471,218,000 2.24 54.57 piotr
7,129,801,000 4,879,123,000 2.44 50.01 wibble
7,143,288,000 4,892,610,000 2.45 49.87 kobzda
java.exe com.mindprod.example.TestSignum Mustang beta java -server 64bit Opteron 244 (x86_64 SMP) 1793MHz 2048MB "Linux/2.6.15-git12-6-smp Java HotSpot(TM) 64-bit Server VM (build 1.6.0-beta-b59g, mixed mode)"
Mustang beta java -server 64bit Opteron 244 (x86_64 SMP) 1793MHz 2048MB Linux/2.6.15-git12-6-smp Java HotSpot(TM) 64-bit Server VM (build 1.6.0-beta-b59g, mixed mode) Cold
elapsed ns cpu ns call ns SI candidate notes
8,948,042,000 698,080,000 0.35 349.53 Oracle Long.signum
9,205,451,000 955,489,000 0.48 255.37 signOf
9,484,249,000 1,234,287,000 0.62 197.68 kobzda
9,494,643,000 1,244,681,000 0.62 196.03 wibble
9,688,812,000 1,438,850,000 0.72 169.58 piotr
13,622,480,000 5,372,518,000 2.69 45.42 lohi32
14,125,964,000 5,876,002,000 2.94 41.52 twoifs
15,621,429,000 7,371,467,000 3.69 33.10 signHalf
Mustang beta java -server 64bit Opteron 244 (x86_64 SMP) 1793MHz 2048MB Linux/2.6.15-git12-6-smp Java HotSpot(TM) 64-bit Server VM (build 1.6.0-beta-b59g, mixed mode) Warm
elapsed ns cpu ns call ns SI candidate notes
8,966,063,000 700,519,000 0.35 348.31 Oracle Long.signum Overall winner in the 64-bit category. It is 40% faster than the best of the competition. The algorithm moved from last place in the 32-bit category to first when run on the sort of hardware and JVM it was designed for. It is a very simple algorithm that needs no optimising to perform well.
9,236,622,000 971,078,000 0.49 251.27 signOf
9,477,793,000 1,212,249,000 0.61 201.28 wibble
9,503,623,000 1,238,079,000 0.62 197.08 kobzda
9,704,868,000 1,439,324,000 0.72 169.52 piotr
13,674,181,000 5,408,637,000 2.70 45.11 lohi32
14,141,835,000 5,876,291,000 2.94 41.52 twoifs
15,650,443,000 7,384,899,000 3.69 33.04 signHalf

Discoveries

what have we discovered from all this attention to triviality?
  1. The best I algorithm I could come up with in assembler was three machine cycles, which should take about 1.5 nanoseconds per call. Piotr’s algorithm under JET beats that with 1.17 nanoseconds. This is a great feather in both Piotr and JET’s cap, doing even better than hand-written assembler.
  2. Comparing their best work, the 32-bit optimisers are best in this order:
    Ranking 32-bit Optimisers From Signum Benchmark Results
    Run Time Warm SI Best Warm Algorithm Cold SI Best Cold Algorithm
    JET 5.0 252.2 piotr 247.88 piotr
    Java -server 74.53 signOf 71.65 signOf
    Java -client 23.96 wibble 26.37 kobzda
  3. fastest code depends on the JVM. You don’t know till you measure the alternatives. The rankings change drastically depending on platform and whether you run warm or cold.
  4. The quality of algorithm for any given platform ranges from twice as fast from slowest to fastest to twenty times as fast. So that choice is crucial.
  5. The fastest 32-bit time was piotr under JET. This is 3.5 times faster than Piotr under Java -client. The fastest 64-bit time overall, was Oracle’s Long. signum, running under Mustang, by a huge margin.
  6. The secret of the speed of the piotr algorithm is it does its work at a low level on ints, not longs. On 32-bit hardware, 64-bit longs are clumsily simulated. It also took advantage of the fact hardware can most easily split a long exactly down the middle. In compiler listing I speculate on ways the algorithm could be implemented in assembler. JET does better than I can do with hand assembler.
  7. Though Oracle’s Long.signum algorithm is terse, it is a pretty much a pig in 32-bit mode. Oracle’s Long. signum is implemented as (int) ((i >> 63) | (-i >>> 63)); It does much better on the 64-bit benchmarks.
  8. Even though piotr, signOf, signHalf and Kobzda each beat Long. signum, they are not true signums, so can’t replace Oracle’s algorithm. However, two ifs beat Oracle Long. signum in nearly every case so should replace it in 32-bit JVMs (Java Virtual Machines).
  9. JRockit doubles its speed between the cold and warm trials. It is thus important to call high level routines many times to exercise them to get them warmed up with inline optimisation, not just the leaf methods. If you only run them once for hours rattling around inside, they may never become candidates for optimisation. I removed the JRockit results since they were done with an older version of the program and it would be misleading to compare them with the new results.
  10. Even with nanosecond accurately timers, benchmarks are not that repeatable. Random background processes presumably are interfering (even in a nominally idle machine) or perhaps even random processing inside the JVM itself.
  11. Lohi was expressly designed for a 32-bit processor with aggressive peephole optimisation. Yet in the 64-bit Java -server 1.5 competition it won, massively, in a 64-bit machine without optimisation. Strange! Even stranger, optimisation made it run 2.25 times slower! I suspect the problem is simply the immaturity of JVM 1.5 64-bit -server mode. It is a 32-bit JVM in 64-bit drag, not through the full sex change yet. I suspect the 1.5 JVM essentially uses a 64-bit machine to emulate a 32-bit one, with pockets of true 64-bit.
  12. Oracle’s team might heed the physician’s advice and above all, do no harm. They are often doing serious damage with their optimisations that fail to take into account the relative speed characteristics of the particular processor model, not just the general architecture. There is little incentive for them to do all that work collecting the lore about different models. If they optimise for Oracle hardware and let the chips fall where they may on other hardware, that gives Oracle hardware an edge running Java benchmarks, which is as it should be, given that this edge is the only thing financing Java development.
  13. Java 1.6.0_04 -client is about half the speed of the 1.5 -client runtime.

The Bottom Line

The bottom line, JET is the best optimiser, with JRockit coming in about ¾ of JET ’s speed, Java -server at 1/2 JET ’s speed and Java -client at 1/4 JET ’s speed.

Piotr, kobzda or signhalf are the best algorithms, depending on platform. Wibble does best before the optimiser has time to kick in.

This is a very narrow test and should not be taken as indicative of overall speed in real world situations.

I would like to thank every who participated and especially congratulate to Peter Kobzda who designed the overall fastest algorithm and the JET compiler team whose optimisation skills embedded in the JET AOT (Ahead Of Time) compiler converted it into the fasted machine code.

If you think you can beat these results, please feel free to submit more entries.

It would be nice to see some JET results on a 64-bit machine (you can use the free or evaluation version).

It would also be nice to see what happens on some high end 64-bit hardware, especially various Oracle workstations. After all, this is the hardware the Java developers were using the tune the JVM code.

Coaxing the Optimiser Along

Both HotSpot and JRockit do hotspot detection and optimizations lazily in the background, so it they can take some time to kick in, especially in a CPU-intensive benchmark like TestSignum. For HotSpot, you can speed up the process by using -Xbatch and with JRockit by using -XXaggressive:opt.

This is why TestSignum does a pretrial, to give the optimiser time to notice what needs optimising and collect statistics. The main task sleeps for 5 seconds after the warmup to give the background optimising threads plenty of time to complete their optimisations. It would be silly to do this in a real application, but it is a reasonable solution for a microbenchmark.

To discover how much warmup time you need with JRockit, run with -Xverbose:opt which prints out trace details on optimizations as they happen. When messages stop appearing, you’ve reached stable state. The time varies from a few seconds (for a microbenchmark) to many minutes (for a large application with thousands of classes).

Atto hard disk benchmark software
CrystalDiskMark: hald disk benchmarking software
Google Caliper: microbenchmarking tool
JET
JMH: tool for writing micro to macro benchmarks
optimiser
optimising
time
time sources
Tutorial on writing microbenchmarks

This page is posted
on the web at:

http://mindprod.com/jgloss/benchmark.html

Optional Replicator mirror
of mindprod.com
on local hard disk J:

J:\mindprod\jgloss\benchmark.html
Canadian Mind Products
Please the feedback from other visitors, or your own feedback about the site.
Contact Roedy. Please feel free to link to this page without explicit permission.

IP:[65.110.21.43]
Your face IP:[3.147.69.25]
You are visitor number