endian : Java Glossary

*0-9ABCDEFGHIJKLMNOPQRSTUVWXYZ (all)

endian
Java stores binary values internally and in files MSB (Most Significant Byte) first, i.e. high order part first. This is referred to as big-endian byte sex or sometimes network order. Java binary files, Java sockets and OpenType font files also use big-endian order. What do you do if your data files are in little-endian format as would be the case for most Windows-95 binary files?
Your Options Int History
No Problem! Double CPU Sex
Files Float Four Sexes
Short Byte reverseBytes
Long Nio Learning More
Char Unicode Links

Your Options to Solving the Endian Problem: a Summary

Everything in Java binary format files is stored big-endian, MSB (Most Significant Bit) MSB first. This is sometimes called network order. This is good news. This means if you use only Java, all files are done the same way on all platforms Mac, PC (Personal Computer), Solaris, etc. You can freely exchange binary data electronically over the Internet or on CD/floppy without any concerns about endianness. The problem comes when you must exchange data files with some program not written in Java that uses little-endian order, most commonly C on the PC. Some platforms use big-endian order internally (Mac, IBM (International Business Machines) 390); some use little-endian order (Intel). Java hides that internal endianness from you.

In a binary file, there are no separators between fields. The files are in binary, not readable ASCII (American Standard Code for Information Interchange).

What do you do if you want to read data not in this standard format, usually prepared by some non-Java program?

You have five options:

  1. Rewrite the export program that is providing the imported file. It might export directly in either big-endian binary DataOutputStream or character DataOutputSream format. See binary formats.
  2. Write a separate translator program that reads and rearranges bytes. You could write this in any language.
  3. Read the data as bytes, and rearrange them on the fly.
  4. Use LEDataInputStream, LEDataOutputStream and LERandomAccessFile analogs of DataInputStream, DataOutputStream and RandomAccessFile that work with little-endian binary data. You can read about LEDataStream. You can download the code and source free. You can get help from the File I/O Amanuensis to show you how to use the classes. Just tell it you have little-endian binary data. This is the easiest way.
  5. If you are using Java version 1.4 or later, you can use nio and the ByteBuffer. order( ByteOrder. LITTLE_ENDIAN ) technique. This the most efficient way.

You Probably Don’t Even Have a Problem!

Most people new to Java coming from C think that they need to code differently depending on whether the machine they are using internally represents integers as big or little endian. In Java it does not matter. Further, without resorting to native classes, there is no way you can even tell how they are stored. The JVM (Java Virtual Machine) may store them either way internally but Java is cleverly constructed so that it never matters. Java has no struct I/O and no unions or any of the other endian-sensitive language constructs.

The only time endianness becomes a concern is in communicating with legacy little-endian C/C++ applications.

The following code will produce the same result on either a big or little endian machine:

// take 16-bit short apart into two 8-bit bytes.
short x = 0xabcd;

byte high = (byte)(x >>> 8);

byte low = (byte)x;/* cast implies & 0xff */

out.println( "x=" + x + " high=" + high + " low=" + low );

Reading Little-Endian Binary Files

The most common problem is dealing with files stored in little-endian format.

I had to implement routines parallel to those in java.io. DataInputStream which reads raw binary, in my LEDataInputStream and LEDataOutputStream classes. Don’t confuse this with the java.io.DataInput human-readable character-based file-interchange format.

If you wanted to do it yourself, without the overhead of the full LEDataInputStream and LEDataOutputStream classes, here is the basic technique. If you are not familiar with how to fudge unsigned data in java by masking off the high order bits, you might want to read the unsigned and masking entries first.

Reading Little-Endian Shorts

Presuming your integers are in 2’s complement little-endian format, shorts are pretty easy to handle: Or if you want to get clever and puzzle your readers, you can avoid one mask since the high bits will later be shaved 

Reading Little-Endian Longs

Reading Little-Endian Chars

In a similar way to short we handle char. You can also use Character.reverseBytes in Java version 1.5 or later.

Reading Little-Endian Ints

In a similar way to short we handle int.

Reading Little-Endian Doubles

Floating point doubles are a little trickier. Presuming your data are in IEEE 754 Floating Point little-endian format, you need something like this:

Reading Little-Endian Floats

floats are much like doubles. Again, presuming your data are in IEEE 754 Floating Point little-endian format you need something like this:

Reading Little-Endian Bytes

You don’t need a readByteLittleEndian since the code would be identical to readByte, though you might create one just for consistency:
byte readByteLittleEndian( )
   {
   // 1 byte signed -128 .. 127. Nothing special needed in addition.
   return readByte();
   }
Big and little endian byte data are identical. There is nothing to rearrange. If you wanted to reverse the

Nio and ByteBuffer for handling Little Endian Files

In  Java version 1.4 or later, you can do things like this to deal with little endian data: You can use ByteBuffer.order ( ByteOrder. LITTLE_ENDIAN ) to set the endian byte-sex of the buffer to little endian. Then when you use ByteBuffer. getInt ( int offset ), it will collect the bytes least significant first. Note that the offset is specified in bytes, not ints.

Unicode

Unicode comes in both big and little endian variants. Sometimes the order is marked, sometimes not. For details read about BOMs (Byte Order Marks) in the Unicode entry and read up on all the variant Unicode UTF (Unicode Transformation unit) encodings.

History

In Gulliver’s travels the Lilliputians liked to break their eggs on the small end and the Blefuscudians on the big end. They fought wars over this. There is a computer analogy. Should numbers be stored most or least significant byte first? This is sometimes referred to as byte sex.

Those in the big-endian camp (most significant byte stored first) include the Java VM virtual computer, the Java binary file format, the IBM 360 and follow-on mainframes such as the 390, and the Motorola 68K and most mainframes. The Power PC is endian-agnostic.

Blefuscudians (big-endians) assert this is the way God intended integers to be stored, most important part first. At an assembler level fields of mixed positive integers and text can be sorted as if it were one big text field key. Real programmers read hex dumps, and big-endian is a lot easier to comprehend.

In the little-endian camp (least significant byte first) are the Intel 8080, 8086, 80286, Pentium and follow ons and the MOS 6502 popularised by the Apple ][.

Lilliputians (little-endians) assert that putting the low order part first is more natural because when you do arithmetic manually, you start at the least significant part and work toward the most significant part. This ordering makes writing multi-precision arithmetic easier since you work up not down. It made implementing 8-bit microprocessors easier. At the assembler level (not in Java ) it also lets you cheat and pass addresses of a 32-bit positive ints to a routine expecting only a 16-bit parameter and still have it work. Real programmers read hex dumps, and little-endian is more of a stimulating challenge.

If a machine is word addressable, with no finer addressing supported, the concept of endianness means nothing since words are fetched from RAM (Random Access Memory) in parallel, both ends first.

What Sex Is Your CPU (Central Processing Unit)?

Byte Sex Endianness of CPU s
CPU Endianness Notes
AMD (Advanced Micro Devices) Opteron little 64-bit
AMD Sempron, Athlon, Phenom little 64-bit
AMD Sempron, Thunderbird, Duron, Athlon little 32-bit W95, W98, Me, NT, W2K, XP, W2003, Vista, W2008 and W7-32
Apple ][ 6502 little  
Apple Mac 68000 big Uses Motorola 68000
Apple Power PC big CPU is bisexual but stays big in the Mac OS (Operating System).
ARM both chips used in handhelds and cellphones. Endianness is controlled by a programmable mode bit.
Burroughs 1700, 1800, 1900 ? bit addressable. Used different interpreter firmware instruction sets for each language.
Burroughs B5000 word addressable 48-bits, Algol stack machine, first virtual memory.
Burroughs 7800 word addressable 48-bits, Algol stack machine
CDC (Control Data Corporation) LGP-30 word-addressable only, hence no endianness 31½ bit words. Low order bit must be 0 on the drum, but can be 1 in the accumulator.
CDC 3300, 6600, Cyber word-addressable, so no endianness 60 bits
Compaq (née DEC (Digital Equipment Corporation)) Alpha Servers little  
Cray X1 big endian 64-bit
DEC PDP-11 little 16-bit. However when it stored 32-bit ints, it would store them most significant 16-bit chunk first.
DEC Vax little 32-bit
IBM 360, 370, 380, 390, eSeries, zSeries big 32-bit
IBM 7044, 7090 word addressable 36-bit
IBM AS-400 big 64-bit
Power PC either The endian-agnostic Power-PC’s have a foot in both camps. They are bisexual, but the OS usually imposes one convention or the other, e. g. Mac PowerPCs are big-endian.
IBM Power PC G5 big endian The endian-agnostic pseudo-little-endian mode has been dropped. This caused Microsoft Virtual PC a major headache in emulating the Pentium on a Mac Power PC G5.
Intel 8080, 8988, 8086, 80286 little 16-bit Chips used in PC s
Intel 80386, 80486, Pentium I, II, III, IV little 32-bit, chips used in PC s
Intel 8051 big  
Intel Xeon little 32-bit, used in Unisys Clearpath servers, like a Pentium designed to be used in groups, with 144 extra SIMD (Single Instruction Multiple Data) instructions for web servers.
Intel Itanium either 64-bit
MIPS (Mobile Internet Phone Services) R4000, R5000, R10000 big Used in Silcon Graphics IRIX.
MOS 6502 little MOS 6502 was used in the Apple ][
Motorola 68000, 6800, 6809, 680x0, 68HC11 big Early Macs used the 68000. Amiga.
NCR (National Cash Register) 8500 big  
NCR Century big  
Palm big Motorola 68K or ARM
SGI (Silicon Graphics Inc) MIPS both machines with Cray ancestry are big, with SGI ancestry are little.
Sun Sparc and UltraSparc big Oracle’s Solaris. Normally used as big-endian, but also has support for operating for little-endian mode, including being able to switch endianness under program control for particular loads and stores.
Univac 1100 word-addressable 36-bit words.
Univac 90/30 big IBM 370 clone
Zilog Z80 little Used in CPM (Cost Per thousand/Mille impressions) machines.
If you know the endianness of other CPUs/OSes/platforms please email me at email feedback to Roedy Green or Canadian Mind Products

Four Byte Sexes

In theory data can have two different byte sexes but CPUs can have four. Let us give thanks, in this world of mixed left and right hand drive, that there are not real CPUs with all four sexes to contend with.
The Four Possible Byte Sexes for CPU s
Which Byte
Is Stored in the
Lower-Numbered
Address?
Which Byte
Is Addressed?
Used In
LSB (Least Significant Bit) LSB Intel, AMD, Power PC, DEC.
LSB MSB none that I know of.
MSB LSB Perhaps one of the old word mark architecture machines.
MSB MSB Mac, IBM 390, Power PC

reverseBytes

In Java version 1.5 or later there is a method part of Integer, Long, Short and Char called reverseBytes that will reverse the byte sex. These will be most useful to deal with a handful of little-endian fields. Unfortunately there is no such thing as Float. reverseBytes or  Double.reverseBytes.

Learning More

Oracle’s Javadoc on Integer.reverseBytes : available:
Oracle’s Javadoc on Collections.reverse. Nothing to with reversing bytes. : available:

This page is posted
on the web at:

http://mindprod.com/jgloss/endian.html

Optional Replicator mirror
of mindprod.com
on local hard disk J:

J:\mindprod\jgloss\endian.html
logo
Please the feedback from other visitors, or your own feedback about the site.
Contact Roedy. Please feel free to link to this page without explicit permission.
Blog
IP:[65.110.21.43]
Your face IP:[54.87.93.157]
You are visitor number