| UTF (Unicode Transformation unit) BOM (Byte Order Mark) (Byte Order Mark) Unicode-encoding Endian Indicators | |
|---|---|
| 0xfeff BOM as it appears encoded |
Description |
| ef bb bf | UTF-8 endian, strictly speaking does not apply, though it uses big-endian most-significant-bytes first representation. |
| fe ff | UTF-16 for 16-bit internal UCS-2, big endian, Java network order |
| ff fe | UTF-16 for 16-bit internal UCS-2, little endian, Intel/Microsoft order. Note you must examine subsequent bytes to tell this apart from a UTF-32 BOM since they both start ff fe. |
| 00 00 fe ff | UTF-32 for 32-bit internal UCS-4, big-endian, Java network order |
| ff fe 00 00 | UTF-32 for 32-bit internal UCS-4, little endian, Intel/Microsoft order. |
There are also variants of these encodings that have an implied endian marker.
Unfortunately, often applications, even Javac.exe, choke on these byte order marks. Java Readers don’t automatically filter them out. There is not much you can do but manually remove them.
This program tests how Java handles BOM s. It discovers than Java never inserts BOM and it never removes them on its own. You have to bypass, insert and delete them explicitly.
|
|
available on the web at: |
http://mindprod.com/jgloss/bom.html |
optional Replicator mirror
|
J:\mindprod\jgloss\bom.html | |
![]() |
Please email your
feedback for publication,
letters to the editor, errors, omissions, typos, formatting errors, ambiguities, unclear
wording, broken/redirected link reports, suggestions to improve this page or comments to
Roedy Green :
| |
| Blog | Canadian
Mind
Products
IP:[65.110.21.43] Your face IP:[54.224.75.101] |
|
| Feedback | You are visitor number 26,951. | |