UTF (Unicode Transformation unit)BOM (Byte Order Mark)(Byte Order Mark) Unicode-encoding Endian Indicators | |
---|---|
0xfeff BOM as it appears encoded |
Description |
ef bb bf | UTF-8 endian, strictly speaking does not apply, though it uses big-endian most-significant-bytes first representation. |
fe ff | UTF-16 for 16-bit internal UCS-2, big endian, Java network order |
ff fe | UTF-16 for 16-bit internal UCS-2, little endian, Intel/Microsoft order. Note you must examine subsequent bytes to tell this apart from a UTF-32 BOM since they both start ff fe. |
00 00 fe ff | UTF-32 for 32-bit internal UCS-4, big-endian, Java network order |
ff fe 00 00 | UTF-32 for 32-bit internal UCS-4, little endian, Intel/Microsoft order. |
There are also variants of these encodings that have an implied endian marker.
Unfortunately, often applications, even Javac.exe, choke on these byte order marks. Java Readers don’t automatically filter them out. There is not much you can do but manually remove them.
This program tests how Java handles BOM s. It discovers than Java never inserts BOM and it never removes them on its own. You have to bypass, insert and delete them explicitly.
Here is how I discovered this:
This page is posted |
http://mindprod.com/jgloss/bom.html | |
Optional Replicator mirror
|
J:\mindprod\jgloss\bom.html | |
Please read the feedback from other visitors,
or send your own feedback about the site. Contact Roedy. Please feel free to link to this page without explicit permission. | ||
Canadian
Mind
Products
IP:[65.110.21.43] Your face IP:[3.145.103.169] |
| |
Feedback |
You are visitor number | |