Internally, Java uses 16-bit characters. Unicode has been extended to include some 32-bit characters (actually only 20-bit at this point). Instead of flipping to RAM-gobbling 32-bit characters, Sun decided to handle the new characters with a pair of 16-bit characters. The added support for them in a half-hearted way.
Java does not even have 32-bit String literals, like C style code points e.g. \U0001d504. Note the capital U vs the usual \ud504 I wrote the SurrogatePair applet to convert C-style code points to arcane surrogate pairs to let you use 32-bit Unicode glyphs in your programs.
To pull this off, Unicode reserves two bands of 16-bit characters for use in encoding the high characters.
This page is posted |
http://mindprod.com/jgloss/surrogatepair.html | |
Optional Replicator mirror
|
J:\mindprod\jgloss\surrogatepair.html | |
Please read the feedback from other visitors,
or send your own feedback about the site. Contact Roedy. Please feel free to link to this page without explicit permission. | ||
Canadian
Mind
Products
IP:[65.110.21.43] Your face IP:[3.145.12.181] |
| |
Feedback |
You are visitor number | |