Unicode™ : Java Glossary

Unicode logo Unicode
Unicode Glyph Ranges BOMs : Byte Order Marks
What Is Unicode? What’s Missing From Unicode?
Symbols Unicode Editors
Arrows Viewer Applet
Hyphens Notepad Unicode
Viewing Glyphs Books
Creating Unicode Documents Links
Unicode Literals in Java

Unicode 6.3 Glyph Ranges

Unicode 16 and Unicode 32 Glyphs
in Downloadable Acrobat PDF (Portable Document Format) Format
hex code
⁶=Unicode 6
size Sample
Glyph
Description
0000 383k A Basic Latin
0080 412k sample glyph from range Latin-1 Supplement: accented letters, basic symbols
0100 191k sample glyph from range Latin Extended-A: Esperanto accented letters
0180 362k Ɖ Latin Extended-B: African
0250 246k sample glyph from range IPA Extensions: International PhoneticAlphabet
02B0 195k ˤ Spacing Modifier Letters
0300 214k Combining Diacritical Marks
0370 281k Ω Greek
0400 242k Д Cyrillic
0500 115k Ԏ Cyrillic Supplement
0530 106k Մ Armenian
0590 109k א Hebrew
0600 172k ص Arabic
0700 91k ܛ Syriac
0780 74k ޘ Tijuana: Maldives
0840 69k Mandic: African
0900 110k sample glyph from range Devanagari: Hindi
0980 103k Bengali
0A00 98k Gurmukhi: Punjabi
0A80 96k Gujarati: Gujarat
0B00 105k Oriya: Odiya Orissa
0B80 136k Tamil: India and Sri Lanka
0C00 137k Telugu: Andhra Pradesh
0C80 122k Kannada: Karnataka
0D00 123k Malayalam: Kerala
0D80 104k Sinhala: Sri Lanka
0E00 100k Thai
0E80 100k Lao
0F00 219k Tibetan
1000 116k Myanmar
10A0 100k Georgian
1100 131k Hangul Jamo: Korean
1200 179k Ethiopic
13A0 85k Cherokee
1400 183k sample glyph from range Canadian Aboriginal Syllabic
1680 106k Ogham: Old Irish
16A0 122k sample glyph from range Runic
1700 73k Tagalog: Philippino
1720 76k Hanunoo: Mindoro in the Philippines
1740 68k Buhid: Mindoro in the Philippines, used to write Tagalog
1760 73k Tagbanwa: Philippines
1780 128k Khmer: Cambodian
1800 146k Mongolian
1900 83k Limbu: Tibet/Burma
1950 72k Tai Le: China
19E0 75k Khmer Symbols: Cambodian
1BC0 69k Batak: Sumatra Indonesia
1D00 250k Phonetic Extensions
1E00 247k Latin Extended Additional: dotted letters, letters with two accents.
1F00 175k Greek Extended
2000 283k General Punctuation
2070 108k Superscripts and Subscripts
20A0 238k sample glyph from range Currency Symbols: including new 20b9 Rupee
20D0 145k Combining Marks for Symbols
2100 276k Letterlike Symbols
2150 184k sample glyph from range Number Forms ⅐ ⅑ ⅒
2190 109k sample glyph from range Arrows
2200 309k sample glyph from range Mathematical Operators: ∇ del, ∈ element, ∃ there exists, ∀ for all, ∪ union, ∩ intersection, ∋ contains member, ⋅ dot product, ∴ therefore, √ square root, ∧ logical and, ∨ logical or, ∑ summation, ∏ product, ≠ not equal, ≤ less or equal
2300 263k sample glyph from range Miscellaneous Technical: APL operators.
2400 88k sample glyph from range Control Pictures: for displaying unprintable ASCII control characters.
2440 73k sample glyph from range Optical Character Recognition
2460 140k sample glyph from range Enclosed Alphanumerics: see Dingbats 2700 for more circled digits.
2500 121k sample glyph from range Box Drawing: single/double lines also triangles
2580 78k sample glyph from range Block Elements
25A0 182k sample glyph from range Geometric Shapes
2600 337k sample glyph from range Miscellaneous Symbols: chess, astrology, I-ching, telephones, hazards, religious symbols, hammer and sickle.
2700 215k sample glyph from range Dingbats: asterisks, ornaments, hands, right-pointing arrows, pencils, scissors, pens. See 2460 for more circled digits.
27C0 150k sample glyph from range Miscellaneous Mathematical Symbols-A: including SQL left, right and full joins.
27F0 95k Supplemental Arrows-A
2800 95k sample glyph from range Braille Patterns
2900 134k Supplemental Arrows-B
2980 196k sample glyph from range Miscellaneous Mathematical Symbols-B
2A00 164k sample glyph from range Supplemental Mathematical Operators: including variants of + - × ÷
2B00 158k sample glyph from range Miscellaneous Symbols and Arrows
2C00 128k Glagolytic: pre Cyrillic Bulgarian
2E80 184k CJK Radicals Supplement: Chinese Japanese Korean
2F00 184k Kangxi Radicals: fragments combined to write Chinese
2FF0 67k Ideographic Description Characters
3000 206k CJK Symbols and Punctuation: Chinese Japanese Korean
3040 142k sample glyph from range Hiragana: (Japanese) Used when no Kanji character exists.
30A0 148k sample glyph from range Katakana: (Japanese) mainly for foreign names
3100 125k Bopomofo: phonetic script for Mandarin
3130 124k Hangul Compatibility Jamo: Korean
3190 124k Kanbun: used by Japanese to annotate classic Chinese
31A0 102k Bopomofo Extended: phonetic script for Mandarin
31F0 84k Katakana Phonetic Extensions: Japanese
3200 250k Enclosed CJK Letters and Months: Chinese Japanese Korean
3300 261k CJK Compatibility: Chinese Japanese Korean
3400 5781k CJK Unified Ideographs Extension A: Chinese Japanese Korean
4DC0 75k Yijing Hexagram Symbols: I Ching symbols
4E00 25871k sample glyph from range CJK Unified Ideographs: Chinese Japanese Korean including Kanji digits 零 一 二 三 四 五 六 七 八 九
A000 424k Yi Syllables: classical Yi language of China
A490 83k Yi Radicals: classical Yi language of China
AB00 79k Ethiopic Extended-A
AC00 701k Hangul Syllables: Korean
D800 23k High Surrogates
DC00 23k Low Surrogates
E000 23k Private Use Area
F900 590k CJK Compatibility Ideographs: Chinese Japanese Korean
FB00 116k sample glyph from range Alphabetic Presentation Forms: ligatures including Hebrew
FB50 236k Arabic Presentation Forms-A
FE00 69k Variation Selectors: non-printing control characters
FE20 82k Combining Half Marks
FE30 129k CJK Compatibility Forms: Chinese, Japanese, Korean vertical brackets
FE50 148k Small Form Variants: small punctuation
FE70 117k Arabic Presentation Forms-B
FF00 274k Halfwidth and Fullwidth Forms: wide and narrow letters, digits and punctuation
FFF0 72k Specials: byte order marks.
00010000 93k Linear B Syllabary ancient Cretan
00010080 123k Linear B Ideograms
00010100 84k Aegean Numbers
00010300 102k Old Italic
00010330 97k Gothic
00010380 100k sample glyph from range Ugaritic: Cuneiform
00010400 108k 𐐁 Deseret: Mormon
00010450 112k 𐑻 Shavian: George Bernard Shaw’s alphabet
00010480 102k 𐒁 Osmanyav: Somalian
00010800 106k Cypriot Syllabary
00011000 81k Brahmi: ancient Indian scripts
00016800 322k Bamum Supplement: Cameroons
0001B000 95k Kana Supplement: Japanese
0001D000 230k Byzantine Musical Symbols
0001D100 172k sample glyph from range Musical Symbols
0001D300 125k 𝍎 Tai Xuan Jing Symbols: Look like I-Ching hexagrams truncated to four lines.
0001D400 418k sample glyph from range Mathematical Alphanumeric Symbols
0001F0A0 106k sample glyph from range Playing Cards
0001F300 625k sample glyph from range Miscellaneous symbols and pictographs
0001F600 119k sample glyph from range Emoticons
0001F680 130k sample glyph from range Transport and Map Symbols
0001F700 193k sample glyph from range Alchemical symbols
00020000 28317k sample glyph from range CJK Unified Ideographs Extension B: Chinese Japanese Korean
0002B740 212k CJK Unified Ideographs Extension D: Chinese Japanese Korean
0002F800 548k CJK Compatibility Ideographs Supp.: Chinese Japanese Korean
000E0000 136k sample glyph from range Tags: control characters.
000E0100 84k Variation Selectors Supp.: non printing control characters
000F0000 23k Supplementary Private Use Area-A
00100000 23k Supplementary Private Use Area-B

What Is Unicode?

Informally, Unicode is a 16-bit character encoding, with surrogate pairs to handle 32-bit, used internally in programs written in Java Java. More precisely, Unicode is not a character encoding, but a 32-bit character set. UTF-8, UTF-16 and UTF-32 are character encodings in which the Unicode character set can be encoded.

See the example glyphs, in PDF format. Requires Adobe Acrobat to view. Also available as ASCII text file describing the glyphs with cross references to similar glyphs. Unicode does not standardise the precise shapes of the letters, i.e. the glyphs. It does, however, provide example glyphes. This distinction is most important for Hangul which encodes Chinese, Japanese and Korean. They use the same Unicode encodings, but quite different looking renderings of the characters. These differences are handled by the font designer who uses Chinese, Japnese or Korean style.

Sometimes called UCS (Universal Character Set) or ISO (International Standards Organisation) 10646. Unicode allows Java to handle international characters for most of the world’s living languages, including Arabic, Armenian, Bengali, Bopomofo, Chinese (via unified Han), Cyrillic, English, Georgian, Greek, Gujarati, Gurmukhi, Hebrew, Hindi (Devanagari), Japanese (Kanji, Hiragana and Katakana via unified Han), Kannada, Korean (Hangul via unified Han), Lao, Maylayalam, Oriya, Tai, Tamil, Telugu, Tibetan… Unicode will make it much easier for non-English speaking programmers to write programs for English speaking users and vice versa.

To get musical symbols you need 32-bit Unicode support.

In Java, you get at the exotic characters by encoding them in hex in your strings like this: \u00f7\u2713 to produce ÷ ✓. See String literals for more details.

In HTML (Hypertext Markup Language), you get at the exotic characters by encoding them as entities such as ÷✓ to produce ÷ ✓.

Unicode Symbols

There are even codes for:
apple '\uf000' unofficial, private use area
British pound sign £ '\u20a4'
checkmark '\u2713'
copyright © '\u00a9'
degree ° '\u00b0'
dharma wheel '\u2638'
division ÷ '\u00f7'
bullet '\u2022'
euro '\u20ac'
female '\u2640'
funeral urn '\u26b1'
heart '\u2665'
bullet (as mathematical operator) '\u2219'
infinity '\u221e'
integral '\u222b'
male '\u2642'
pi π '\u03c0'
PI Π '\u03a0'
registered trade mark ® '\u00ae'
sun '\u2600'
telephone '\u260e'
trademark '\u2122'
This does not mean your fonts will support all these wonders, of course.

In addition there all kinds of interesting special characters such as: Alphabetic Presentation Forms, APL (A Programming Language), Arrows, Bengali, Block Elements, Box Drawing, Braille Patterns, Byzantine Musical Symbols, Combining Diacritical Marks, Combining Half Marks, Combining Marks for Symbols, Control Pictures — icons for control chars, Currency Symbols, Dingbats, Enclosed Alphanumerics, General Punctuation, Geometric Shapes, Halfwidth and Fullwidth Forms, High Surrogates, Ideographic Description Characters, IPA (International Phonetic Alphabet) Extensions, Letterlike Symbols, Low Surrogates, Mathematical Alphanumeric Symbols (32-bit Unicode), Mathematical Operators, Mathematical Symbols, Miscellaneous Symbols (astrology, chess, playing cards), Miscellaneous Technical (del, grad, integral), Musical Symbols, Number Forms (e.g. Roman numerals), OCR (Optical Character Recognition) — the OCR-A (Optical Character Recognition font-A) MICR (Magnetic Ink Character Recognition) characters used in magnetic ink cheque encoding), Old Italic, Runic, Small Form Variants, Spacing Modifier Letters, Specials, Superscripts and Subscripts, Tags (letters with price tags), Unified Canadian Aboriginal Syllabic and Variation Selectors.

Unicode Arrows

There are also arrows:
\u2190
\u2191
\u2192
\u2193
\u2194
\u2195
\u21a2
\u21ac
\u21ad
\u21b0
\u21b6
\u21c5
\u21ce
\u21d0
\u21d1
\u21d2
\u21d3
\u21d4
\u21d5
\u21dc
There are even more arrows defined in Unicode: 2190-21ff, To use these characters in HTML, you need to code them as &… entities.

Hyphens

There are also are variety of hyphen characters:
- \u2d hyphen-minus
­ \uad soft-hyphen
\u2010 hyphen
\u2011 non-breaking hyphen
\u2012 figure dash hyphen
\u2013 en dash hyphen
\u2014 em dash hyphen
\u2212 minus sign
𐆑 0x10191 (\ud835\udd04) roman uncia sign

Viewing Unicode Glyphs

Nic Fulton of Reuters has written an Java Test Applet that can display all 64 thousand Unicode characters including the Chinese/Korean Han. How many of them actually display on your screen depends on the font handling ability of your browser and operating system, and which fonts you have installed. In Java programs, intractable Unicode characters are represented in the form '\uffff', with four hex digits. Ordinary characters like 'A' are actually 16-bit Unicode too.

Creating Unicode Documents

How do you create and edit the various flavours of Unicode documents? You can create them in some specific encoding then convert them. To write a little utility to do that read up on encoding and ask the File I/O Amanuensis for sample code. You can use lowly Notepad in Windows NT/W2K/XP to edit existing documents but not earlier Windows versions. You would have to acquire an almost empty Unicode document for getting started with new documents. It is even clever enough to deal with byte order (endian) marks. Recent version of MS Word in Windows NT/W2K/XP/W2K3 also work.

Java

See the literals section for a full explanation of how to code 16-bit Unicode characters in Java programs.

Java does not have 32-bit String literals, like C style code points e.g. \U0001d504. Note the capital U vs the usual \ud504 I wrote the SurrogatePair applet to convert C-style code points to arcane surrogate pairs to let you use 32-bit Unicode glyphs in your programs.

Byte Order Marks

There are two different standards, Unicode which assigns glyphs to numbers, and UTF (Unicode Transformation unit) which describes how you encode these number in a file. Byte order marks are part of the UTF standard, not the Unicode standard. See more on BOMs (Byte Order Marks).

What’s Missing From Unicode?

There are no Unicode glyphs for the following: Unicode is not concerned with typesetting, just with raw text. In other words, it is about characters, (logical letters) not glyphs (how letters are precisely shaped). Unicode has various flavours of digits, that look much the same, but they are intended to be used in different contexts.

To typeset, you need separate fonts to handle such variants, with the letters encoded with the same Unicode character. The word processor automatically selects the appropriate variant. I don’t know the mechanism by which a word processor can tell which fonts are related, and which styles and font-weights each supports. Presumably it is encoded somehow in the font files.

To a large extent ligatures are handled outside Unicode by automatically combining Unicode characters, though there are a few ligatures that rate a special Unicode character.

Unicode Editors

Where do Unicode files come from? You can create them with: You can edit or create UTF-8 or UTF-16 files with windows notepad.

Unicode 6.3

Unicode 6.3 is the latest version of the Unicode Standard. JDK (Java Development Kit) 1.8.0_05 supports it.

Books

book cover recommend book⇒The Unicode 5.0 Standardto book home
by The Unicode Consortium 978-0-321-48091-0 hardcover
birth 1991 age: 22
publisher Addison-Wesley
published 2006-11-19
Unicode 5.0 adds the following:
  • Security mechanisms
  • a standard collation algorithm for various national orderings.
  • A common locale data repository.
  • Improvements to the encoding model for UTF-8.
  • Rigorous stability of case folding.
  • a systematic framework covering combining characters, Unicode strings, line breaking, and segmentation
Australian flag abe books anz abe books.co.uk UK flag
Chinese flag amazon.cn amazon.co.uk UK flag
German flag abe books.de abe books.ca Canadian flag
German flag amazon.de amazon.ca Canadian flag
Spanish flag amazon.es Chapters Indigo Canadian flag
Spanish flag iberlibro.com abe books.com American flag
French flag abe books.fr amazon.com American flag
French flag amazon.fr Barnes & Noble American flag
Italian flag abe books.it Google play American flag
Italian flag amazon.it O’Reilly Safari American flag
India flag junglee.com Powells American flag
UN flag Kobo other stores UN flag
Greyed out stores probably do not have the item in stock. Try looking for it with a bookfinder.

available on the web at:

http://mindprod.com/jgloss/unicode.html
ClustrMaps is down

optional Replicator mirror
of mindprod.com
on local hard disk J:

J:\mindprod\jgloss\unicode.html
logo
Please the feedback from other visitors, or your own feedback about the site.
Contact Roedy.
Blog
IP:[65.110.21.43]
Your face IP:[54.204.215.209]
You are visitor number 260,409.