Unicode Glyphs
| Unicode 16 and Unicode 32 Glyphs |
| in Downloadable Acrobat PDF Format |
| code |
Description |
code
† = 32 bit |
Description |
| 0000 |
Basic Latin |
2600 |
Miscellaneous Symbols chess, astrology, I-ching, telephones, hazards, religious symbols, hammer and sickle. |
| 0080 |
Latin-1 Supplement accented letters, basic symbols |
2700 |
Dingbats asterisks, ornaments, hands, right-pointing arrows, pencils, scissors, pens. |
| 0100 |
Latin Extended-A Esperanto accented letters |
27C0 |
Miscellaneous Mathematical Symbols-A, including SQL left, right and full joins. |
| 0180 |
Latin Extended-B African |
27F0 |
Supplemental Arrows-A |
| 0250 |
IPA (International Phonetic Alphabet) Extensions |
2800 |
Braille Patterns |
| 02B0 |
Spacing Modifier Letters |
2900 |
Supplemental Arrows-B |
| 0300 |
Combining Diacritical Marks |
2980 |
Miscellaneous Mathematical Symbols-B |
| 0370 |
Greek |
2A00 |
Supplemental Mathematical Operators including variants of + - × ÷ |
| 0400 |
Cyrillic |
2B00 |
Miscellaneous Symbols and Arrows |
| 0500 |
Cyrillic Supplement |
2C00 |
Glagolytic pre Cyrillic Bulgarian |
| 0530 |
Armenian |
2E80 |
CJK Radicals Supplement Chinese Japanese Korean |
| 0590 |
Hebrew |
2F00 |
Kangxi Radicals |
| 0600 |
Arabic |
2FF0 |
Ideographic Description Characters |
| 0700 |
Syriac |
3000 |
CJK Symbols and Punctuation Chinese Japanese Korean |
| 0780 |
Thaana |
3040 |
Hiragana (Japanese) Used when no Kanji character exists. |
| 0900 |
Devanagari: Hindi |
30A0 |
Katakana (Japanese) mainly for foreign names |
| 0980 |
Bengali |
3100 |
Bopomofo: phonetic script for Mandarin |
| 0A00 |
Gurmukhi |
3130 |
Hangul Compatibility Jamo |
| 0A80 |
Gujarati |
3190 |
Kanbun: used by Japanese to annotate classic Chinese |
| 0B00 |
Oriya |
31A0 |
Bopomofo Extended |
| 0B80 |
Tamil |
31F0 |
Katakana Phonetic Extensions |
| 0C00 |
Telugu |
3200 |
Enclosed CJK Letters and Months Chinese Japanese Korean |
| 0C80 |
Kannada |
3300 |
CJK Compatibility Chinese Japanese Korean |
| 0D00 |
Malayalam |
3400 |
CJK Unified Ideographs Extension A Chinese Japanese Korean |
| 0D80 |
Sinhala |
4DC0 |
Yijing Hexagram Symbols |
| 0E00 |
Thai |
4E00 |
CJK Unified Ideographs Chinese Japanese Korean  |
| 0E80 |
Lao |
A000 |
Yi Syllables |
| 0F00 |
Tibetan |
A490 |
Yi Radicals |
| 1000 |
Myanmar |
AC00 |
Hangul Syllables |
| 10A0 |
Georgian |
D800 |
High Surrogates |
| 1100 |
Hangul Jamo |
DC00 |
Low Surrogates |
| 1200 |
Ethiopic |
E000 |
Private Use Area |
| 13A0 |
Cherokee |
F900 |
CJK Compatibility Ideographs Chinese Japanese Korean |
| 1400 |
Canadian Aboriginal Syllabic |
FB00 |
Alphabetic Presentation Forms, ligatures including Hebrew |
| 1680 |
Ogham |
FB50 |
Arabic Presentation Forms-A |
| 16A0 |
Runic |
FE00 |
Variation Selectors, non-printing control characters |
| 1700 |
Tagalog |
FE20 |
Combining Half Marks |
| 1720 |
Hanunoo |
FE30 |
CJK Compatibility Forms Chinese Japanese Korean |
| 1740 |
Buhid |
FE50 |
Small Form Variants |
| 1760 |
Tagbanwa |
FE70 |
Arabic Presentation Forms-B |
| 1780 |
Khmer |
FF00 |
Halfwidth and Fullwidth Forms |
| 1800 |
Mongolian |
FFF0 |
Specials, byte order marks. |
| 1900 |
Limbu |
†0001 0000 |
Linear B Syllabary (32-bit) |
| 1950 |
Tai Le |
†0001 0080 |
Linear B Ideograms (32-bit) |
| 19E0 |
Khmer Symbols |
†0001 0100 |
Aegean Numbers (32-bit) |
| 1D00 |
Phonetic Extensions |
†0001 0300 |
Old Italic (32-bit) |
| 1E00 |
Latin Extended Additional, dotted letters, letters with two accents. |
†0001 0330 |
Gothic (32-bit) |
| 1F00 |
Greek Extended |
†0001 0380 |
Ugaritic Cuneiform (32-bit) |
| 2000 |
General Punctuation |
†0001 0400 |
Deseret Mormon (32-bit) |
| 2070 |
Superscripts and Subscripts |
†0001 0450 |
Shavian (32-bit) |
| 20A0 |
Currency Symbols |
†0001 0480 |
Osmanya: Somalian (32-bit) |
| 20D0 |
Combining Marks for Symbols |
†0001 0800 |
Cypriot Syllabary (32-bit) |
| 2100 |
™ Letterlike Symbols |
†0001 D000 |
Byzantine Musical Symbols (32-bit) |
| 2150 |
Number Forms, Roman Numerals and fractions |
†0001 D100 |
Musical Symbols (32-bit) |
| 2190 |
Arrows |
†0001 D300 |
Tai Xuan Jing Symbols (32-bit) Look like I-Ching hexagrams truncated to four lines. |
| 2200 |
Mathematical Operators, del, grad, element, there exists, for all, union, intersection, contains, dot product, cross product, therefore, square root, logical and, logical or, summation, product. |
†0001 D400 |
Mathematical Alphanumeric Symbols (32-bit) |
| 2300 |
Miscellaneous Technical, APL operators. |
†0002 0000 |
CJK Unified Ideographs Extension B (32-bit) Chinese Japanese Korean  |
| 2400 |
Control Pictures for displaying unprintable ASCII control chararacters. |
†0002 F800 |
CJK Compatibility Ideographs Supp. (32-bit) Chinese Japanese Korean |
| 2440 |
Optical Character Recognition |
†000E 0000 |
Tags, control characters. (32-bit) |
| 2460 |
Enclosed Alphanumerics |
†000E 0100 |
Variation Selectors Supp., non printing control characters (32-bit) |
| 2500 |
Box Drawing |
†000F 0000 |
Supplementary Private Use Area-A (32-bit) |
| 2580 |
Block Elements |
†0010 0000 |
Supplementary Private Use Area-B (32-bit) |
| 25A0 |
Geometric Shapes |
|
|
What Is Unicode?
A 16-bit character encoding used in Java. See
the glyphs, in PDF format. Requires Adobe Acrobat
to view. Also available as ASCII
text file describing the glyphs with cross references to similar glyphs.
Sometimes called UCS or ISO 10646. Unicode allows Java to handle international
characters for most of the world’s living languages, including Arabic,
Armenian, Bengali, Bopomofo, Chinese (via unified Han), Cyrillic, English,
Georgian, Greek, Gujarati, Gurmukhi, Hebrew, Hindi (Devanagari), Japanese (Kanji,
Hiragana and Katakana via unified Han), Kannada, Korean (Hangul via unified Han),
Lao, Maylayalam, Oriya, Tai, Tamil, Telugu, Tibetan… Unicode will make it
much easier for non-English speaking programmers to write programs for English
speaking users and vice versa.
In Java, you get at the exotic characters by encoding them in hex in your
strings like this: "\u00f7\u2713" to
produce ÷ ✓. See String
literals for more details.
In HTML, you get at the exotic characters by encoding them as entities
such as ÷✓ to produce ÷
✓.
Unicode Symbols
There are even codes for:
| apple |
|
'\uf000' unofficial, private use area |
| British pound sign |
£ |
'\u20a4' |
| checkmark |
✓ |
'\u2713' |
| copyright |
© |
'\u00a9' |
| degree |
° |
'\u00b0' |
| dharma wheel |
☸ |
'\u2638' |
| division |
÷ |
'\u00f7' |
| bullet |
• |
'\u2022' |
| euro |
€ |
'\u20ac' |
| female |
♀ |
'\u2640' |
| funeral urn |
⚱ |
'\u26b1' |
| heart |
♥ |
'\u2665' |
| bullet (as mathematical operator) |
∙ |
'\u2219' |
| infinity |
∞ |
'\u221e' |
| integral |
∫ |
'\u222b' |
| male |
♂ |
'\u2642' |
| pi |
π |
'\u03c0' |
| PI |
Π |
'\u03a0' |
| registered trade mark |
® |
'\u00ae' |
| sun |
☀ |
'\u2600' |
| telephone |
☎ |
'\u260e' |
| trademark |
™ |
'\u2122' |
This does not mean your fonts will support all these wonders, of course.
In addition there all kinds of interesting special characters characters such as:
Alphabetic Presentation Forms, APL, Arrows, Bengali, Block Elements, Box Drawing,
Braille Patterns, Byzantine Musical Symbols, Combining Diacritical Marks,
Combining Half Marks, Combining Marks for Symbols, Control Pictures —
icons for control chars, Currency Symbols, Dingbats, Enclosed Alphanumerics,
General Punctuation, Geometric Shapes, Halfwidth and Fullwidth Forms, High
Surrogates, Ideographic Description Characters, IPA Extensions, Letterlike
Symbols, Low Surrogates, Mathematical Alphanumeric Symbols (32 bit Unicode),
Mathematical Operators, Mathematical Symbols, Miscellaneous Symbols (astrology,
chess, playing cards), Miscellaneous Technical (del, grad, integral), Musical
Symbols, Number Forms (e.g. Roman numerals), OCR (Optical
Character Recognition
— the OCR-A MICR characters used in magnetic ink cheque encoding), Old
Italic, Runic, Small Form Variants, Spacing Modifier Letters, Specials,
Superscripts and Subscripts, Tags (letters with price tags), Unified Canadian
Aboriginal Syllabic and Variation Selectors.
Unicode Arrows
There are also arrows:
| ← |
\u2190 |
| ↑ |
\u2191 |
| → |
\u2192 |
| ↓ |
\u2193 |
| ↔ |
\u2194 |
| ↕ |
\u2195 |
| ↢ |
\u21a2 |
| ↬ |
\u21ac |
| ↭ |
\u21ad |
| ↰ |
\u21b0 |
| ↶ |
\u21b6 |
| ⇅ |
\u21c5 |
| ⇎ |
\u21ce |
| ⇐ |
\u21d0 |
| ⇑ |
\u21d1 |
| ⇒ |
\u21d2 |
| ⇓ |
\u21d3 |
| ⇔ |
\u21d4 |
| ⇕ |
\u21d5 |
| ⇜ |
\u21dc |
There are even more arrows defined in Unicode: 2190-21ff,
To use these characters in HTML, you need to code them as &…
entities.
Viewing Unicode Glyphs
Nic Fulton of Reuters has written an Java
Test Applet that can display all 64 thousand Unicode characters including
the Chinese/Korean Han. How many of them actually display on your screen
depends on the font handling ability of your browser and operating system, and
which fonts you have installed. In Java programs, intractable Unicode characters
are represented in the form '\uffff', with four hex
digits. Ordinary characters like 'A' are actually 16-bit
Unicode too.
Creating Unicode Documents
How do you create and edit the various flavours of Unicode documents? You can
create them in some specific encoding then convert
them. To write a little utility to do that read up on encoding and ask the File
I/O Amanuensis for sample code. You can use lowly Notepad in Windows NT/W2K/XP
to edit existing documents but not earlier Windows versions. You would have to
acquire an almost empty Unicode document for getting started with new documents.
It is even clever enough to deal with byte order (endian)
marks. Recent version of MS Word in Windows NT/W2K/XP/W2K3 also work.
Byte Order Marks
There are two different standards, Unicode which assigns glyphs to numbers, and
UTF which describes how you encode these number in a file. Byte order marks are
part of the UTF standard, not the Unicode standard. See
more on BOMs (Byte Order
Marks).
What’s Missing From Unicode?
THere are no Unicode glyphs for the following:
- bold
- italic
- Small caps
- Old style numerals:

- Variant forms for Arabic letters use at the beginnings, middle and ends of words.
Unicode is not concerned with typesetting, just with raw text. In other words,
it is about
characters, (logical letters) not glyphs
(how letters are precisely shaped). Unicode has various flavours of digits, that
look much the same, but they are intended to be used in different
contexts.
To typeset, you need separate fonts to handle such variants, with the letters
encoded with the same unicode character. The word processor automatically
selects the appropriate variant. I don’t know the mechanism by which a
word processor can tell which fonts are related, and which styles and font-weights
each supports. Presumably it is encoded somehow in the font files.
To a large extent ligatures are handled outside
Unicode by automatically combining Unicode characters, though there are a few
ligatures that rate a special Unicode character.
Unicode Editors
Where do Unicode files come from? You can create them with:
- r A custom Java program that uses a FileWriter with UTF-16,
UTF-16BE, UTF-16LE,
or UTF-8 encoding.
- nativetoascii.exe,
Sun’s encoding translation utility.
- Eclipse IDE.
- JEdit: a programmer’s
text editor that also supports a few dozen other encodings, and has piles of
plugins for various purposes, plus syntax highlighting for lots of languages.
You can edit or create UTF-8 or UTF-16 files with windows notepad.
Books
 |
recommend book⇒The Unicode 5.0 Standard |
| | hardcover |
|---|
| ISBN13: | 978-0-321-48091-0 |
|---|
| ISBN10: | 0-321-48091-0 |
|---|
| publisher: | Addison-Wesley |
| published: | 2006-11-19 |
| by: | The Unicode Consortium |
Unicode 5.0 adds the following: - Security mechanisms
- a standard collation algorithm for various national orderings.
- A common locale data repository.
- Improvements to the encoding model for UTF-8.
- Rigorous stability of case folding.
- a systematic framework covering combining characters, Unicode strings, line breaking, and segmentation
|
|