Unicode Glyphs
| Unicode 16 and Unicode 32 Glyphs |
| in Downloadable Acrobat PDF Format |
| code |
Description |
code
† = 32-bit |
Description |
| 0000 |
Basic Latin |
2600 |
Miscellaneous Symbols chess, astrology, I-ching, telephones, hazards, religious symbols, hammer and sickle. |
| 0080 |
Latin-1 Supplement: accented letters, basic symbols |
2700 |
Dingbats: asterisks, ornaments, hands, right-pointing arrows, pencils, scissors, pens. |
| 0100 |
Latin Extended-A: Esperanto accented letters |
27C0 |
Miscellaneous Mathematical Symbols-A: including SQL left, right and full joins. |
| 0180 |
Latin Extended-B African |
27F0 |
Supplemental Arrows-A |
| 0250 |
IPA (International Phonetic Alphabet) Extensions |
2800 |
Braille Patterns |
| 02B0 |
Spacing Modifier Letters |
2900 |
Supplemental Arrows-B |
| 0300 |
Combining Diacritical Marks |
2980 |
Miscellaneous Mathematical Symbols-B |
| 0370 |
Greek |
2A00 |
Supplemental Mathematical Operators: including variants of + - × ÷ |
| 0400 |
Cyrillic |
2B00 |
Miscellaneous Symbols and Arrows |
| 0500 |
Cyrillic Supplement |
2C00 |
Glagolytic: pre Cyrillic Bulgarian |
| 0530 |
Armenian |
2E80 |
CJK Radicals Supplement: Chinese Japanese Korean |
| 0590 |
Hebrew |
2F00 |
Kangxi Radicals: fragments combined to write Chinese |
| 0600 |
Arabic |
2FF0 |
Ideographic Description Characters |
| 0700 |
Syriac |
3000 |
CJK Symbols and Punctuation: Chinese Japanese Korean |
| 0780 |
Thaana: Maldives |
3040 |
Hiragana: (Japanese) Used when no Kanji character exists. |
| 0900 |
Devanagari: Hindi |
30A0 |
Katakana: (Japanese) mainly for foreign names |
| 0980 |
Bengali |
3100 |
Bopomofo: phonetic script for Mandarin |
| 0A00 |
Gurmukhi: Punjabi |
3130 |
Hangul Compatibility Jamo: Korean |
| 0A80 |
Gujarati: Gujarat |
3190 |
Kanbun: used by Japanese to annotate classic Chinese |
| 0B00 |
Oriya: Odiya Orissa |
31A0 |
Bopomofo Extended: phonetic script for Mandarin |
| 0B80 |
Tamil |
31F0 |
Katakana Phonetic Extensions: Japanese |
| 0C00 |
Telugu: Andhra Pradesh |
3200 |
Enclosed CJK Letters and Months: Chinese Japanese Korean |
| 0C80 |
Kannada: Karnataka |
3300 |
CJK Compatibility: Chinese Japanese Korean |
| 0D00 |
Malayalam: Kerala |
3400 |
CJK Unified Ideographs Extension A: Chinese Japanese Korean |
| 0D80 |
Sinhala: Sri Lanka |
4DC0 |
Yijing Hexagram Symbols: I Ching |
| 0E00 |
Thai |
4E00 |
CJK Unified Ideographs: Chinese Japanese Korean including Kanji digits 零 一 二 三 四 五 六 七 八 九 |
| 0E80 |
Lao |
A000 |
Yi Syllables |
| 0F00 |
Tibetan |
A490 |
Yi Radicals |
| 1000 |
Myanmar |
AC00 |
Hangul Syllables: Korean |
| 10A0 |
Georgian |
D800 |
High Surrogates |
| 1100 |
Hangul Jamo: Korean |
DC00 |
Low Surrogates |
| 1200 |
Ethiopic |
E000 |
Private Use Area |
| 13A0 |
Cherokee |
F900 |
CJK Compatibility Ideographs: Chinese Japanese Korean |
| 1400 |
Canadian Aboriginal Syllabic |
FB00 |
Alphabetic Presentation Forms: ligatures including Hebrew |
| 1680 |
Ogham: Old Irish |
FB50 |
Arabic Presentation Forms-A |
| 16A0 |
Runic |
FE00 |
Variation Selectors: non-printing control characters |
| 1700 |
Tagalog: Philippino |
FE20 |
Combining Half Marks |
| 1720 |
Hanunoo: Mindoro in the Philippines |
FE30 |
CJK Compatibility Forms: Chinese Japanese Korean |
| 1740 |
Buhid: Mindoro in the Philippines |
FE50 |
Small Form Variants: small punctuation |
| 1760 |
Tagbanwa: Philippines |
FE70 |
Arabic Presentation Forms-B |
| 1780 |
Khmer: Cambodian |
FF00 |
Halfwidth and Fullwidth Forms |
| 1800 |
Mongolian |
FFF0 |
Specials: byte order marks. |
| 1900 |
Limbu: Tibet/Burma |
†0001 0000 |
Linear B Syllabary (32-bit) |
| 1950 |
Tai Le: China |
†0001 0080 |
Linear B Ideograms (32-bit) |
| 19E0 |
Khmer Symbols: Cambodian |
†0001 0100 |
Aegean Numbers: (32-bit) |
| 1D00 |
Phonetic Extensions |
†0001 0300 |
Old Italic: (32-bit) |
| 1E00 |
Latin Extended Additional: dotted letters, letters with two accents. |
†0001 0330 |
Gothic: (32-bit) |
| 1F00 |
Greek Extended |
†0001 0380 |
Ugaritic Cuneiform (32-bit) |
| 2000 |
General Punctuation |
†0001 0400 |
Deseret: Mormon: (32-bit) |
| 2070 |
Superscripts and Subscripts |
†0001 0450 |
Shavian: (32-bit) |
| 20A0 |
Currency Symbols |
†0001 0480 |
Osmanya: Somalian (32-bit) |
| 20D0 |
Combining Marks for Symbols |
†0001 0800 |
Cypriot Syllabary (32-bit) |
| 2100 |
™ Letterlike Symbols |
†0001 D000 |
Byzantine Musical Symbols: (32-bit) |
| 2150 |
Number Forms: Roman Numerals and fractions |
†0001 D100 |
Musical Symbols: (32-bit) |
| 2190 |
Arrows |
†0001 D300 |
Tai Xuan Jing Symbols (32-bit) Look like I-Ching hexagrams truncated to four lines. |
| 2200 |
Mathematical Operators: del, grad, element, there exists, for all, union, intersection, contains, dot product, cross product, therefore, square root, logical and, logical or, summation, product. |
†0001 D400 |
Mathematical Alphanumeric Symbols: (32-bit) |
| 2300 |
Miscellaneous Technical: APL operators. |
†0002 0000 |
CJK Unified Ideographs Extension B: (32-bit) Chinese Japanese Korean  |
| 2400 |
Control Pictures: for displaying unprintable ASCII: control chararacters. |
†0002 F800 |
CJK Compatibility Ideographs Supp.: (32-bit) Chinese Japanese Korean |
| 2440 |
Optical Character Recognition |
†000E 0000 |
Tags: control characters. (32-bit) |
| 2460 |
Enclosed Alphanumerics |
†000E 0100 |
Variation Selectors Supp.: non printing control characters (32-bit) |
| 2500 |
Box Drawing, also triangles |
†000F 0000 |
Supplementary Private Use Area-A: (32-bit) |
| 2580 |
Block Elements |
†0010 0000 |
Supplementary Private Use Area-B: (32-bit) |
| 25A0 |
Geometric Shapes |
|
|
What Is Unicode?
A 16-bit character encoding used in Java. See the example glyphs, in PDF format. Requires Adobe
Acrobat to view. Also available as ASCII text file
describing the glyphs with cross references to similar glyphs. Unicode does not standardise the precise shapes of
the letters, i.e. the glyphs. It does, however, provide example glyphes. This distinction is most important for
Hangul which encodes Chinese, Japanese and Korean. They use the same Unicode encodings, but quite different
looking renderings of the characters. These differences are handled by the font designer who uses Chinese,
Japnese or Korean style.
Sometimes called UCS or ISO 10646. Unicode allows Java to handle international characters for most of the
world’s living languages, including Arabic, Armenian, Bengali, Bopomofo, Chinese (via unified Han),
Cyrillic, English, Georgian, Greek, Gujarati, Gurmukhi, Hebrew, Hindi (Devanagari), Japanese (Kanji, Hiragana and
Katakana via unified Han), Kannada, Korean (Hangul via unified Han), Lao, Maylayalam, Oriya, Tai, Tamil, Telugu,
Tibetan… Unicode will make it much easier for non-English speaking programmers to write programs for
English speaking users and vice versa.
In Java, you get at the exotic characters by encoding them in hex in your strings like this: "\u00f7\u2713" to produce ÷ ✓. See String literals for more details.
In HTML, you get at the exotic characters by encoding them as entities such as ÷✓ to produce
÷ ✓.
Unicode Symbols
There are even codes for:
| apple |
|
'\uf000' unofficial, private use area |
| British pound sign |
£ |
'\u20a4' |
| checkmark |
✓ |
'\u2713' |
| copyright |
© |
'\u00a9' |
| degree |
° |
'\u00b0' |
| dharma wheel |
☸ |
'\u2638' |
| division |
÷ |
'\u00f7' |
| bullet |
• |
'\u2022' |
| euro |
€ |
'\u20ac' |
| female |
♀ |
'\u2640' |
| funeral urn |
⚱ |
'\u26b1' |
| heart |
♥ |
'\u2665' |
| bullet (as mathematical operator) |
∙ |
'\u2219' |
| infinity |
∞ |
'\u221e' |
| integral |
∫ |
'\u222b' |
| male |
♂ |
'\u2642' |
| pi |
π |
'\u03c0' |
| PI |
Π |
'\u03a0' |
| registered trade mark |
® |
'\u00ae' |
| sun |
☀ |
'\u2600' |
| telephone |
☎ |
'\u260e' |
| trademark |
™ |
'\u2122' |
This does not mean your fonts will support all these wonders, of course.
In addition there all kinds of interesting special characters characters such as: Alphabetic Presentation
Forms, APL, Arrows, Bengali, Block Elements, Box Drawing, Braille Patterns, Byzantine Musical Symbols, Combining
Diacritical Marks, Combining Half Marks, Combining Marks for Symbols, Control Pictures — icons for control
chars, Currency Symbols, Dingbats, Enclosed Alphanumerics, General Punctuation, Geometric Shapes, Halfwidth and
Fullwidth Forms, High Surrogates, Ideographic Description Characters, IPA Extensions, Letterlike Symbols, Low
Surrogates, Mathematical Alphanumeric Symbols (32-bit Unicode), Mathematical
Operators, Mathematical Symbols, Miscellaneous Symbols (astrology, chess, playing cards), Miscellaneous Technical
(del, grad, integral), Musical Symbols, Number Forms (e.g. Roman numerals), OCR (Optical
Character Recognition — the OCR-A MICR characters used in
magnetic ink cheque encoding), Old Italic, Runic, Small Form Variants, Spacing Modifier Letters, Specials,
Superscripts and Subscripts, Tags (letters with price tags), Unified Canadian Aboriginal Syllabic and Variation
Selectors.
Unicode Arrows
There are also arrows:
| ← |
\u2190 |
| ↑ |
\u2191 |
| → |
\u2192 |
| ↓ |
\u2193 |
| ↔ |
\u2194 |
| ↕ |
\u2195 |
| ↢ |
\u21a2 |
| ↬ |
\u21ac |
| ↭ |
\u21ad |
| ↰ |
\u21b0 |
| ↶ |
\u21b6 |
| ⇅ |
\u21c5 |
| ⇎ |
\u21ce |
| ⇐ |
\u21d0 |
| ⇑ |
\u21d1 |
| ⇒ |
\u21d2 |
| ⇓ |
\u21d3 |
| ⇔ |
\u21d4 |
| ⇕ |
\u21d5 |
| ⇜ |
\u21dc |
There are even more arrows defined in Unicode: 2190-21ff, To use these characters in HTML, you need to code
them as &… entities.
Viewing Unicode Glyphs
Nic Fulton of Reuters has written an Java Test Applet that
can display all 64 thousand Unicode characters including the Chinese/Korean Han. How many of them actually
display on your screen depends on the font handling ability of your browser and operating system, and
which fonts you have installed. In Java programs, intractable Unicode characters are represented in the form
'\uffff', with four hex digits. Ordinary characters like 'A' are actually 16-bit Unicode too.
Creating Unicode Documents
How do you create and edit the various flavours of Unicode documents? You can create them in some specific
encoding then convert them. To write a little utility to do that read up on encoding
and ask the File I/O Amanuensis for sample code. You can use
lowly Notepad in Windows NT/W2K/XP to edit existing documents but not earlier Windows versions. You would have to
acquire an almost empty Unicode document for getting started with new documents. It is even clever enough to deal
with byte order (endian) marks. Recent version of MS Word in Windows NT/W2K/XP/W2K3
also work.
Byte Order Marks
There are two different standards, Unicode which assigns glyphs to numbers, and UTF which describes how you
encode these number in a file. Byte order marks are part of the UTF standard, not the
Unicode standard. See more on BOMs (Byte Order Marks).
What’s Missing From Unicode?
THere are no Unicode glyphs for the following:
- bold
- italic
- Small caps
- Old style numerals:

- Variant forms for Arabic letters use at the beginnings, middle and ends of words.
Unicode is not concerned with typesetting, just with raw text. In other words, it is about characters, (logical
letters) not glyphs (how letters are precisely shaped). Unicode has various flavours of
digits, that look much the same, but they are intended to be used in different contexts.
To typeset, you need separate fonts to handle such variants, with the letters encoded with the same unicode
character. The word processor automatically selects the appropriate variant. I don’t know the mechanism by
which a word processor can tell which fonts are related, and which styles and font-weights each supports.
Presumably it is encoded somehow in the font files.
To a large extent ligatures are handled outside Unicode by automatically combining
Unicode characters, though there are a few ligatures that rate a special Unicode character.
Unicode Editors
Where do Unicode files come from? You can create them with:
- r A custom Java program that uses a FileWriter with UTF-16, UTF-16BE, UTF-16LE, or
UTF-8 encoding.
- nativetoascii.exe, Sun’s encoding
translation utility.
- Eclipse IDE.
- JEdit: a programmer’s text editor that also
supports a few dozen other encodings, and has piles of plugins for various purposes, plus syntax highlighting
for lots of languages.
You can edit or create UTF-8 or UTF-16 files with windows notepad.
Books
 |
recommend book⇒The Unicode 5.0 Standard |
| | hardcover |
|---|
| ISBN13: | 978-0-321-48091-0 |
|---|
| publisher: | Addison-Wesley |
| published: | 2006-11-19 |
| by: | The Unicode Consortium |
Unicode 5.0 adds the following: - Security mechanisms
- a standard collation algorithm for various national orderings.
- A common locale data repository.
- Improvements to the encoding model for UTF-8.
- Rigorous stability of case folding.
- a systematic framework covering combining characters, Unicode strings, line breaking, and segmentation
|
|