accents : Java Glossary


English, French, German, Italian and Swedish use modified letters such as é (e acute), ê (e circumflex), è (e grave), ç (c cedille). These appear in the range 0x000c to 0x00ff in the Latin-1 supplement part of Unicode.

Eastern European languages have additional accents such as š (s caron) in the range 0x0100 to 0x017f in the Latin Extended-A section of Unicode.

Esperanto has 6 accented letters ĉ (c circumflex), ĝ (g circumflex), ĥ (h circumflex), ĵ (j circumflex), ŝ (s circumflex), û (u circumflex).

Detecting Accented Vowels

com.mindprod.common18.ST. isVowel will tell you if a given character is a vowel, including accented vowels. You can download the source as part of the COMMON18 distributable. This works in JDK (Java Development Kit) 1.+.

Removing Accents

Here is how you can convert accented chars to unaccented ones, in Java version 1.6 or later.


Learning More

Oracle’s Javadoc on Normalizer class : available:
Oracle’s Javadoc on Normalizer.Form : available:

