classifying characters : Java Glossary

* 0-9 A B C D E F G H I J K L M N O P Q R S T U V W X Y Z (all)

classifying characters

There are some methods in Character for classifying characters: getType, isWhiteSpace, isIdentifierIgnorable, isLetter, isDigit, isUpperCase, isLowercase etc.

These methods are quite complex internally since they deal with the full Unicode character set. If you are dealing only with ASCII (American Standard Code for Information Interchange) characters you can use simpler logic such as:

if ( '0' <= c && c <= '9' )...
   if ( 'a' <= c && c <= 'z' )...
      if ( 'A' <= c && c <= 'Z' )...

An easy way to detect a vowel would be:

if ( "aeiou".indexof ( c ) >= 0 )

The switch statement with cases for each character let the compiler figure out how to efficiently categorise, but the categories must be fixed at compile time.

The traditional classifying method of using a translate table of byte classifications indexed by character consumes 64K per table. It is fast, but gobbles RAM (Random Access Memory). You could use a BitSet to shrink that to 8K. Consider using a HashMap indexed by Character to look up a sparse set of characters. Another technique is to use a binary search table of special characters. You might look inside the Sun character-classifying methods in Character and Collate to learn a few clever tricks.

standard footer
	This page is posted on the web at:	http://mindprod.com/jgloss/classifying.html
	Optional Replicator mirror of mindprod.com on local hard disk J:	J:\mindprod\jgloss\classifying.html
	Please read the feedback from other visitors, or send your own feedback about the site. Contact Roedy. Please feel free to link to this page without explicit permission.
	Canadian Mind Products IP:[65.110.21.43] Your face IP:[18.216.151.52]
Feedback	You are visitor number