classifying characters : Java Glossary

*0-9ABCDEFGHIJKLMNOPQRSTUVWXYZ (all)

classifying characters
There are some methods in Character for classifying characters: getType, isWhiteSpace, isIdentifierIgnorable, isLetter, isDigit, isUpperCase, isLowercase etc.

These methods are quite complex internally since they deal with the full Unicode character set. If you are dealing only with ASCII (American Standard Code for Information Interchange) characters you can use simpler logic such as:

if ( '0' <= c && c <= '9' )...
   if ( 'a' <= c && c <= 'z' )...
      if ( 'A' <= c && c <= 'Z' )...

An easy way to detect a vowel would be:

if ( "aeiou".indexof ( c ) >= 0 )

The switch statement with cases for each character let the compiler figure out how to efficiently categorise, but the categories must be fixed at compile time.

The traditional classifying method of using a translate table of byte classifications indexed by character consumes 64K per table. It is fast, but gobbles RAM (Random Access Memory). You could use a BitSet to shrink that to 8K. Consider using a HashMap indexed by Character to look up a sparse set of characters. Another technique is to use a binary search table of special characters. You might look inside the Sun character-classifying methods in Character and Collate to learn a few clever tricks.


This page is posted
on the web at:

http://mindprod.com/jgloss/classifying.html

Optional Replicator mirror
of mindprod.com
on local hard disk J:

J:\mindprod\jgloss\classifying.html
Canadian Mind Products
Please the feedback from other visitors, or your own feedback about the site.
Contact Roedy. Please feel free to link to this page without explicit permission.

IP:[65.110.21.43]
Your face IP:[18.118.226.246]
You are visitor number