finite state automaton : Java Glossary

The JDisplay Java Applet displays the large program listings on this web page. JDisplay requires an up-to-date browser and Java version 1.8+, preferably 1.8.0_102. If you can’t see the listings, or if you just want to learn more about JDisplay, click  here for help. Use Firefox for best results.

finite state automaton

A finite state automaton is a way of looking at a computer, a way of solving parsing problems and a way of designing special purpose hardware. The automaton at any time in is one of a finite number of internal states. It is a black box. You feed it input, usually a character at a time. It turns its internal crank and applies its rules and goes into a new state and accepts more input. Every once in a while it also produces some output. A simple parser might classify Java source as either java commands or comment. It would have states for having seen /, //, /*, /*… *, /*… */ etc.

Implementations Books
Strategy Links
Sample Code


Lexers and regex packages are tools for creating finite state automatons.

In the days of C, the state was stored as an integer 0 .. n. You had a complicated nest of switch statements that examined both the current state and the input to determine the next state.

In 1.4-, one way of writing finite state automata was to have a singleton class represent each possible state. There is a state variable that represents the current state. You feed the input to a standard method of the class’s interface and it computes the next state. That way states that are very similar can inherit default behaviours. You can have a static method to categorise the input and and a separate instance method to handle each categroy, or use a common method and a switch to dispatch to code based on input category to decide the next state. You don’t need any switch code based on current state. The dynamic method overriding features of Java handle that.

In Java version 1.5 or later, one way to write finite state automata is to use an enum constant to represent each state. A custom next method on each enum constant examines the input and calculates the next state. The various parsers used by JDisplay work using enums. The problem with this approach is you must use statics for variables shared between states. You can’t instantiate several different finite state automata.

Finite state automata come in two flavours: DFA (Deterministic Finite Automaton) and NFA (Non-deterministic Finite Automaton). NFAs (Non-deterministic Finite Automatons) don’t always give the same answer.

You might wonder what use a non-deterministic program could be. Consider integration in two dimensions by the Monte Carlo method. All you need is a way of determining if a point is inside or outside the area you want to integrate. Then you generate random points and count how many are inside and how many outside. The ratio tells you the ratio of the area inside to outside which when applied to the total area gives you an approximation of the area of the region to integrate. You want a random pattern of test points. Any regular pattern could be thrown off by regularities in the shape of the region you are trying to integrate.


Here is my strategy for writing finite state automata.
  1. You need two classes, both enums. One has an enum for each category of character input you process and a method for categorising characters. The other has one enum for each state. Each enum constant implements an abstract method called nextState that takes the category of the next character and the next character as parameters and returns the enum constant for the next state. The nextState method may emit output.
  2. Typically a nextState method will have a switch based on the category of the incoming chararacter.
  3. You start by removing complications to the problem and working out the states and categories for just that simplification. Gradually add the complications which will usually require more categories and states.
  4. If you see a combinatorial explosion in your number of states, factor your finite state automaton and track its state with two or more different state variables, often using simple auxiliary booleans booleans.
  5. Write explicit descriptions of what it means when you are in each of your states. Proofread by focusing on each state and checking off the processing for each category of input, closing your eyes to everything else. It is amazing how easy it is to write code that works first time if you do this. Don’t use default to cover actual categories because it makes it too easy to forget a category.
  6. Keep all your switch statements and states in standard order, usually alphabetical. This will make them easier to find, easier to compare and easier to check for inadvertent deletions.
  7. Don’t implement any auxiliary routines you need, just write stubs. You may find as you work out the implementation, that you will make drastic changes to these methods. There is no point in implementing them until the design specification settles down.
  8. Write some unit tests for your various auxiliary routines.
  9. Put debug code in your main driver routine to dump the current state, the next character and the category of the next character. Run your automaton through on various test cases, just looking for anomalies.
  10. Proof read your state machine by looking for symmetries and repeating patterns. Make sure deviations from the symmetries are intentional.
  11. Test on a small document that contains all the corner (pathological cases) you can think of.
  12. Test on real world documents. I forgot all about comments in my HTML (Hypertext Markup Language) compactor finite state automaton on my first cut. I did not notice the problem until I started testing on real world documents.

Sample Code

Here is the source code for a the Compactor finite state automaton that removes excess white space from HTML to make it more compact for transmission. Compactor: compacts a group of files, a single file or a string.

HTMLCharCategory: categorises characters. HTMLState: the finite state automaton that parse the HTML and decides which spaces and new view


book cover recommend book⇒Introduction to the Theory of Computation, second editionto book home
by Michael Sipser 978-0-534-95250-1 paperback
publisher Course Technology 978-0-534-95097-2 hardcover
published 2005-02-15 978-1-285-40106-5 eBook
  B002L6GJG0 kindle
This is a university textbook that will explain such mysteries and finite state automata and how to convert them into regex expressions. It covers Turing machines. This is about the theory of computing, not a cookbook on how to code anything. This is thin but expensive book. third edition 978-1-133-18779-0 is due 2012-05-30
Australian flag abe books anz abe UK flag
German flag abe UK flag
German flag abe Canadian flag
Spanish flag Canadian flag
Spanish flag Chapters Indigo Canadian flag
French flag abe abe American flag
French flag American flag
Italian flag abe Barnes & Noble American flag
Italian flag Nook at Barnes & Noble American flag
India flag Kobo American flag
UN flag other stores Google play American flag
O’Reilly Safari American flag
Powells American flag
Greyed out stores probably do not have the item in stock. Try looking for it with a bookfinder.

This page is posted
on the web at:

Optional Replicator mirror
on local hard disk J:

Please the feedback from other visitors, or your own feedback about the site.
Contact Roedy. Please feel free to link to this page without explicit permission.

Your face IP:[]
You are visitor number