parser : Java Glossary

* 0-9 A B C D E F G H I J K L M N O P Q R S T U V W X Y Z (all)

parser

a program that analyses syntax. It might for example look at a piece of Java source code and find all the variable names, method names and operators in order to compile it into JVM (Java Virtual Machine) byte code, or it might analyse HTML (Hypertext Markup Language), or your own invented language. The original LEX/YACC/Bison generated C code. There are now variants that generate Java code. My personal favourite, based mainly on the accessible documentation is JavaCC. People who write parsers have a strange language all their own. The writers of these tools are academics and are not interested in teaching you anything, just impressing you with how brilliant their programs are. This means the manuals are almost useless. You have to study examples, particularly the simple ones and gradually the manuals will begin to make sense. Another learning technique is to examine the Java code generated from some sample grammars. Authors took six years of university courses to get to their level of parser understanding, why should they make it any easier for you?
Roughly what happens is you describe your grammar in some Mickey Mouse syntax. Then a utility converts that into a Java program that will analyse text conforming to that grammar. I must admit I am shocked at how ugly the specification languages are. I would have thought they would be the most beautiful and regular of all languages, being composed by afficionados of language analysis.
Java has four simple built-in parsers, java.util.StringTokenizer, java.io.StreamTokenizer, java.text.BreakIterator. and java.regex.Pattern.

Java version 1.5 or later also has a number of XML (extensible Markup Language) parsers built-in. Check out the DOM, SAX, XSD, XPath and Schema entries.

Hand Rolled Parsers

I wrote a number of parsers as part of JDisplay — the tools that pretties up listings on this website. The problem I faced is the code had to work with deliberately erroneous code and code fragments.

The traditional parsers are totally unforgiving. They want perfect, complete programs or data files to parse and give up totally on the first hiccough.

So I wrote my own using finite state automata, using enum constants to represent each state.

Download and have a look at the source.

Limitations

If you are colourising code, or rearranging code, ideally you want your parser to work even if the code contains syntax errors, or if you just have a snippet. Traditional parsers only work on syntatically perfect complete programs. This is why I used finite state automata instead of ANTLR (Another Tool for Language Recognition) for the parsers I used for colourising on this site.

ANTLR (formerly PCCTS)
Barat (parses Java source and byte code)
BYACC
Coco /R
ComputerTools List of parsers
CSS parser
CUP
CyberNeko: HTML parser
DOM
finite state automaton
grammar
Grammar-Kit: parser for IntelliJ Plugins
Java Expression Parser
JavaCC (formerly Jack)
JavaCUP
Jericho HTML parser
JFlex
JLEX
JParsec: a parser library not a generator
JTB
Koala XSL (parses XML)
lexer
List of HTML parsers
List of Open Source Parsers
MixedCC (for PHP, JSP )
Parboiled
PCCTS
recursive descent parser theory
regex
SableCC
SAX
schema
SMC
TagSoup
Wikipedia list of parsers
Xerces: parse XML
XML
XPath
XSD
YACC

standard footer
	This page is posted on the web at:	http://mindprod.com/jgloss/parser.html
	Optional Replicator mirror of mindprod.com on local hard disk J:	J:\mindprod\jgloss\parser.html
	Please read the feedback from other visitors, or send your own feedback about the site. Contact Roedy. Please feel free to link to this page without explicit permission.
	Canadian Mind Products IP:[65.110.21.43] Your face IP:[216.73.216.163]
Feedback	You are visitor number