XML : Java Glossary

menu
The Basics	Schema
Naming	Awkward Characters and Entity References
Encoding	Quoting
Schemas	Writing
Validation	XML Serialization
Parsing	Digitally Signing XML
XML Benefits	Tools
XML Drawbacks	Books
What Should Replace XML?	Learning More
DTD	Links

XML was designed to make it easy to write a parser. I think this was an unfortunate decision. Only a handful of people in the world will ever write an XML parser, but hundreds of thousands have to compose XML . They should have designed it to be easy and terse to write. For example, its mandatory quotes around each field are there solely for the convenience of the parser writer. The tag names in the </mytag;> are redundant and should be optional. They are not needed at all in XML designed solely for machine consumption. Even in human-read XML, they add nothing on the innermost nest on a single line.

Naming

Pretty well any character is legal in an element or attribute name. You can use upper or lower case, accented letters, digits or punctuation. _ is good for separating words. You may not use a space. It is considered poor style to use -, . and :. Names cannot start with a number or punctuation or with the letters xml (in any case). Names are case-sensitive.

Encoding

UTF-8 is the default encoding, but unfortunately the encoding could be any ruddy encoding ever invented. Using other encodings destroys XML as an interchange format. Don’t do it!

Schemas

You describe your little XML subgrammar by writing a DTD (Document Type Definition) file. Optionally, you can include the DTD inline inside your XML file. There are other more elaborate schema grammars including RELAX NG, Schematron, XSD and various other schemas. I like XSD (XML Scheme Definition) s the best.

Validation

Each schema has its corresponding technique for validating an XML file that the syntax is valid. If you use a DTD , here:

Parsing

I personally detest XML, however, it has caught on like a cocaine wave. It must have some redeeming features.

XML Benefits

XML Drawbacks

What Should Replace XML ?

One possible candidate for the XML replacement job is the Java serialized object format. It can handle just about any data structure imaginable. It is platform independent. It has a simple DTD — Java source code for the corresponding class. Some claim it is Java-only. Not so. It is no more difficult for C++ to parse than any other similar newly concocted protocol. It is not tied to any hardware or OS (Operating System). It is just that Java has a head start implementing it. Java can implement it with no extra overhead.

There have been some efforts made to patch up the shortcomings of XML, in fact there are dozens of them. XML is no longer simple any more. It is raggedy patchwork quilt. People were sucked in by the initial simplicity, then discovered that it was not really all that useful in its simple form. Schema was added to allow specifying types (but still only permitting strings). Yes we need a standard interchange format, but XML was only a back of the envelope stab at it. XML was destined to fail since it totally ignored so many factors in coming up with a good design.

One such effort is VTD (Virtual Token Descriptor). A VTD record is a 64-bit integer that encodes the starting offset, length, type and nesting depth of a token in an XML document. Because VTD records don’t contain data fields, they work alongside of the original XML document, which is maintained intact in memory by the processing model.

Due to the stupidity, duplicity and/or greed of those promoting XML, we will likely be stuck with some committee-patched variant of it forever — something that will make even HTML look clean. We need a common data interchange format, but not so inept.

DTD

Schema

Handling Awkward Characters, XML Entities

But what about awkward non-ASCII characters such as é and Ω and ⇔? There are six ways around the restriction that XML does not support the full set of HTML character entity references.

UTF-8 files using the basic five character-entity encodings, or ISO-8859-1, with the basic five character entities (possibly excluding ') plus decimal NCE s, will create the files easiest to read and compose manually, XML ’s saving grace.

Nearly all XML documents now use UTF-8 encoding, so the usual way to handle awkward characters is to code them with a UTF-8-aware text editor as ordinary characters. That leaves you with only < > " and & to worry about.

Quoting

Writing

XML Serialization

Tools

Books

Learning More

XML Tools
XML Tool Comparison
Manual	A hand-written parser will run quickly Writing XML by hand is conceptually simpler and faster than doing it with a tool. Writing XML by hand gives you complete control over layout, headers, encoding etc.	Not feasible for all but the simplest files. Hard to maintain.
DOM	You can navigate the tree in any way you please in any order.	Will not work for large files since the whole tree must reside in RAM. Slow parsing.
SAX/StAX	Fast parsing. You can represent the data with a different structure from the XML structure of the file. Uses only a little RAM .	Must process sequentially.
JAXB (Java Api for XML data Binding)	Very little coding needed to read XML . JAXB generates most of the Java code for you from the schema. You deal with Java primitives and ordinary getters and setters.	Complicated to write XML files.
XPATH/XQUERY	You can avoid the low level details of navigating and specify a search query instead to find what you want.	Slow.

AELfred
Altova XMLSpy
Ant: XML validator
ASN.1
binaphobia
Binary XML: unfortunately still in the it-would-be-a-good-idea stage
Castor
Caucho Resin
cooktop
Crimson
Digitally Signing XML
Digitally Signing XML documents
DOM
DOM 1 spec
DOMValidator
DTD attributes
DTD: a language for describing XML file layouts
Elliotte Rusty Harold’s XML online book
Fluffiness of various file formats: student project
HTML entities
IBM’s tutorial
IBM’s XML page
JAXB
JAXP: Oracle’s XML manipulating classes
JDOM
JNLP (Java Web Start’s XML configuration language)
JSON
JUntotal: a more compact XML alternative
Liquid XML: code generator to read/write XML given schema
Mistakes with XML
NotXMLProposal: SDL streamlined XML proposal
online XML validator
Oracle’s Fast Web Services Project
RDF
Reading XML with DOM
Reading XML with SAX
RefleX: (XSLT and XQuery)
RELAX NG: a language for describing XML file layouts
SAX
Schematron: an XML description and pattern finding language
Serialization
StAX
Stylus Autogen: figures out a schema from sample XML
Stylus Schema Editor
Stylus Studio
Stylus XML tools
TagSoup
UBDDL (a Yahoo group working to define a more efficient replacement for XML)
UDDI
VTD-XML: faster, more efficient XML parsing
W3 online XML validator: via URL
W3 XML standard: lawyerly document
W3 XML xinclude standard on includes: lawyerly document
W3Schools XML validator
W3schools: XML tutorials
Wattle XML editor and schema converter
Woodstox
Writing XML with a DOM: by myong
Writing XML with DOM
Writing XML with SAX
x->Jen
Xalan
Xerces
XHTML
XML 1.0 spec
XML Compactor
XML databases
XML inventors
XML Validator tools
XML.ORG
xmlfiles.com (has lots of examples and tutorials)
XMLFox: free Windows XSD/XML editor/validator
XMLGlobal has some tutorials and information
xmlns
XPath
XQuery
XSD: schema to describe XML, friendlier and more specific than DTD
XSLT
XTP
XUL
YAML

	This page is posted on the web at:	http://mindprod.com/jgloss/xml.html
	Optional Replicator mirror of mindprod.com on local hard disk J:	J:\mindprod\jgloss\xml.html
	Please read the feedback from other visitors, or send your own feedback about the site. Contact Roedy. Please feel free to link to this page without explicit permission.
	Canadian Mind Products IP:[65.110.21.43] Your face IP:[18.188.173.56]
Feedback	You are visitor number Statcounter

XML Tool Comparison
Tool	Advantages	Disadvantages
Manual	A hand-written parser will run quickly Writing XML by hand is conceptually simpler and faster than doing it with a tool. Writing XML by hand gives you complete control over layout, headers, encoding etc.	Not feasible for all but the simplest files. Hard to maintain.
DOM	You can navigate the tree in any way you please in any order.	Will not work for large files since the whole tree must reside in RAM. Slow parsing.
SAX/StAX	Fast parsing. You can represent the data with a different structure from the XML structure of the file. Uses only a little RAM .	Must process sequentially.
JAXB (Java Api for XML data Binding)	Very little coding needed to read XML . JAXB generates most of the Java code for you from the schema. You deal with Java primitives and ordinary getters and setters.	Complicated to write XML files.
XPATH/XQUERY	You can avoid the low level details of navigating and specify a search query instead to find what you want.	Slow.