XSD (XML Scheme Definition). W3C’s
XML (extensible Markup Language) Schema that is itself a form of
XML. It is
often simply called XML Schema. It offers much finer
control of XML
document content than the older DTD (Document Type Definition)
-style schema borrowed from HTML. XSD
has a schema written in XSD used to validate other schemas. You can download it. It is 88k. Unfortunately, by default, Opera treats it as raw text. IE nicely
lists it with colours.
The two main advantages of XSD
over DTD for
specifying an XML grammar are:
- The XSD grammar of the schema is simpler than
DTD.
XSD is just a
flavour of XML.
- XSD lets you
restrict in much more detail just what constitutes a valid file.
Sample XSD Schemas
The academics who wrote the
XSD spec were
more interested in impressing you than informing you. Therefore there are no examples
or even anything remotely like English language descriptions of what the various
grammatical elements are for. Your only hope of making sense of it is to find example
documents. Even the primer fairly tough slogging. Keep looking at the example
XML and
XSD to clarify
the text. People can learn languages from a set of examples, heavily commented
gradually adding features much more easily than from descriptions of the grammar in
some esoteric formal
Note how it allows forward and backward references to permit a top-down description
of the document. In typical XML fashion, it is revoltingly verbose. Oddly, you specify
the attributes on a tag after you describe all the nested tags that tag may enclose,
even though when you write the actual XML
the schema describes, the attributes come first. NMTOKEN is
an atomic string without spaces, often the name of an enumeration value. The 2-letter
country codes would be NMTOKEN. XML
lets you specify the types of the fields with a rich set of built-in types which
included bounded integers, float, double, fixed decimal, dates, times, strings, urls,
hex, Boolean, durations… You can set up enumerated types where you give a list
of the legal values of a field. There is even a pattern scheme, similar to Perl
regex, for describing legal string values. XSD
also allows you to enforce ordering of fields. Complete list of types.
XSD allows you to
specify the minimum and maximum number of times a field may appear with the
minOccurs=0
maxOccurs=unbounded.
You can specify the types of fields with: type=xsd:positiveInteger type=xsd:string type=xsd:anyURI options.
You can specify the allowable low and high bounds on a numeric field with:
mininclusive and maxinclusive.
There is a scheme to insist a data value be unique.
Sometimes the files are peppered with xs: and sometimes
with xsd:. This is an arbitrary string to abbreviate the
xmlns name space defined at https://www.w3.org/2001/XMLSchema. You can make it anything you like so
long as it you use it consistently. It lets the parser know that a word is a keyword.
This way you can accidentally use keywords for field names without confusion.
Understanding an XSD Schema
Understanding the keywords
used in schemas and comparing a schema with a known valid compliant
XML/jnlp
file will be almost all you need to make sense of the schema. After you read the
XSD tutorial,
this list will refresh your memory.
- all
- you must supply exactly one of each of a group of tags.
- attribute
- a keyword=value modifier on a tag.
- choice
- A group of possible tags. You can specify only one of them.
- complexType
- A tag that contains other tags nested inside it.
- dateTime
- A date/time in the form 2009-12-13T12:25:00.0000000-08:00
- default
- Specifies the default value for an attribute.
- element
- a tag that must appear in a particular order within a sequence group.
- enumeration
- describes one possible value of an attribute that has only a limited set of
legal values.
- length
- the precise length for this field in characters. More commonly you use
minLength and maxLength.
- maxLength
- the maximum length for this field in characters.
- maxOccurs
- the maximum number of times this tag can appear, unbounded for no upper limit. Oddly, must go on the ref not the target.
- minLength
- the minimum length for this field in characters.
- minOccurs
- the minimum number of times this tag can appear, 0
for optional. Oddly, must go on the ref not the
target.
- NCName
- describes a field that can have pretty well any character, including spaces,
except a colon.
- NMTOKEN
- describes an enumerated field that can only have a value selected from a list.
Letters, digits, period, colons, hyphens but no commas or spaces. You can have
enumerated values based on strings, which do allow spaces.
- restriction
- describes an attribute whose value is restricted in some way.
- sequence
- a group of tags that must appear in a particular order.
- simpleType
- usually describes an attribute that has restrictions on it.
- string
- a field that can include any Unicode-16 character.
- use
- either required or
optional. It applies to
attributes, not tags..
Stylus Studio has a wizard that will
take a well formed
XML document and compose an approximate
XSD schema for
it. You can then fine tune it. This greatly speeds up the work of composing
schemas. You can keep validating your schema as you work. As you type it shows you
multiple choices for what you most likely want to type next.
Validating an XML
File Conforming to an XSD
Schema
Here is an example of validating XML
with an XSD schema. This schema describes a valid
JNLP (Java Network Launching Protocol) 1.0
XML file.
You can check that your JNLP file is correctly formed using an
XSD Schema
originally from Vampqh. You must copy the JNLP
1.0 XSD schema posted
below into the current directory as file jnlp1.xsd or use
the JNLP
6.0 XSD
jnlp6.xsd then run the Java validation posted below with:
java.exe ValidateJNLP jnlp6.xsd C:\mydir\myapp.jnlp
The above validator is not user friendly. If all is ok, it prints
nothing. If there is problem, you get a cryptic exception. You can get a good idea
what it is looking for by reading the XSD
file. I have composed three schemas for it jnlp1.xsd,
jnlp5.xsd and jnlp6.xsd. Use the one that corresponds to the version of
your JNLP.
XSDs (XML Scheme Definitions)
are a bit like a BNF (Backus-Naur Form)
description of JNLP, written by someone with a terrible lexical
stutter.
Extracting Data from an xml file with an
XSD schema is
verbose undertaking. Unfortunately, the ranges, defaults etc. in the
XSD schema, are
all ignored when you extract information from a conforming xml file. They are just used for validating. Here are some of the
classes you will
There are dozens of classes in other packages with identical or similar names. You
have to make sure you use matching classes. Your
IDE (
Integrated Development Environment)
might automatically import the wrong classes if you are not careful.
Learning More
Oracle’s Javadoc on
Schema class : available:
Oracle’s Javadoc on
SchemaFactory class : available:
Oracle’s Javadoc on
Validator class : available:
Oracle’s Javadoc on
XMLConstants class : available:
Oracle’s Javadoc on
SAXParser class : available: