26 Sep 2006
Parsing and validation represent the core of XML. Knowing how to use these
capabilities well is vital to the successful introduction of XML to your project. This
tutorial on XML processing teaches you how to parse and validate XML files as well
as use XQuery. It is the third tutorial in a series of five tutorials that you can use to
help prepare for the IBM certification Test 142, XML and Related Technologies.
XML processing
© Copyright IBM Corporation 1994, 2006. All rights reserved. Page 1 of 38
developerWorks® ibm.com/developerWorks
Anyone working in software development for the last few years is aware that XML
provides cross-platform capabilities for data, just as the Java® programming
language does for application logic. This series of tutorials is for anyone who wants
to go beyond the basics of using XML technologies.
This tutorial is written for Java programmers who have a basic understanding of
XML and whose skills and experience are at a beginning to intermediate level. You
should have a general familiarity with defining, validating, and reading XML
documents, as well as a working knowledge of the Java language.
Objectives
After completing this tutorial, you will know how to:
• Parse XML documents using the Simple API for XML 2 (SAX2) and
Document Object Model 2 (DOM2) parsers
• Validate XML documents against Document Type Definitions (DTDs) and
XML Schemas
• Access XML content from databases using XQuery
Prerequisites
This tutorial is written for developers who have a background in programming and
scripting and who have an understanding of basic computer-science models and
data structures. You should be familiar with the following XML-related,
computer-science concepts: tree traversal, recursion, and reuse of data. You should
be familiar with Internet standards and concepts, such as Web browser,
client-server, documenting, formatting, e-commerce, and Web applications.
Experience designing and implementing Java-based computer applications and
working with relational databases is also recommended.
XML processing
Page 2 of 38 © Copyright IBM Corporation 1994, 2006. All rights reserved.
ibm.com/developerWorks developerWorks®
System requirements
To run the examples in this tutorial, you need a Linux® or Microsoft® Windows® box
with at least 50MB of free disk space and administrative access to install software.
The tutorial uses, but does not require, the following software:
StAX
A new API, called Streaming API for XML (StAX), is to be released
in late 2006. It is a pull API, as opposed to SAX's push model, so it
keeps control with the application rather than the parser. You can
also use StAX to modify the document being parsed. Read more in
"An Introduction to StAX" (see Resources).
<?xml version="1.0"?>
<!DOCTYPE catalog SYSTEM "dvd.dtd">
<!-- DVD inventory -->
<catalog>
<dvd code="_1234567">
<title>Terminator 2</title>
<description>
A shape-shifting cyborg is sent back from the future
to kill the leader of the resistance.
XML processing
© Copyright IBM Corporation 1994, 2006. All rights reserved. Page 3 of 38
developerWorks® ibm.com/developerWorks
</description>
<price>19.95</price>
<year>1991</year>
</dvd>
<dvd code="_7654321">
<title>The Matrix</title>
<price>12.95</price>
<year>1999</year>
</dvd>
<dvd code="_2255577" genre="Drama">
<title>Life as a House</title>
<description>
When a man is diagnosed with terminal cancer,
he takes custody of his misanthropic teenage son.
</description>
<price>15.95</price>
<year>2001</year>
</dvd>
<dvd code="_7755522" genre="Action">
<title>Raiders of the Lost Ark</title>
<price>14.95</price>
<year>1981</year>
</dvd>
</catalog>
XML processing
Page 4 of 38 © Copyright IBM Corporation 1994, 2006. All rights reserved.
ibm.com/developerWorks developerWorks®
These events are pushed out to the application in real time, as the parser moves
across the document contents. One benefit of this processing model is that you can
handle large documents with relatively little memory. A downside is that you have
more work to do to handle all these events.
The org.xml.sax package contains a set of interfaces. One of these provides the
XMLReader interface to the parser. You can set up for parsing like this:
try {
XMLReader parser = XMLReaderFactory.createXMLReader();
parser.parse( "myDocument.xml" ); //complete path
} catch ( SAXParseException e ) {
//document is not well-formed
} catch ( SAXException e ) {
//could not find an implementation of XMLReader
} catch ( IOException e ) {
//problem reading document file
}
XML processing
© Copyright IBM Corporation 1994, 2006. All rights reserved. Page 5 of 38
developerWorks® ibm.com/developerWorks
Tip: Reuse the parser instance if possible. Creating a parser is expensive. If you
have multiple threads running, you can reuse parser instances from a resource pool.
This is all well and good so far, but how does your application get events from the
parser? I'm glad you asked.
To receive events from the parser, you implement the ContentHandler interface.
This interface has a number of methods that you can implement to process your
document. Alternatively, if you only want to handle one or two callbacks, you can
subclass DefaultHandler, which implements all the ContentHandler methods
(doing nothing) and overrides only the methods you need.
Either way, you write logic to do whatever processing you require upon receiving
startElement, characters, endDocument, and other callback methods invoked
by the SAX parser. You can see all the method calls from a document as they would
occur on pages 351-355 of XML in a Nutshell, Third Edition (see Resources).
The callback events are the normal events from a document as it's being parsed.
You can also handle validity callbacks by implementing an ErrorHandler. I'll
discuss this topic after I go over validation, so stay tuned.
To learn more about parsing with SAX, check out Chapter 20 of XML in a Nutshell,
Third Edition or read "Serial Access with the Simple API for XML (SAX)" (see
Resources).
parser.setErrorHandler( saxEcho );
XML processing
Page 6 of 38 © Copyright IBM Corporation 1994, 2006. All rights reserved.
ibm.com/developerWorks developerWorks®
As an exercise for the SAX parser skills you've learned, use the SAXEcho.java code
in Listing 2 to output the parser events for the catalog.xml file.
package com.xml.tutorial;
import java.io.IOException;
import java.io.OutputStreamWriter;
import java.io.Writer;
import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.SAXParseException;
import org.xml.sax.XMLReader;
import org.xml.sax.helpers.DefaultHandler;
import org.xml.sax.helpers.XMLReaderFactory;
/**
* A handler for SAX parser events that outputs certain event
* information to standard output.
*
* @author mlorenz
*/
public class SAXEcho extends DefaultHandler {
public static final String XML_DOCUMENT_DTD = "catalogDTD.xml";
//validates via catalog.dtd
public static final String XML_DOCUMENT_XSD = "catalogXSD.xml";
//validates via catalog.xsd
public static final String NEW_LINE = System.getProperty("line.separator");
protected static Writer writer;
/**
* Constructor
*/
public SAXEcho() {
super();
}
/**
* @param args
*/
public static void main(String[] args) {
//-- Set up my instance to handle SAX events
DefaultHandler eventHandler = new SAXEcho();
//-- Echo to standard output
writer = new OutputStreamWriter( System.out );
try {
//-- Create a SAX parser
XMLReader parser = XMLReaderFactory.createXMLReader();
parser.setContentHandler( eventHandler );
parser.setErrorHandler( eventHandler );
parser.setFeature(
"http://xml.org/sax/features/validation", true );
//-- Validation via DTD --
echo( "=== Parsing " + XML_DOCUMENT_DTD + " ===" + NEW_LINE );
XML processing
© Copyright IBM Corporation 1994, 2006. All rights reserved. Page 7 of 38
developerWorks® ibm.com/developerWorks
XML processing
Page 8 of 38 © Copyright IBM Corporation 1994, 2006. All rights reserved.
ibm.com/developerWorks developerWorks®
writer.flush();
} catch (IOException e) {
System.out.println( "I/O error during echo()" );
e.printStackTrace();
}
}
/* (non-Javadoc)
* @see org.xml.sax.helpers.DefaultHandler#error(org.xml.sax.SAXParseException)
* @see org.xml.sax.ErrorHandler interface
*/
@Override
public void error(SAXParseException e) throws SAXException {
echo( NEW_LINE + "*** Failed validation ***" + NEW_LINE );
super.error(e);
echo( "* " + e.getMessage() + NEW_LINE +
"* Line " + e.getLineNumber() +
" Column " + e.getColumnNumber() + NEW_LINE +
"*************************" + NEW_LINE );
try {
Thread.sleep( 10 );
} catch (InterruptedException e1) {
e1.printStackTrace();
}
}
}
You can use the code in SAXEcho.java to see how SAX parsing all comes together.
Note that this code does not handle all events, so not everything from the original
document will be echoed (see Listing 3). Take a look at the ContentHandler
interface to see what other messages you might get.
XML processing
© Copyright IBM Corporation 1994, 2006. All rights reserved. Page 9 of 38
developerWorks® ibm.com/developerWorks
</description>
<price>15.95</price>
<year>2001</year>
</dvd>
<dvd>
<title>Raiders of the Lost Ark</title>
<price>14.95</price>
<year>1981</year>
</dvd>
</catalog>
DOM doesn't specify an interface for the XML parser, so different vendors have
different parser classes. I'll continue to use the Xerces parser, which has a
DOMParser class.
XML processing
Page 10 of 38 © Copyright IBM Corporation 1994, 2006. All rights reserved.
ibm.com/developerWorks developerWorks®
try {
parser.parse( "myDocument.xml" );
Document document = parser.getDocument();
} catch (DOMException e) {
// take validity action here
} catch (SAXException e) {
// well-formedness action here
} catch (IOException e) {
// take I/O action here
}
DOM incurs an expense in time and memory to construct an entire document tree.
The payback comes from the many ways that you can traverse and manipulate the
document's content using the tree structure. Figure 3 shows a portion of the DVD
catalog document.
The tree has a root, which you can access through the
Document.getDocumentElement() method. From any Node, you can use
Node.getChildNodes() to get a NodeList of children of the current Node. Note
that attributes are not considered a child of the containing Node. You can create new
Nodes, append them, insert them, locate them by name, and remove them. These
are just a few of the available capabilities.
XML processing
© Copyright IBM Corporation 1994, 2006. All rights reserved. Page 11 of 38
developerWorks® ibm.com/developerWorks
Client traversal
You can traverse the DOM tree in the client, and you can validate actions on an
XHTML page through JavaScript from within the browser. For example, the client
might need to find out if a Node with a particular name already exists:
Server traversal
On the server, you will certainly need to manipulate the tree, such as to add a new
child to a Node:
XHTML as an alternative
This tutorial works with a data document, but the document could
easily be an XHTML page, in which case you'd see Nodes such as
head, body, p, td, and li.
DOM3
DOM3 has added a DOMErrorHandler, which provides a callback
mechanism to use instead of DOMException. Here is some
example code:
XML processing
Page 12 of 38 © Copyright IBM Corporation 1994, 2006. All rights reserved.
ibm.com/developerWorks developerWorks®
The DOM parser throws a DOMException if problems occur during parsing. This is
a RuntimeException, since some languages don't support checked exceptions,
but you should always catch it or throw it in your Java code.
As an exercise for the DOM parser skills you've learned, use the DOMEcho.java
code in Listing 4 to output the contents of the DOM tree for the catalog.xml file. After
this code echoes the tree information, it then changes the tree and echoes the
updated tree.
package com.xml.tutorial;
import java.io.IOException;
import java.io.OutputStreamWriter;
import java.io.Writer;
import org.w3c.dom.DOMException;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.NamedNodeMap;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.w3c.dom.traversal.DocumentTraversal;
import org.w3c.dom.traversal.NodeFilter;
import org.w3c.dom.traversal.TreeWalker;
import org.xml.sax.SAXException;
import com.sun.org.apache.xerces.internal.parsers.DOMParser;
/**
* A handler to output certain information about a DOM tree
* to standard output.
*
* @author lorenzm
*/
public class DOMEcho {
public static final String XML_DOCUMENT_DTD =
"catalogDTD.xml"; //validates via catalog.dtd
public static final String NEW_LINE = System.getProperty("line.separator");
protected static Writer writer;
// Types of DOM nodes, indexed by nodeType value (e.g. Attr = 2)
protected static final String[] nodeTypeNames = {
"none", //0
"Element", //1
"Attr", //2
"Text", //3
XML processing
© Copyright IBM Corporation 1994, 2006. All rights reserved. Page 13 of 38
developerWorks® ibm.com/developerWorks
"CDATA", //4
"EntityRef", //5
"Entity", //6
"ProcInstr", //7
"Comment", //8
"Document", //9
"DocType", //10
"DocFragment", //11
"Notation", //12
};
//-- DOMImplementation features (we only need one for now)
protected static final String TRAVERSAL_FEATURE = "Traversal";
//-- DOM versions (we're using DOM2)
protected static final String DOM_2 = "2.0";
/**
* Constructor
*/
public DOMEcho() {
super();
}
/**
* @param args
*/
public static void main(String[] args) {
//Echo to standard output
writer = new OutputStreamWriter( System.out );
//use the Xerces parser
try {
DOMParser parser = new DOMParser();
parser.setFeature( "http://xml.org/sax/features/validation", true );
parser.parse( XML_DOCUMENT_DTD ); //use DTD grammar for validation
Document document = parser.getDocument();
echoAll( document );
//-- add description for Indiana Jones movie
//---- find parent Node
Element indianaJones = document.getElementById("_7755522");
//---- insert a description before the price
// (anywhere else would be invalid)
NodeList prices = indianaJones.getElementsByTagName("price");
Node desc = document.createElement("description");
desc.setTextContent(
"Indiana Jones is hired to find the Ark of the Covenant");
indianaJones.insertBefore( desc, prices.item(0) );
//-- now, echo the document again to see the change
echoAll( document );
} catch (DOMException e) { //handle invalid manipulations
short code = e.code;
if( code == DOMException.INVALID_MODIFICATION_ERR ) {
//take action when invalid manipulation attempted
} else if( code == DOMException.NOT_FOUND_ERR ) {
//take action when element or attribute not found
} //add more checks here as desired
} catch (SAXException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
/**
* Echo all the Nodes, in preorder traversal order, for aDocument
* @param aDocument
*/
protected static void echoAll(Document aDocument) {
if( aDocument.getImplementation().hasFeature(
TRAVERSAL_FEATURE,DOM_2) ) {
echo( "=== Echoing " + XML_DOCUMENT_DTD + " ===" + NEW_LINE );
Node root = (Node) aDocument.getDocumentElement();
int whatToShow = NodeFilter.SHOW_ALL;
NodeFilter filter = null;
boolean expandRefs = false;
//-- depth first, preorder traversal
XML processing
Page 14 of 38 © Copyright IBM Corporation 1994, 2006. All rights reserved.
ibm.com/developerWorks developerWorks®
XML processing
© Copyright IBM Corporation 1994, 2006. All rights reserved. Page 15 of 38
developerWorks® ibm.com/developerWorks
e.printStackTrace();
}
}
}
This array maps the Node.getNodeType() int value to each of the types of
Nodes that you can encounter:
if( aDocument.getImplementation().hasFeature(
TRAVERSAL_FEATURE,DOM_2) ) {
This section of code is what changes the DOM tree. It adds a description in the
XML processing
Page 16 of 38 © Copyright IBM Corporation 1994, 2006. All rights reserved.
ibm.com/developerWorks developerWorks®
correct place so that the tree is still valid according to the document's schema.
XML processing
© Copyright IBM Corporation 1994, 2006. All rights reserved. Page 17 of 38
developerWorks® ibm.com/developerWorks
Text:
=== Echoing catalogDTD.xml ===
Text:
Comment: DVD inventory
Text:
Element: dvd
Attr: code="_1234567"
Text:
Element: title
Text: Terminator 2
Text:
Element: description
Text: A shape-shifting cyborg is sent back from the future
to kill the leader of the resistance.
Text:
Element: price
Text: 19.95
Text:
Element: year
Text: 1991
Text:
Text:
Element: dvd
Attr: code="_7654321"
Text:
Element: title
Text: The Matrix
Text:
Element: price
Text: 10.95
Text:
Element: year
Text: 1999
Text:
Text:
Element: dvd
Attr: code="_2255577"
Attr: genre="Drama"
Text:
Element: title
Text: Life as a House
Text:
Element: description
Text: When a man is diagnosed with terminal cancer,
he takes custody of his misanthropic teenage son.
Text:
Element: price
Text: 15.95
Text:
Element: year
Text: 2001
Text:
Text:
Element: dvd
Attr: code="_7755522"
Attr: genre="Action"
Text:
Element: title
Text: Raiders of the Lost Ark
Text:
Element: description
Text: Indiana Jones is hired to find the Ark of the Covenant
Element: price
Text: 14.95
Text:
Element: year
Text: 1981
Text:
Text:
Whitespace
XML processing
Page 18 of 38 © Copyright IBM Corporation 1994, 2006. All rights reserved.
ibm.com/developerWorks developerWorks®
You'll notice a lot of Text Nodes in the DOMEcho output (Listing 6), many of them
with nothing apparent as content. Why would that be?
The parser reports whitespace (extra spaces, tabs, and carriage returns) that occurs
within the document's element contents.
The Text elements due to whitespace that is in Element content are called
ignorable whitespace. Ignorable whitespace is not part of validation, as you're about
to see in Figure 4.
Schemas
Technically speaking, DTDs, XML Schemas (capital S), and RELAX
NG are all types of XML schema (little s). XML Schemas (capital S)
are strictly called W3C XML Schemas. In this tutorial, whenever you
see XML Schema, realize that it's the W3C language and not the
generic schema document description.
XML processing
© Copyright IBM Corporation 1994, 2006. All rights reserved. Page 19 of 38
developerWorks® ibm.com/developerWorks
A DTD specifies the elements and attributes that an XML instance document must
contain to be considered valid. You can associate a document with a DTD by
including a DOCTYPE statement near the top of the document:
Now, go through the catalog.dtd file. To validate a document, you need to turn
validation on and use a validating parser. With this code, turn on validation for the
SAX parser:
saxParser.setFeature(
"http://xml.org/sax/features/validation", true );
domParser.setFeature(
"http://xml.org/dom/features/validation", true );
XML processing
Page 20 of 38 © Copyright IBM Corporation 1994, 2006. All rights reserved.
ibm.com/developerWorks developerWorks®
The dvd+ specifies that a <catalog> element has one or more <dvd>s. Makes
sense; otherwise, you aren't going to be selling too many DVDs!
The title, ..., year is called a sequence. It means that the named elements
must appear in this order as children of a <dvd> element. The question mark after
description means that a <dvd> has zero or one description elements -- in other
words, it's optional but if it is specified, there can only be one (an asterisk means
zero or more, and a plus sign means one or more).
An ID type attribute must have a unique name within the document. You'll notice
that in the catalog.xml file, the IDs begin with an underscore. An XML name
cannot start with a number, but an underscore (or letter or many other nondigit
character) is fine. An element can only have one ID type. REQUIRED, as you might
have guessed, means that a <dvd> must have a code.
XML processing
© Copyright IBM Corporation 1994, 2006. All rights reserved. Page 21 of 38
developerWorks® ibm.com/developerWorks
Comedy or ...").
These remaining lines all specify parsed character data. None of these elements
may have children.
Now try to change the instance document to make sure the rules work correctly.
First, add a <description>, but put it at the end of the <dvd>. As expected, you
get an error (see Figure 6).
XML processing
Page 22 of 38 © Copyright IBM Corporation 1994, 2006. All rights reserved.
ibm.com/developerWorks developerWorks®
Why didn't that work?! Science fiction is in the list! D'oh -- XML is case-sensitive, as
you know, so "scifi" won't work. It needs to be "SciFi".
Now check to see if IDs really need to be unique. Copy the same code into another
<dvd> (see Figure 8).
Figure 8. ID error
Sure enough, you get an appropriate error. You get the idea. Feel free to use the
DTD and XML files to try out other changes (see Download for the source files).
XML processing
© Copyright IBM Corporation 1994, 2006. All rights reserved. Page 23 of 38
developerWorks® ibm.com/developerWorks
To handle DTD manipulation errors, you must turn on validation. For Xerces, you set
the schema validation feature to true:
parser.setFeature(
"http://apache.org/xml/features/validation/schema",
true );
You can read about the different Xerces parser features at The Apache Software
Foundation Web site (see Resources). To read more about validation with DTDs,
check out Chapter 3 of XML in a Nutshell, Third Edition (see Resources).
Now, check out the validation. Comment out the price for the Life as a House dvd
in the XML document and see the results, using both DTD and XSD files for
validation. Listing 6 shows the output.
XML processing
Page 24 of 38 © Copyright IBM Corporation 1994, 2006. All rights reserved.
ibm.com/developerWorks developerWorks®
<year>2001</year>
</dvd>
<dvd>
<title>Raiders of the Lost Ark</title>
<price>14.95</price>
<year>1981</year>
</dvd>
</catalog>
XSD
XML Schema is also known as XML Schema Definition, thus the file
extension .xsd.
Let's validate the same XML instance document that you used for DTD validation in
Listing 1. Listing 7 shows the XML Schema:
XML processing
© Copyright IBM Corporation 1994, 2006. All rights reserved. Page 25 of 38
developerWorks® ibm.com/developerWorks
Notice that the XML Schema is a lot more involved than the corresponding DTD. In
fact, even taking out the comments and spacing, this schema is more than 50 lines
long, as opposed to the DTD schema that is nine lines long. (Granted, this schema
does more detailed checking than the DTD does). So, along with more granular
control comes more complexity -- a lot more complexity. The message is: If your
validation needs don't require an XML Schema, use a DTD.
Review the added value list for XML Schemas to see how the DVD catalog
documents benefit, in addition to enforcing comparable constraints from the DTD
you used before:
• Granular control over element and attribute values: Unlike the DTD,
which allows any character values, the XSD constrains the values of
descriptions (20 to 120 characters), prices (0.00 to 100.00), and years
(1900 to 2999).
• Complex data types: You created new data types that you can reuse
XML processing
Page 26 of 38 © Copyright IBM Corporation 1994, 2006. All rights reserved.
ibm.com/developerWorks developerWorks®
<xs:element name="dvd">
<xs:complexType>
<xs:sequence>
<xs:element name="title" type="xs:string"/>
...
A simpleType is an element that only contains text and its own attribute
values:
<xs:simpleType name="yearString">
<xs:restriction base="xs:string">
<xs:pattern value="(19|20)\d\d"/>
</xs:restriction>
</xs:simpleType>
In this particular case, you define a new type called yearString that
must contain four digits and begin with either "19" or "20." You use the
xs:restriction element to derive a new, constrained type from an
existing (base) type. You use the xs:pattern facet element to compare
values to see if they match the specified expression (see Facets).
• xs:sequence. The child elements must appear in the exact order listed
(although minOccurs can make an element optional, as you saw):
<xs:sequence>
<xs:element name="title" type="xs:string"/>
<xs:element name="description" type="descriptionString" minOccurs="0"/>
<xs:element name="price" type="priceValue"/>
<xs:element name="year" type="yearString"/>
</xs:sequence>
XML processing
© Copyright IBM Corporation 1994, 2006. All rights reserved. Page 27 of 38
developerWorks® ibm.com/developerWorks
Facets
Schemas support a set of possible aspects for values. These
aspects are called facets and are used with a restriction to constrain
the valid values. The following facet types are available:
• pattern
• enumeration
• whiteSpace
Now make some edits and verify that your constraints are being enforced. Add a
genre of Adventure, enter a description more than 120 characters long, and
duplicate a dvd code (see Figure 9).
You can see that the genre, unique ID, and description length are all enforced.
XML processing
Page 28 of 38 © Copyright IBM Corporation 1994, 2006. All rights reserved.
ibm.com/developerWorks developerWorks®
• xs:all: Each of the child elements listed must appear once, but they
can appear in any order.
• xs:group: A set of elements of the group name can be defined and then
referenced (throughref=groupName).
• xs:attributeGroup: This is the corresponding indicator for attributes,
as xs:group is for elements.
• xs:date: This is a Gregorian calendar date as defined in ISO 8601,
formatted as YYYY-MM-DD.
• xs:time: The time is represented by hh:mm:ss, with or without "Z" for
UTC relative time.
• xs:duration: An amount of years, months, days, hours, and minutes.
As you can see, a lot of built-in power is available when you write an XML Schema.
Can't find what you need? Create a new type.
Data types
A powerful feature of XML Schema is the capability to create new data types. You
saw new types used extensively in the catalog.xsd file, including the creation of the
yearString and priceValue types. In this case, these types are only used in the
dvd type, but you could use them anywhere that years or prices appear in the
document.
As I mentioned before, you can specialize an existing type using the restriction
element in combination with one or more facets. If more than one facet exists, you
can use them in combination to determine which values are valid and which are not.
Pattern matching
The pattern facet element supports a rich expression syntax that is similar to Perl.
You saw it used for the yearString, where you can read the pattern "
(19|20)\d\d" as "the string must start with either one-nine or two-zero and must
be followed by two decimal numbers." Table 1 shows a few more patterns.
XML processing
© Copyright IBM Corporation 1994, 2006. All rights reserved. Page 29 of 38
developerWorks® ibm.com/developerWorks
To read more about the many possibilities for expressions, see pages 427-429 of
XML in a Nutshell, Third Edition or view Table 24-5 in Chapter 24 of XML Bible,
Second Edition online (see Resources).
To handle XML Schema manipulation errors, you must turn on validation. For
Xerces, set the schema validation feature to true:
parser.setFeature(
"http://apache.org/xml/features/validation/schema",
true );
You can read about the different Xerces parser features on The Apache Software
Foundation Web site (see Resources).
DOMEcho revisited
XML processing
Page 30 of 38 © Copyright IBM Corporation 1994, 2006. All rights reserved.
ibm.com/developerWorks developerWorks®
To read more about validation with XML Schemas, check out Chapter 17 of XML in
a Nutshell, Third Edition, W3Schools, or "Interactive XML tutorials" (see Resources).
XQuery expands upon XPath expressions, which the fourth part of this tutorial on
XML transformations discusses in detail. An XPath expression is also a valid XQuery
expression. So, why do you need XQuery? The value-add for XQuery is due to
clauses that XQuery adds to its expressions, allowing for more complicated
expressions much like a SELECT statement does in SQL.
XQuery clauses
XQuery contains multiple clauses, represented by the acronym FLWOR: for, let,
where, order by, return. Table 2 shows these parts.
XML processing
© Copyright IBM Corporation 1994, 2006. All rights reserved. Page 31 of 38
developerWorks® ibm.com/developerWorks
XQuery contains a condition that evaluates to true or false and comprises the
search criteria within the FLWOR clauses. Look at some examples. You can use the
dvd.xml file shown in Listing 8 as the XML instance document.
Listing 8. dvd.xml
<?xml version="1.0"?>
<!-- DVD inventory -->
<catalog>
<dvd code="1234567">
<title>Terminator 2</title>
<price>19.95</price>
<year>1991</year>
</dvd>
<dvd code="7654321">
<title>The Matrix</title>
<price>12.95</price>
<year>1999</year>
</dvd>
<dvd code="2255577">
<title>Life as a House</title>
<price>15.95</price>
<year>2001</year>
</dvd>
<dvd code="7755522">
<title>Raiders of the Lost Ark</title>
<price>14.95</price>
<year>1981</year>
</dvd>
</catalog>
Saxon
You can get the free Saxon tools at Saxonica if you want to try out
XQuery yourself (see Resources).
To try this out, I used the Saxon XQuery tools. All my files are in the directory I
unpacked Saxon into. To use XQuery to create an HTML page that lists all the DVD
titles in ascending order, I used the dvdTitles.xq file shown in Listing 9, which also
shows the output. I used the following command to execute this query:
XML processing
Page 32 of 38 © Copyright IBM Corporation 1994, 2006. All rights reserved.
ibm.com/developerWorks developerWorks®
dvdTitles.xq:
<html>
<body>
Available DVDs:
<br/>
<ol>
{
for $title in doc("dvd.xml")/catalog/dvd/title
order by $title
return <li>{data($title)}</li>
}
</ol>
</body>
</html>
dvdTitles.html:
<?xml version="1.0" encoding="UTF-8"?>
<html>
<body>
Available DVDs:
<br/>
<ol>
<li>Life as a House</li>
<li>Raiders of the Lost Ark</li>
<li>Terminator 2</li>
<li>The Matrix</li>
</ol>
</body>
</html>
In Listing 9, look at the XQuery logic in detail. First of all, the query must be
surrounded by curvy brackets ("{}"). You can see in this example that three of the
clauses are used (for, order by, and return). You use the doc() function to
open an XML document. $title is a variable that is set to each of the search
results during each loop. In this case, it is set to each result of the
/catalog/dvd/title expression -- thus, its name. The data() function in the
return clause pulls out just the value from the XML without the tags. If you just put
$title, you would get "<title>value</title>," which you don't want in your
HTML output. Notice that the XQuery is surrounded with all the HTML needed to
complete the page.
Now, suppose you want to output the prices for DVDs that cost more than US$15 in
descending order. Listing 10 shows the XQuery and output files.
dvdPriceThreshold.hq
<html>
<body>
DVDs prices below $15.00:
<br/>
<ol>
{
for $price in doc("dvd.xml")/catalog/dvd/price
where $price < 15.00
XML processing
© Copyright IBM Corporation 1994, 2006. All rights reserved. Page 33 of 38
developerWorks® ibm.com/developerWorks
The main difference with this query is that you specified a where clause. Just for
fun, you also reversed the sort order.
Obviously, you can do a lot more to learn the power of XQuery, but I've covered
enough to show you some of the possibilities. To learn more, check out "XQuery"
and "Five Practical XQuery Applications" (see Resources).
Section 5. Conclusion
The core of XML is parsing and validation. Knowing how to use these capabilities
well is vital to the successful introduction of XML to your project.
Summary
In this tutorial on XML processing, you've seen how to:
XML processing
Page 34 of 38 © Copyright IBM Corporation 1994, 2006. All rights reserved.
ibm.com/developerWorks developerWorks®
Downloads
Description Name Size Download method
Sample DTD and XML files x-cert1423-code-samples.zip
16KB HTTP
XML processing
© Copyright IBM Corporation 1994, 2006. All rights reserved. Page 35 of 38
developerWorks® ibm.com/developerWorks
Resources
Learn
• XML and Related Technologies certification prep (developerWorks, August -
October, 2006): With this series of five tutorials, prepare to take the IBM
certification Test 142, XML and Related Technologies, to attain the IBM
Certified Solution Developer - XML and Related Technologies certification.
• XML: A Manager's Guide, Second Edition (Kevin Dick, Addison-Wesley
Professional, 2002): Read about uses of XML technologies in enterprise
applications.
• XML in a Nutshell, 3rd Edition (Elliotte Rusty Harold and W. Scott Means,
O'Reilly Media, 2004, ISBN: 0596007647): Check out this comprehensive XML
reference with everything from fundamental syntax rules, DTD and XML
Schema creation, XSLT transformations, processing APIs, XML 1.1, plus SAX2
and DOM Level 3.
• XQuery (Jim Keogh and Ken Davidson, McGraw-Hill/Osborne, 2005; ISBN:
0072262109): Learn to write XQuery expressions in this excerpt from chapter 9
of the book XML DeMYSTiFieD.
• Five Practical XQuery Applications (Tim Matthews and Srinivas Pandrangi, 9
May 2003): Add XQuery in your own apps to simplify difficult or tedious tasks.
• An Introduction to StAX (Elliotte Rusty Harold, O'Reilly Media, September 17,
2003): Read more about Streaming API for XML (StAX) in this article.
• Interactive XML tutorials: Explore a variety of XML topics including, SVG, DTD,
Schema, XSLT, DOM and SAX complete with student problems, access to
online parsers to process your answers for immediate feedback.
• W3Schools online Web tutorials: Discover Web-building tutorials, from basic
HTML and XHTML to advanced XML, SQL, Database, Multimedia and WAP.
• Java theory and practice: Screen-scraping with XQuery (Brian Goetz,
developerWorks, 22 Mar 2005): See how effectively you can use XQuery as an
HTML screen-scraping engine.
• Power your mashups with XQuery (Ning Yan, developerWorks, July 2006):
Create a mashup application that uses XQuery to couple Web content with XML
data and Web services.
• The Java XML Validation API (Elliotte Rusty Harold, developerWorks, August
2006): Check your documents for conformance to schemas with this XML
validation API.
• Saxonica: XSLT and XQuery Processing: Learn about this collection of tools for
processing XML documents that includes XSLT 2.0, XPath 2.0, XQuery 1.0,
and XML Schema 1.0 processors.
• DOMException from Chapter 9 of Processing XML with Java: A Guide to SAX,
DOM, JDOM, JAXP, and TrAX (Elliotte Rusty Harold, Addison-Wesley
XML processing
Page 36 of 38 © Copyright IBM Corporation 1994, 2006. All rights reserved.
ibm.com/developerWorks developerWorks®
XML processing
© Copyright IBM Corporation 1994, 2006. All rights reserved. Page 37 of 38
developerWorks® ibm.com/developerWorks
Trademarks
IBM, DB2, Lotus, Rational, Tivoli, and WebSphere are trademarks of IBM
Corporation in the United States, other countries, or both.
Java and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the
United States, other countries, or both.
Linux is a trademark of Linus Torvalds in the United States, other countries, or both.
Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft
Corporation in the United States, other countries, or both.
XML processing
Page 38 of 38 © Copyright IBM Corporation 1994, 2006. All rights reserved.