X Cert1423 A4

XML and Related Technologies certification prep,
Part 3: XML processing

Explore how to parse and validate XML documents plus how to
use XQuery
Skill Level: Intermediate
Mark Lorenz (mlorenz@nc.rr.com)

Senior Application Architect
Hatteras Software, Inc.
26 Sep 2006
Parsing and validation represent the core of XML. Knowing how to use these
capabilities well is vital to the successful introduction of XML to your project. This
tutorial on XML processing teaches you how to parse and validate XML files as well
as use XQuery. It is the third tutorial in a series of five tutorials that you can use to
help prepare for the IBM certification Test 142, XML and Related Technologies.
Section 1. Before you start

In this section, you'll find out what to expect from this tutorial and how to get the
most out of it.
About this series

This series of five tutorials helps you prepare to take the IBM certification Test 142,
XML and Related Technologies, to attain the IBM Certified Solution Developer - XML
and Related Technologies certification. This certification identifies an
intermediate-level developer who designs and implements applications that make
use of XML and related technologies such as XML Schema, Extensible Stylesheet
Language Transformation (XSLT), and XPath. This developer has a strong
understanding of XML fundamentals; has knowledge of XML concepts and related
technologies; understands how data relates to XML, in particular with issues
associated with information modeling, XML processing, XML rendering, and Web
XML processing
© Copyright IBM Corporation 1994, 2006. All rights reserved. Page 1 of 38
developerWorks® ibm.com/developerWorks
services; has a thorough knowledge of core XML-related World Wide Web

Consortium (W3C) recommendations; and is familiar with well-known, best
practices.
Anyone working in software development for the last few years is aware that XML
provides cross-platform capabilities for data, just as the Java® programming
language does for application logic. This series of tutorials is for anyone who wants
to go beyond the basics of using XML technologies.
About this tutorial

This tutorial is the third in the "XML and Related Technologies certification prep"
series that takes you through the key aspects of effectively using XML technologies
on Java projects. This third tutorial focuses on XML processing -- that is, how to
parse and validate XML documents. It lays the groundwork for Part 4, which focuses
on transformation, including the use of XSLT, XPath, and Cascading Style Sheets
(CSS).
This tutorial is written for Java programmers who have a basic understanding of
XML and whose skills and experience are at a beginning to intermediate level. You
should have a general familiarity with defining, validating, and reading XML
documents, as well as a working knowledge of the Java language.
Objectives
After completing this tutorial, you will know how to:
• Parse XML documents using the Simple API for XML 2 (SAX2) and
Document Object Model 2 (DOM2) parsers
• Validate XML documents against Document Type Definitions (DTDs) and
XML Schemas
• Access XML content from databases using XQuery
Prerequisites
This tutorial is written for developers who have a background in programming and
scripting and who have an understanding of basic computer-science models and
data structures. You should be familiar with the following XML-related,
computer-science concepts: tree traversal, recursion, and reuse of data. You should
be familiar with Internet standards and concepts, such as Web browser,
client-server, documenting, formatting, e-commerce, and Web applications.
Experience designing and implementing Java-based computer applications and
working with relational databases is also recommended.
XML processing
Page 2 of 38 © Copyright IBM Corporation 1994, 2006. All rights reserved.
ibm.com/developerWorks developerWorks®
System requirements
To run the examples in this tutorial, you need a Linux® or Microsoft® Windows® box
with at least 50MB of free disk space and administrative access to install software.
The tutorial uses, but does not require, the following software:
• Java software development kit (JDK) 1.4.2 or later

• Eclipse 3.1 or later
• XMLBuddy 2.0 or later (Note: Some portions of the series use capabilities
of XMLBuddy Pro, which is not free.)
See Resources for links to download the above software
Section 2. Parsing XML documents

You can parse an XML document in multiple ways (see Part 1 of this series, which
focuses on architecture), but the SAX parser and the DOM parser constitute the
primary ways. Part 1 features a high-level comparison of the two (see Resources).
StAX
A new API, called Streaming API for XML (StAX), is to be released
in late 2006. It is a pull API, as opposed to SAX's push model, so it
keeps control with the application rather than the parser. You can
also use StAX to modify the document being parsed. Read more in
"An Introduction to StAX" (see Resources).
XML instance document

This tutorial uses a store's catalog of available DVDs for purchase as the document
throughout. Conceptually, the catalog contains a collection of DVDs with information
about each DVD associated with it. The actual document is a short catalog with only
four DVDs in it, but it has enough complexity for you to learn about XML processing,
including validation. Listing 1 shows the file.
Listing 1. The XML instance document for the DVD catalog
<?xml version="1.0"?>
<!DOCTYPE catalog SYSTEM "dvd.dtd">

<catalog>
<dvd code="_1234567">
<title>Terminator 2</title>
<description>
A shape-shifting cyborg is sent back from the future
to kill the leader of the resistance.
XML processing
</description>
<price>19.95</price>
<year>1991</year>
</dvd>
<dvd code="_7654321">
<title>The Matrix</title>
<year>1999</year>
</dvd>
<dvd code="_2255577" genre="Drama">
<title>Life as a House</title>
<description>
When a man is diagnosed with terminal cancer,
he takes custody of his misanthropic teenage son.
</description>
<year>2001</year>
</dvd>
<dvd code="_7755522" genre="Action">
<title>Raiders of the Lost Ark</title>
<year>1981</year>
</dvd>
</catalog>
Using the SAX parser

As Part 1 of this series discussed, the SAX parser is an event-based parser. This
means that the parser sends events to callback methods as it parses a document
(see Figure 1). For simplicity, Figure 1 doesn't show all the events that would
actually occur.
Figure 1. SAX parser events
XML processing
These events are pushed out to the application in real time, as the parser moves
across the document contents. One benefit of this processing model is that you can
handle large documents with relatively little memory. A downside is that you have
more work to do to handle all these events.
The org.xml.sax package contains a set of interfaces. One of these provides the
XMLReader interface to the parser. You can set up for parsing like this:
try {
XMLReader parser = XMLReaderFactory.createXMLReader();
parser.parse( "myDocument.xml" ); //complete path
} catch ( SAXParseException e ) {
//document is not well-formed
} catch ( SAXException e ) {
//could not find an implementation of XMLReader
} catch ( IOException e ) {
//problem reading document file
}
Apache Xerces2 parser

If you need a parser, you can download the open source Apache
Xerces2 parser from The Apache Software Foundation Web site
(see Resources).
XML processing
Tip: Reuse the parser instance if possible. Creating a parser is expensive. If you
have multiple threads running, you can reuse parser instances from a resource pool.
This is all well and good so far, but how does your application get events from the
parser? I'm glad you asked.
Handling SAX events
To receive events from the parser, you implement the ContentHandler interface.
This interface has a number of methods that you can implement to process your
document. Alternatively, if you only want to handle one or two callbacks, you can
subclass DefaultHandler, which implements all the ContentHandler methods
(doing nothing) and overrides only the methods you need.
Either way, you write logic to do whatever processing you require upon receiving
startElement, characters, endDocument, and other callback methods invoked
by the SAX parser. You can see all the method calls from a document as they would
occur on pages 351-355 of XML in a Nutshell, Third Edition (see Resources).
The callback events are the normal events from a document as it's being parsed.
You can also handle validity callbacks by implementing an ErrorHandler. I'll
discuss this topic after I go over validation, so stay tuned.
To learn more about parsing with SAX, check out Chapter 20 of XML in a Nutshell,
Third Edition or read "Serial Access with the Simple API for XML (SAX)" (see
Resources).
SAX parser exception handling
By default, the parser ignores errors. To take action upon an invalid or

non-well-formed document, you must implement an ErrorHandler (note that
DefaultHandler implements this as well as the ContentHandler interface) and
define an error() method:
public class SAXEcho extends DefaultHandler {

...
//Handle validity errors
public void error( SAXParseException e ) {
echo( e.getMessage() );
echo( "Line " + e.getLineNumber() +
" Column " + e.getColumnNumber();
}
Then you must turn on the validation feature:
parser.setFeature( "http://xml.org/sax/features/validation", true );
Finally, call this code:
parser.setErrorHandler( saxEcho );
XML processing
Remember, parser is an instance of XMLReader. The parser calls the error()

method if the document violates a schema (DTD or XML Schema) rule.
Other ErrorHandler methods

ErrorHandler also has warning and fatalError methods, for
nonviolations and well-formedness violations, respectively. You
don't normally need to do anything in these methods.
Echoing SAX events
As an exercise for the SAX parser skills you've learned, use the SAXEcho.java code
in Listing 2 to output the parser events for the catalog.xml file.
Listing 2. Echoing SAX events
package com.xml.tutorial;
import java.io.IOException;
import java.io.OutputStreamWriter;
import java.io.Writer;
import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.SAXParseException;
import org.xml.sax.XMLReader;
import org.xml.sax.helpers.DefaultHandler;
import org.xml.sax.helpers.XMLReaderFactory;
/**
* A handler for SAX parser events that outputs certain event
* information to standard output.
*
* @author mlorenz
*/
public class SAXEcho extends DefaultHandler {
public static final String XML_DOCUMENT_DTD = "catalogDTD.xml";
//validates via catalog.dtd
public static final String XML_DOCUMENT_XSD = "catalogXSD.xml";
//validates via catalog.xsd
public static final String NEW_LINE = System.getProperty("line.separator");
protected static Writer writer;
/**
* Constructor
*/
public SAXEcho() {
super();
}
/**
* @param args
*/
public static void main(String[] args) {
//-- Set up my instance to handle SAX events
DefaultHandler eventHandler = new SAXEcho();
//-- Echo to standard output
writer = new OutputStreamWriter( System.out );
try {
//-- Create a SAX parser
XMLReader parser = XMLReaderFactory.createXMLReader();
parser.setContentHandler( eventHandler );
parser.setErrorHandler( eventHandler );
parser.setFeature(
"http://xml.org/sax/features/validation", true );
//-- Validation via DTD --
echo( "=== Parsing " + XML_DOCUMENT_DTD + " ===" + NEW_LINE );
XML processing
//-- Parse my XML document, reporting DTD-related errors

parser.parse( XML_DOCUMENT_DTD );
//-- Validation via XSD --
parser.setFeature(
"http://apache.org/xml/features/validation/schema",
true );
echo( NEW_LINE + NEW_LINE + "=== Parsing " +
XML_DOCUMENT_XSD + " ===" + NEW_LINE );
//-- Parse my XML document, reporting XSD-related errors
parser.parse( XML_DOCUMENT_XSD );
} catch (SAXException e) {
System.out.println( "Parsing Exception occurred" );
e.printStackTrace();
} catch (IOException e) {
System.out.println( "Could not read the file" );
}
System.exit(0);
}
//--Implement SAX callback events of interest (default is do nothing) --
/* (non-Javadoc)
* @see org.xml.sax.helpers.DefaultHandler#startElement(java.lang.String,
* java.lang.String, java.lang.String, org.xml.sax.Attributes)
* @see org.xml.sax.ContentHandler interface
* Element and its attributes
*/
@Override
public void startElement( String uri,
String localName,
String qName,
Attributes attributes)
throws SAXException {
if( localName.length() == 0 )
echo( "<" + qName );
else
echo( "<" + localName );
if( attributes != null ) {
for( int i=0; i < attributes.getLength(); i++ ) {
if( attributes.getLocalName(i).length() == 0 ) {
echo( " " + attributes.getQName(i) +
"=\"" + attributes.getValue(i) + "\"" );
}
}
}
echo( ">" );
}
/* (non-Javadoc)
* @see org.xml.sax.helpers.DefaultHandler#endElement(java.lang.String,
* java.lang.String, java.lang.String)
* End tag
*/
@Override
public void endElement(String uri, String localName, String qName)
echo( "</" + qName + ">" );
}
/* (non-Javadoc)
* @see org.xml.sax.helpers.DefaultHandler#characters(char[], int, int)
* Character data inside an element
*/
@Override
public void characters(char[] ch, int start, int length)
String s = new String(ch, start, length);
echo(s);
}
//-- Add additional event echoing at your discretion --
/**
* Output aString to standard output
* @param aString
*/
protected static void echo( String aString ) {
try {
writer.write( aString );
XML processing
writer.flush();
System.out.println( "I/O error during echo()" );
}
}
/* (non-Javadoc)
* @see org.xml.sax.helpers.DefaultHandler#error(org.xml.sax.SAXParseException)
* @see org.xml.sax.ErrorHandler interface
*/
@Override
public void error(SAXParseException e) throws SAXException {
echo( NEW_LINE + "*** Failed validation ***" + NEW_LINE );
super.error(e);
echo( "* " + e.getMessage() + NEW_LINE +
"* Line " + e.getLineNumber() +
" Column " + e.getColumnNumber() + NEW_LINE +
"*************************" + NEW_LINE );
try {
Thread.sleep( 10 );
} catch (InterruptedException e1) {
e1.printStackTrace();
}
}
}
You can use the code in SAXEcho.java to see how SAX parsing all comes together.
Note that this code does not handle all events, so not everything from the original
document will be echoed (see Listing 3). Take a look at the ContentHandler
interface to see what other messages you might get.
Listing 3. Output from SAXEcho execution
=== Parsing catalogDTD.xml ===

<catalog><dvd><title>Terminator 2</title><description>
</description><price>19.95</price><year>1991</year>
</dvd><dvd><title>The Matrix</title><price>10.95</price>
<year>1999</year></dvd><dvd><title>Life as a House</title><description>
</dvd><dvd><title>Raiders of the Lost Ark</title><price>
14.95</price><year>1981</year></dvd></catalog>
=== Parsing catalogXSD.xml ===
<catalog>
<dvd>
<description>
</description>
<year>1991</year>
</dvd>
<dvd>
<year>1999</year>
</dvd>
<dvd>
<description>
XML processing
</description>
<year>2001</year>
</dvd>
<dvd>
<year>1981</year>
</dvd>
</catalog>
Using the DOM parser

In contrast to the SAX parser, the DOM parser builds a tree structure based on the
XML document contents (see Figure 2). For simplicity, some parsing actions are not
shown.
Figure 2. DOM parser tree
DOM doesn't specify an interface for the XML parser, so different vendors have
different parser classes. I'll continue to use the Xerces parser, which has a
DOMParser class.
You set up a DOM parser like this:
DOMParser parser = new DOMParser();
XML processing
try {
parser.parse( "myDocument.xml" );
Document document = parser.getDocument();
} catch (DOMException e) {
// take validity action here
// well-formedness action here
// take I/O action here
}
Traversing the DOM tree
DOM incurs an expense in time and memory to construct an entire document tree.
The payback comes from the many ways that you can traverse and manipulate the
document's content using the tree structure. Figure 3 shows a portion of the DVD
catalog document.
Figure 3. Traversing the DOM tree
The tree has a root, which you can access through the
Document.getDocumentElement() method. From any Node, you can use
Node.getChildNodes() to get a NodeList of children of the current Node. Note
that attributes are not considered a child of the containing Node. You can create new
Nodes, append them, insert them, locate them by name, and remove them. These
are just a few of the available capabilities.
One of the more powerful methods is Document.getElementsByTagName(),

which returns a NodeList of the matching Nodes in the descendant elements. The
DOM tree is available on the client as well as the server.
XML processing
Client traversal
You can traverse the DOM tree in the client, and you can validate actions on an
XHTML page through JavaScript from within the browser. For example, the client
might need to find out if a Node with a particular name already exists:
//-- make sure a new DVD's title is unique

var titles = document.getElementsByTagName("title");
var newTitleValue = newTitle.getNodeValue();
var nextTitle;
for( i=0; i < titles.getLength(); i++ ) {
nextTitle = titles.item(i); //NodeList access by index
if( nextTitle.getNodeValue().equals( newTitleValue ) {
//take some action
}
}
Server traversal
On the server, you will certainly need to manipulate the tree, such as to add a new
child to a Node:
//-- add a new DVD with aName and description

public void createNewDvd( String aName, String description ) {
Element catalog = document.getDocumentElement(); //root
Element newDvd = document.createElement( aName );
Element dvdDescription =
document.createTextNode( description );
newDvd.appendChild( dvdDescription );
catalog.appendChild( newDvd ); //as last element
}
XHTML as an alternative
This tutorial works with a data document, but the document could
easily be an XHTML page, in which case you'd see Nodes such as
head, body, p, td, and li.
Caution: Make sure to use DOM interfaces, such as NodeList or NamedNodeMap,

to manipulate the tree. The DOM tree is dynamic, meaning it is updated immediately
based on changes you're making, so if you use local variables to cache values, they
might be wrong. For example, Node.getLength() returns a different value after a
call to removeChild().
DOM parser exception handling
DOM3
DOM3 has added a DOMErrorHandler, which provides a callback
mechanism to use instead of DOMException. Here is some
example code:

DOMConfiguration domConfig = document.domConfig;
domConfig.setParameter( DOMErrorHandler handler );
The class that implements the DOMErrorHandler interface has a
XML processing
handleError(DOMError error) method, which returns true to

continue processing or false to stop processing (fatal errors
always stop processing).
The DOM parser throws a DOMException if problems occur during parsing. This is
a RuntimeException, since some languages don't support checked exceptions,
but you should always catch it or throw it in your Java code.
To detect manipulation problems, use the code of a DOMException. These codes

tell you what is wrong, such as an attempted change that makes the document
invalid (DOMException.INVALID_MODIFICATION_ERR) or a target Node that
could not be found (DOMException.NOT_FOUND_ERR). The DOMException section
within Chapter 9 of Processing XML with Java: A Guide to SAX, DOM, JDOM,
JAXP, and TrAX offers a complete list of DOMException codes with explanations
(see Resources).
Echoing the DOM tree
As an exercise for the DOM parser skills you've learned, use the DOMEcho.java
code in Listing 4 to output the contents of the DOM tree for the catalog.xml file. After
this code echoes the tree information, it then changes the tree and echoes the
updated tree.
Listing 4. Echoing a DOM tree
package com.xml.tutorial;
import java.io.IOException;
import java.io.OutputStreamWriter;
import java.io.Writer;
import org.w3c.dom.DOMException;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.NamedNodeMap;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.w3c.dom.traversal.DocumentTraversal;
import org.w3c.dom.traversal.NodeFilter;
import org.w3c.dom.traversal.TreeWalker;
import org.xml.sax.SAXException;
import com.sun.org.apache.xerces.internal.parsers.DOMParser;
/**
* A handler to output certain information about a DOM tree
* to standard output.
*
* @author lorenzm
*/
public class DOMEcho {
public static final String XML_DOCUMENT_DTD =
"catalogDTD.xml"; //validates via catalog.dtd
public static final String NEW_LINE = System.getProperty("line.separator");
protected static Writer writer;
// Types of DOM nodes, indexed by nodeType value (e.g. Attr = 2)
protected static final String[] nodeTypeNames = {
"none", //0
"Element", //1
"Attr", //2
"Text", //3
XML processing
"CDATA", //4
"EntityRef", //5
"Entity", //6
"ProcInstr", //7
"Comment", //8
"Document", //9
"DocType", //10
"DocFragment", //11
"Notation", //12
};
//-- DOMImplementation features (we only need one for now)
protected static final String TRAVERSAL_FEATURE = "Traversal";
//-- DOM versions (we're using DOM2)
protected static final String DOM_2 = "2.0";
/**
* Constructor
*/
public DOMEcho() {
super();
}
/**
* @param args
*/
public static void main(String[] args) {
//Echo to standard output
writer = new OutputStreamWriter( System.out );
//use the Xerces parser
try {
parser.setFeature( "http://xml.org/sax/features/validation", true );
parser.parse( XML_DOCUMENT_DTD ); //use DTD grammar for validation
Document document = parser.getDocument();
echoAll( document );
//-- add description for Indiana Jones movie
//---- find parent Node
Element indianaJones = document.getElementById("_7755522");
//---- insert a description before the price
// (anywhere else would be invalid)
NodeList prices = indianaJones.getElementsByTagName("price");
Node desc = document.createElement("description");
desc.setTextContent(
"Indiana Jones is hired to find the Ark of the Covenant");
indianaJones.insertBefore( desc, prices.item(0) );
//-- now, echo the document again to see the change
echoAll( document );
} catch (DOMException e) { //handle invalid manipulations
short code = e.code;
if( code == DOMException.INVALID_MODIFICATION_ERR ) {
//take action when invalid manipulation attempted
} else if( code == DOMException.NOT_FOUND_ERR ) {
//take action when element or attribute not found
} //add more checks here as desired
}
}
/**
* Echo all the Nodes, in preorder traversal order, for aDocument
* @param aDocument
*/
protected static void echoAll(Document aDocument) {
if( aDocument.getImplementation().hasFeature(
TRAVERSAL_FEATURE,DOM_2) ) {
echo( "=== Echoing " + XML_DOCUMENT_DTD + " ===" + NEW_LINE );
Node root = (Node) aDocument.getDocumentElement();
int whatToShow = NodeFilter.SHOW_ALL;
NodeFilter filter = null;
boolean expandRefs = false;
//-- depth first, preorder traversal
XML processing
DocumentTraversal traversal = (DocumentTraversal)aDocument;

TreeWalker walker = traversal.createTreeWalker(
(org.w3c.dom.Node) root, //where to start
//(cannot go "above" the root)
whatToShow, //what to include
filter, //what to exclude
expandRefs); //include referenced entities or not
for( Node nextNode = (Node) walker.nextNode(); nextNode != null;
nextNode = (Node) walker.nextNode() ) {
echoNode( nextNode );
}
} else {
echo( NEW_LINE + "*** " + TRAVERSAL_FEATURE +
" feature is not supported" + NEW_LINE );
}
}
/**
* Output aNode's name, type, and value to standard output.
* @param aNode
*/
protected static void echoNode( Node aNode ) {
String type = nodeTypeNames[aNode.getNodeType()];
String name = aNode.getNodeName();
StringBuffer echoBuf = new StringBuffer();
echoBuf.append(type);
if( !name.startsWith("#") ) { //do not output duplicate names
echoBuf.append(": ");
echoBuf.append(name);
}
if( aNode.getNodeValue() != null ) {
if( echoBuf.indexOf("ProcInst") == 0 )
echoBuf.append( ", " );
else
echoBuf.append( ": " ); //output only to first newline
String trimmedValue = aNode.getNodeValue().trim();
int nlIndex = trimmedValue.indexOf("\n");
if( nlIndex >= 0 ) //found newline
trimmedValue = trimmedValue.substring(0,nlIndex);
echoBuf.append(trimmedValue);
}
echo( echoBuf.toString() + NEW_LINE );
echoAttributes( aNode );
}
/**
* Output aNode's attributes to standard output.
* @param aNode
*/
protected static void echoAttributes(Node aNode) {
NamedNodeMap attr = aNode.getAttributes();
if( attr != null ) {
StringBuffer attrBuf = new StringBuffer();
for( int i = 0; i < attr.getLength(); i++ ) {
String type = nodeTypeNames[attr.item(i).getNodeType()];
attrBuf.append(type);
attrBuf.append( ": " + attr.item(i).getNodeName() + "=" );
attrBuf.append( "\"" + attr.item(i).getNodeValue() + "\"" +
NEW_LINE );
}
echo( attrBuf.toString() );
}
}
/**
* Output aString to standard output
* @param aString
*/
protected static void echo( String aString ) {
try {
writer.write( aString );
writer.flush();
System.out.println( "I/O error during echo()" );
XML processing
}
}
}
Look at some portions of the logic:
protected static final String[] nodeTypeNames = {

...
};
This array maps the Node.getNodeType() int value to each of the types of
Nodes that you can encounter:
if( aDocument.getImplementation().hasFeature(
TRAVERSAL_FEATURE,DOM_2) ) {
DOM1 versus DOM2

In DOM1, traversing the document tree was done in a "linear"
fashion, with previous and next Nodes acquired using
NodeIterators and NodeFilters. In DOM2, the TreeWalker
interface added the concept of a current Node, with movement to
parent, child, and sibling.
You can read about DOM's NodeIterator and NodeFilter as

well as DOM2's TreeWalker in Chapter 12 of Processing XML
with Java: A Guide to SAX, DOM, JDOM, JAXP, and TrAX (see
Resources).
Bruno R. Preiss explains different tree traversals (see Resources).
DOMEcho takes advantage of the TreeWalker interface introduced in DOM2 (see

DOM 1 versus DOM 2). To be safe, check to make sure your parser supports this
feature. You can read about all the available features in the "DOM Modules" section
in Chapter 9 of Processing XML with Java: A Guide to SAX, DOM, JDOM, JAXP,
and TrAX (see Resources).
Basically, DOMEcho has an echoAll(Document aDoc) method, which uses the

TreeWalker with no filtering to get the Nodes in preorder traversal order (see DOM
1 versus DOM 2). echoNode(Node aNode) is then called for each. In turn,
echoNode calls echoAttributes(Node aNode) for its Node:

NodeList prices = indianaJones.getElementsByTagName("price");
Node desc = document.createElement("description");
desc.setTextContent(
indianaJones.insertBefore( desc, prices.item(0) );
This section of code is what changes the DOM tree. It adds a description in the
XML processing
correct place so that the tree is still valid according to the document's schema.
Listing 5 shows the resulting output from DOMEcho.
Listing 5. Output from DOMEcho
=== Echoing catalogDTD.xml ===

Text:
Comment: DVD inventory
Text:
Element: dvd
Attr: code="_1234567"
Text:
Element: title
Text: Terminator 2
Text:
Element: description
Text: A shape-shifting cyborg is sent back from the future
Text:
Element: price
Text: 19.95
Text:
Element: year
Text: 1991
Text:
Text:
Element: dvd
Attr: code="_7654321"
Text:
Element: title
Text: The Matrix
Text:
Element: price
Text: 10.95
Text:
Element: year
Text: 1999
Text:
Text:
Element: dvd
Attr: code="_2255577"
Attr: genre="Drama"
Text:
Element: title
Text: Life as a House
Text:
Text: When a man is diagnosed with terminal cancer,
Text:
Element: price
Text: 15.95
Text:
Element: year
Text: 2001
Text:
Text:
Element: dvd
Attr: code="_7755522"
Attr: genre="Action"
Text:
Element: title
Text: Raiders of the Lost Ark
Text:
Element: price
Text: 14.95
Text:
Element: year
Text: 1981
Text:
XML processing
Text:
=== Echoing catalogDTD.xml ===
Text:
Comment: DVD inventory
Text:
Element: dvd
Attr: code="_1234567"
Text:
Element: title
Text: Terminator 2
Text:
Text: A shape-shifting cyborg is sent back from the future
Text:
Element: price
Text: 19.95
Text:
Element: year
Text: 1991
Text:
Text:
Element: dvd
Attr: code="_7654321"
Text:
Element: title
Text: The Matrix
Text:
Element: price
Text: 10.95
Text:
Element: year
Text: 1999
Text:
Text:
Element: dvd
Attr: code="_2255577"
Attr: genre="Drama"
Text:
Element: title
Text: Life as a House
Text:
Text: When a man is diagnosed with terminal cancer,
Text:
Element: price
Text: 15.95
Text:
Element: year
Text: 2001
Text:
Text:
Element: dvd
Attr: code="_7755522"
Attr: genre="Action"
Text:
Element: title
Text: Raiders of the Lost Ark
Text:
Text: Indiana Jones is hired to find the Ark of the Covenant
Element: price
Text: 14.95
Text:
Element: year
Text: 1981
Text:
Text:
Whitespace
XML processing
You'll notice a lot of Text Nodes in the DOMEcho output (Listing 6), many of them
with nothing apparent as content. Why would that be?
The parser reports whitespace (extra spaces, tabs, and carriage returns) that occurs
within the document's element contents.
Notice what's not reported: whitespace within elements, such as surrounding

attributes. Not shown here, but also not reported, is whitespace in the prolog. Note
that there is a Text Element for the description, but the whitespace is
normalized to strip out extra characters before and after the nonwhitespace content.
The Text elements due to whitespace that is in Element content are called
ignorable whitespace. Ignorable whitespace is not part of validation, as you're about
to see in Figure 4.
Figure 4. Whitespace processing
Section 3. Validating XML documents

Validation consists of ensuring the proper structure and content of XML documents
using a grammar. You can specify a grammar by using an XML schema, which can
take the form of a DTD or XML Schema file (see Schemas). This section of the
tutorial discusses DTD and XML Schema files.
Schemas
Technically speaking, DTDs, XML Schemas (capital S), and RELAX
NG are all types of XML schema (little s). XML Schemas (capital S)
are strictly called W3C XML Schemas. In this tutorial, whenever you
see XML Schema, realize that it's the W3C language and not the
generic schema document description.
XML processing
Validating using a DTD

A DTD defines constraints to put on an XML instance document. These constraints
are not related to well-formedness. In fact, a document that is not well-formed is not
considered an XML document at all. Constraints relate to business rules about
content that must hold true for you to be able to use the document with an
application.
A DTD specifies the elements and attributes that an XML instance document must
contain to be considered valid. You can associate a document with a DTD by
including a DOCTYPE statement near the top of the document:
<!DOCTYPE catalog SYSTEM "catalog.dtd">
Now, go through the catalog.dtd file. To validate a document, you need to turn
validation on and use a validating parser. With this code, turn on validation for the
SAX parser:
saxParser.setFeature(
"http://xml.org/sax/features/validation", true );
With this code, turn on validation for the DOM parser:
domParser.setFeature(
"http://xml.org/dom/features/validation", true );
Figure 5 shows the catalog.dtd file.
Figure 5. Catalog DTD
XML processing
Go line by line through the DTD to see what is being specified:
<!ELEMENT catalog (dvd+)>
The dvd+ specifies that a <catalog> element has one or more <dvd>s. Makes
sense; otherwise, you aren't going to be selling too many DVDs!
<!ELEMENT dvd (title, description?, price, year)>
The title, ..., year is called a sequence. It means that the named elements
must appear in this order as children of a <dvd> element. The question mark after
description means that a <dvd> has zero or one description elements -- in other
words, it's optional but if it is specified, there can only be one (an asterisk means
zero or more, and a plus sign means one or more).
<!ATTLIST dvd code ID #REQUIRED>
An ID type attribute must have a unique name within the document. You'll notice
that in the catalog.xml file, the IDs begin with an underscore. An XML name
cannot start with a number, but an underscore (or letter or many other nondigit
character) is fine. An element can only have one ID type. REQUIRED, as you might
have guessed, means that a <dvd> must have a code.
<!ATTLIST dvd genre ( Drama | Comedy | SciFi | Action | Romance ) #IMPLIED>
This is an enumeration. Since it is IMPLIED, it is optional. However, if it does appear

in the document, it must be one of the enumerated values (read them as "Drama or
XML processing
Comedy or ...").
<!ELEMENT title (#PCDATA)>

<!ELEMENT description (#PCDATA)>
<!ELEMENT price (#PCDATA)>
<!ELEMENT year (#PCDATA)>
These remaining lines all specify parsed character data. None of these elements
may have children.
Now try to change the instance document to make sure the rules work correctly.
First, add a <description>, but put it at the end of the <dvd>. As expected, you
get an error (see Figure 6).
Figure 6. Description error
Now, add a genre (see Figure 7).
Figure 7. Genre error
XML processing
Why didn't that work?! Science fiction is in the list! D'oh -- XML is case-sensitive, as
you know, so "scifi" won't work. It needs to be "SciFi".
Now check to see if IDs really need to be unique. Copy the same code into another
<dvd> (see Figure 8).
Figure 8. ID error
Sure enough, you get an appropriate error. You get the idea. Feel free to use the
DTD and XML files to try out other changes (see Download for the source files).
DTD exception handling
XML processing
To handle DTD manipulation errors, you must turn on validation. For Xerces, you set
the schema validation feature to true:
parser.setFeature(
true );
You can read about the different Xerces parser features at The Apache Software
Foundation Web site (see Resources). To read more about validation with DTDs,
check out Chapter 3 of XML in a Nutshell, Third Edition (see Resources).
Validating with SAXEcho
Now, check out the validation. Comment out the price for the Life as a House dvd
in the XML document and see the results, using both DTD and XSD files for
validation. Listing 6 shows the output.
Listing 6. Output from SAXEcho execution
=== Parsing catalogDTD.xml ===

<catalog><dvd><title>Terminator 2</title><description>
</dvd><dvd><title>The Matrix</title><price>10.95
</price><year>1999</year></dvd><dvd><title>Life as a House</title><description>
</description><year>2001</year>
*** Failed validation ***
* The content of element type "dvd" must match "(title,description?,price,year)".
*************************
</dvd><dvd><title>Raiders of the Lost Ark</title><price>14.95
</price><year>1981</year></dvd></catalog>
=== Parsing catalogXSD.xml ===
<catalog>
<dvd>
<description>
</description>
<year>1991</year>
</dvd>
<dvd>
<year>1999</year>
</dvd>
<dvd>
<description>
</description>
*** Failed validation ***

* cvc-complex-type.2.4.a: Invalid content was found starting with
element 'year'. One of '{"":price}' is expected.
*************************
XML processing
<year>2001</year>
</dvd>
<dvd>
<year>1981</year>
</dvd>
</catalog>
Validating using an XML schema

Perhaps you're wondering: If I have DTDs to make sure a document's structure and
content is valid, why do I need another way to validate documents? I'll give you a
few reasons:
• Granular control over element and attribute values: XML Schema

allows you to specify the format, length, and data type.
• Complex data types: XML Schema supports the creation of new data
types and specialization from existing types.
• Element occurrence: With XML Schema, granular control of elements is
possible.
• Namespaces: XML Schema works with namespaces, which become
important for organizations that deal with other organizations.
The XML Schema language is more powerful than the DTD language and thus is
also more complicated. One nice aspect is that XML Schemas are written in XML,
whereas DTDs are not.
XSD
XML Schema is also known as XML Schema Definition, thus the file
extension .xsd.
Let's validate the same XML instance document that you used for DTD validation in
Listing 1. Listing 7 shows the XML Schema:
Listing 7. Catalog XML Schema
<?xml version="1.0" encoding="UTF-8"?>

<xs:schema elementFormDefault="qualified" xml:lang="EN"
xmlns:xs="http://www.w3.org/2001/XMLSchema">

<xs:element name="catalog">
<xs:complexType>
<xs:sequence minOccurs="4" maxOccurs="unbounded">
<xs:element ref="dvd"/>
</xs:sequence>
</xs:complexType>
</xs:element>

<xs:element name="dvd">
<xs:complexType>
<xs:sequence>
XML processing
<xs:element name="title" type="xs:string"/>

<xs:element name="description" type="descriptionString"
minOccurs="0"/>
<xs:element name="price" type="priceValue"/>
<xs:element name="year" type="yearString"/>
</xs:sequence>
<xs:attribute name="code" type="xs:ID"/> 
<xs:attribute name="genre"> 
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:enumeration value="Drama"/>
<xs:enumeration value="Comedy"/>
<xs:enumeration value="SciFi"/>
<xs:enumeration value="Action"/>
<xs:enumeration value="Romance"/>
</xs:restriction>
</xs:simpleType>
</xs:attribute>
</xs:complexType>
</xs:element>

<xs:simpleType name="descriptionString">
<xs:minLength value="10"/>
<xs:maxLength value="120"/>
</xs:restriction>
</xs:simpleType>

<xs:simpleType name="priceValue">
<xs:restriction base="xs:decimal">
<xs:totalDigits value="4"/>
<xs:fractionDigits value="2"/>
<xs:maxExclusive value="100.00"/>
</xs:restriction>
</xs:simpleType>

<xs:simpleType name="yearString">
<xs:pattern value="(19|20)\d\d"/>
</xs:restriction>
</xs:simpleType>
</xs:schema>
Notice that the XML Schema is a lot more involved than the corresponding DTD. In
fact, even taking out the comments and spacing, this schema is more than 50 lines
long, as opposed to the DTD schema that is nine lines long. (Granted, this schema
does more detailed checking than the DTD does). So, along with more granular
control comes more complexity -- a lot more complexity. The message is: If your
validation needs don't require an XML Schema, use a DTD.
Review the added value list for XML Schemas to see how the DVD catalog
documents benefit, in addition to enforcing comparable constraints from the DTD
you used before:
• Granular control over element and attribute values: Unlike the DTD,
which allows any character values, the XSD constrains the values of
descriptions (20 to 120 characters), prices (0.00 to 100.00), and years
(1900 to 2999).
• Complex data types: You created new data types that you can reuse
XML processing
and extend even further: dvd, descriptionString, priceValue, and

yearString.
• Element occurrence: Since this tutorial has a small document, I set the
number of DVDs to be four or more so the document would be valid. In
reality, the minimum would probably be a larger number, but you can see
that these types of constraints are possible.
• Namespaces: You only used namespaces for XML Schema types, but
since XML Schemas are namespace-aware, you know that you can add
more namespaces to control name collisions.
Let's discuss some more points about the XML Schema to understand its contents:
• xs:complexType and xs:simpleType. A complexType is an element

that contains other elements or attributes:
<xs:element name="dvd">
<xs:complexType>
<xs:sequence>
...
A simpleType is an element that only contains text and its own attribute
values:
</xs:restriction>
</xs:simpleType>
In this particular case, you define a new type called yearString that
must contain four digits and begin with either "19" or "20." You use the
xs:restriction element to derive a new, constrained type from an
existing (base) type. You use the xs:pattern facet element to compare
values to see if they match the specified expression (see Facets).
• xs:sequence. The child elements must appear in the exact order listed
(although minOccurs can make an element optional, as you saw):
<xs:sequence>
<xs:element name="description" type="descriptionString" minOccurs="0"/>
<xs:element name="price" type="priceValue"/>
<xs:element name="year" type="yearString"/>
</xs:sequence>
The sequence declares that dvds in a valid document must have a

title, optionally followed by a description of between 10 and 120
characters, followed by a price of less than US$100 in the format
XML processing
"nn.nn," and finally a year.
Facets
Schemas support a set of possible aspects for values. These
aspects are called facets and are used with a restriction to constrain
the valid values. The following facet types are available:
• pattern
• enumeration
• minLength and maxLength
• minInclusive, maxInclusive, minExclusive, and

maxExclusive
• totalDigits and fractionDigits
• whiteSpace
Note: Validation for XML Schemas requires XMLBuddy Pro.
Now make some edits and verify that your constraints are being enforced. Add a
genre of Adventure, enter a description more than 120 characters long, and
duplicate a dvd code (see Figure 9).
Figure 9. XSD errors
You can see that the genre, unique ID, and description length are all enforced.
XML Schema is capable of much more. Here are a few highlights:
• xs:choice: One of the child elements must appear.
XML processing
• xs:all: Each of the child elements listed must appear once, but they
can appear in any order.
• xs:group: A set of elements of the group name can be defined and then
referenced (throughref=groupName).
• xs:attributeGroup: This is the corresponding indicator for attributes,
as xs:group is for elements.
• xs:date: This is a Gregorian calendar date as defined in ISO 8601,
formatted as YYYY-MM-DD.
• xs:time: The time is represented by hh:mm:ss, with or without "Z" for
UTC relative time.
• xs:duration: An amount of years, months, days, hours, and minutes.
As you can see, a lot of built-in power is available when you write an XML Schema.
Can't find what you need? Create a new type.
Data types
A powerful feature of XML Schema is the capability to create new data types. You
saw new types used extensively in the catalog.xsd file, including the creation of the
yearString and priceValue types. In this case, these types are only used in the
dvd type, but you could use them anywhere that years or prices appear in the
document.
These types extend existing decimal and string types:


<xs:simpleType name="priceValue">
<xs:restriction base="xs:decimal">
<xs:totalDigits value="4"/>
<xs:fractionDigits value="2"/>
<xs:maxExclusive value="100.00"/>
</xs:restriction>
</xs:simpleType>

</xs:restriction>
</xs:simpleType>
As I mentioned before, you can specialize an existing type using the restriction
element in combination with one or more facets. If more than one facet exists, you
can use them in combination to determine which values are valid and which are not.
Pattern matching
The pattern facet element supports a rich expression syntax that is similar to Perl.
You saw it used for the yearString, where you can read the pattern "
(19|20)\d\d" as "the string must start with either one-nine or two-zero and must
be followed by two decimal numbers." Table 1 shows a few more patterns.
XML processing
Table 1. XML Schema pattern-matching expressions

Pattern Matches
(A|B) A string that matches A or B
A? Zero or one occurrence of a string that matches
A
A* Zero or more occurrences of a string that
matches A
A+ One or more occurrences of a string that
matches A
[abcd] A character that matches one of the specified
characters
[^abc] A character other than those specified
\t A tab character
\\ A backslash character
\c An XML name character
\s A space, tab, carriage-return, or line-feed
character
. Any character except a carriage return or line
feed
To read more about the many possibilities for expressions, see pages 427-429 of
XML in a Nutshell, Third Edition or view Table 24-5 in Chapter 24 of XML Bible,
Second Edition online (see Resources).
XSD exception handling
To handle XML Schema manipulation errors, you must turn on validation. For
Xerces, set the schema validation feature to true:
parser.setFeature(
true );
You can read about the different Xerces parser features on The Apache Software
Foundation Web site (see Resources).
I previously discussed DOMExceptions that can occur due to manipulation

problems. The DOMException's code indicates what type of problem has occurred.
DOMEcho revisited
Change the logic of DOMEcho.java to cause a DOMException. Here's the new

logic:

XML processing

NodeList years = indianaJones.getElementsByTagName("price");
Node desc = document.createTextNode(
// This change will now fail validation.
indianaJones.insertBefore( desc, indianaJones );
This results in the following code being executed:
short code = e.code;

...
} else if( code == DOMException.NOT_FOUND_ERR ) {
//take action when element or attribute not found
echo( "*** Element not found" );
System.exit(code);
}
To read more about validation with XML Schemas, check out Chapter 17 of XML in
a Nutshell, Third Edition, W3Schools, or "Interactive XML tutorials" (see Resources).
Section 4. Using XQuery

XML Query (XQuery) is a language for writing expressions that return matching
results from XML data, often in a database. The functionality is like that provided by
SQL for non-XML content:
"Like SQL, XQuery contains functions for extracting, summarizing,

aggregating, and joining data from multiple datasets."
--"Java theory and practice: Screen-scraping with XQuery" by Brian
Goetz (see Resources)
XQuery expands upon XPath expressions, which the fourth part of this tutorial on
XML transformations discusses in detail. An XPath expression is also a valid XQuery
expression. So, why do you need XQuery? The value-add for XQuery is due to
clauses that XQuery adds to its expressions, allowing for more complicated
expressions much like a SELECT statement does in SQL.
XQuery clauses
XQuery contains multiple clauses, represented by the acronym FLWOR: for, let,
where, order by, return. Table 2 shows these parts.
Table 2. FLWOR clauses

Clause Description
for You use this looping construct to assign
values to variables used within the other
clauses. You declare the variables with a
XML processing
dollar sign, as in $name, and get values

assigned to them from the search results.
let You use a let to assign a value to a
variable outside of a for.
where Much like in SQL, you use a where clause
to filter the results based on some criteria.
order by You use this clause to determine how to
sort the result set (ascending or
descending).
return You use the return clause to determine
the contents of the output of the query. The
contents can include literals, XML document
contents, HTML markup, or many other
possibilities.
XQuery contains a condition that evaluates to true or false and comprises the
search criteria within the FLWOR clauses. Look at some examples. You can use the
dvd.xml file shown in Listing 8 as the XML instance document.
Listing 8. dvd.xml
<?xml version="1.0"?>

<catalog>
<dvd code="1234567">
<year>1991</year>
</dvd>
<dvd code="7654321">
<year>1999</year>
</dvd>
<dvd code="2255577">
<year>2001</year>
</dvd>
<dvd code="7755522">
<year>1981</year>
</dvd>
</catalog>
Saxon
You can get the free Saxon tools at Saxonica if you want to try out
XQuery yourself (see Resources).
To try this out, I used the Saxon XQuery tools. All my files are in the directory I
unpacked Saxon into. To use XQuery to create an HTML page that lists all the DVD
titles in ascending order, I used the dvdTitles.xq file shown in Listing 9, which also
shows the output. I used the following command to execute this query:
java -cp saxon8.jar net.sf.saxon.Query -t dvdTitles.xq > dvdTitles.html
XML processing
Listing 9. XQuery to list DVD titles in ascending order
dvdTitles.xq:
<html>
<body>
Available DVDs:
<br/>
<ol>
{
for $title in doc("dvd.xml")/catalog/dvd/title
order by $title
return <li>{data($title)}</li>
}
</ol>
</body>
</html>
dvdTitles.html:
<html>
<body>
Available DVDs:
<br/>
<ol>
<li>Life as a House</li>
<li>Raiders of the Lost Ark</li>
<li>Terminator 2</li>
<li>The Matrix</li>
</ol>
</body>
</html>
In Listing 9, look at the XQuery logic in detail. First of all, the query must be
surrounded by curvy brackets ("{}"). You can see in this example that three of the
clauses are used (for, order by, and return). You use the doc() function to
open an XML document. $title is a variable that is set to each of the search
results during each loop. In this case, it is set to each result of the
/catalog/dvd/title expression -- thus, its name. The data() function in the
return clause pulls out just the value from the XML without the tags. If you just put
$title, you would get "<title>value</title>," which you don't want in your
HTML output. Notice that the XQuery is surrounded with all the HTML needed to
complete the page.
Now, suppose you want to output the prices for DVDs that cost more than US$15 in
descending order. Listing 10 shows the XQuery and output files.
Listing 10. DVD prices > US$15 in descending order
dvdPriceThreshold.hq
<html>
<body>
DVDs prices below $15.00:
<br/>
<ol>
{
for $price in doc("dvd.xml")/catalog/dvd/price
where $price < 15.00
XML processing
order by $price descending

return <li>{data($price)}</li>
}
</ol>
</body>
</html>
dvdPrices.html
<html>
<body>
DVDs prices below $15.00:
<br/>
<ol>
<li>14.95</li>
<li>12.95</li>
</ol>
</body>
</html>
The main difference with this query is that you specified a where clause. Just for
fun, you also reversed the sort order.
Obviously, you can do a lot more to learn the power of XQuery, but I've covered
enough to show you some of the possibilities. To learn more, check out "XQuery"
and "Five Practical XQuery Applications" (see Resources).
Section 5. Conclusion
The core of XML is parsing and validation. Knowing how to use these capabilities
well is vital to the successful introduction of XML to your project.
Summary
In this tutorial on XML processing, you've seen how to:
• Parse XML documents using the SAX2 and DOM2 parsers

• Validate XML documents against DTDs and XML Schemas
• Access XML content from databases using XQuery
XML processing
Downloads
Description Name Size Download method
Sample DTD and XML files x-cert1423-code-samples.zip
16KB HTTP
Information about download methods
XML processing
Resources
Learn
• XML and Related Technologies certification prep (developerWorks, August -
October, 2006): With this series of five tutorials, prepare to take the IBM
certification Test 142, XML and Related Technologies, to attain the IBM
Certified Solution Developer - XML and Related Technologies certification.
• XML: A Manager's Guide, Second Edition (Kevin Dick, Addison-Wesley
Professional, 2002): Read about uses of XML technologies in enterprise
applications.
• XML in a Nutshell, 3rd Edition (Elliotte Rusty Harold and W. Scott Means,
O'Reilly Media, 2004, ISBN: 0596007647): Check out this comprehensive XML
reference with everything from fundamental syntax rules, DTD and XML
Schema creation, XSLT transformations, processing APIs, XML 1.1, plus SAX2
and DOM Level 3.
• XQuery (Jim Keogh and Ken Davidson, McGraw-Hill/Osborne, 2005; ISBN:
0072262109): Learn to write XQuery expressions in this excerpt from chapter 9
of the book XML DeMYSTiFieD.
• Five Practical XQuery Applications (Tim Matthews and Srinivas Pandrangi, 9
May 2003): Add XQuery in your own apps to simplify difficult or tedious tasks.
• An Introduction to StAX (Elliotte Rusty Harold, O'Reilly Media, September 17,
2003): Read more about Streaming API for XML (StAX) in this article.
• Interactive XML tutorials: Explore a variety of XML topics including, SVG, DTD,
Schema, XSLT, DOM and SAX complete with student problems, access to
online parsers to process your answers for immediate feedback.
• W3Schools online Web tutorials: Discover Web-building tutorials, from basic
HTML and XHTML to advanced XML, SQL, Database, Multimedia and WAP.
• Java theory and practice: Screen-scraping with XQuery (Brian Goetz,
developerWorks, 22 Mar 2005): See how effectively you can use XQuery as an
HTML screen-scraping engine.
• Power your mashups with XQuery (Ning Yan, developerWorks, July 2006):
Create a mashup application that uses XQuery to couple Web content with XML
data and Web services.
• The Java XML Validation API (Elliotte Rusty Harold, developerWorks, August
2006): Check your documents for conformance to schemas with this XML
validation API.
• Saxonica: XSLT and XQuery Processing: Learn about this collection of tools for
processing XML documents that includes XSLT 2.0, XPath 2.0, XQuery 1.0,
and XML Schema 1.0 processors.
• DOMException from Chapter 9 of Processing XML with Java: A Guide to SAX,
DOM, JDOM, JAXP, and TrAX (Elliotte Rusty Harold, Addison-Wesley
XML processing
Professional, 2002): Read about DOMException -- generic, runtime exception.

• DOM Modules section in Chapter 9 of Processing XML with Java: A Guide to
SAX, DOM, JDOM, JAXP, and TrAX (Elliotte Rusty Harold, Addison-Wesley
Professional, 2002): Read about the fourteen modules in eight different
packages of DOM2.
• Chapter 12, The DOM Traversal Module of Processing XML with Java: A Guide
to SAX, DOM, JDOM, JAXP, and TrAX (Elliotte Rusty Harold, Addison-Wesley
Professional, 2002): Delve into this collection of utility interfaces that perform
most of the logic to traverse a DOM tree for simpler programs .
• Setting Features: Read how to set and query features from The Apache
Software Foundation, 2005.
• Serial Access with the Simple API for XML (SAX): Discover SAX -- the
event-driven, serial-access mechanism for accessing XML documents.
• Tree traversals: Bruno R. Preiss explains different tree traversals.
• XML Bible, Second Edition (Elliotte Rusty Harold): View Table 24-5 in Chapter
24 for a grammar of regular expressions symbols for XML schema.
• IBM XML 1.1 certification: Become an IBM Certified Developer in XML 1.1 and
related technologies.
• XML: See developerWorks XML Zone for a wide range of technical articles and
tips, tutorials, standards, and IBM Redbooks.
• developerWorks technical events and webcasts: Stay current with technology in
these sessions.
Get products and technologies
• Apache Xerces2 parser: Download the open source for a XML-compliant parser
that includes the Xerces Native Interface (XNI) framework for building parser
components and configurations.
• Java software development kit (JDK) 1.4.2 or later: Download the JDK to build
standards-based, interoperable apps, applets, and Web services.
• Eclipse 3.1 or later: Download this open source, extensible development
platform and application frameworks for building software.
• XMLBuddy 2.0 or later: Download and start to work in XML-related technology,
including XML, DTD, XML Schema, RELAX NG, RELAX NG compact syntax
and XSLT. You can get XMLBuddy as an Eclipse plugin.
Discuss
• XML zone discussion forums: Participate in any of several XML-centered
forums.
• developerWorks blogs: Get involved in the developerWorks community.
XML processing
About the author

Mark Lorenz
Mark Lorenz is the founder of Hatteras Software, an object-oriented consulting firm,
and the author of multiple books on software development. He is certified in
object-oriented analysis and design (OOAD), XML, RAD, and Java. He uses XHTML,
Web services, Ajax, JSF, Spring, BIRT, and related Eclipse-based tools to develop
Java enterprise applications. You can read Mark's blog on technology.
Trademarks
IBM, DB2, Lotus, Rational, Tivoli, and WebSphere are trademarks of IBM
Corporation in the United States, other countries, or both.
Java and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the
United States, other countries, or both.
Linux is a trademark of Linus Torvalds in the United States, other countries, or both.
Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft
Corporation in the United States, other countries, or both.
XML processing

X Cert1423 A4

Diunggah oleh

Informasi Dokumen

Deskripsi Asli:

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

X Cert1423 A4

Diunggah oleh

Hak Cipta:

Format Tersedia

XML and Related Technologies certification prep,

Part 3: XML processing

Skill Level: Intermediate

Mark Lorenz (mlorenz@nc.rr.com)

Section 1. Before you start

About this series

services; has a thorough knowledge of core XML-related World Wide Web

About this tutorial

• Java software development kit (JDK) 1.4.2 or later

Section 2. Parsing XML documents

XML instance document

Listing 1. The XML instance document for the DVD catalog

Using the SAX parser

Figure 1. SAX parser events

Apache Xerces2 parser

Handling SAX events

SAX parser exception handling

By default, the parser ignores errors. To take action upon an invalid or

public class SAXEcho extends DefaultHandler {

Then you must turn on the validation feature:

parser.setFeature( "http://xml.org/sax/features/validation", true );

Finally, call this code:

Remember, parser is an instance of XMLReader. The parser calls the error()

Other ErrorHandler methods

Echoing SAX events

Listing 2. Echoing SAX events

//-- Parse my XML document, reporting DTD-related errors

Listing 3. Output from SAXEcho execution

=== Parsing catalogDTD.xml ===

Using the DOM parser

Figure 2. DOM parser tree

You set up a DOM parser like this:

DOMParser parser = new DOMParser();

Traversing the DOM tree

Figure 3. Traversing the DOM tree

One of the more powerful methods is Document.getElementsByTagName(),

//-- make sure a new DVD's title is unique

//-- add a new DVD with aName and description

Caution: Make sure to use DOM interfaces, such as NodeList or NamedNodeMap,

DOM parser exception handling

DOMParser parser = new DOMParser();

The class that implements the DOMErrorHandler interface has a

handleError(DOMError error) method, which returns true to

To detect manipulation problems, use the code of a DOMException. These codes

Echoing the DOM tree

Listing 4. Echoing a DOM tree

DocumentTraversal traversal = (DocumentTraversal)aDocument;

Look at some portions of the logic:

protected static final String[] nodeTypeNames = {

DOM1 versus DOM2

You can read about DOM's NodeIterator and NodeFilter as

Bruno R. Preiss explains different tree traversals (see Resources).

DOMEcho takes advantage of the TreeWalker interface introduced in DOM2 (see

Basically, DOMEcho has an echoAll(Document aDoc) method, which uses the

//---- find parent Node

Listing 5 shows the resulting output from DOMEcho.

Listing 5. Output from DOMEcho

=== Echoing catalogDTD.xml ===

Notice what's not reported: whitespace within elements, such as surrounding

Figure 4. Whitespace processing

Section 3. Validating XML documents

Validating using a DTD

* Failed validation *