XML Material

Part I
XML
Coalesce
Chapter 1
Introduction
1.1 Introducing XML
XML stands for eXtensible Markup Language, but what is a markup language? How did they evolve
and what are the goals of XML?
1.1.1 Markup Languages
Ever since the invention of printing press, writers have made notes on manuscripts to instruct the
printers on matters such as typesetting and other production issues. These notes were called
markup. A collection of such notes that conform to a defined syntax and grammar can certainly be
called a language. Proofreaders use a hand written symbolic markup language to communicate
corrections to editors and printers. Even the modern use of punctuation is actually a form of markup
that remains with the text to advise the reader how to interpret that text.
Thus Markup is a method of conveying Meta data Meta data is nothing but a description about
data, i.e. the content.
1.2 Evolution of XML
XML is a relatively new language, but it is a subset of, and is based upon a mature markup language
called SGML Standard Generalized Markup Language. The HTML (Hypertext Markup Language)
is also based upon SGML.
1.2.1 SGML History
In 1969, Ed Mosher, Ray Lorie and Charles F. Goldfarb of IBM research invented the first modern
markup language, Generalized Markup Language (GML). GML was a self-referential language
for marking the structure of an arbitrary set of data, and was intended to be a meta- language a
language that could be used to describe other languages, their grammar and vocabularies. GML later
became SGML. In 1986, SGML was adopted as an international data storage and exchange standard
by the ISO.
GML led to SGML, the parent of both HTML and XML. The complexity of SGML and lack of
content tagging in HTML led to the need for a new markup language for the WWW and beyond XML
1.3 Goals of XML
In 1996, the principal design organization for technologies related to the WWW, the World Wide
Web Consortium (W3C) began the process of designing an extensible markup language that would
combine the flexibility of SGML and the widespread acceptance of HTML. That language is XML.
The W3C home page is at wwww3.org and its XML pages begin with an overview at
www.w3.org/XML. Most technical documents can be found at www.w3.org/tr.
XML version 1.0 was defined in a February 1998 W3C recommendation.
Coalesce
The design goals of XML are :
1.
2.
3.
4.
5.
6.
7.
8.
9.
XML shall be straightforwardly usable over the internet

XML shall support a wide variety of applications
It shall be easy to write programs that process XML documents
The number of optional features in XML is to be kept to the absolute minimum, ideally
zero.
XML documents should be human-legible and reasonably clear.
The XML design should be prepared quickly
The design of XML shall be formal and concise.
XML documents shall be easy to create
Terseness in XML markup is of minimal importance
1.4 What is XML?
XML stands for EXtensible Markup Language

XML is a markup language much like HTML.
XML was designed to describe data.
XML tags are not predefined in XML. You must define your own tags.
XML is self describing.
XML uses a DTD (Document Type Definition) to formally describe the data
1.4.1 The main difference between XML and HTML

XML is not a replacement for HTML. XML and HTML were designed with different goals: XML
was designed to describe data and to focus on what data is.HTML was designed to display data
and to focus on how data looks. HTML is about displaying information, XML is about
describing information.
1.4.2 XML is Extensible
The tags used to markup HTML documents and the structure of HTML documents is predefined.
The author of HTML documents can only use tags that are defined in the HTML standard. XML
allows the author to define his own tags and his own document structure.
1.4.3 XML is a complement to HTML
It is important to understand that XML is not a replacement for HTML. In the future
development of the Web it is most likely that XML will be used to structure and describe the Web
data, while HTML will be used to format and display the same data.
1.4.4 XML in future Web development
We have been participating in XML development since its creation. It has been amazing to see how
quickly the XML standard has been developed, and how quickly a large number of software vendors
have adopted the standard.
We strongly believe that XML will be as important to the future of the Web as HTML has been to
the foundation of the Web. XML is the future for all data transmission and data manipulation over
the Web.
Coalesce
1.5 How can XML be used?
XML can keep data separated from your HTML

XML can be used to store data inside HTML documents
XML can be used as a format to exchange information
XML can be used to store data in files or in databases
1.5.1 XML can keep data separated from your HTML

HTML pages are used to display data. Data is often stored inside HTML pages. With XML this data
can now be stored in a separate XML file. This way you can concentrate on using HTML for
formatting and display, and be sure that changes in the underlying data will not force changes to any
of your HTML code.
1.5.2 XML can also store data inside HTML documents
XML data can also be stored inside HTML pages as "Data Islands". You can still concentrate on
using HTML for formatting and displaying the data.
1.5.3 XML can be used to exchange data
In the real world, computer systems and databases contain data in incompatible formats. One of the
most time consuming challenges for developers has been to exchange data between such systems
over the Internet. Converting the data to XML can greatly reduce this complexity and create data that
can be read by different types of applications.
1.5.4 XML can be used to store data
XML can also be used to store data in files or in databases. Applications can be written to store and
retrieve information from the store, and generic applications can be used to display the data.
Coalesce
Chapter 2
XML Tags
Let we start the eXtensible Markup language with an example
Example first.xml
<?xml version="1.0"?>
<note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
The first line of the Document: The XML declaration should always be included. It defines the XML
version of the document. In this case the document conforms to the 1.0 specification of XML:
The next line defines the first element of the document (the root element):
<note>
The next line defines the child elements of the root

<to>Tove</to>
<from>Jani</from>
The last line defines the end of the root element

</note>
2.1 XML Tags

XML tags Will have Following Properties
1.
2.
3.
4.
5.
XML tags must have its close tag

XML Tags are case sensitive
All XML elements should be properly nested
All XML Documents must have a root tag
Attributes values must be quoted
Coalesce
1. XML Tags must have its close tag
In HTML some elements do not have to have a closing tag. The following code is legal in HTML:
<p>This is first paragraph
<p>this is another paragraph
But in XML, all tags must have a closing tag

<p>this is my first document</p>
2. XML Tags are case sensitive

XML tags are case sensitive. The tag <Letter> is different from the tag <letter>. So opening and
clasing tags should be in the same case
<Message>this is correct statement</Message>
<Message> this is wrong statement</message>
3. All XML should be properly nested

In HTML some elements can be improperly nested within each other like this:
<b><i>this is HTML document</b></i>
In XML all elements must be properly nested within each other like this
<b><i>this is XML document</i></b>
4. All XML Documents must have a root tag

All XML documents must contain a single tag pair to define the root element. All other elements
must be nested within the root element. All elements can have sub (children) elements. Sub elements
must be in pairs and correctly nested within their parent element:
<note>
<to>Tove</to>
<from>Jani</from>
</note>
5. All Attributes must be quoted

XML elements can have attributes in name/value pairs just like in HTML. In XML the attribute
value must always be quoted. Study the two XML documents below. The first one is incorrect, the
second is correct:
<?xml version=1.0>
<note date=10/06/2004>
<to>Tove</to>
Coalesce
<from>Jani</from>
</note>
<?xml version=1.0>
<note date=10/06/2004>
<to>Tove</to>
<from>Jani</from>
</note>
2.1.1 Empty Tags

Empty tags can be represented as follow in XML
<note/> or <note></note>
2.2 XML Attributes

XML attributes are normally used to describe XML elements, or to provide additional information
about elements. From HTML you can remember this construct: <IMG SRC="computer.gif">. In
this HTML example SRC is an attribute to the IMG element. The SRC attribute provides additional
information about the element.
Attributes are always contained within the start tag of an element. Here are some examples:
HTML Example
<img src="computer.gif">
<a href="demo.asp">
XML Example
<file type="gif">
<person id="3344">
Usually, or most common, attributes are used to provide information that is not a part of the content
of the XML document. Did you understand that? Here is another way to express that: Often
attribute data is more important to the XML parser than to the reader. Did you understand it now?
Anyway, in the example above, the person id is a counter value that is irrelevant to the reader, but
important to software that wants to manipulate the person element.
2.2.1 Use of Elements vs. Attributes
Take a look at these examples:
Using an Attribute for sex:
<person sex="female">
<firstname>Anna</firstname>
<lastname>Smith</lastname>
</person>
Coalesce
Using an Element for sex:
<person>
<sex>female</sex>
<firstname>Anna</firstname>
<lastname>Smith</lastname>
</person>
In the first example sex is an attribute. In the last example sex is an element. Both examples provides
the same information to the reader.
There are no fixed rules about when to use attributes to describe data, and when to use elements. My
experience is however; that attributes are handy in HTML, but in XML you should try to avoid them,
as long as the same information can be expressed using elements. Here is another example,
demonstrating how elements can be used instead of attributes. The following three XML documents
contain exactly the same information. A date attribute is used in the first, a date element is used in
the second, and an expanded date element is used in the third:
<note date="12/11/99">
<to>Tove</to>
<from>Jani</from>
</note>
<note>
<date>12/11/99</date>
<to>Tove</to>
<from>Jani</from>
</note>
<note>
<date>
<day>12</day>
<month>11</month>
<year>99</year>
</date>
<to>Tove</to>
<from>Jani</from>
</note>
Coalesce
2.2.2 Avoid using attributes? (I say yes!)
Why should you avoid using attributes? Should you just take my word for it? These are some of the
problems using attributes:
attributes can not contain multiple values (elements can)

attributes are not expandable (for future changes)
attributes can not describe structures (like child elements can)
attributes are more difficult to manipulate by program code
attribute values are not easy to test against a DTD
If you start using attributes as containers for XML data, you might end up with documents that are
both difficult to maintain and to manipulate. What I'm trying to say is that you should use elements
to describe your data. Use attributes only to provide information that is not relevant to the reader.
Please don't end up like this:
<note day="12" month="11" year="99" to="Tove" from="Jani"
heading="Reminder"
body="Don't forget me this weekend!">
</note>
2.2.3 An Exception to my Attribute rule

Rules always have exceptions. My rule about not using attributes has one too:
Sometimes I assign ID references to elements in my XML documents. These ID references can be
used to access XML element in much the same way as the NAME or ID attributes in HTML. This
example demonstrates this:
<messages>
<note ID="501">
<to>Tove</to>
<from>Jani</from>
</note>
<note ID="502">
<to>Jani</to>
<from>Tove</from>
<heading>Re: Reminder</heading>
<body>I will not!</body>
</note>
</messages>
ID in these examples is just a counter, or a unique identifier, to identify the different notes in the
XML file.
Coalesce
Chapter 3
XML Validation
3.1 "Well Formed" XML documents
A document can only be well-formed, if it obeys the syntax of XML. A document that includes
sequences of markup characters that cannot be parsed or are invalid cannot be well-formed.
In addition, the document must meet all of the following conditions (understanding some of these
conditions may require experience with SGML):
The document instance must conform to the grammar of XML documents. In particular,
some markup constructs (parameter entity references, for example) are only allowed in
specific places. The document is not well-formed if they occur elsewhere, even if the
document is well-formed in all other ways.
The replacement text for all parameter entities referenced inside a markup declaration
consists of zero or more complete markup declarations. (No parameter entity used in the
document may consist of only part of a markup declaration.)
No attribute may appear more than once on the same start-tag.
String attribute values cannot contain references to external entities.
Non-empty tags must be properly nested.
Parameter entities must be declared before they are used.
All entities except the following: amp, lt, gt, apos, and quot must be declared.
A binary entity cannot be referenced in the flow of content, it can only be used in an
attribute declared as ENTITY or ENTITIES.
Neither text nor parameter entities are allowed to be recursive, directly or indirectly.
By definition, if a document is not well-formed, it is not XML. This means that there is no such thing
as an XML document which is not well-formed, and XML processors are not required to do
anything with such documents.
The following is a "Well Formed" XML document:
<note>
<to>Tove</to>
<from>Jani</from>
<to>Jane</to>
<from>Thomas</from>
<heading>Call</heading>
<body>come for interview on Friday </body>
</note>
Coalesce
3.2 "Valid" XML documentspart
A "Valid" XML document is a "Well Formed" XML document which conforms to the rules of a
Document Type Definition (DTD). The following is the same document as above but with an added
reference to a DTD:
<!DOCTYPE note SYSTEM "InternalNote.dtd">
<note>
<to>Tove</to>
<from>Jani</from>
</note>
10
Coalesce
Chapter 4
Browsers
Many applications support XML in a number of ways. In this Web we focus on the XML support in
Internet Explorer 5.0. Some visitors have complained about this, but we don't do it because IE5 is
the only performer in the XML field. We do it because it is the only practical way to demonstrate
XML to a large audience over the Web.
So - while we are waiting for Netscape - most of our software examples will work only with IE5. If
you want to learn XML the easy way - with lots of examples for you to try out - you will have to live
with that.
4.1 XML in Netscape Navigator 5
Netscape has promised full XML support in its new Navigator 5 browser. We hope that this will
include standard support for the W3C XML, just as it does in Internet Explorer 5.
Based on previous experience we can only hope that Navigator and Explorer will be fully compatible
in the future XML field.
Your option at the moment - if you want to work with Netscape and XML - is to work with XML on
your server and transform your XML to HTML before it is sent to the browser.
4.2 XML in Internet Explorer 5
Internet Explorer 5 fully supports the international standards for both XML 1.0 and the XML
Document Object Model (DOM). These standards are set by the World Wide Web Consortium
(W3C).
Internet Explorer 5.0 has the following XML support:
Viewing of XML documents

Full support for W3C DTD standards
XML embedded in HTML as Data Islands
Binding XML data to HTML elements
Formatting XML with XSL
Formatting XML with CSS
Support for CSS Behaviors
Access to the XML DOM
11
Coalesce
In Internet Explorer, the XML file will be displayed like
12
Coalesce
Chapter 5
Displaying XML
5.1 Displaying XML with JavaScript
To display XML data inside an HTML page you can use JavaScript to import data from an XML file.
To see how XML and HTML complement each other this way; first look at the XML document
(note.xml), then open the HTML document (note.htm) that contains a JavaScript which reads an
XML file and displays the information inside the HTML page.
Example Note.xml
<?xml version="1.0" encoding="ISO8859-1" ?>
<note>
<to>Tove</to>
<from>Jani</from>
</note>
Note.html
<html>
<head>
<script language="JavaScript"
for="window" event="onload">
var xmlDoc = new ActiveXObject("Microsoft.XMLDOM")
xmlDoc.async="false"
xmlDoc.load("note.xml")
nodes = xmlDoc.documentElement.childNodes
to.innerText = nodes.item(0).text
from.innerText = nodes.item(1).text
header.innerText = nodes.item(2).text
body.innerText = nodes.item(3).text
</script>
<title>HTML using XML data</title>
</head>
<body bgcolor="yellow">
<h1>Refsnes Data Internal Note</h1>
<b>To: </b><span id="to"></span>
<br>
<b>From: </b><span id="from"></span>
<hr>
<b><span id="header"></span></b>
<hr>
<span id="body"></span>
</body>
</html>
13
Coalesce
The output will be
5.2 Displaying XML with CSS

Xml can also be displayed using Cascading Style Sheets. The following example will explain how to
add CSS files with XML.
Example (note.css)
CATALOG
{
background-color: #ffffff;
width: 100%;
}
CD
{
display: block;
margin-bottom: 30pt;
margin-left: 0;
}
TITLE
{
color: #FF0000;
font-size: 20pt;
}
ARTIST
{
color: #0000FF;
font-size: 20pt;
}
COUNTRY,PRICE,YEAR,COMPANY
{
Display: block;
color: #000000;
margin-left: 20pt;}
14
Coalesce
XML file
<?xml-stylesheet type="text/css" href="cd_catalog.css"?><CATALOG>
<TITLE>Empire Burlesque</TITLE>
<ARTIST>Bob Dylan</ARTIST>
<COUNTRY>USA</COUNTRY>
<COMPANY>Columbia</COMPANY>
<PRICE>10.90</PRICE>
<YEAR>1985</YEAR>
</CD>
<CD>
<TITLE>Hide your heart</TITLE>
<ARTIST>Bonnie Tylor</ARTIST>
<COUNTRY>UK</COUNTRY>
<COMPANY>CBS Records</COMPANY>
<PRICE>9.90</PRICE>
<YEAR>1988</YEAR>
</CD>
<CD>
<TITLE>Greatest Hits</TITLE>
<ARTIST>Dolly Parton</ARTIST>
<COMPANY>RCA</COMPANY>
<PRICE>9.90</PRICE>
<YEAR>1982</YEAR>
</CD>
<CD>
<TITLE>Still got the blues</TITLE>
<ARTIST>Gary More</ARTIST>
<COMPANY>Virgin redords</COMPANY>
<YEAR>1990</YEAR>
</CD>
<CD>
<TITLE>Eros</TITLE>
<ARTIST>Eros Ramazzotti</ARTIST>
<COUNTRY>EU</COUNTRY>
<COMPANY>BMG</COMPANY>
<PRICE>9.90</PRICE>
<YEAR>1997</YEAR>
</CD>
<CD>
<TITLE>One night only</TITLE>
<ARTIST>Bee Gees</ARTIST>
<COMPANY>Polydor</COMPANY>
<YEAR>1998</YEAR>
</CD>
<CD>
<TITLE>Sylvias Mother</TITLE>
<ARTIST>Dr.Hook</ARTIST>
<COMPANY>CBS</COMPANY>
15
<CD>
Coalesce
<PRICE>8.10</PRICE>
<YEAR>1973</YEAR>
</CD>
<CD>
<TITLE>Maggie May</TITLE>
<ARTIST>Rod Stewart</ARTIST>
<COMPANY>Pickwick</COMPANY>
<PRICE>8.50</PRICE>
<YEAR>1990</YEAR>
</CD>
<CD>
<TITLE>Romanza</TITLE>
<ARTIST>Andrea Bocelli</ARTIST>
<COMPANY>Polydor</COMPANY>
<YEAR>1996</YEAR>
</CD>
<CD>
<TITLE>When a man loves a woman</TITLE>
<ARTIST>Percy Sledge</ARTIST>
<COMPANY>Atlantic</COMPANY>
<PRICE>8.70</PRICE>
<YEAR>1987</YEAR>
</CD>
<CD>
<TITLE>Black angel</TITLE>
<ARTIST>Savage Rose</ARTIST>
<COMPANY>Mega</COMPANY>
<YEAR>1995</YEAR>
</CD>
<CD>
<TITLE>1999 Grammy Nominees</TITLE>
<ARTIST>Many</ARTIST>
<COMPANY>Grammy</COMPANY>
<YEAR>1999</YEAR>
</CD>
<CD>
<TITLE>For the good times</TITLE>
<ARTIST>Kenny Rogers</ARTIST>
<COMPANY>Mucik Master</COMPANY>
<PRICE>8.70</PRICE>
<YEAR>1995</YEAR>
</CD>
</CATALOG>
16
Coalesce
The output will be
5.3 Displaying XML with XSL

Example Menu.xml
<?xml:stylesheet type="text/xsl" href="simple.xsl" ?>
<breakfast-menu>
<food>
<name>Belgian Waffles</name>
<price>$5.95</price>
<description>two of our famous Belgian Waffles with plenty of real
maple syrup</description>
<calories>650</calories>
</food>
<food>
<name>Strawberry Belgian Waffles</name>
<description>light Belgian waffles covered with strawberrys and
whipped cream</description>
</food>
<food>
<name>Berry-Berry Belgian Waffles</name>
<description>light Belgian waffles covered with an assortment of
fresh berries and whipped cream</description>
</food>
</breakfast-menu>
Simple.xsl
<?xml version="1.0" encoding="ISO-8859-1"?><xsl:stylesheet
version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"><xsl:template
match="/">
<html>
17
Coalesce
<body>
<h2>breakfast-menu</h2>
<table border="1">
<tr bgcolor="#9acd32">
<th align="left">name</th>
<th align="left">price</th>
</tr>
<xsl:for-each select="food">
<tr>
<td><xsl:value-of select="name"/></td>
<td><xsl:value-of select="price"/></td>
</tr>
</xsl:for-each>
</table>
</body>
</html>
</xsl:template></xsl:stylesheet>
The output will be
18
Coalesce
Part III
XHTML
19
Coalesce
Chapter 6
Introduction to XHTML
Although many people have never heard of it, XHTML is really the future of the internet. It is the
newest generation of HTML (comming after HTML 4) but has many new features which mean that
it is, in some ways, like XML. In this tutorial I will explain how XHTML differs from HTML and
how you can update your pages to support it.
6.1 What is XHTML
XHTML stands for eXtensable HyperText Markup Language and is a cross between HTML and
XML. XHTML was created for two main reasons:
1. To create a stricter standard for making web pages, reducing incompatibilities between
browsers
2. To create a standard that can be used on a variety of different devices without changes
The great thing about XHTML, though, is that it is almost the same as HTML, although it is much
more important that you create your code correctly. You cannot make badly formed code to be
XHTML compatible. Unlike with HTML (where simple errors (like missing out a closing tag) are
ignored by the browser), XHTML code must be exactly how it is specified to be. This is due to the
fact that browsers in handheld devices etc. don't have the power to show badly formatted pages so
XHTML makes sure that the code is correct so that it can be used on any type of browser.
XHTML is a web standard which has been agreed by the W3C and, as it is backwards compatible,
you can start using it in your webpages now. Also, even if you don't think its really necessary to
update to XHTML yet, there are three very good reasons to do so:
1. It will help you to create better formatted code on your site
2. It will make your site more accessable (both in the future and now due to the fact that it will
also mean you have correct HTML and most browsers will show your page better)
3. XHTML is planned to replace HTML 4 in the future
There is really no excuse not to start writing your web pages using XHTML as it is so easy to pick up
and will bring many benefits to your site.
6.2 The Main Changes
There are several main changes in XHTML from HTML:
All tags must be in lower case

All documents must have a doctype
All documents must be properly formed
All tags must be closed
All attributes must be added properly
The name attribute has changed
20
Coalesce
Attributes cannot be shortened

All tags must be properly nested
At a glance, this seems like a huge amount of changes but once you start checking though the list
you will find that very little on your site actually needs to be changed. In this tutorial I will go
though each of these changes explaining exactly what is different.
6.3 The Doctype
The first change which will appear on your page is the Doctype. When using HTML it is
considered good practice to add a Doctype to the beginning of the page like this.
Allthough this was optional in HTML, XHTML requires you to add a Doctype. There are three
available for use.
Strict - This is used mainly when the markup is very clean and there is no 'extra' markup to aid
the presentation of the document. This is best used if you are using Cascading Style Sheets for
presentation.
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
Transitional - This should be used if you want to use presentational features of HTML in your page.
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
Frameset - This should be used if you want to have frames on your page.
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd">
The doctype should be the very first line of your document and should be the only thing on that line.
You don't need to worry about this confusing older browsers because the Doctype is actually a
comment tag. It is used to find out the code which the page is written in, but only by
browsers/validators which support it, so this will cause no problems.
6.4 Document Formation
After the Doctype line, the actual XHTML content can be placed. As with HTML, XHTML has
<html> <head> <title> and <body> tags but, unlike with HTML, they must all be included in a
valid XHTML document. The correct setup of your file is as follows:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html>
<head>
<title>Page Title</title>
OTHER HEAD DATA
</head>
<body>
CONTENT
</body>
</html>
21
Coalesce
It is important that your document follows this basic pattern. This example uses the transitional
Doctype but you can use either of the others
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" >
<head>
<title>Virtual Library</title>
</head>
<body>
<p>Moved to <a href="http://vlib.org/">vlib.org</a>.</p>
</body>
</html>
22
Coalesce
Chapter 7
XHTML Tags
7.1 Introduction
One of the major changes to HTML which was introduced to XHTML is that tags must always be
properly formed. With the old HTML specification you could be very sloppy in your coding, with
missing tags and incorrect formation without many problems but in XHTML this is very important.
7.2 Lower Case
Probably the biggest changes in XHTML is that now not only the tags you use but, the way in which
you write them must be correct. Luckily the major change can be easily implemented into a normal
HTML document without much problem.
In XHTML, tags must always be lower case. This means that:
<FONT>
<Font>
<FoNT>
are all incorrect tags and must not be used. The font tag must now be used as follows:
<font>
If you are not writing your code, but instead use a WYSIWYG editor, you can still begin to migrate
your documents to XHTML by setting the editor to output all code in lower case. For example, in
Dreamweaver 4 you can do this by going to:
Edit -> Preferences -> Code Format
and making sure that Case For Tags is set to:
<lowercase>
and also that Case For Attributes is set to:

lowercase="value"
7.3 Nesting
The second change to the HTML tags in XHTML is that they must all be properly nested. This
means that if you have multiple tags applying to something on your page you must make sure you
open and close them in the correct order. For example if you have some bold red text in a paragraph,
the correct nesting would be one of the following:
<p><b><font
<b><p><font
color="#FF0000">Your
color="#FF0000">Your
23
Text</font></b></p>
Text</font></p></b>
Coalesce
<p><font color="#FF0000"><b>Your Text<b></font></p>
These are only examples, though, and there are other possibilities for these tags. What you must not
do, though, is to close tags in the wrong order, for example:
<p><b><font color="#FF0000">Your Text</p></font></b>
Although code in this form would be shown correctly using HTML, this is incorrect in the XHTML
specification and you must be very careful to nest your tags correctly.
7.4 Closing Tags
The previous two changes to HTML should not be a particular problem to you if your HTML code
is already well formed. The final change to HTML tags probably will require quite a lot of changes to
your HTML documents to make them XHTML compliant.
All tags in XHTML must be closed. Most tags in HTML are already closed (for example <p></p>,
<font></font>, <b></b>) but there are several which are standalone tags which do not get
closed. The main three are:
<br>
<img>
<hr>
There are two ways in which you can deal with the change in specification. The first way is quite
obvious if you know HTML. You can just add a closing tag to each one, e.g.
<br></br>
<img></img>
<hr></hr>
Although you must be careful that you do not accidentally place anything between the opening and
closing tags as this would be incorrect coding. The second way is slightly different but will be familiar
to anyone who has written WML. You can include the closing in the actual tag:
<br
<img
<hr />
/>
/>
This is probably the best way to close your tags, as it is the recommended way by the W3C who set
the XHTML standard. You should notice that, in these examples, there is a space before the />.
This is not actually necessary in the XHTML specification (you could have <br/>) but the reason
why I have included it is that, as well as being correct XHTML, it will also make the tag compatible
with past browsers. As every other XHTML change is backwards compatible, it would not be very
good to have a simple missed out space causing problems with site compatibility.
In case you are wondering how the <img> tag works if it has all the normal attributes included, here
is an example: Again, notice the space before the />
<img src="myimage.gif" alt="My Image" width="400" height="300" />
24
Coalesce
Chapter 8
Attributes
8.1 Introduction
In this part of the XHTML tutorial, I will show you the changes to HTML attributes in XHTML.
HTML attributes are the extra parts you can add onto tags (such as src in the img tag) to change the
way in which they are shown. There are four changes to the way in which attributes are changed.
8.2 Lowercase
As with XHTML tags, the attributes for them must be in lowercase. This means that, although in the
past, code like:
<table Width="100%">
would have worked, this must now be given as:

<table width="100%">
Although this is quite a minor issue, it is important to check your code for this mistake as it is quite a
common one.
8.3 Correct Quotation
Another change in the HTML syntax is that all attributes in XHTML must be quoted. In HTML you
could have used the following:
<table width=100%>
with absolutely no compatibility problems. This all changes in XHTML. If you use code in this
format with XHTML it will be incorrect and must be changed. In future, all attributes must be
surrounded by quotes (") so the correct format of this code would be:
<table width="100%">
8.4 Attribute Shortening

It has become common practice in HTML to shorten a few of the attributes to save on typing and
on transfer times. This method has become almost a standard. As with other common practices in
HTML, this has been removed from the XHTML specification as it causes incompatibilities between
browsers and other devices.
An example of a commonly shortened tag is:
<input type="checkbox" value="yes" name="agree" checked>
25
Coalesce
In this, it is the:
Checked
part which is incorrect. In XHTML all shortened attributes must be given in their 'long' format. For
example:
Checked="checked"
so the checkbox code earlier would now need to be written as:

<input type="checkbox" value="yes" name="agree" checked="checked">
There are other attributes (such as noresize) which also must be given in full.
8.5 The ID Attribute
Probably the biggest change from HTML to XHTML is the one tag attribute change. All other
differences have been just making tags more compatible. This is the only full change.
In HTML, the <img> tag has an attribute 'name'. This is usually used to refer to the image in
javascript for doing actions like image rollovers. This attribute has now been changed to the 'id'
attribute. So, the HTML code:
<img src="myimage.gif" name="my_image">
would need to be written in XHTML as:

<img src="myimage.gif" id="my_image" />
Of course, this would not be backward compatible with older browsers, so if you still want your site
to work fully in all old browsers (as XHTML is intended to do), you will need to include both id and
name attributes (this would also be correct XHTML):
<img src="myimage.gif" id="my_image" name="my_image" />
8.6 Conclusion
This tutorial has shown you most of the changes between HTML and XHTML. As you will have
seen, there are actually very few changes and, the next time you update your site, I recommend
changing your code to make it XHTML compatible. It will not only make your site 'future-proof' but
will also mean that you will have more correct code and should have fewer browser incompatibility
problems.
26
Coalesce
PART III
Document type Definition
27
Coalesce
Chapter 9
Introduction to DTD
The purpose of a DTD is to define the legal building blocks of an XML document. It defines the
document structure with a list of legal elements. A DTD can be declared inline in your XML
document, or as an external reference.
9.1 Internal DTD
This is an XML document with a Document Type Definition:
<!DOCTYPE note [
<!ELEMENT note
(to,from,heading,body)>
<!ELEMENT to
(#PCDATA)>
<!ELEMENT from
(#PCDATA)>
<!ELEMENT heading (#PCDATA)>
<!ELEMENT body
(#PCDATA)>
]>
<note>
<to>Tove</to>
<from>Jani</from>
</note>
The DTD is interpreted like this:

!ELEMENT note (in line 2) defines the element "note" as having four elements:
"to,from,heading,body".
!ELEMENT to (in line 3) defines the "to" element to be of the type "CDATA".
!ELEMENT from (in line 4) defines the "from" element to be of the type "CDATA"
and so on.....
9.2 External DTD
This is the same XML document with an external DTD:
Example Note.xml
<!DOCTYPE note SYSTEM "note.dtd">
<note>
<to>Tove</to>
<from>Jani</from>
</note>
This is a copy of the file "note.dtd" containing the Document Type Definition:
28
Coalesce
<!ELEMENT note (to,from,heading,body)>
<!ELEMENT to (#PCDATA)>
<!ELEMENT from (#PCDATA)>
<!ELEMENT heading (#PCDATA)>
<!ELEMENT body (#PCDATA)>
9.3 Why use a DTD?

XML provides an application independent way of sharing data. With a DTD, independent groups of
people can agree to use a common DTD for interchanging data. Your application can use a standard
DTD to verify that data that you receive from the outside world is valid. You can also use a DTD to
verify your own data.
A lot of forums are emerging to define standard DTDs for almost everything in the areas of data
exchange.
29
Coalesce
Chapter 10
The building blocks
10.1 The Building Blocks of DTD documents
XML documents (and HTML documents) are made up by the following building blocks:Elements,
Tags, Attributes, Entities, PCDATA, and CDATA. This is a brief explanation of each of the building
blocks:
10.1.1 Elements
Elements are the main building blocks of both XML and HTML documents.
Examples of HTML elements are "body" and "table". Examples of XML elements could be "note"
and "message". Elements can contain text, other elements, or be empty. Examples of empty HTML
elements are "hr", "br" and "img"
10.1.2 Tags
Tags are used to markup elements.
A starting tag like <element_name> mark up the beginning of an element, and an ending tag like
</element_name> mark up the end of an element.
A body element: <body>body text in between</body>.
A message element: <message>some message in between</message>
10.1.3 Attributes
Attributes provide extra information about elements.
Attributes are placed inside the start tag of an element. Attributes come in name/value pairs. The
following "img" element has an additional information about a source file:
<img src="computer.gif" />
The name of the element is "img". The name of the attribute is "src". The value of the attribute is
"computer.gif". Since the element itself is empty it is closed by a " /".
10.1.4 PCDATA
PCDATA means parsed character data.
Think of character data as the text found between the start tag and the end tag of an XML element.
30
Coalesce
PCDATA is text that will be parsed by a parser. Tags inside the text will be treated as markup and
entities will be expanded.
10.1.5 CDATA
CDATA also means character data.
CDATA is text that will NOT be parsed by a parser. Tags inside the text will NOT be treated as
markup and entities will not be expanded
10.1.6 Entities
Entities as variables used to define common text. Entity references are references to entities.
Most of you will known the HTML entity reference: " " that is used to insert an extra space in
an HTML document. Entities are expanded when a document is parsed by an XML parser.
The following entities are predefined in XML:
Entity Reference
<
>
&
"
'
Character
<
>
&
10.2 DTD - Elements

10.2.1 Declaring an Element
In the DTD, XML elements are declared with an element declaration. An element declaration has the
following syntax:
<!ELEMENT element-name (element-content)>
10.2.2 Empty elements

Empty elements are declared with the keyword EMPTY inside the parentheses:
<!ELEMENT element-name (EMPTY)>
example: <!ELEMENT img (EMPTY)>
10.2.3 Elements with data

Elements with data are declared with the data type inside parentheses:
<!ELEMENT element-name (#CDATA)>
or
<!ELEMENT element-name (#PCDATA)>
31
Coalesce
or
<!ELEMENT element-name (ANY)>
example:
<!ELEMENT note (#PCDATA)>
#CDATA means the element contains character data that is not supposed to be parsed by a parser.
#PCDATA means that the element contains data that IS going to be parsed by a parser.
The keyword ANY declares an element with any content.
If a #PCDATA section contains elements, these elements must also be declared.
10.2.4 Elements with children (sequences)
Elements with one or more children are defined with the name of the children elements inside the
parentheses:
<!ELEMENT element-name (child-element-name)>
or
<!ELEMENT element-name (child-element-name,child-element-name,.....)>
example:
When children are declared in a sequence separated by commas, the children must appear in the
same sequence in the document. In a full declaration, the children must also be declared, and the
children can also have children. The full declaration of the note document will be
<!ELEMENT
<!ELEMENT
<!ELEMENT
<!ELEMENT
<!ELEMENT
note (to,from,heading,body)>
to
(#CDATA)>
from
(#CDATA)>
heading (#CDATA)>
body
(#CDATA)>
10.2.5 Wrapping
If the DTD is to be included in your XML source file, it should be wrapped in a DOCTYPE
definition with the following syntax:
<!DOCTYPE root-element [element-declarations]>
example:
<!DOCTYPE note [
<!ELEMENT to
(#CDATA)>
<!ELEMENT from
(#CDATA)>
<!ELEMENT heading (#CDATA)>
<!ELEMENT body
(#CDATA)>
]>
<note>
<to>Tove</to>
<from>Jani</from>
<body>Don't forget me this weekend</body>
32
Coalesce
</note>
10.2.6 Declaring only one occurrence of the same element

<!ELEMENT element-name (child-name)>
Example
<!ELEMENT note (message)>
The example declaration above declares that the child element message can only occur one time
inside the note element.
10.2.7 Declaring minimum one occurrence of the same element
<!ELEMENT element-name (child-name+)>
example
<!ELEMENT note (message+)>
The + sign in the example above declares that the child element message must occur one or more
times inside the note element.
10.2.8 Declaring zero or more occurrences of the same element
<!ELEMENT element-name (child-name*)>
Example
<!ELEMENT note (message*)>
The * sign in the example above declares that the child element message can occur zero or more
times inside the note element
10.2.9 Declaring zero or one occurrences of the same element
<!ELEMENT element-name (child-name?)>
Example
<!ELEMENT note (message?)>
The ? sign in the example above declares that the child element message can occur zero or one times
inside the note element.
10.2.10 Declaring mixed content
Example
<!ELEMENT note (to+,from,header,message*,#PCDATA)>
The example above declares that the element note must contain at least one to child element, exactly
one from child element, exactly one header, zero or more message, and some other parsed
character data as well.
33
Coalesce
Chapter 11
DTD Attributes and Entities
11.1 Declaring Attributes
In the DTD, XML element attributes are declared with an ATTLIST declaration. An attribute
declaration has the following syntax:
<!ATTLIST element-name attribute-name attribute-type default-value>
As you can see from the syntax above, the ATTLIST declaration defines the element which can have
the attribute, the name of the attribute, the type of the attribute, and the default attribute value.
The attribute-type can have the following values:
Value
CDATA
(eval | eval | ..)
ID
IDREF
IDREFS
NMTOKEN
NMTOKENS
ENTITY
ENTITIES
NOTATION
xml:
Explanation
The value is Character Data
The value must be an Enumerated Value
The Value is an unique id
The value is the id of another element
The value is a list of other ids
The value is a valid XML name
The value is a list of valid XML names
The value is an entity
The value is a list of entities
The value is a name of a notation
The value is predefined
The attribute-default-value can have the following values:

Value
#DEFAULT value
#REQUIRED
#IMPLIED
#FIXED value
Explanation
The attribute has a default value
The attribute value must be included in the
element
The attribute does not have to be included
The attribute value is fixed
11.1.1 Attribute declaration example

DTD example:
<!ELEMENT square EMPTY>
<!ATTLIST square width CDATA "0">
XML example:
<square width="100"></square>
34
Coalesce
In the above example the element square is defined to be an empty element with the attributes width
of type CDATA. The width attribute has a default value of 0.
11.1.2 Default attribute value
Syntax:
<!ATTLIST element-name attribute-name CDATA "default-value">
DTD example:
<!ATTLIST payment type CDATA "check">
XML example:
<payment type="check">
Specifying a default value for an attribute, assures that the attribute will get a value even if the author
of the XML document didn't include it.
11.1.3 Implied attribute
Syntax:
<!ATTLIST element-name attribute-name attribute-type #IMPLIED>
DTD example:
<!ATTLIST contact fax CDATA #IMPLIED>
XML example:
<contact fax="555-667788">
Use an implied attribute if you don't want to force the author to include an attribute and you don't
have an option for a default value either.
11.1.4 Required attribute
Syntax:
<!ATTLIST element-name attribute_name attribute-type #REQUIRED>
DTD example:
<!ATTLIST person number CDATA #REQUIRED>
XML example:
<person number="5677">
Use a required attribute if you don't have an option for a default value, but still want to force the
attribute to be present.
11.1.5 Fixed attribute value
Syntax:
<!ATTLIST element-name attribute-name attribute-type #FIXED "value">
DTD example:
<!ATTLIST sender company CDATA #FIXED "Microsoft">
XML example:
<sender company="Microsoft">
35
Coalesce
Use a fixed attribute value when you want an attribute to have a fixed value without allowing the
author to change it. If an author includes another value, the XML parser will return an error.
11.1.6 Enumerated attribute values
Syntax:
<!ATTLIST element-name attribute-name (eval|eval|..) default-value>
DTD example:
<!ATTLIST payment type (check|cash) "cash">
XML example:
<payment type="check">
Or
<payment type="cash">
Use enumerated attribute values when you want the attribute values to be one of a fixed set of legal
values.
11.2 DTD - Entities
11.2.1 Entities
Entities as variables used to define shortcuts to common text.

Entity references are references to entities.
Entities can be declared internal.
Entities can be declared external
11.2.2 Internal Entity Declaration

Syntax:
<!ENTITY entity-name "entity-value">
DTD Example:
<!ENTITY writer "Jan Egil Refsnes.">
<!ENTITY copyright "Copyright XML101.">
XML example:
<author>&writer;&copyright;</author>
11.2.3 External Entity Declaration

Syntax:
<!ENTITY entity-name SYSTEM "URI/URL">
DTD Example:
<!ENTITY writer
SYSTEM
"http://www.xml101.com/entities/entities.xml">
<!ENTITY copyright SYSTEM
"http://www.xml101.com/entities/entities.dtd">
XML example:
<author>&writer;&copyright;</author>
36
Coalesce
11.3 DTD Validation
Validating with the XML Parser
If you try to open an XML document, the XML Parser might generate an error. By accessing the
parseError object, the exact error code, the error text, and even the line that caused the error can be
retrieved:
xmlDoc.validateOnParse="true"
xmlDoc.load("note_dtd_error.xml")
document.write("<br>Error Code: ")
document.write(xmlDoc.parseError.errorCode)
document.write("<br>Error Reason: ")
document.write(xmlDoc.parseError.reason)
document.write("<br>Error Line: ")
document.write(xmlDoc.parseError.line)
Turning Validation off

Validation can be turned off by setting the XML parser's validateOnParse="false".
xmlDoc.validateOnParse="false"
xmlDoc.load("note_dtd_error.xml")
37
Coalesce
Part IV
XSD
38
Coalesce
Chapter 12
Introduction to XSD
The W3C XML Schema Definition Language is an XML language for describing and constraining
the content of XML documents. W3C XML Schema is a W3C Recommendation.
This article is an introduction to using W3C XML Schemas, and also includes a comprehensive
reference to the Schema datatypes and structures.
(Editor's note: this tutorial has been updated since its first publication in 2000, to reflect the finalization of W3C
XML Schema as a Recommendation.)
12.1 Introducing our First Schema
Let's start by having a look at this simple document which describes a book:
<book isbn="0836217462">
<title>
Being a Dog Is a Full-Time Job
</title>
<author>Charles M. Schulz</author>
<character>
<name>Snoopy</name>
<friend-of>Peppermint Patty</friend-of>
<since>1950-10-04</since>
<qualification>
extroverted beagle
</qualification>
</character>
<character>
<name>Peppermint Patty</name>
<qualification>bold, brash and tomboyish</qualification>
</character>
</book>
To write a schema for this document, we could simply follow its structure and define each element as
we find it. To start, we open a xs:schema element:
<xs:schema
xmlns:xs="http://www.w3.org/2001/XMLSchema">
.../...
</xs:schema>
The schema element opens our schema. It can also hold the definition of the target namespace and
several default options, of which we will see some of them in the following sections.
39
Coalesce
To match the start tag for the book element, we define an element named book. This element has
attributes and non text children, thus we consider it as a complexType (since the other datatype,
simpleType is reserved for datatypes holding only values and no element or attribute sub-nodes. The
list of children of the book element is described by a sequence element:
<xs:element name="book">
<xs:complexType>
<xs:sequence>
.../...
</xs:sequence>
.../...
</xs:complexType>
</xs:element>
The sequence is a "compositor" that defines an ordered sequence of sub-elements. We will see the
two other compositors, choice and all in the following sections.
Now we can define the title and author elements as simple types -- they don't have attributes or nontext children and can be described directly within a degenerate element element. The type (xs:string)
is prefixed by the namespace prefix associated with XML Schema, indicating a predefined XML
Schema datatype:
<xs:element name="title" type="xs:string"/>
<xs:element name="author" type="xs:string"/>
Now, we must deal with the character element, a complex type. Note how its cardinality is defined:
<xs:element name="character" minOccurs="0" maxOccurs="unbounded">
<xs:complexType>
<xs:sequence>
.../...
</xs:sequence>
</xs:complexType>
</xs:element>
Unlike other schema definition languages, W3C XML Schema lets us define the cardinality of an
element (i.e. the number of its possible occurrences) with some precision. We can specify both
minOccurs (the minimum number of occurences) and maxOccurs (the maximum number of
occurrences). Here maxOccurs is set to unbounded which means that there can be as many
occurences of the character element as the author wishes. Both attributes have a default value of one.
We specify then the list of all its children in the same way:
<xs:element name="name" type="xs:string"/>
<xs:element name="friend-of" type="xs:string"
minOccurs="0" maxOccurs="unbounded"/>
<xs:element name="since" type="xs:date"/>
<xs:element name="qualification" type="xs:string"/>
And we terminate its description by closing the complexType , element and sequence elements.
40
Coalesce
We can now declare the attributes of the document elements, which must always come last. There
appears to be no special reason for this, but the W3C XML Schema Working Group has considered
that it was simpler to impose a relative order to the definitions of the list of elements and attributes
within a complex type, and that it was more natural to define the attributes after the elements.
<xs:attribute name="isbn" type="xs:string"/>
And close all the remaining elements.

That's it! This first design, sometimes known as "Russian Doll Design" tightly follows the structure
of our example document.
One of the key features of such a design is to define each element and attribute within its context and
to allow multiple occurrences of a same element name to carry different definitions.
Complete listing of this first example:
<?xml version="1.0" encoding="utf-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:complexType>
<xs:sequence>
<xs:element name="character" minOccurs="0"
maxOccurs="unbounded">
<xs:complexType>
<xs:sequence>
<xs:element name="friend-of" type="xs:string"
minOccurs="0"
maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
12.2 Slicing the Schema

While the previous design method is very simple, it can lead to a depth in the embedded definitions,
making it hardly readable and difficult to maintain when documents are complex. It also has the
drawback of being very different from a DTD structure, an obstacle for human or machine agents
wishing to transform DTDs into XML Schemas, or even just use the same design guides for both
technologies.
The second design is based on a flat catalog of all the elements available in the instance document
and, for each of them, lists of child elements and attributes. This effect is achieved through using
41
Coalesce
references to element and attribute definitions that need to be within the scope of the referencer,
leading to a flat design:

<xs:element name="friend-of" type="xs:string"/>


<xs:element name="character">
<xs:complexType>
<xs:sequence>
<xs:element ref="name"/>
<xs:element ref="friend-of" minOccurs="0"
<xs:element ref="since"/>
<xs:element ref="qualification"/>


</xs:sequence>
</xs:complexType>
</xs:element>
<xs:complexType>
<xs:sequence>
<xs:element ref="title"/>
<xs:element ref="author"/>
<xs:element ref="character" minOccurs="0"
</xs:sequence>
<xs:attribute ref="isbn"/>
</xs:complexType>
</xs:element>
</xs:schema>
Using a reference to an element or an attribute is somewhat comparable to cloning an object. The

element or attribute is defined first, and it can be duplicated at another place in the document
structure by the reference mechanism, in the same way an object can be cloned. The two elements
(or attributes) are then two instances of the same class.
42
Coalesce
Chapter 13
Defining Named Types
We have seen that we can define elements and attributes as we need them (Russian doll design), or
create them first and reference them (flat catalog). W3C XML Schema gives us a third mechanism,
which is to define data types (either simple types that will be used for PCDATA elements or
attributes or complex types that will be used only for elements) and to use these types to define our
attributes and elements.
This is achieved by giving a name to the simpleType and complexType elements, and locating them
outside of the definition of elements or attributes. We will also take the opportunity to show how we
can derive a datatype from another one by defining a restriction over the values of this datatype.
For instance, to define a datatype named nameType, which is a string with a maximum of 32
characters, we will write:
<xs:simpleType name="nameType">
<xs:restriction base="xs:string">
<xs:maxLength value="32"/>
</xs:restriction>
</xs:simpleType>
The simpleType element holds the name of the new datatype. The restriction element expresses the
fact that the datatype is derived from the string datatype of the W3C XML Schema namespace
(attribute base) by applying a restriction, i.e. by limiting the number of possible values. The
maxLength element that, called a facet, says that this restriction is a condition on the maximum
length to be 32 characters.
Another powerful facet is the pattern element, which defines a regular expression that must be
matched. For instance, if we do not care about the "-" signs, we can define an ISBN datatype as 10
digits thus:
<xs:simpleType name="isbnType">
<xs:pattern value="[0-9]{10}"/>
</xs:restriction>
</xs:simpleType>
Facets, and the two other ways to derive a datatype (list and union) are covered in the next sections.
Complex types are defined as we've seen before, but given a name.
Defining and using named datatypes is comparable to defining a class and using it to create an object.
A datatype is an abstract notion that can be used to define an attribute or an element. The datatype
plays then the same role with an attribute or an element that a class would play with an object.
43
Coalesce

</xs:restriction>
</xs:simpleType>
<xs:simpleType name="sinceType">
<xs:restriction base="xs:date"/>
</xs:simpleType>
<xs:simpleType name="descType">
<xs:restriction base="xs:string"/>
</xs:simpleType>
</xs:restriction>
</xs:simpleType>

<xs:complexType name="characterType">
<xs:sequence>
<xs:element name="name" type="nameType"/>
<xs:element name="friend-of" type="nameType" minOccurs="0"
<xs:element name="since" type="sinceType"/>
<xs:element name="qualification" type="descType"/>
</xs:sequence>
</xs:complexType>
<xs:complexType name="bookType">
<xs:sequence>
<xs:element name="title" type="nameType"/>
<xs:element name="author" type="nameType"/>
<xs:element name="character" type="characterType" minOccurs="0"/>

</xs:sequence>
<xs:attribute name="isbn" type="isbnType" use="required"/>
</xs:complexType>

<xs:element name="book" type="bookType"/>
</xs:schema>
44
Coalesce
Chapter 14
Groups, Compositors and Derivation
14.1 Groups
W3C XML Schema also allows the definition of groups of elements and attributes.

<xs:group name="mainBookElements">
<xs:sequence>
<xs:element name="title" type="nameType"/>
<xs:element name="author" type="nameType"/>
</xs:sequence>
</xs:group>

<xs:attributeGroup name="bookAttributes">
<xs:attribute name="available" type="xs:string"/>
</xs:attributeGroup>
These groups can be used in the definition of complex types, as shown below.
<xs:sequence>
<xs:group ref="mainBookElements"/>
<xs:element name="character" type="characterType"
minOccurs="0" maxOccurs="unbounded"/>
</xs:sequence>
<xs:attributeGroup ref="bookAttributes"/>
</xs:complexType>
These groups are not datatypes but containers holding a set of elements or attributes that can be used
to describe complex types.
14.2 Compositors
So far, we have seen the xs:sequence compositor which defines ordered groups of elements (in fact,
it defines ordered group of particles, which can also be groups or other compositors). W3C XML
Schema supports two additional compositors that can be mixed to allow various combinations. Each
of these compositors can have minOccurs and maxOccurs attributes to define their cardinality.
The xs:choice compositor describes a choice between several possible elements or groups of
elements. The following group --compositors can appear within groups, complex types or other
45
Coalesce
compositors-- will accept either a single name element or a sequence of firstName, an optional
middleName and a lastName:
<xs:group name="nameTypes">
<xs:choice>
<xs:sequence>
<xs:element name="firstName" type="xs:string"/>
<xs:element name="middleName" type="xs:string" minOccurs="0"/>
<xs:element name="lastName" type="xs:string"/>
</xs:sequence>
</xs:choice>
</xs:group>
The xs:all compositor defines an unordered set of elements. The following complex type definition
allows its contained elements to appear in any order:
<xs:all>
<xs:element name="character" type="characterType" minOccurs="0"
</xs:all>
</xs:complexType>
In order to avoid combinations that could become ambiguous or too complex to be solved by W3C
XML Schema tools, a set of restrictions has been added to the xs:all particle:
they can appear only as a unique child at the top of a content model
and their children can be only xs:element definitions or references and cannot have a
cardinality greater than one.
14.3 Derivation of simple types

Simple datatypes are defined by derivation of other datatypes, either predefined and identified by the
W3C XML Schema namespace or defined elsewhere in your schema.
We have already seen examples of simple types derived by restriction (using xs:restriction elements).
The different kind of restrictions that can be applied on a datatype are called facets. Beyond the
xs:pattern (using a regular expression syntax) and xs:maxLength facets shown already, many facets
allow constraints on the length of a value, an enumeration of the possible values, the minimal and
maximal values, its precision and scale, etc.
Two other derivation methods are available that allow to define white space separated lists and union
of datatypes. The following definition uses xs:union extends the definition of our type for isbn to
accept the values TDB and NA:
<xs:union>
<xs:simpleType>
46
Coalesce
</xs:restriction>
</xs:simpleType>
<xs:simpleType>
<xs:restriction base="xs:NMTOKEN">
<xs:enumeration value="TBD"/>
<xs:enumeration value="NA"/>
</xs:restriction>
</xs:simpleType>
</xs:union>
</xs:simpleType>
The union has been applied on the two embedded simple types to allow values from both datatypes,
our new datatype will now accept the values from an enumeration with two possible values (TBD
and NA).
The following example type (isbnTypes) uses xs:list to define a whitespace-separated list of ISBN
values. It also derives a type (isbnTypes10) using xs:restriction that accept between 1 and 10 values,
separated by a whitespace:
<xs:simpleType name="isbnTypes">
<xs:list itemType="isbnType"/>
</xs:simpleType>
<xs:simpleType name="isbnTypes10">
<xs:restriction base="isbnTypes">
<xs:minLength value="1"/>
</xs:restriction>
</xs:simpleType>
14.4 Content Types

In the first part of this article, we examined the default content type behavior, modeled after dataoriented documents, where complex type elements are element and attribute only and simple type
elements are character data without attributes.
The W3C XML Schema Definition Language also supports defining empty content elements and
simple content (those that contain only character data) with attributes.
Empty content elements are defined using a regular xs:complexType construct and purposefully
omitting to define a child element. The following construct defines an empty book element accepting
an isbn attribute:
<xs:complexType>
<xs:attribute name="isbn" type="isbnType"/>
</xs:complexType>
</xs:element>
Simple content elements, i.e. character data elements with attributes, can be derived from simple
types using xs:simpleContent. The book element defined above can thus be extended to accept a text
value by:
47
Coalesce
<xs:complexType>
<xs:simpleContent>
<xs:extension base="xs:string">
<xs:attribute name="isbn" type="isbnType"/>
</xs:extension>
</xs:simpleContent>
</xs:complexType>
</xs:element>
Note the location of the attribute definition, showing that the extension is done through the addition
of the attribute. This definition will accept the following XML element:
<book isbn="0836217462">
Funny book by Charles M. Schulz.
Its title (Being a Dog Is a Full-Time Job) says it all !
</book>
W3C XML Schema supports mixed content though the mixed attribute in the xs:complexType
elements. Consider
<xs:complexType mixed="true">
<xs:all>
</xs:all>
</xs:complexType>
</xs:element>
which will validate an XML element such as:

<book isbn="0836217462">
Funny book by <author>Charles M. Schulz</author>.
Its title (<title>Being a Dog Is a Full-Time Job</title>) says it all !
</book>
Unlike DTDs, W3C XML Schema mixed content doesn't modify the constraints on the subelements, which can be expressed in the same way as simple content models. While this is a
significant improvement over XML 1.0 DTDs, note that the values of the character data and its
location relative to the child elements, cannot be constrained.
48
Coalesce
Chapter 15
Constraints
15.1 Unique
W3C XML Schema provides several flexible XPath-based features for describing uniqueness
constraints and corresponding references constraints. The first of these, a simple uniqueness
declaration, is declared with the xs:unique element. The following declaration done under the
declaration of our book element indicates that the character name must be unique:
<xs:unique name="charName">
<xs:selector xpath="character"/>
<xs:field xpath="name"/>
</xs:unique>
This location of the xs:unique element in the schema gives the context node in which the constraint
holds. By inserting xs:unique under our book element, we specify that the character has to be unique
within the context of this book only.
The two XPaths defined in the uniqueness constraint are evaluated relative to the context node. The
first of these paths is defined by the selector element. The purpose is to define the element which has
the uniqueness constraint -- the node to which the selector points must be an element node.
The second path, specified in the xs:field element is evaluated relative to the element identified by the
xs:selector, and can be an element or an attribute node. This is the node whose value will be checked
for uniqueness. Combinations of values can be specified by adding other xs:field elements within
xs:unique.
15.2 Keys
The second construct, xs:key, is similar to xs:unique except that the value has to be non null (note
that xs:unique and xs:key can both be referenced). To use the character name as a key, we can just
replace the xs:unique by xs:key:
<xs:key name="charName">
<xs:field xpath="name"/>
</xs:key>
15.3 Keyref
The third construct, xs:keyref, allows us to define a reference to a xs:key or a xs:unique. To show its
usage, we will introduce the friend-of element, to be used against characters:
<character>
<name>Snoopy</name>
<friend-of>Peppermint Patty</friend-of>
49
Coalesce
<qualification>
extroverted beagle
</qualification>
</character>
To indicate that friend-of needs to refer to a character from this same book, we will write, at the
same level as we defined our key constraint, the following:
<xs:keyref name="charNameRef" refer="charName">
<xs:field xpath="friend-of"/>
</xs:keyref>
These capabilities are almost independent of the other features in a schema. They are disconnected
from the definition of the datatypes. The only point anchoring them to the schema is the place where
they are defined, which establishes the scope of the uniqueness constraints.
15.4 Building Usable -- and Reusable -- Schemas
Perhaps the first step in writing reusable schemas is to document them. W3C XML Schema provides
an alternative to XML comments (for humans) and processing instructions (for machines) that might
be easier to handle for supporting tools.
Human readable documentation can be defined by xs:documentation elements, while information
targeted to applications should be included in xs:appinfo elements. Both elements need to be
included in an xs:annotation element and accept optional xml:lang and source attributes and any
content type. The source attribute is a URI reference that can be used to indicate the purpose of the
comment documentation or application information.
The xs:annotation elements can be added at the beginning of most schema constructions, as shown
in the example below. The appinfo section demonstrates how custome namespaces and schemes
might allow the binding of an element to a Java class from within the schema.
<xs:annotation>
<xs:documentation xml:lang="en">
Top level element.
</xs:documentation>
<xs:documentation xml:lang="fr">
Element racine.
</xs:documentation>
<xs:appinfo source="http://example.com/foo/">
<bind xmlns="http://example.com/bar/">
<class name="Book"/>
</bind>
</xs:appinfo>
</xs:annotation>
50
Coalesce
15.5 Composing schemas from multiple files
For those who want to define a schema using several XML documents -- either to split a large
schema, or to use libraries of schema snippets -- W3C XML Schema provides two mechanisms for
including external schemas.
The first one, xs:include, is similar to a copy and paste of the definitions of the included schema: it's
an inclusion and as such it doesn't allow to override the definitions of the included schema. It can be
used this way:
<xs:include schemaLocation="character.xsd"/>
The second inclusion mechanism, xs:redefine, is similar to xs:include, except that it lets you redefine
declarations from the included schema.
<xs:redefine schemaLocation="character12.xsd">
</xs:restriction>
</xs:simpleType>
</xs:redefine>
Note that the declarations that are redefined must be placed in the xs:redefine element.
We've already seen many features that can be used together with xs:include and xs:redefine to create
libraries of schemas. We've seen how we can reference previously defined elements, how we can
define datatypes by derivation and use them, how we can define and use groups of attributes. We've
also seen the parallel between elements and objects, and datatypes and classes. There are other
features borrowed from object oriented designs that can be used to create reusable schemas.
15.6 Abstract types
The first of these features derived from object oriented design is the substitution group. Unlike the
features we've seen so far, a substitution group is not defined explicitly through a W3C XML Schema
element, but through referencing a common element (called the head) using a substitutionGroup
attribute.
The head element doesn't hold any specific declaration but must be global. All the elements within a
substitution group need to have a type that is either the same type as the head element or can be
derived from it. Then they can all be used in place of the head element. In the following example, the
element surname can be used anywhere an element name has been defined.
<xs:element name="surname" type="xs:string"
substitutionGroup="name" />
Now, we can also define a generic name-elt element, head of a substitution group, that shouldn't be
used directly, but in one of its derived forms. This is done through declaring the element as abstract,
analogous to abstract classes in object oriented languages. The following example defines name-elt as
an abstract element that should be replaced either by name or surname everywhere it is referenced.
51
Coalesce
<xs:element name="name-elt" type="xs:string" abstract="true"/>
<xs:element name="name" type="xs:string"
substitutionGroup="name-elt"/>
<xs:element name="surname" type="xs:string"
substitutionGroup="name-elt"/>
15.7 Final types

We could, on the other hand, wish to control derivation performed on a datatype. W3C XML
Schema supports this through the final attribute in a xs:complexType, xs:simpleType or xs:element
element. This attribute can take the values restriction, extension and #all to block derivation by
restriction, extension or any derivation. The following snippet would, for instance, forbid any
derivation of the characterType complex type.
<xs:complexType name="characterType" final="#all">
<xs:sequence>
<xs:element name="name" type="nameType"/>
<xs:element name="since" type="sinceType"/>
<xs:element name="qualification" type="descType"/>
</xs:sequence>
</xs:complexType>
In addition to final, a more fine-grained mechanism is provided to control the derivation of simple
types that operate on each facet. Here, the attribute is called fixed, and when its value is set to true,
the facet cannot be further modified (but other facets can still be added or modified). The following
example prevents the size of our nameType simple type to be redefined:
<xs:maxLength value="32" fixed="true"/>
</xs:restriction>
</xs:simpleType>
15.8 Namespaces
Namespaces support in W3C XML Schema is flexible yet straightforward. It not only allows the use
of any prefix in instance documents (unlike DTDs) but also lets you open your schemas to accept
unknown elements and attributes from known or unknown namespaces.
Each W3C XML Schema document is bound to a specific namespace through the targetNamespace
attribute, or to the absence of namespace through the lack of such an attribute. We need at least one
schema document per namespace we want to define (elements and attributes without namespaces
can be defined in any schema, though).
Until now we have omitted the targetNamespac attribute, which means that we were working
without namespaces. To get into namespaces, let's first imagine that our example belongs to a single
namespace:
<book isbn="0836217462" xmlns="http://example.org/ns/books/">
.../...
</book>
52
Coalesce
The least intrusive way to adapt our schema is to add some more attributes to our xs:schema
element.
<xs:schema targetNamespace="http://example.org/ns/books/"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:bk="http://example.org/ns/books/"
elementFormDefault="qualified"
attributeFormDefault="unqualified">
.../...
</xs:schema>
The
namespace
declarations
play
an
important
role.
The
first
one
(xmlns:xs="http://www.w3.org/2001/XMLSchema") says not only that we've chosen to use the
prefix xs to identify the elements that will be W3C XML Schema instructions, but also that we will
prefix the W3C XML Schema predefined datatypes with xs as we have done all over the examples
thus far. Understand that we could have chosen any prefix instead of xs. We could even make
http://www.w3.org/2001/XMLSchema our default namespace and in this case, we wouldn't have
prefixed the W3C XML Schema elements nor its datatypes.
Since we are working with the http://example.org/ns/books/ namespace, we define it (with a bk prefix).
This means that we will now prefix the references to "objects" (datatypes, elements, attributes, ...)
belonging to this namespace with bk:. Again, we could have chosen any prefix to identify this
namespace or even have made it our default namespaces (note that the XPath expressions used in
xs:unique, xs:key and xs:keyref do not use a default namespace, though).
The targetNamespace attribute lets you define, independently of the namespace declarations, which
namespace is described in this schema. If you need to reference objects belonging to this namespace,
which is usually the case except when using a pure "Russian doll" design, you need to provide a
namespace declaration in addition to the targetNamespace.
The final two attributes (elementFormDefault and attributeFormDefault) are a facility provided by
W3C XML Schema to control, within a single schema, whether attributes and elements are
considered by default to be qualified (in a namespace). This differentiation between qualified and
unqualified can be indicated by specifying the default values, as above, but also when defining the
elements and attributes, by adding a form attribute of value qualified or unqualified.
It is important to note that only local elements and attributes can be specified as unqualified. All
globally defined elements and attributes must always be qualified.
15.9 Importing definitions from external namespaces
W3C XML Schema, not unlike XSLT and XPath, uses namespace prefixes within the value of some
attributes to identify the namespace of data types, elements, attributes, atc. For instance, we've used
this feature all along our examples to identify the W3C XML Schema predefined datatypes. This
mechanism can be extended to import definitions from any other namespace and so reuse them in
our schemas.
Reusing definitions from other namespaces is done through a three-step process. This process needs
to be done even for the XML 1.0 namespace, in order to declare attributes such as xml:lang. First,
the namespace must be defined as usual.
53
Coalesce
<xs:schema targetNamespace="http://example.org/ns/books/"
xmlns:xml="http://www.w3.org/XML/1998/namespace"
xmlns:bk="http://example.org/ns/books/"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
elementFormDefault="qualified"
attributeFormDefault="qualified">
.../...
</xs:schema>
Then W3C XML Schema needs to be informed of the location at which it can find the schema
corresponding to the namespace. This is done using an xs:import element.
<xs:import namespace="http://www.w3.org/XML/1998/namespace"
schemaLocation="myxml.xsd"/>
W3C XML Schema now knows that it should attempt to find any reference belonging to the XML
namespace in a schema located at myxml.xsd. We can now use the external definition.
<xs:element name="title">
<xs:complexType>
<xs:simpleContent>
<xs:extension base="xs:string">
<xs:attribute ref="xml:lang"/>
</xs:extension>
</xs:simpleContent>
</xs:complexType>
</xs:element>
You may wonder why we have chosen to reference the xml:lang attribute from the XML namespace,
rather than creating an attribute with a type xml:lang. We've done so because there is an important
difference between referencing an attribute (or an element) and referencing a datatype when
namespaces are concerned:
Referencing an element or an attribute imports the whole thing with its name and
namespace,
Referencing a datatype imports only its definition, leaving you with the task of giving a name
to the element and attribute you're defining and using the target namespace (or no
namespace if your attribute or element is unqualified).
15.10 Including unknown elements

To finish this section about namespaces, we need to see how, as promised in our introduction, we
can open our schema to unknown elements, attributes and namespaces. This is done using xs:any
and xs:anyAttribute, allowing, respectivly, to include any elements or attributes.
For instance, if we want to extend the definition of our description type to any XHTML tag, we
could declare:
<xs:complexType name="descType" mixed="true">
<xs:sequence>
<xs:any namespace="http://www.w3.org/1999/xhtml"
processContents="skip" minOccurs="0"
54
Coalesce
</xs:sequence>
</xs:complexType>
The xs:anyAttribute gives the same functionality for attributes.

The type descType is now mixed content and accepts an unbounded number of any element from
the http://www.w3.org/1999/xhtml namespace. The processContents attribute is set to skip telling a
W3C XML Schema processor that no validation of these elements should be attempted. The other
permissible values could are strict asking to validate these elements or lax asking to validate them
when possible. The namespace attribute accepts a whitespace-separated list of URIs and the special
values ##local (non qualified elements) and ##targetNamespace (the target namespace) that can be
included in the list and ##other (any namespace other than the target) or ##any (any namespace)
that can replace the list. It is not possible to specify any namespace except those from a list.
15.11 W3C XML Schema and Instance Documents
We've now covered most of the features of W3C XML Schema, but we still need to have a glance on
some extensions that you can use within your instance documents. In order to differentiate these
other features, a separate namespace, http://www.w3.org/2001/XMLSchema-instance, usually associated
with the prefix xsi.
The xsi:noNamespaceSchemaLocation and xsi:schemaLocation attributes allow you to tie a
document to its W3C XML Schema. This link is not mandatory, and other indications can be given
at validation time, but it does help W3C XML Schema-aware tools to locate a schema.
Dependent on using namespaces, the link will be either
<book isbn="0836217462"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="file:library.xsd">
Or, as below (noting the syntax with a URI for the namespace and the URI of the schema, separated
by whitespace in the same attribute):
<book isbn="0836217462"
xmlns="http://example.org/ns/books/"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation=
"http://example.org/ns/books/ file:library.xsd">
The other use of xsi attributes is to provide information about how an element corresponds to a
schema.These attributes are xsi:type, which lets you define the simple or complex type of an element
and xsi:nil, which lets you specify a nil (null) value for an element (that has to be defined as nillable in
the schema using a nillable=true attribute). You don't need to declare these attributes in your W3C
XML Schema to be able to use them in an instance document.
55
Coalesce
Part V
XSL
56
Coalesce
Chapter 16
Introduction to XSL
16.1 XSL - The Style Sheet of XML?
HTML pages uses predefined tags, and the meaning of these tags is well understood: <p> means a
paragraph and <h1> means a header, and the browser knows how to display these pages.
With XML we can use any tags we want, and the meaning of these tags are not automatically
understood by the browser: <table> could mean a HTML table or maybe a piece of furniture.
Because of the nature of XML, there is no standard way to display an XML document.
In order to display XML documents, it is necessary to have a mechanism to describe how the
document should be displayed. One of these mechanisms is Cascading Style Sheets (CSS), but XSL
(eXtensible Stylesheet Language) is the preferred style sheet language of XML, and XSL is far more
sophisticated than the CSS used by HTML.
16.2 XSL - More than a Style Sheet
XSL consists of two parts:
a method for transforming XML documents

a method for formatting XML documents
If you don't understand the meaning of this, think of XSL as a language that can transform XML
into HTML, a language that can filter and sort XML data and a language that can format XML data,
based on the data value, like displaying negative numbers in red.
16.3 XSL - What can it do?
XSL can be used to define how an XML file should be displayed by transforming the XML file into a
format that is recognizable to a browser. One such format is HTML. Normally XSL does this by
transforming each XML element into an HTML element.
XSL can also add completely new elements into the output file, or remove elements. It can rearrange
and sort the elements, test and make decisions about which elements to display, and a lot more.
16.4 A note about XSL in IE5
XSL in Internet Explorer 5.0 is not 100% compatible with the latest released W3C XSL standard.
That is because IE 5 was released before the standard was completely settled. Microsoft has
promised to solve this problem in the 5.5 release
57
Coalesce
Chapter 17
XSL - Transformation
17.1 Transforming XML to HTML
What if you want to transform the following XML document into HTML?
<CATALOG>
<CD>
<YEAR>1985</YEAR>
</CD>
.
.
.
Consider the following XSL document as an HTML template to populate a HTML document with
XML data:
<?xml version='1.0'?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/TR/WD-xsl">
<xsl:template match="/">
<html>
<body>
<table border="2" bgcolor="yellow">
<tr>
<th>Title</th>
<th>Artist</th>
</tr>
<xsl:for-each select="CATALOG/CD">
<tr>
<td><xsl:value-of select="TITLE"/></td>
<td><xsl:value-of select="ARTIST"/></td>
</tr>
</xsl:for-each>
</table>
</body>
</html>
</xsl:template>
</xsl:stylesheet>
In the above file, the xsl:for-each element locates elements in the XML document and repeats a
template for each one. The select attribute describes the element in the source document. The
syntax for this attribute is called an XSL Pattern, and works like navigating a file system where a
58
Coalesce
forward slash (/) selects subdirectories. The xsl:value-of element selects a child in the hierarchy and
inserts the content of that child into the template.
Since an XSL style sheet is an XML file itself, the file begins with an xml declaration. The
xsl:stylesheet element indicates that this document is a style sheet. The template has also been
wrapped with xsl:template match="/" to indicate that this is a template that corresponds to the
root (/) of the XML source document.
If you add a reference to the above stylesheet to your original XML document (look at line 2), your
browser will nicely transform your XML document into HTML :
<?xml-stylesheet type="text/xsl" href="cd_catalog.xsl"?>
<CATALOG>
<CD>
<YEAR>1985</YEAR>
</CD>
.
.
.
59
Coalesce
Chapter 18
XSL on Different Machines
18.1 XSL - On the Client
18.1.1 A JavaScript Solution
In the previous chapter I explained how XSL can be used to transform a document from XML to
HTML. The trick was to add an XSL stylesheet information to the XML file, and to let the browser
do the transformation.
Even if this works fine, it is not always desirable to include a stylesheet reference in the XML file,
and the solution will not work in a non XML aware browser.
A much more versatile solution would be to use a JavaScript to do the XML to HTML
transformation.
By using a JavaScript we are more open for these possibilities:
Allowing the JavaScript to do browser specific testing

Using different style sheets according to browser and/or user needs
That's the beauty of XSL. One of the design goals for XSL was to make it possible to transform data
from one format to another, supporting different browsers and different user needs.
XSL transformation on the client side is bound to be a major part of the browsers work tasks in the
future, as we will se a growth in the specialized browser marked (think: Braille, Speaking Web, Web
Printers, Handheld PCs, Mobile Phones .....).
18.1.2 The XML file and the XSL file
Take a new look at the XML document that you saw in the previous chapter:
<CATALOG>
<CD>
<YEAR>1985</YEAR>
</CD>
.
.
.
60
Coalesce
And at the companying XSL stylesheet
<html>
<body>
<tr>
<th>Title</th>
<th>Artist</th>
</tr>
<tr>
</tr>
</xsl:for-each>
</table>
</body>
</html>
</xsl:template>
</xsl:stylesheet>
The syntax of the above XSL document was explained in the previous chapter, so it will not be
explained here. But be sure to notice that the XML file does not have a reference to the XSL file, and
the XSL file does not have a reference to the XML file.
IMPORTANT: The above sentence indicates that an XML file could be transformed using many
different XSL files
18.1.3 Transforming XML to HTML on the client
Here is the simple source code needed transform the XML file to HTML on the client
<html>
<body>
<script language="javascript">
// Load XML
var xml = new ActiveXObject("Microsoft.XMLDOM")
xml.async = false
xml.load("cd_catalog.xml")
// Load the XSL
var xsl = new ActiveXObject("Microsoft.XMLDOM")
xsl.async = false
xsl.load("cd_catalog.xsl")
// Transform
document.write(xml.transformNode(xsl))
</script>
</body>
</html>
61
Coalesce
The first block of code creates an instance of the Microsoft XML parser (XMLDOM), and loads the
XML document into memory. The second block of code creates another instance of the parser and
loads the XSL document into memory. The last line of code transforms the XML document using
the XSL document, and writes the result to the HTML document.
18.2 XSL - On the Server
18.2.1 A Cross Browser Solution
In the previous chapter I explained how XSL can be used to transform a document from XML to
HTML in the browser. The trick was to let the JavaScript use an XML parser to do the
transformation.
This solution will not work with a browser that don't support an XML parser.
To make our XML data available to all kinds of browsers, we have to transform the XML document
on the SERVER and send it as pure HTML to the BROWSER.
That's another the beauty of XSL. One of the design goals for XSL was to make it possible to
transform data from one format to another on a server, returning readable data to all kinds of future
browsers.
XSL transformation on the server is bound to be a major part of the Internet Information Server
work tasks in the future, as we will se a growth in the specialized browser marked (think: Braille,
Speaking Web, Web Printers, Handheld PCs, Mobile Phones .....).
18.2.2 The XML file and the XSL file
Take a new look at the XML document that you saw in the previous chapter:
<CATALOG>
<CD>
<YEAR>1985</YEAR>
</CD>
.
.
.
And at the companying XSL stylesheet

<html>
<body>
<tr>
62
Coalesce
<th>Title</th>
<th>Artist</th>
</tr>
<tr>
</tr>
</xsl:for-each>
</table>
</body>
</html>
</xsl:template>
</xsl:stylesheet>
The syntax of the above XSL document was explained in the previous chapter, so it will not be
explained here. But be sure to notice that the XML file does not have a reference to the XSL file, and
the XSL file does not have a reference to the XML file.
IMPORTANT: The above sentence indicates that an XML file on the server could be transformed
using many different XSL files.
18.2.3 Transforming XML to HTML on the Server
Here is the simple source code needed transform the XML file to HTML on the server
<%
'Load the XML
set xml = Server.CreateObject("Microsoft.XMLDOM")
xml.async = false
xml.load(Server.MapPath("cd_catalog.xml"))
'Load the XSL
set xsl = Server.CreateObject("Microsoft.XMLDOM")
xsl.async = false
xsl.load(Server.MapPath("cd_catalog.xsl"))
Response.Write(xml.transformNode(xsl))%>
The first block of code creates an instance of the Microsoft XML parser (XMLDOM), and loads the
XML file into memory. The second block of code creates another instance of the parser and loads
the XSL document into memory. The last line of code transforms the XML document using the XSL
document, and returns the result to the browser.
63
Coalesce
Chapter 19
XSL Functions
19.1 XSL Sort
19.1.1 Where to put the Filter Information
Take a new look at the XML document that you have seen in almost every chapter
<CATALOG>
<CD>
<YEAR>1985</YEAR>
</CD>
.
.
.
To output this XML file as an ordinary HTML file, and sort it at the same time, simply add an orderby attribute to your for-each element like this:
<xsl:for-each select="CATALOG/CD" order-by="+ ARTIST">
The order-by attributes takes a plus (+) or minus (-) sign, to define an ascending or descending sort
order, and an element name to define the sort element.
Now take a look at your slightly adjusted XSL stylesheet
<html>
<body>
<tr>
<th>Title</th>
<th>Artist</th>
</tr>
<xsl:for-each select="CATALOG/CD" order-by="+ ARTIST">
<tr>
</tr>
</xsl:for-each>
</table>
</body>
</html>
64
Coalesce
</xsl:template>
</xsl:stylesheet>
19.1.2 Transforming it on the Client

Here is the simple source code needed transform the XML file to HTML on the client as shown in
the previous chapter.
19.2 XSL Filter Query
19.2.1 Where to put the Filter Information
<CATALOG>
<CD>
<YEAR>1985</YEAR>
</CD>
.
.
.
To filter the XML file, simply add filter to the select attribute in your for-each element like this:
<xsl:for-each select="CATALOG/CD[ARTIST='Bob Dylan']">
Leagal filter operators are:
= (equal)
=! (not equal)
&LT& less than
&GT& greater than

<html>
<body>
<tr>
<th>Title</th>
<th>Artist</th>
</tr>
<xsl:for-each select="CATALOG/CD[ARTIST='Bob Dylan']">
<tr>
65
Coalesce
</tr>
</xsl:for-each>
</table>
</body>
</html>
</xsl:template>
</xsl:stylesheet>

the previous chapter.
19.3 XSL Conditional Choose
19.3.1 Where to put the Choose Condition
<CATALOG>
<CD>
<YEAR>1985</YEAR>
</CD>
.
.
.
To insert an conditional choose test against the content of the file, simply add an xsl:choose,
xsl:when and xsl:otherwise elements to your XSL document like this:
<xsl:choose>
<xsl:when match=".[ARTIST='Bob Dylan']">
... some code ...
</xsl:when>
<xsl:otherwise>
... some code ....
</xsl:otherwise>
</xsl:choose>

<html>
<body>
<tr>
66
Coalesce
<th>Title</th>
<th>Artist</th>
</tr>
<tr>
<xsl:choose>
<xsl:when match=".[ARTIST='Bob Dylan']">
<td bgcolor="#ff0000"><xsl:value-of select="ARTIST"/></td>
</xsl:when>
<xsl:otherwise>
</xsl:otherwise>
</xsl:choose>
</tr>
</xsl:for-each>
</table>
</body>
</html>
</xsl:template>
</xsl:stylesheet>

the previous chapter
67
Coalesce
Part VI
DOM
68
Coalesce
Chapter 20
The XML DOM
20.1 The Document Object Model
The DOM is a programming interface for HTML and XML documents. It defines the way a
document can be accessed and manipulated.
Using a DOM, a programmer can create a document, navigate its structure, and add, modify, or
delete its elements.
As a W3C specification, one important objective for the DOM has been to provide a standard
programming interface that can be used in a wide variety of environments and applications.
The W3C DOM has been designed to be used with any programming language.
20.2 The Node Interface
As you will se in the next section, a program called an XML parser can be used to load an XML
document into the memory of your computer. When the document is loaded, it's information can be
retrieved and manipulated by accessing the Document Object Model (DOM).
The DOM represents a tree view of the XML document. The documentElement is the top-level
of the tree. This element has one or many childNodes that represent the branches of the tree.
A Node Interface is used to read and write (or access if you like) the individual elements in the
XML node tree. The childNodes property of the documentElement can be accesses with a for/each
construct to enumerate each individual node.
The Microsoft XML parser used to demonstrate the DOM in this Web, supports all the necessary
functions to traverse the node tree, access the nodes and their attribute values, insert and delete
nodes, and convert the node tree back to XML.
All the demonstrated Microsoft XML parser functions are from the official W3C XML DOM
recommendation, apart from the load and loadXML functions. (Believe it or not: The official DOM
does not include standard functions for loading XML documents !!)
A total of 13 node types are currently supported by the Microsoft XML parser. The following table
lists the most commonly used node types:
Node Type
Document type
Processing instruction
Element
Attribute
Text
Example
<!DOCTYPE food SYSTEM "food.dtd">
<drink type="beer">Carlsberg</drink>
type="beer"
Carlsberg
69
Coalesce
20.3 Parsing the DOM
20.3.1 Using the XML parser
To read and update - create and manipulate - an XML document, you need an XML parser. The
Microsoft XML parser is a COM component that comes with Microsoft Internet Explorer 5.0. Once
you have installed IE 5.0, the parser is available to scripts inside HTML documents and ASP files.
The Microsoft XMLDOM parser features a language-neutral programming model that:
Supports JavaScript, VBScript, Perl, VB, Java, C++ and more

Supports W3C XML 1.0 and XML DOM
Supports DTD and validation
If you are using JavaScript in IE 5.0, you can create an XML document object with the following
code:
If you are using VBScript you create the XML document object with the following code:
set xmlDoc = CreateObject("Microsoft.XMLDOM")
If you are using VBScript in an Active Server Page (ASP), you can use the following code:
set xmlDoc = Server.CreateObject("Microsoft.XMLDOM")
20.3.2 Loading an XML file into the parser
The following code loads an existing XML document (note.xml) into the XML parser:
<script language="JavaScript">
// ....... processing the document goes here
</script>
The first line of the script creates an instance of the Microsoft XML parser. The third line tells the
parser to load an XML document called note.xml. The second line assures that the parser will halt
execution until the document is fully loaded.
20.3.3 Loading pure XML text into the parser
The following code loads a text string into the XML parser:
<script language="JavaScript">
var text="<note>"
70
Coalesce
text=text+"<to>Tove</to><from>Jani</from>"
text=text+"<heading>Reminder</heading>"
text=text+"<body>Don't forget me this weekend!</body>"
text=text+"</note>"
xmlDoc.loadXML(text)
// ....... processing the document goes here
</script>
Note that the "loadXML
71
Coalesce
Chapter 21
Parse Errors
21.1 The parseError Object
If you try to open an XML document, the XML Parser might generate an error. By accessing the
parseError object, the exact error code, the error text, and even the line that caused the error can be
retrieved:
21.1.1 File Error
In this example we let the XML parser try to load a non existing file, and display some of its error
properties:
xmlDoc.load("ksdjf.xml")
21.1.2 XML Error
Now we let the parser load an XML document that is not well formed. (if you don't know what well
formed XML is, read the XML part of this Web)
xmlDoc.load("note_error.xml")
18.2 The parseError Properties
Property
errorCode
reason
Line
Description
Returns a long integer error code
Returns a string explaining the reason for the error
Returns a long integer representing the line number for the error
72
Coalesce
linePos
srcText
url
filePos
Returns a long integer representing the line position for the error
Returns a string containing the line that caused the error
Returns the url pointing the loaded document
Returns a long integer file position of the error
18.3 Accessing the DOM

18.3.1 Traversing the node tree
One very common way to extract XML elements from an XML document is to traverse the node
three and extract the text value of each elements. A small snippet of programming code like a
VBScript for/each construct can be written to demonstrate this.
The following VBScript code traverses an XML node tree, and displays the XML elements in the
browser:
set xmlDoc=CreateObject("Microsoft.XMLDOM")
for each x in xmlDoc.documentElement.childNodes
document.write(x.nodename)
document.write(": ")
document.write(x.text)
next
18.3.2 Providing HTML content from XML files
One of the great promises of XML is the possibility to separate HTML documents from their data.
By using an XML parser inside the browser, an HTML page can be constructed as a static document,
with an embedded JavaScript to provide dynamic data. When you add that these JavaScripts can
access Active Server Pages from a Web server, the future looks very bright.
The following JavaScript reads XML data from an XML document and writes the XML data into
(waiting) HTML elements.
nodes = xmlDoc.documentElement.childNodes
to.innerText = nodes.item(0).text
from.innerText = nodes.item(1).text
header.innerText = nodes.item(2).text
body.innerText = nodes.item(3).text
73
Coalesce
18.3.3 Accessing XML elements by name
The following JavaScript reads XML data from an XML document and writes the XML data into
(waiting) HTML elements.
document.write(xmlDoc.getElementsByTagName("from").item(0).text)
74
Coalesce
Part VII
XLink and XPath
75
Coalesce
Chapter 22
Introduction to XLink
The very nature of the success of the Web lies in its capability for linking resources. However, the
unidirectional, simple linking structures of the Web today are not enough for the growing needs of
an XML world. The official W3C solution for linking in XML is called XLink (XML Linking
Language). This article explains its structure and use according to the most recent Candidate
Recommendation (July 3, 2000).
22.1 Overview
Every developer is familiar with the linking capabilities of the Web today. However, as the use of
<A
XML
grows,
we
quickly
realize
that
simple
tags
like
HREF="elem_lessons.html">Freud</A> are not going to be enough for many of our needs.
Consider, for example the problem of creating an XML-based help system similar to ones used in
some PC applications. Among other things (such as displaying amusingly animated characters), the
system might be capable of performing the following actions when a user clicks on a topic:
Opening an explanatory text (with a link back to the main index)

Opening a window and simulate the actions to be taken (e.g., going to the "Edit" menu and
pressing "Include Image")
Opening up a relevant dialog (e.g, a file chooser for the image to include)
Trying to code something like this (links with multiple targets, directions, and roles) in XML while
having old "<a href..." in mind is confusing, and leads people to questions like the following:
What is the "correct" tag for links in XML?>

If there is such a magic element, how can I make it point to more than one resource?
What if I want links to have different meanings relevant to my data? E.g., the "motherhood"
and "friendship" relationships between two "person" elements
In answer to these and many other linking questions, this article describes the structure and use of
XLink. The article is composed of three parts: a brief example that illustrates the basics of the
language, a complete review of the structure of XLink, and a list of XLink-related resources. The
resources include some XSLT transformations that enable your HTML output to simulate required
XLink behavior on today's browsers.
Before we start to dissect the structure of XLink, let's examine a concrete example.
22.2 The Artist/Influence problem
Suppose you want to express in XML the relationship between artists and their environment. This
includes making links from an artist to his/her influences, as well as links to descriptions of historical
events of their time. The data for each artist might be written in a file like the following:
76
Coalesce
<artistinfo>
<surname>Modigliani</surname>
<name>Amadeo</name>
<born>July 12, 1884</born><died>January 24, 1920</died>
<biography>
<p>In 1906, Modigliani settled in Paris, where ...</p>
</biography>
</artistinfo>
Also, brief descriptions of time periods are included in separate files such as:
<period>
<city>Paris</city>
<country>France<country>
<timeframe begin="1900" end="1920"/>
<title>Paris in the early 20th century (up to the twenties)</title>
<end>Amadeo</end>
<description>
<p>During this period, Russian, Italian, ...</p>
</description>
</period>
Fulfilling our requirement (i.e. creating a file that relates artists to their influences and periods) is a
task beyond a simple strategy like adding "a" or "img" links to the above documents, for several
reasons:
A single artist has many influences (a link points from one resource to many).
A single artist has associations with many periods.
The link itself must be semantically meaningful. (Having an influence is not the same as
belonging to a period, and we want to express that in our document!)
22.3 The XLink Solution

In XLink we have two type of linking elements: simple (like "a" and "img" in HTML) and extended.
Links are represented as elements. However, XLink does not impose any particular "correct" name
for your links; instead, it lets you decide which elements of your own are going to serve as links, by
means of the XLink attribute type. An example snippet will make this clearer:
<environment xlink:type="extended">


</environment>
Now that we have our extended link, we must specify the resources involved. Since the artist and
movement information are stored outside our own document (so we have no control over them), we
use XLink's locator elements to reference them. Again, the strategy is not to impose a tag name, but to
let you mark your elements as locators using XLink attributes:
<environment xmlns:xlink="http://www.w3.org/1999/xlink"
xlink:type="extended">
77
Coalesce


<artist
xlink:type="locator" xlink:label="artist"
xlink:href="modigliani.xml"/>
<influence xlink:type="locator" xlink:label="inspiration"
xlink:href="cezanne.xml"/>
xlink:href="lautrec.xml"/>
xlink:href="rouault.xml"/>
<history
xlink:type="locator" xlink:label="period"
xlink:href="paris.xml"/>
<history
xlink:href="kisling.xml"/>
</environment>
Only one thing is missing: We must specify how the resources relate to each other. We do this by
specifying arcs between them:
<environment xmlns:xlink="http://www.w3.org/1999/xlink"
xlink:type="extended">

<artist
xlink:type="locator" xlink:role="artist"
xlink:href="modigliani.xml"/>
xlink:href="cezanne.xml"/>
xlink:href="lautrec.xml"/>
xlink:href="rouault.xml"/>
<history
xlink:href="paris.xml"/>
<history
xlink:href="kisling.xml"/>
<bind xlink:type="arc" xlink:from="artist"
xlink:to="inspiration"/>
<bind xlink:type="arc" xlink:from="artist"
xlink:to="period"/>
</environment>
As you can see, using XLink, our problem is reduced to creating an XML file full of elements like the
above, where all the resources and their relationships are clearly and elegantly specified.
In this section we saw a small example of the use and syntax of XLink. In the next one, we will
examine in detail the constructs and rules of this linking mechanism.
22.4 XLink Reference

Now that we have a basic idea of how XLink looks, it's time to dive into the details. This section
presents all the constructs and rules contained in the XLink specification.
78
Coalesce
22.4.1 Basics
XLink works by proving you with global attributes you can use to mark your elements as linking
elements. In order to use linking elements, the declaration of the XLink namespace is required:
<my_element xmlns:xlink="http://www.w3.org/1999/xlink"> ...
Using the global attributes provided by XLink, one may specify whether a particular element is a
linking element, and many properties about it (e.g., when to load the linked resources, how to see
them once they are loaded, etc.). The global attributes provided by XLink are the following:
Type definition attribute
type
Locator attribute
href
Semantic attributes
role, arcrole, title
Behavior attributes
show, actuate
Traversal attributes
label, from, to
The next sections explain each of these attributes, their possible values and the rules that govern their
use.
22.5 The XLink type attribute
The type attribute may have one of the following values:
simple: a simple link

extended: an extended, possibly multi-resource, link
locator: a pointer to an external resource
resource: an internal resource
arc: a traversal rule between resources
title: a descriptive title for another linking element
By convention, when an attribute includes the type attribute with a value V, we will refer to it as a Vtype element, no matter what its actual name is.

<bookref xlink:type="locator" ...
Two restrictions stem from the fact that an element belongs to a certain XLink type:
1. Given an element of a particular type, only elements of certain types are relevant as XLink
subelements.
2.

5.
6.
<a xlink:type="simple" href="monet.html"> ... no other
7.
xlink element would make sense here... </a>
8. Given an element of a particular type, only some XLink attributes apply:
9.

13.
<bookref xlink:type="locator" href="ficciones.xml"/>
The following two tables summarize the attribute and subelement restrictions of each type (they are
included here as a reference, but each element will be properly explained later on). In Table 1, "R"
indicates "required," and "O" indicates "optional." A blank space indicates an invalid combination.
Table 2 shows which XLink elements are permitted which XLink subelements.
Attribute simple
type
href
role
arcrole
title
show
actuate
label
from
to
R
O
O
O
O
O
O
extended
R
O
O
locator arc
R
R
R
O
O
O
O
O
O
O
O
O
resource title
R
R
O
O
Table 1 - Attribute usage (from the W3C specification)

Parent type
simple
extended
locator
arc
resource
title
Significant child element types

locator, arc, resource, title
title
title
-
Table 2 - Significant child types (from the W3C specification)

22.6 XLink Types: Use and Composition
Let's review each of the XLink types. To do this, we'll use an example of linking actresses and the
movies they played in.
80
Coalesce
Resources (resource-type and locator-type elements)
The resources involved in a link can be either local (resource-type elements) or remote (pointed to by
locator-type elements). For a rough equivalent in HTML, think of resource-type elements as "<a
name..>" and locator-type elements as "<a href...>". The following code shows a DTD declaration
of a resource element:
<!ELEMENT actress
(first_name,surname)>
<!ATTLIST actress
xlink:type
(resource)
#FIXED "resource"
xlink:title
CDATA
#IMPLIED
xlink:label
NMTOKEN
#IMPLIED>
xlink:role
CDATA
#IMPLIED
Note that the element has another two XLink-based attributes besides xlink:type. The first one, "title,"
is a semantic attribute used to give a short description of the resource. The second one, "label," is a
traversal attribute, used to identify the element later, when we build arcs. The third attribute, "role," is
used for describing a property of the resource.
An actress element may look like the following:
<actress xlink:label="maria">
<first_name>Brigitte</first_name>
<surname>Helm</surname>
</actress>
It is important to note also that the subelements of resource-type elements (here, the first_name and
surname elements) have no significance for XLink (see Table 2).
As we mentioned before, remote resources are pointed to by locators. Here is the DTD for a locatortype element:
<!ELEMENT movie
<!ATTLIST movie
xlink:type
xlink:title
xlink:role
xlink:label
xlink:href
EMPTY>
(locator)
CDATA
CDATA
NMTOKEN
CDATA
#FIXED "locator"
#IMPLIED
#IMPLIED
#IMPLIED
#REQUIRED>
Locators can have the same attributes as resources (i.e., title, label, and role), plus a required href
semantic attribute, which points to the remote resource. A locator movie element will look like the
following:
<movie xlink:label="metropolis" xlink:href="metropolis.xml"/>
Navigation rules (arc-type elements)

The relationships between resources involved in a link are specified using arcs. Arc-type elements (i.e.
those with xlink:type="arc") use the "to" and "from" attributes to designate the start and end points
of an arc:
81
Coalesce
<acted xlink:type="arc" xlink:from="maria" xlink:to="metropolis"/>
Aside from the traversal attributes "to" and "from," arcs may include the following:
show: This attribute is used to determine the desired presentation of the ending resource. Its
possible values are "new" (open a new window), "replace" (load the referenced resource in
the same window), "embed" (embed the pointed resource -- a movie, for example), "none"
(unrestricted), and "other" (unrestricted by the XLink spec, but the processor should look
into the subelements for further information).
title: Just as with resources, this is simply a human-readable string with a short description for
the arc.
actuate: This attribute is used to determine the timing of traversal to the ending resource. Its
possible values are "onLoad" (load the ending resource as soon as the start resource is
found), "onRequest" (e.g., user clicks the link), "other," and "none."
arcrole: The advanced uses of arcrole (and its counterpart, the role attribute) are beyond the
scope of this article. (Please refer to section 5 of the XLink specification for a discussion on
linkbases). For our discussion, suffice it to say that this attribute must be a URI reference for
some description of the arc role.
Note that XLinks permit both inbound and outbound links. Outbound links are akin to normal
HTML links, where a link is made from the current document to an external resource. An inbound
link is constituted by an arc from an external resource, located with a locator-type element, into an
internal resource.
The following DTD will illustrate the above attributes:
<!ELEMENT acted EMPTY>
<!ATTLIST acted
xlink:type
xlink:title
xlink:show
xlink:from
xlink:to
(arc)
#FIXED "arc"
CDATA
#IMPLIED
(new | replace |
embed | other | none)
#IMPLIED
NMTOKEN
#IMPLIED
NMTOKEN
#IMPLIED>
Putting together our resource and locator examples with this arc, we have the following snippet of an
XML instance:

</actress>

<movie xlink:label="metropolis" xlink:href="metropolis.xml"/>
82
Coalesce

<acted
xlink:type="arc"
xlink:to="metropolis"/>
xlink:from="maria"
In order to encapsulate relationships like the above we need containers, that is, extended-type XLink
elements
22.7 Extended links (extended-type elements)
Extended links are marked by the type "extended" and may contain locators (pointing to remote
resources), local resources, arcs, and a title. The diagram below illustrates the composition of an
extended link.
One can simply consider the extended-link elements as meaningful wrappers that provide a nest for
resources and arcs:
<!ELEMENT divas (actress,movie,acted)*>
<!ATTLIST divas
xmlns:xlink CDATA
#FIXED "http://www.w3.org/1999/xlink"
xlink:type
(extended) #FIXED "extended"
xlink:title CDATA
#IMPLIED>
Putting together all the previous elements, we finally have a complete and valid extended link. (Note
in particular the one-to-many link that has been generated, something previously not possible in
HTML.)
<divas xlink:title="German divas 1920s">
</actress>
<movie xlink:label="silent" xlink:title="Metropolis"
xlink:href="metropolis.xml"/>
<movie xlink:label="silent" xlink:title="Alaraune"
xlink:href="alaraune.xml"/>
<acted xlink:type="arc" xlink:from="maria" xlink:to="silent"/>
...
<divas>
Title elements
An alternative way to provide titles to extended, locator, and arc type elements is by using a title-type
subelement (xlink:type="title"). This was included in order to have a standard way for applications to
express complex titles that include more than a string. (For instance, one might use multiple titles in
different languages, to provide localization features.) The contents of title-type elements are not
constrained by XLink.
83
Coalesce
Simple links
Simple links are, conceptually, a subset of extended links. They exist as a notation for links where you
don't need the overhead of an entire extended link. All the XLink-related aspects of a simple link are
encapsulated on one element (i.e., XLink doesn't care about the subelements of a simple link).
The valid XLink attributes of a simple link are "href" (just like in HTML's "a" or "img"), "title,"
"role," "arcrole," "show," and "actuate," which keep the same semantics as when used in arc-type
elements.
The following shows a typical simple link element:

<!ELEMENT director (#PCDATA)>
<!ATTLIST director
xmlns:xlink
CDATA
#FIXED "http://www.w3.org/1999/xlink"
xlink:type
(simple)
#FIXED "simple"
xlink:href
CDATA
#IMPLIED
xlink:show
(new)
#FIXED "new"
xlink:actuate (onRequest) #FIXED "onRequest">
...

<director xlink:href="fincher.xml">David Fincher</director>
That's all there is to it. We have covered all the types and attributes of XLink. As you can see, this is a
powerful but compact specification that is bound to prove useful in future projects. We will wrap up
by presenting some pointers to useful XLink tools.
84
Coalesce
Chapter 23
Introduction to XPath
23.1What is XPath?
XPath is a syntax for defining parts of an XML document

XPath uses paths to define XML elements
XPath defines a library of standard functions
XPath is a major element in XSLT
XPath is not written in XML
XPath is a W3C Standard
23.2 Like Traditional File Paths

XPath uses path expressions to identify nodes in an XML document. These path expressions look
very much like the expressions you see when you work with a computer file system:
w3schools/xpath/default.asp
XPath Example
Look at this simple XML document:
<?xml version="1.0" encoding="ISO-8859-1"?>
<catalog>
<cd country="USA">
<title>Empire Burlesque</title>
<artist>Bob Dylan</artist>
<price>10.90</price>
</cd>
<cd country="UK">
<title>Hide your heart</title>
<artist>Bonnie Tyler</artist>
<price>9.90</price>
</cd>
<cd country="USA">
<title>Greatest Hits</title>
<artist>Dolly Parton</artist>
<price>9.90</price>
</cd>
</catalog>
The XPath expression below selects the ROOT element catalog:

/catalog
The XPath expression below selects all the cd elements of the catalog element:
85
Coalesce
/catalog/cd
The XPath expression below selects all the price elements of all the cd elements of the catalog
element:
/catalog/cd/price
Note: If the path starts with a slash ( / ) it represents an absolute path to an element!
22.3 XPath Defines a Library of Standard Functions
XPath defines a library of standard functions for working with strings, numbers and Boolean
expressions.
The XPath expression below selects all the cd elements that have a price element with a value larger
than 10.80:
/catalog/cd[price>10.80]
22.4 Locating Nodes

XML documents can be represented as a tree view of nodes (very similar to the tree view of folders
you can see on your computer).
XPath uses a pattern expression to identify nodes in an XML document. An XPath pattern is a slashseparated list of child element names that describe a path through the XML document. The pattern
"selects" elements that match the path.
The following XPath expression selects all the price elements of all the cd elements of the catalog
element:
/catalog/cd/price
Note: If the path starts with a slash ( / ) it represents an absolute path to an element!
Note: If the path starts with two slashes ( // ) then all elements in the document that fulfill the
criteria will be selected (even if they are at different levels in the XML tree)!
The following XPath expression selects all the cd elements in the document:
//cd
22.5 Selecting Unknown Elements

Wildcards ( * ) can be used to select unknown XML elements.
86
Coalesce
The following XPath expression selects all the child elements of all the cd elements of the catalog
element:
/catalog/cd/*
The following XPath expression selects all the price elements that are grandchild elements of the
catalog element:
/catalog/*/price
The following XPath expression selects all price elements which have 2 ancestors:
/*/*/price
The following XPath expression selects all elements in the document:

//*
22.6 Selecting Branches

By using square brackets in an XPath expression you can specify an element further.
The following XPath expression selects the first cd child element of the catalog element:
/catalog/cd[1]
The following XPath expression selects the last cd child element of the catalog element (Note: There
is no function named first()):
/catalog/cd[last()]
The following XPath expression selects all the cd elements of the catalog element that have a price
element:
/catalog/cd[price]
The following XPath expression selects all the cd elements of the catalog element that have a price
element with a value of 10.90:
/catalog/cd[price=10.90]
The following XPath expression selects all the price elements of all the cd elements of the catalog
element that have a price element with a value of 10.90:
/catalog/cd[price=10.90]/price
87
Coalesce
22.7 Selecting Several Paths
By using the | operator in an XPath expression you can select several paths.
The following XPath expression selects all the title and artist elements of the cd element of the
catalog element:
/catalog/cd/title | /catalog/cd/artist
The following XPath expression selects all the title and artist elements in the document:
//title | //artist
The following XPath expression selects all the title, artist and price elements in the document:
//title | //artist | //price
The following XPath expression selects all the title elements of the cd element of the catalog
element, and all the artist elements in the document:
/catalog/cd/title | //artist
22.8 Selecting Attributes

In XPath all attributes are specified by the @ prefix.
This XPath expression selects all attributes named country:
//@country
This XPath expression selects all cd elements which have an attribute named country:
//cd[@country]
This XPath expression selects all cd elements which have any attribute:
//cd[@*]
This XPath expression selects all cd elements which have an attribute named country with a value of
'UK':
//cd[@country='UK']
22.9 Location Path Expression

A location path can be absolute or relative.
88
Coalesce
An absolute location path starts with a slash ( / ) and a relative location path does not. In both cases
the location path consists of one or more location steps, each separated by a slash:
An absolute location path:
/step/step/...
A relative location path:
step/step/...
The location steps are evaluated in order one at a time, from left to right. Each step is evaluated
against the nodes in the current node-set. If the location path is absolute, the current node-set
consists of the root node. If the location path is relative, the current node-set consists of the node
where the expression is being used. Location steps consist of:
an axis (specifies the tree relationship between the nodes selected by the location step and
the current node)
a node test (specifies the node type and expanded-name of the nodes selected by the
location step)
zero or more predicates (use expressions to further refine the set of nodes selected by the
location step)
The syntax for a location step is:

axisname::nodetest[predicate]
Example:
child::price[price=9.90]
22.10 Axes and Node Tests

An axis defines a node-set relative to the current node. A node test is used to identify a node within
an axis. We can perform a node test by name or by type.
AxisName
ancestor
ancestor-or-self
attribute
child
descendant
Description
Contains all ancestors (parent, grandparent, etc.) of the
current node
Note: This axis will always include the root node, unless the
current node is the root node
Contains the current node plus all its ancestors (parent,
grandparent, etc.)
Contains all attributes of the current node
Contains all children of the current node
Contains all descendants (children, grandchildren, etc.) of the
current node
Note: This axis never contains attribute or namespace nodes
89
Coalesce
descendant-or-self
following
following-sibling
Contains the current node plus all its descendants (children,

grandchildren, etc.)
Contains everything in the document after the closing tag of
the current node
Contains all siblings after the current node
preceding-sibling
Note: If the current node is an attribute node or namespace

node, this axis will be empty
Contains all namespace nodes of the current node
Contains the parent of the current node
Contains everything in the document that is before the
starting tag of the current node
Contains all siblings before the current node
self
Note: If the current node is an attribute node or namespace

node, this axis will be empty
Contains the current node
namespace
parent
preceding
Examples
Example
child::cd
attribute::src
child::*
attribute::*
child::text()
child::node()
descendant::cd
ancestor::cd
ancestor-or-self::cd
child::*/child::price
/
Result
Selects all cd elements that are children of the current node (if
the current node has no cd children, it will select an empty
node-set)
Selects the src attribute of the current node (if the current
node has no src attribute, it will select an empty node-set)
Selects all child elements of the current node
Selects all attributes of the current node
Selects the text node children of the current node
Selects all the children of the current node
Selects all the cd element descendants of the current node
Selects all cd ancestors of the current node
Selects all cd ancestors of the current node and, if the current
node is a cd element, the current node as well
Selects all price grandchildren of the current node
Selects the document root
22.11 Predicates
A predicate filters a node-set into a new node-set. A predicate is placed inside square brackets ( [ ] ).
Examples
Example
child::price[price=9.90]
Result
Selects all price elements that are children of the current node
with a price element that equals 9.90
90
Coalesce
child::cd[position()=1]
child::cd[position()=last()]
child::cd[position()=last()-1]
child::cd[position()<6]
/descendant::cd[position()=7]
child::cd[attribute::type="classic"]
Selects the first cd child of the current node

Selects the last cd child of the current node
Selects the last but one cd child of the current node
Selects the first five cd children of the current node
Selects the seventh cd element in the document
Selects all cd children of the current node that have a type
attribute with value classic
22.12 Location Path Abbreviated Syntax

Abbreviations can be used when describing a location path.
The most important abbreviation is that child:: can be omitted from a location step.
Abbr
none
@
Meaning
child::
attribute::
self::node()
..
parent::node()
//
/descendant-orself::node()/
Example
cd is short for child::cd
cd[@type="classic"]
is
short
child::cd[attribute::type="classic"]
.//cd
is
short
self::node()/descendant-or-self::node()/child::cd
../cd
is
short
parent::node()/child::cd
//cd
is
short
/descendant-or-self::node()/child::cd
for
for
for
for
Examples
Example
cd
*
text()
@src
@*
cd[1]
cd[last()]
*/cd
/book/chapter[3]/para[1]
//cd
.
.//cd
..
Result
Selects all the cd elements that are children of the current
node
Selects all child elements of the current node
Selects all text node children of the current node
Selects the src attribute of the current node
Selects all the attributes of the current node
Selects the first cd child of the current node
Selects the last cd child of the current node
Selects all cd grandchildren of the current node
Selects the first para of the third chapter of the book
Selects all the cd descendants of the document root and thus
selects all cd elements in the same document as the current
node
Selects the current node
Selects the cd element descendants of the current node
Selects the parent of the current node
91
Coalesce
../@src
cd[@type="classic"]
cd[@type="classic"][5]
cd[5][@type="classic"]
cd[@type and @country]
Selects the src attribute of the parent of the current node

Selects all cd children of the current node that have a type
Selects the fifth cd child of the current node that has a type
Selects the fifth cd child of the current node if that child has a
type attribute with value classic
Selects all the cd children of the current node that have both a
type attribute and a country attribute
22.13 Numerical Expressions

Numerical expressions are used to perform arithmetic operations on numbers.
Operator
+
*
div
mod
Description
Addition
Subtraction
Multiplication
Division
Modulus (division remainder)
Example
6+4
6-4
6*4
8 div 4
5 mod 2
Result
10
2
24
2
1
Note: XPath always converts each operand to a number before performing an arithmetic expression.
Equality Expressions
Equality expressions are used to test the equality between two values.
Operator
=
!=
Description
Like (equal)
Not like (not equal)
Example
price=9.80
price!=9.80
Result
true (if price is 9.80)
false
22.14 Testing Against a Node-Set

If the test value is tested for equality against a node-set, the result is true if the node-set contains any
node with a value that matches the test value.
If the test value is tested for not equal against a node-set, the result is true if the node-set contains
any node with a value that is different from the test value.
The result is that the node-set can be equal and not equal at the same time !!!
Relational Expressions
Relational expressions are used to compare two values.
92
Coalesce
Operator
<
<=
>
>=
Description
Less than
Less or equal
Greater than
Greater or equal
Example
price<9.80
price<=9.80
price>9.80
price>=9.80
Result
false (if price is 9.80)
true
false
true
Note: XPath always converts each operand to a number before performing the evaluation.
Boolean Expressions
Boolean expressions are used to compare two values.
Operator
or
and
Description
or
and
Example
Result
price=9.80 or price=9.70
true (if price is 9.80)
price<=9.80 and price=9.70 false
93
Coalesce
APPENDIX
94
Coalesce
List of Tags and Events in XHTML

XHTML Tags
NN: indicates the earliest version of Netscape that supports the tag
IE: indicates the earliest version of Internet Explorer that supports the tag
DTD: indicates in which XHTML 1.0 DTD the tag is allowed. S=Strict, T=Transitional,
and F=Frameset
Tag

<!DOCTYPE>
<a>
<abbr>
<acronym>
<address>
<applet>
<area />
<b>
<base />
<basefont />
<bdo>
<big>
<blockquote>
<body>
<br />
<button>
<caption>
<center>
<cite>
<code>
<col>
<colgroup>
<dd>
<del>
<dir>
<div>
<dfn>
<dl>
<dt>
<em>
<fieldset>
Description
Defines a comment
Defines the document type
Defines an anchor
Defines an abbreviation
Defines an acronym
Defines an address element
Defines an applet
Defines an area inside an image map
Defines bold text
Defines a base URL for all the links in a page
Defines a base font
Defines the direction of text display
Defines big text
Defines a long quotation
Defines the body element
Inserts a single line break
Defines a push button
Defines a table caption
Defines centered text
Defines a citation
Defines computer code text
Defines attributes for table columns
Defines groups of table columns
Defines a definition description
Defines deleted text
Defines a directory list
Defines a section in a document
Defines a definition term
Defines a definition list
Defines a definition term
Defines emphasized text
Defines a fieldset
95
NN IE DTD
3.0 3.0 STF
STF
3.0 3.0 STF
6.2
STF
6.2 4.0 STF
4.0 4.0 STF
2.0 3.0 TF
3.0 3.0 STF
3.0 3.0 STF
3.0 3.0 STF
3.0 3.0 TF
6.2 5.0 STF
3.0 3.0 STF
3.0 3.0 STF
3.0 3.0 STF
3.0 3.0 STF
6.2 4.0 STF
3.0 3.0 STF
3.0 3.0 TF
3.0 3.0 STF
3.0 3.0 STF
3.0 STF
3.0 STF
3.0 3.0 STF
6.2 4.0 STF
3.0 3.0 TF
3.0 3.0 STF
3.0 STF
3.0 3.0 STF
3.0 3.0 STF
3.0 3.0 STF
6.2 4.0 STF
Coalesce
<font>
<form>
<frame>
<frameset>
<h1> to <h6>
<head>
<hr />
<html>
<i>
<iframe>
<img />
<input />
<ins>
<isindex>
<kbd>
<label>
<legend>
<li>
<link>
<map>
<menu>
<meta>
<noframes>
<noscript>
<object>
<ol>
<optgroup>
<option>
<p>
<param>
<pre>
<q>
<s>
<samp>
<script>
<select>
<small>
<span>
<strike>
<strong>
<style>
<sub>
Defines the font face, size, and color of text

Defines a form
Defines a sub window (a frame)
Defines a set of frames
Defines header 1 to header 6
Defines information about the document
Defines a horizontal rule
Defines an html document
Defines italic text
Defines an inline sub window (frame)
Defines an image
Defines an input field
Defines inserted text
Deprecated. Defines a single-line input field. Use <input> instead
Defines keyboard text
Defines a label for a form control
Defines a title in a fieldset
Defines a list item
Defines a resource reference
Defines an image map
Defines a menu list
Defines meta information
Defines a noframe section
Defines a noscript section
Defines an embedded object
Defines an ordered list
Defines an option group
Defines an option in a drop-down list
Defines a paragraph
Defines a parameter for an object
Defines preformatted text
Defines a short quotation
Defines strikethrough text
Defines sample computer code
Defines a script
Defines a selectable list
Defines small text
Defines a section in a document
Defines strikethrough text
Defines strong text
Defines a style definition
Defines subscripted text
96
3.0
3.0
3.0
3.0
3.0
3.0
3.0
3.0
3.0
6.0
3.0
3.0
6.2
3.0
3.0
6.2
6.2
3.0
4.0
3.0
3.0
3.0
3.0
3.0
3.0
6.0
3.0
3.0
3.0
3.0
6.2
3.0
3.0
3.0
3.0
3.0
4.0
3.0
3.0
4.0
3.0
3.0
3.0
3.0
3.0
3.0
3.0
3.0
3.0
3.0
4.0
3.0
3.0
4.0
3.0
3.0
4.0
4.0
3.0
3.0
3.0
3.0
3.0
3.0
3.0
3.0
3.0
6.0
3.0
3.0
3.0
3.0
3.0
3.0
3.0
3.0
3.0
3.0
3.0
3.0
3.0
3.0
TF
STF
F
F
STF
STF
STF
STF
STF
TF
STF
STF
STF
TF
STF
STF
STF
STF
STF
STF
TF
STF
TF
STF
STF
STF
STF
STF
STF
STF
STF
STF
TF
STF
STF
STF
STF
STF
TF
STF
STF
STF
Coalesce
<sup>
<table>
<tbody>
<td>
<textarea>
<tfoot>
<th>
<thead>
<title>
<tr>
<tt>
<u>
<ul>
<var>
<xmp>
Defines superscripted text

Defines a table
Defines a table body
Defines a table cell
Defines a text area
Defines a table footer
Defines a table header
Defines a table header
Defines the document title
Defines a table row
Defines teletype text
Defines underlined text
Defines an unordered list
Defines a variable
Deprecated. Defines preformatted text. Use <pre> instead
3.0 3.0 STF

3.0 3.0 STF
4.0 STF
3.0 3.0 STF
3.0 3.0 STF
4.0 STF
3.0 3.0 STF
4.0 STF
3.0 3.0 STF
3.0 3.0 STF
3.0 3.0 STF
3.0 3.0 TF
3.0 3.0 STF
3.0 3.0 STF
3.0 3.0
XHTML Events
Window Events
Attribute
Onload
onunload
Value
script
script
Description
Script to be run when a document loads
Script to be run when a document unloads
Value
script
script
script
script
script
script
Description
Script to be run when the element changes
Script to be run when the form is submitted
Script to be run when the form is reset
Script to be run when the element is selected
Script to be run when the element loses focus
Script to be run when the element gets focus
Value
script
script
script
Description
What to do when key is pressed
What to do when key is pressed and released
What to do when key is released
Form Element Events

Attribute
onchange
onsubmit
onreset
onselect
onblur
onfocus
Keyboard Events
Attribute
onkeydown
onkeypress
onkeyup
97
Coalesce
Mouse Events
Attribute
onclick
ondblclick
onmousedown
onmousemove
onmouseover
onmouseout
onmouseup
Value
script
script
script
script
script
script
script
Description
What to do on a mouse click
What to do on a mouse doubleclick
What to do when mouse button is pressed
What to do when mouse pointer moves
What to do when mouse pointer moves over an element
What to do when mouse pointer moves out of an element
What to do when mouse button is released
98

XML Material

Diunggah oleh

Informasi Dokumen

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

XML Material

Diunggah oleh

Hak Cipta:

Format Tersedia

Part I

XML shall be straightforwardly usable over the internet

1.4 What is XML?

XML stands for EXtensible Markup Language

1.4.1 The main difference between XML and HTML

XML can keep data separated from your HTML

1.5.1 XML can keep data separated from your HTML

The next line defines the child elements of the root

The last line defines the end of the root element

2.1 XML Tags

XML tags must have its close tag

But in XML, all tags must have a closing tag

2. XML Tags are case sensitive

3. All XML should be properly nested

4. All XML Documents must have a root tag

5. All Attributes must be quoted

2.1.1 Empty Tags

2.2 XML Attributes

attributes can not contain multiple values (elements can)

2.2.3 An Exception to my Attribute rule

Viewing of XML documents

5.2 Displaying XML with CSS

5.3 Displaying XML with XSL

The output will be

All tags must be in lower case

Attributes cannot be shortened

and also that Case For Attributes is set to:

would have worked, this must now be given as:

8.4 Attribute Shortening

so the checkbox code earlier would now need to be written as:

would need to be written in XHTML as:

The DTD is interpreted like this:

9.3 Why use a DTD?

10.2 DTD - Elements

10.2.2 Empty elements

10.2.3 Elements with data

10.2.6 Declaring only one occurrence of the same element

The attribute-default-value can have the following values:

11.1.1 Attribute declaration example

Entities as variables used to define shortcuts to common text.

11.2.2 Internal Entity Declaration

11.2.3 External Entity Declaration

Turning Validation off

And close all the remaining elements.

12.2 Slicing the Schema

Using a reference to an element or an attribute is somewhat comparable to cloning an object. The

14.3 Derivation of simple types

14.4 Content Types

which will validate an XML element such as:

15.7 Final types

15.10 Including unknown elements

The xs:anyAttribute gives the same functionality for attributes.

a method for transforming XML documents

Allowing the JavaScript to do browser specific testing

And at the companying XSL stylesheet

19.1.2 Transforming it on the Client

Leagal filter operators are:

Now take a look at your slightly adjusted XSL stylesheet

19.2.2 Transforming it on the Client

Now take a look at your slightly adjusted XSL stylesheet

19.3.2 Transforming it on the Client

Supports JavaScript, VBScript, Perl, VB, Java, C++ and more

Note that the "loadXML

18.3 Accessing the DOM