Anda di halaman 1dari 26

CSCE 547

Windows Programming

XML Support
Department of Computer Science and Engineering
University of South Carolina
Columbia, SC 29208
Why XML?
XML stands for eXtensible Markup Language.
XML is an extension of HTML; it is designed to express the structure
of data and information about how to render the data.
Some organizations are embarked on defining standards that use XML
to express the semantics of their domains (healthcare, automotive,
security and the military).
WHY XML? Because:

1. It is just text, readable by any OS (Linux, MacOs, WinTel, etc)


and humans
2. It has become the de facto standard adopted by everybody who
is somebody wishing to communicate data over the WWW

This chapter discusses .NET support for XML.


CSCE 547 Fall 2002 Ch 13 - 2
Why XML?
XML encourages the separation of interface from structured data,
allowing the seamless integration of data from diverse sources, and
providing the infrastructure to create N-tier architectures.

XML

CSCE 547 Fall 2002 Ch 13 - 3


XML Documents
XML documents can be described in terms of their logical and physical
structure.

The logical structure is a function of the XML elements and attributes


contained in the document.

The physical structure is the set of storage units in which the


document actually exists. These units, called entities, could be a
stream of characters or a (set of) files.

XML documents contain two parts, called the header and the content.

Typically, the header contains declarations or processing instructions


(commands for the XML processor).

CSCE 547 Fall 2002 Ch 13 - 4


XML Documents Can Contain
• Processing Instructions (aka PIs) delimited by
<? . . . ?>
• Declarations, in the form <! aDeclaration >
• Elements
• Attributes
• Entities
• Comments

Typically, you will include in the header declarations and/or


processing instructions

CSCE 547 Fall 2002 Ch 13 - 5


Processing Instructions and Declarations

?xml
<?xml version="1.0"?>
<?xml-stylesheet href="XSL\DotNet.html.xsl" type="text/xsl"?>
<?xml-stylesheet href="XSL\DotNet.wml.xsl" type="text/xsl"
media="wap"?>
<?cocoon-process type="xslt"?>

<?xml-stylesheet type="text/xsl" href="Guitars.xsl"?>


<?xml version="1.0" encoding="UTF-16"?>

Declarations
<!DOCTYPE DotNetXML:Book SYSTEM "DTD\DotNetXML.dtd">
<!NOTATION PNG SYSTEM “program.exe”>
<!ATTLIST . . . >

<!ENTITY AGRAPH SYSTEM “file.png” NDATA PNG>


<!ENTITY memoText “blablabla”>
<memo> & memoText; </memo> & is Reference Notation

CSCE 547 Fall 2002 Ch 13 - 6


XML Elements
XML elements are made up of a start tag, an end tag, and data in
between. The start and end tags describe the data or value of the
elements:
<Student> Anita Donut </Student>
<CarDriver> Anita Donut </CarDriver>
<BloodDonor> Anita Donut </BloodDonor>

Elements can be empty, e.g.,

<memo> </memo>

But this only makes sense when creating attributes. The preferred
way is:

<memo />

Attributes define properties for an element. XML elements can


contain one or more attributes
CSCE 547 Fall 2002 Ch 13 - 7
XML Elements
The XML tree in Figure 13-1 was
produced by the code below Using attributes:
<?xml version="1.0"?>
<Guitars> <Guitar Year="1977">
<Guitar> <Make>Gibson</Make>
<Make>Gibson</Make> <Model>SG</Model>
<Model>SG</Model> <Color>Tobacco Sunburst</Color>
<Year>1977</Year> <Neck>Rosewood</Neck>
<Color>Tobacco Sunburst</Color> </Guitar>
<Neck>Rosewood</Neck>
</Guitar>
<Guitar> <Guitar Image="MySG.jpeg">
<Make>Fender</Make> <Make>Gibson</Make>
<Model>Stratocaster</Model> <Model>SG</Model>
<Year></Year> <Year>1977</Year>
<Color>Black</Color> <Color>Tobacco Sunburst</Color>
<Neck>Maple</Neck> <Neck>Rosewood</Neck>
</Guitar> </Guitar>
</Guitars>

CSCE 547 Fall 2002 Ch 13 - 8


Name Spaces
XML uses name spaces to avoid name collisions, such that, e.g.,
gibson:color and fender:color may refer to different elements

<?xml version="1.0"?>
<win:Guitars
xmlns:win="http://www.wintellect.com/classic-guitars"
xmlns:gibson="http://www.gibson.com/finishes"
xmlns:fender="http://www.fender.com/finishes">
<win:Guitar>
<win:Make>Gibson</win:Make>
<win:Model>SG</win:Model>
<win:Year>1977</win:Year>
<gibson:Color>Tobacco Sunburst</gibson:Color>
<win:Neck>Rosewood</win:Neck>
</win:Guitar>
<win:Guitar>
<win:Make>Fender</win:Make>
<win:Model>Stratocaster</win:Model>
<win:Year>1990</win:Year>
<fender:Color>Black</fender:Color>
<win:Neck>Maple</win:Neck>
</win:Guitar>
</win:Guitars>

CSCE 547 Fall 2002 Ch 13 - 9


Default Name Spaces
A default space is declared with no tag. The XML in the previous slide
has the same content as this one.

<?xml version="1.0"?>
<win:Guitars Default Name Space
xmlns="http://www.wintellect.com/classic-guitars"
xmlns:gibson="http://www.gibson.com/finishes"
xmlns:fender="http://www.fender.com/finishes">
<Guitar>
<Make>Gibson</Make>
<Model>SG</Model>
<Year>1977</Year>
<gibson:Color>Tobacco Sunburst</gibson:Color>
<Neck>Rosewood</Neck>
</Guitar>
<Guitar>
<Make>Fender</Make>
<Model>Stratocaster</Model>
<Year>1990</Year>
<fender:Color>Black</fender:Color>
<Neck>Maple</Neck>
</Guitar>
</Guitars>

CSCE 547 Fall 2002 Ch 13 - 10


Document Validation
“Well-formed” documents satisfy XML syntactic rules. Well-formed documents may be
validated against schema documents, which define in great detail how elements in the
document must be written.
<?xml version="1.0"?> Document is a schema
<xsd:schema
schema id="Guitars" xmlns=""
xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<xsd:element name="Guitars">
<xsd:complexType> As of 2001, this was the
<xsd:choice maxOccurs="unbounded"> mother of all schemas
<xsd:element name="Guitar">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="Make" type="xsd:string" />
<xsd:element name="Model" type="xsd:string" />
<xsd:element name="Year" type="xsd:gYear"
minOccurs="0" />
<xsd:element name="Color" type="xsd:string"
minOccurs="0" />
<xsd:element name="Neck" type="xsd:string"
minOccurs="0" />
</xsd:sequence> The definitions in red come from
</xsd:complexType> the XMLSchema document
</xsd:element>
</xsd:choice>
</xsd:complexType>
</xsd:element>
</xsd:schema>
CSCE 547 Fall 2002 Ch 13 - 11
Parsing XML
There are two main APIs for XML parsers: DOM and SAX. The differences are significant.
DOM parsers assume that the entire document resides in memory, while SAX parsers do
their work under an event-driven model.
DOM offers the advantage of random-access while SAX offers advantages derived from
the event-driven style of processing.
Microsoft offers a DOM-based parser, MSXML.dll as part of IE in Windows.
The DOM tree of Figure 13-2 can be produced by:

<?xml version="1.0"?> <?xml version="1.0"?>


<Guitars> <Guitars>
<Guitar Image="MySG.jpeg">
<Guitar Image="MySG.jpeg">
<Make>Gibson</Make>
<Make>Gibson</Make> <Model>SG</Model>
<Model>SG</Model> <Year>1977</Year>
<Year>1977</Year> <Color>Tobacco Sunburst</Color>
<Color>Tobacco Sunburst</Color> <Neck>Rosewood</Neck>
<Neck>Rosewood</Neck> </Guitar>
<Guitar Image="MyStrat.jpeg"
</Guitar>
PreviousOwner="Eric Clapton">
</Guitars> <Make>Fender</Make>
<Model>Stratocaster</Model>
<Year>1990</Year>
<Color>Black</Color>
<Neck>Maple</Neck>
</Guitar>
</Guitars>

CSCE 547 Fall 2002 Ch 13 - 12


ReadXML.CPP
This sample code reads XML using MSXML.dll.
Although the code is great fun to decipher, not every not enjoys doing so L
The crucial code is

Create a COM object to host the parser in the


memory of this process
hr = CoCreateInstance (CLSID_DOMDocument, NULL,
CLSCTX_INPROC_SERVER, IID_IXMLDOMDocument, (void**) &pDoc
pDoc);

Use the parser to load XML doc from file


hr = pDoc
pDoc->load (var, &success);
Get element given tag into pNodeList

hr = pDoc
pDoc->getElementsByTagName (tag, &pNodeList);

CSCE 547 Fall 2002 Ch 13 - 13


ReadXML.CS
The code below also reads the Guitars.xml file and writes into the console the values
associated to the “Guitar” tag.
The entire code is:

using System;
using System.Xml;

class MyApp
{
static void Main ()
{
XmlDocument doc = new XmlDocument ();
doc.Load ("Guitars.xml");
XmlNodeList nodes = doc.GetElementsByTagName
GetElementsByTagName ("Guitar");
foreach (XmlNode
XmlNode node in nodes) {
Console.WriteLine ("{0} {1}", node["Make"].InnerText,
node["Model"].InnerText);
}
}
}

CSCE 547 Fall 2002 Ch 13 - 14


XmlDocument Class
This class is compatible with DOM level 2. Using that class is quite trivial, even to
discover the contents of the nodes in the document

XmlDocument doc = new XmlDocument ();


doc.Load ("Guitars.xml"); Document points to root
OutputNode (doc.DocumentElement); when loaded
.
.
.
void OutputNode (XmlNode node) XmlNode is a class that contains
{ type, name and value information
Console.WriteLine
("Type={0}\tName={1}\tValue={2}",
node.NodeType, node.Name, node.Value);

if (node.HasChildNodes) {
XmlNodeList children = node.ChildNodes;
foreach (XmlNode child in children)
OutputNode (child);
}
} The items in red are defined in
the Xml Name Space

CSCE 547 Fall 2002 Ch 13 - 15


Inspecting Attributes
A node may have a collection named Attributes
Attributes, which may contain XmlAttribute items,
which in turn may contain type, name and value

void OutputNode (XmlNode node)


{
Console.WriteLine ("Type={0}\tName={1}\tValue={2}",
node.NodeType,
node.NodeType , node.Name
node.Name,
, node.Value
node.Value);

if (node.Attributes != null) { Attributes and XmlAttribute


foreach (XmlAttribute attr in node.Attributes)
Console.WriteLine ("Type={0}\tName={1}\tValue={2}",
attr.NodeType, attr.Name, attr.Value);
}

if (node.HasChildNodes) {
foreach (XmlNode child in node.ChildNodes)
OutputNode (child);
} HasChildNode and ChildNodes
}

CSCE 547 Fall 2002 Ch 13 - 16


XmlTextReader
This class is a forward-only reader, which, as the ADO.NET DataReader
class, provides a fast mechanism for traversing through an XML
document.
XmlTextReader reader = null;
try {
reader = new XmlTextReader ("Guitars.xml");
reader.WhitespaceHandling = WhitespaceHandling.None;
while (reader.Read ()) {
if (reader.NodeType == XmlNodeType.Element &&
reader.Name == "Guitar" &&
reader.AttributeCount > 0) {
while (reader.MoveToNextAttribute ()) {
if (reader.Name == "Image") {
Console.WriteLine (reader.Value);
break;
}}}}}
finally {
if (reader != null)
reader.Close ();
}

CSCE 547 Fall 2002 Ch 13 - 17


XmlValidatingReader
Hopefully you guessed it: This class performs validations while
reading. Validation could be against schemas of types DTD XSD, XDR
using System; using System.Xml;
using System.Xml.Schema;
class MyApp {
static void Main (string[] args) {
if (args.Length < 2) {
Console.WriteLine ("Syntax: VALIDATE xmldoc schemadoc");
return;
}
XmlValidatingReader reader = null;
try {
XmlTextReader nvr = new XmlTextReader (args[0]);
nvr.WhitespaceHandling = WhitespaceHandling.None;
reader = new XmlValidatingReader (nvr);
reader.Schemas.Add (GetTargetNamespace (args[1]), args[1]);
reader.ValidationEventHandler +=
new ValidationEventHandler (OnValidationError);
while (reader.Read ());
} Throw exception if invalid
catch (Exception ex) { elements are found
Console.WriteLine (ex.Message);
}
finally {
if (reader != null)
reader.Close ();
}}
CSCE 547 Fall 2002 Ch 13 - 18
XmlTextWriter
This class has methods for reading and writing elements, attributes,
comments, etc, from/to an XML Document.
try {
writer = new XmlTextWriter
("Guitars.xml", System.Text.Encoding.Unicode);
writer.Formatting = Formatting.Indented;

writer.WriteStartDocument ();
writer.WriteStartElement ("Guitars");
writer.WriteStartElement ("Guitar");
writer.WriteAttributeString ("Image", "MySG.jpeg");
writer.WriteElementString ("Make", "Gibson");
writer.WriteElementString ("Model", "SG");
writer.WriteElementString ("Year", "1977");
writer.WriteElementString ("Color", "Tobacco <?xml
Sunburst");
version="1.0" encoding="utf-16"?>
writer.WriteElementString ("Neck", "Rosewood");
<Guitars>
writer.WriteEndElement (); <Guitar Image="MySG.jpeg">
writer.WriteEndElement (); <Make>Gibson</Make>
} <Model>SG</Model>
finally { <Year>1977</Year>
if (writer != null) <Color>Tobacco Sunburst</Color>
writer.Close (); <Neck>Rosewood</Neck>
} </Guitar>
</Guitars>

CSCE 547 Fall 2002 Ch 13 - 19


XPath
XPath is a query language that can be used to get elements or
attributes from an XML document, using “path expressions.” Since
these expressions are a bit arcane, the WWW consortium is working
on a SQL-like query language aimed at replacing XPath.
In the meantime, .NET offers XPath support via a class named
XPathNavigator, which contains a number of features (methods,
events, etc) that make querying a document quite simple, as seen in
XPathDemo.cs
using System; using System.Xml.XPath;
class MyApp {
This is the query expresion
static void Main () {
XPathDocument doc = new XPathDocument ("Guitars.xml");
XPathNavigator nav = doc.CreateNavigator ();
XPathNodeIterator iterator = nav.Select ("/Guitars/Guitar");
while (iterator.MoveNext ()) {
XPathNodeIterator it = iterator.Current.Select ("Make");
it.MoveNext ();
string make = it.Current.Value;
it = iterator.Current.Select ("Model");
it.MoveNext ();
string model = it.Current.Value;
Console.WriteLine ("{0} {1}", make, model);
}}}

CSCE 547 Fall 2002 Ch 13 - 20


Expressalyzer.cs
This application, shown in Figure 13-12, illustrates the power of XPath.
You can load a document, and make queries dynamically (provided
that you are familiar with xPath expressions)
The crucial methods in this application are OnExecuteExpression
where a navigator is built, and AddNoteAndChildren, where, depending
on the type of item found, nodes are added to the TreeView.

CSCE 547 Fall 2002 Ch 13 - 21


XSL Transformations
XSL is a language that can be used to transform the format of a
document into a different format. XSL stands for eXtensible Stylesheet
Language, and was probably the main reason XML became so popular,
as it was a crucial factor in the early success of EDI (Electronic Data
Interchange)
Organizations use XSL to get their document from/to other
organizations, e.g., just in the healthcare sector
Humana ó KaiserPermanente
BlueCrossBlueShield ó HCA

XSLT is at the heart of MS BizTalk Server, a set of B2B tools, that


facilitate converting all kinds of business forms (invoices, paychecks,
purchase orders, etc) from one format to another.
Figure 13-13 illustrates this concept.

CSCE 547 Fall 2002 Ch 13 - 22


XML -> HTML
Copy Figure 13-16’s Guitars.xml and Guitars.xsl into a directory
Comment out the following statement in Guitars.xml:
<?xml-stylesheet type="text/xsl" href="Guitars.xsl"?>
Open Guitars.xml in IE. (Figure 13-14).
Uncomment the statement
Open Guitars.xml again in IE. (Figure 13-15).

The code in
<?xml-
<?xml-stylesheet type="text/
type="text/xsl
xsl"
" href
href="
="Guitars.xsl
Guitars.xsl"?>
"?>

Contains instructions to transform the XML file into an


HTML table at the client side

CSCE 547 Fall 2002 Ch 13 - 23


Guitars.XSL
<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<xsl:template match="/">
<html>
<body>
<h1>My Guitars</h1>
<hr />
<table width="100%" border="1">
<tr bgcolor="gainsboro">
<td><b>Make</b></td>
<td><b>Model</b></td>
<td><b>Year</b></td>
<td><b>Color</b></td>
<td><b>Neck</b></td>
</tr>
<xsl:for-each select="Guitars/Guitar">
<tr>
<td><xsl:value-of select="Make" /></td>
<td><xsl:value-of select="Model" /></td>
<td><xsl:value-of select="Year" /></td>
<td><xsl:value-of select="Color" /></td>
<td><xsl:value-of select="Neck" /></td>
</tr>
</xsl:for-each>
</table>
</body>
</html>
</xsl:template>
</xsl:stylesheet>

CSCE 547 Fall 2002 Ch 13 - 24


XSLT at the server
.NET provides a class, named XslTransform,
XslTransform that can convert a document
from a format to another, at the server side, using ASP.NET
The chapter illustrates how this can be done in three files:

Quotes.aspx Quotes.xml Quotes.xml

The result is shown in figure 13-17.

Note that the key to get this done is to have a good understanding of .XSL
specifics.

CSCE 547 Fall 2002 Ch 13 - 25


XslTransform in CS
The code below shows how easy it is to work with XslTransform.
Again, as long as you know the details of XSL, transforming a document to
another format is quite easy.

using System; using System.Xml.XPath;


using System.Xml.Xsl;
class MyApp {
static void Main (string[] args) {
if (args.Length < 2) {
Console.WriteLine ("Syntax: TRANSFORM xmldoc xsldoc");
return;
}
try {
XPathDocument doc = new XPathDocument (args[0]);
XslTransform xsl = new XslTransform ();
xsl.Load (args[1]);
xsl.Transform (doc, null, Console.Out);
}
catch (Exception ex) {
Console.WriteLine (ex.Message);
}}}

CSCE 547 Fall 2002 Ch 13 - 26