Anda di halaman 1dari 67

Better Code with C# and XML

Introduction
Who is this book for?
What you will not find in this book
Download the project from the web
Job file
XmlReader and XmlWriter
Introduction to the class hierarchy of the DOM
How to read documents with XmlReader
How to write an XML file using XmlWriter
Validation with XmlReader/XmlWriter
Conclusion
XmlDocument
How to Read a XML file with XmlDocument
How to Write XML with XmlDocument
Validation of an XmlDocument
Conclusion
XPath
How to read XML with XPath
A few thoughts about XPath
XPath query Survival Kit
How to write XML with XPath
Conclusion
Dataset
How to read XML with a Dataset
How to write XML with Dataset
Conclusion
Serialization
How to write a XML file using serialization
What is serializable and what is not
Serialization of a collection
A few attributes to customize serialization
Formatting properties
How to manage derived classes
How to read XML with serialization
Validation with serialization
Performances
Conclusion
Linq To Xml
How to read a XML file with LinqToXml
Reading a file with LinqToXml is very simple. Look at the following line
How to write a XML file with LinqToXml
Automatically generate code with the PasteXmlAsXelement addin
How to navigate in the XML tree
Validation with LinqToXml
Benchmark
General Conclusion
Introduction

London, August 8th


"Hey John, our client has given me a very negative feedback on your application RandomStuff.
It takes forever to load the XML file, and it takes even more time writing! Can you fix it quickly
please? “
What has just happened to Martin can happen to any other C# developer. He was given a task
quite simple, ie, read and display data from an XML file. The software should also allow data
modification.
This is not a very complex task and Martin has obviously tested his work with small files.
Unfortunately, even if the code he has implemented works, it is unusable as such because the
method he has used is not suitable for working with files of at least 1MB
XML (extensible markup language) is a language for transmission and storage of data on the Web
has become inevitable and generally in the field of software.
However, there are very few resources that explain how to work with XML files in C#. Most
books (even the best) just do not talk about it. Sometimes there is a small chapter on XML at the
end of the book therefore few developers read it and it often only contains one or two techniques
for working with XML, XmlDocument or XmlTextReader, which are largely obsolete!

This is far from being the best method to work efficiently with XML. Then how to read an XML
file quickly, easily maintainable code with?
What are the best practices for writing an XML file?
Which solution to choose based on the size of files to be processed to get the best performance?
How to extract part of the data file?
This ebook will answer these questions.
After reading this ebook you will know the current best practices for dealing with XML in C #
(Framework 5.0).
Who is this book for?

This book is not a detailed description of all the ways to deal with XML files in .NET.
The goal is to provide
• A concise overview of available solutions
• keys to use these solutions with fairly simple examples.
• Code that will explain how to use each of these solutions
• A comparison of the pros and cons of these solutions
• Their time performance depending on the size of the file being processed.
• Some tools to boost productivity with XML.
This book is for all. NET developers, who like me have had to deal with XML files and got
drowned in the solutions on the web.
After reading this book you will shine in front of your colleagues when coffee discussion
eventually drifts towards the way to manage XML.
What you will not find in this book

• Paragraphs describing endless fields and the 15,000 methods for each solution to help you sleep
at night.
Download the project from the web
Every code snippets can be found in a Visual Studio solution that at this url :
https://app.box.com/s/kqimm6wqhrehuhejz1z5
To run the project :
1) Unzip the archive
2) Click on the XMLReadWriteDemo.sln file ( Visual studio 2010 )
3) Go to program.cs and choose to comment or uncomment the different methods to read
or write Xml.
4) Run the project. The generated files are located in bin/Debug or bin/release
depending on the selected configuration
No book will ever be finished. Please send me your suggestions or feedback to improve this
Ebook at thomas.blotiere@gmail.com.
Job file

Throughout this book we will work with a single XML file:


<?xml version="1.0" encoding="utf-8" ?>
<Library>
<Book Id="1" Category="novel">
<Name>For whom the bell tolls</Name>
<Author>Ernest Hemingway</Author>
</Book>
<Book Id="2" Category="non-fiction">
<Name>My non fiction book</Name>
<Author>Someone</Author>
</Book>
<Book Id="3" Category="novel">
<Name>The little prince</Name>
<Author>Antoine de Saint-Exupéry</Author>
</Book>
</Library>

As you can see, it is not a very complex file: 3 books contained in a root tag library.
However this example is not ultra-basic as:
- The file contains a collection of elements
- There is a hierarchy of 2 levels
- There are also a few attributes.
Processing this file is a bit trickier than the most basic examples that are usually found in books
that only contains one node or one hierarchical levels.
XmlReader and XmlWriter
Introduction to the class hierarchy of the DOM
Before we even begin to describe the different ways to work with XML files. NET, it is important
to briefly discuss the namespaces and classes of DOM.
This model is used in earlier versions of the .NET Framework. From version 3.5 of the
Framework, LinqToXml is an alternative to this model.
However, for historical reasons, the DOM is still widely used in applications and it is important
to understand how it works, at least at the basic level.
System.Xml
This namespace contains major classes in reading and writing XML files, including the classes
XmlReader, XmlTextReader , XmlTextWriter and XmlWriter.
This should not be a surprise, but XmlReader is an abstract class that contains methods and
properties for reading a document. The Read method reads a node from a stream. In addition, the
class contains methods to navigate the file tree (MoveToAttribute, MoveToFirstAttribute,
MoveToContent, MoveToFirstContent, MoveToElement and MoveToNextAttribute)

The XmlNode class is central to work with XML in .NET. It represents a single node in the tree. It
is used by many other classes to insert, delete, or browse nodes in the XML tree.
XmlDocument is derived from the latter. This class represents an XML document and contains
methods for loading (Load for example) and saving (Save).
How to read documents with XmlReader
The oldest way to read XML documents is to use an XmlReader.
In the following example we will read a file containing three books and return a list of Book
objects:
List<Book> listBooks = new List<Book>();
using (XmlReader xtr = XmlReader.Create("./SampleLibrary.xml"))
{
Book book = new Book();
//loop on each node
while (xtr.Read())
{
switch (xtr.NodeType)
{
case XmlNodeType.Element: // the node is an element.
while (xtr.MoveToNextAttribute()) // read its attributes.
{
if (xtr.Name == "Id")
book.Id = xtr.Value;

if (xtr.Name == "Category")
book.Type = xtr.Value;
}
if (xtr.Name == "Name")
{
//read the Name element and go to the next node
book.Name = xtr.ReadElementString();
}
if (xtr.Name == "Author")
book.Author = xtr.ReadElementString(); break;

case XmlNodeType.EndElement:
{
if (xtr.Name == "Book")
{// add the book when </book> tag is encountered
listBooks.Add(book);
book = new Book();
break;
}
}//End Loop
}// End using

Let’s explain what is done above:


The first thing to do to read an XML file is to create XmlReader object,
using (XmlReader xtr = XmlReader.Create ("./SampleLibrary.xml"))
{
We use the using keyword to let the framework close the object.
Reading nodes occurs during the call to Read method.
while (xtr.Read())
{
switch (xtr.NodeType)
{

}
}
Processing nodes is different depending on their type, hence the use of the NodeType property
When it is an element, we use a loop to retrieve its attributes if there are any.
while (xtr.MoveToNextAttribute()) // read attributes.
{
if (xtr.Name == "Id")
book.Id = xtr.Value;

if (xtr.Name == "Category")
book.Type = xtr.Value;
}
Then we read the value of the node.
if (xtr.Name == "Name")
{
//read the Name element and get the next node
book.Name = xtr.ReadElementString();
}
if (xtr.Name == "Author")
{
// read the Author element and get the next node
book.Author = xtr.ReadElementString();
}
The ReadElementString method will retrieve the text node and iterates to the next node.
Finally, we must add the Book object to the book list for further processing. This is done when
processing the node with the </ Book> tag.
case XmlNodeType.EndElement:
{
if (xtr.Name == "Book")
{// add the book when the tag </book> is processed
listBooks.Add(book);
book = new Book();
}
}
As you can see, read XML with XmlReader is tricky and involves a high degree of coupling
between the details of the XML file and the object model (here the Book class).
In a nutshell, the way to go is to iterate node by node in the file by calling the Read method and
extracting data via ReadElementString .
Using XmlReader operates a very low level of abstraction and is only possible for simple files.
Why not use XmlTextReader?
XmlTextReader class is derived from XmlReader. If you are looking for tutorials on XML and C
# on the web you will come across many of them with solutions from XmlTextReader and
XmlTextWriter.
However XmlTextReader is now (2016) largely obsolete and is not recommended for use by
Microsoft. Many bugs have been detected. For more information refer to the link below:
http://blogs.msdn.com/b/xmlteam/archive/2011/10/08/the-world-has-moved-on-have-you-xml-
apis-you-should-avoid-using.aspx
If this class is not declared as obsolete it is because it is part of the ECMA-335 (Common
Language Infrastructure). Besides, fixing these bugs would break compatibility with previous
versions, which is not desirable.
How to write an XML file using XmlWriter

The full code to write an XML file is below. (Note that the AddBooks method is not provided to
simplify the reading)
try
{
List<Book> listBooks = AddBooks();
//settings Indent is used to add a linebreak at the end of each element
var settings = new XmlWriterSettings
{ Encoding = Encoding.UTF8, Indent = true, };
using (XmlWriter writer = XmlWriter.Create(@"./Generated_XmlWriter.xml"
, settings))
{
writer.WriteStartDocument();
writer.WriteStartElement("Library");
foreach (Book b in listBooks)
{
writer.WriteStartElement("Book");
writer.WriteAttributeString("Id", b.Id);
writer.WriteAttributeString("Category", b.Category);
writer.WriteElementString("Name", b.Name);
writer.WriteElementString("Author", b.Author);
writer.WriteEndElement();//</Book>
}
writer.WriteEndElement();//</Library>
writer.WriteEndDocument();
}
}
catch (Exception ex)
{
_log.Error(ex.Message);
}

The steps to write an XML document with XmlWriter are:


var settings = new XmlWriterSettings { Encoding = Encoding.UTF8, Indent = true };
using (XmlWriter writer = XmlWriter.Create(@"c:\temp\XmlWriterGeneratedSampleLibrary.xml",settings))
{
The XmlWriter class constructor needs the path where the file will be generated. The second
argument is optional: it is an XmlWriterSettings object in which you can set options such as
encoding or indentation.
Then comes the writing of the root element, here Library:
writer.WriteStartDocument();
writer.WriteStartElement("Library");

Once this is done we need to write the file other elements and close the root tag and the document.
foreach (Book b in listBooks){
writer.WriteStartElement("Book");
writer.WriteAttributeString("Id", b.Id);
writer.WriteAttributeString("Category", b.Category);
writer.WriteElementString("Name", b.Name);
writer.WriteElementString("Author", b.Author);
writer.WriteEndElement();//</Book>
}
writer.WriteEndElement();//</Library>
writer.WriteEndDocument();

The result is as follows:


<?xml version="1.0" encoding="utf-8" ?>
<Library>
<Book Id="1" Category="novel">
<Name>For whom the bell tolls</Name>
<Author>Ernest Hemingway</Author>
</Book>
<Book Id="2" Category="non-fiction">
<Name>My non fiction book</Name>
<Author>Someone</Author>
</Book>
<Book Id="3" Category="novel">
<Name>The little prince</Name>
<Author>Antoine de Saint-Exupéry</Author>
</Book>
</Library>

As you can see, to write a file with XmlWriter, we will mainly use the following methods:
WriteStartElement, WriteEndElement : these methods write the opening and closing tags.
Ex :<Book> and </Book>
WriteElementString : writes a start tag, text and a closing tag.
Ex :<Name>the 4h work week</Name>
WriteAttributeString: allows you to add an attribute to the current element.
Ex : <Book Id="2">
Validation with XmlReader/XmlWriter
Working with a Schema file

It can be tempting to validate the XML generated file. This allows to detect errors and to avoid
being considered as a fool by a client who receives a malformed file.
Validating an XML file with XmlWriter is simply not possible directly. XmlWriter does not
provide a method to do this.
The solution is to use an XmlReader to read the newly created file and do the validation at this
level.
To do so we will first create the schema using the XmlSchemaSet class:
XmlSchemaSet sc = new XmlSchemaSet();
sc.Add("", "SampleLibraryXsd.xsd");

The XmlSchemaSet class encapsulates a XSD file. DTD files are not supported.
In the example above, we add the XSD file SampleLibraryXsd.xsd. The first argument is the name
of the namespace. However, as we have specified no namespace, we will leave a blank string.
We then create an XmlReaderSettings object:
XmlReaderSettings settings = new XmlReaderSettings();
settings.ValidationType = ValidationType.Schema;
settings.Schemas = sc;
settings.ValidationEventHandler +=
new ValidationEventHandler(ValidationCallBack);

settings.ValidationType allows
you to choose the type of validation, Schema here. Other values are none
and DTD. The property Schema is used to assign our XmlSchemaSet to this settings object.
Finally we define a callback that will be called when a validation error occurs:
private static void ValidationCallBack(object sender, ValidationEventArgs e)
{
Console.WriteLine("Validation Error: {0}", e.Message);
}

The settings object is then assigned to XmlReader when it is created:


using (XmlReader xtr = XmlReader.Create("./SampleLibrary.xml", settings))

Caution
Not all errors will pop up in the callback. Indeed validation errors are triggered in the callback
only if the file has been successfully read, which mean that errors that are raised while reading the
document have to be dealt elsewhere. We recommend surrounding the code for reading the
document with a try-catch pattern.
Let’s consider the following example:
<?xml version="1.0" encoding="utf-8" ?>
<Library>
<Book Id="3" Category="novel">
<Info>
<Name>The little prince</Name>
<Author>Antoine de Saint-Exupéry</Author>
<Author> test validation<Author>
</Info>
</Book>
</Library>

One of the <Author> tag which has no closing counterpart. The code we have written in the
chapter on how to read a XML file with XmlReader only reads the first <Author> tag, the second
will never be read. Therefore, reading this file raises no exception but the error will be raised in
the validation callback.

In contrast to this file:

<?xml version="1.0" encoding="utf-8" ?>


<Library>
<Book Id="3" Category="novel">
<Info>
<Name>The little prince</Name>
<Author>Antoine de Saint-Exupéry<Author>
</Info>
</Book>
</Library>

The <Author> Tag is not closed. An exception will be thrown when reading the tag and the
reading will stop at this point. We will therefore not enter the validation callback.

A few words about working with a DTD file

Working with a DTD file is not very different from what we have written in the previous
paragraph.
The only things that changes are that,
- You need to have your DTD file before starting the validation.
-Modify the way we use XmlReaderSettings like below :
XmlReaderSettings settings = new XmlReaderSettings();
settings.DtdProcessing = DtdProcessing.Parse;
settings.ValidationType = ValidationType.DTD;
settings.ValidationEventHandler +=
new ValidationEventHandler(ValidationCallBack);
Conclusion

As you can see, reading XML with XmlReader is tricky and involves a high degree of coupling
between the details of the XML file and the object model (here the Book class).
Similarly, writing an XML file with XmlWriter is not a pleasant experience! The example
discussed in this chapter is a simple example, but for complex XML files, writing code this way
takes a long time and produces code with low maintainability.
However, using XmlReader and XmlWriter is the quickest solution to read or write XML with
.NET. If speed is critical to your application, these are the two classes that should be used.
(See benchmark at the end of this book)
XmlDocument
How to Read a XML file with XmlDocument

The System.Xml namespace from XmlDocument is based on the notion of relatedness between
XML nodes. Instead of sequentially through the file, you select a group of nodes with
SelectNodes or a single node with SelectSingleNode . It is then possible to navigate using the
ChildNodes property to get the child nodes and parent nodes for ParentNodes.
Code snippet 1 :
internal List<Book> ReadData2(string fullpath = "./SampleLibrary.xml")
{
List<Book> listBooks = new List<Book>();
XmlDocument xd = new XmlDocument();
xd.Load(fullpath);
XmlNodeList nodelist = xd.SelectNodes("/Library/Book");
foreach (XmlNode node in nodelist)
{
Book book = new Book();
book.Id = node.Attributes.GetNamedItem("Id").Value;
book.Category = node.Attributes.GetNamedItem("Category").Value;

book.Name = node.SelectSingleNode("Name").InnerText;
book.Author = node.SelectSingleNode("Author").InnerText;
listBooks.Add(book);
}
return listBooks;
}
We start by initializing a new XmlDocument and load the file thanks to the Load Method
XmlDocument xd = new XmlDocument();
xd.Load(fullpath);

Then we select the nodes we are interested in, the <Book> nodes :
XmlNodeList nodelist = xd.SelectNodes("/Library/Book");

Once the list of nodes retrieved, we will iterate over each of them:
foreach (XmlNode node in nodelist)
{
Book book = new Book();
book.Id = node.Attributes.GetNamedItem("Id").Value;
book.Category = node.Attributes.GetNamedItem("Category").Value;

book.Name = node.SelectSingleNode("Name").InnerText;
book.Author = node.SelectSingleNode("Author").InnerText;
listBooks.Add(book);
}
Attributes are not XML nodes are recovered via the method Attributes.GetNamedItem node. To
access the tag info, daughter of Book,
Attributes are not XML nodes. To retrieve their value you can use the
Attributes.GetNamedItem method. The La SelectSingleNode method allows to retrieve nodes
from their name. We use this method to retrieve the value of the <Author> et <Name> nodes.
This code snippet works decently but there is another way to do the job :
Code snippet 2 :
internal List<Book> ReadData(string fullpath = "./SampleLibrary.xml")
{
List<Book> listBooks = new List<Book>();
XmlDocument xd = new XmlDocument();
xd.Load(fullpath);

XmlNodeList nodelist = xd.GetElementsByTagName("Book");// get all <book> nodes


XmlNodeList names = xd.GetElementsByTagName("Name");
XmlNodeList authors = xd.GetElementsByTagName("Author");

for (int i = 0; i < nodelist.Count;i++ ) // for each <testcase> node


{
Book book = new Book();
book.Id = nodelist[i].Attributes.GetNamedItem("Id").Value;
book.Category = nodelist[i].Attributes.GetNamedItem("Category").Value;
book.Name = names[i].InnerText;
book.Author = authors[i].InnerText;

listBooks.Add(book);
}
return listBooks;
}
In this case we use the GetElementsByTagName method which is an alternative to get the <Book> nodes.
This method returns a collection of <Book> nodes. To get the <Author> and <Name> nodes we
can do the same way.We’ll end up with 3 node collections, of <Book>, <Author> and <Name>.
To retrieve all the information needed to create the Book objet we just iterate on all three.
As written above you are free to either use SelectNodes ( code snippet 1)
or GetElementsByTagName code snippet 2 ) to read the XML file.
Using the XmlDocument class is a good choice if you want to extract data from non-
sequentially or if you already use XmlDocument objects elsewhere in the code to maintain
some consistency in it.
How to Write XML with XmlDocument

To create a file with XmlDocument, it is necessary to program the entire XML tree by hand.
Then call the Save method will physically create the XML file.
The desired result file is as follows:
<?xml version="1.0" encoding="utf-8"?>
<Library>
<Book ID="01" Category="novel">
<Name>Comment je suis devenu stupide</Name>
<Author>Martin Page</Author>
</Book>
</Library>

It is not a big XML file. Now for the code required to create this small file:
XmlDocument xmlDoc = new XmlDocument();
//Write the XML declaration
XmlDeclaration xmlDeclaration = xmlDoc.CreateXmlDeclaration("1.0", "utf-8", null);

// Creation of the root element


XmlElement rootNode = xmlDoc.CreateElement("Library");
xmlDoc.InsertBefore(xmlDeclaration, xmlDoc.DocumentElement);
xmlDoc.AppendChild(rootNode);

foreach (Book b in listBooks)


{
// Creation of a new <Book> element and add the node
XmlElement parentNode = xmlDoc.CreateElement("Book");

// Add Book attributes and their value.


parentNode.SetAttribute("ID", b.Id);
parentNode.SetAttribute("Category", b.Category);
xmlDoc.DocumentElement.PrependChild(parentNode);

//Create of Name and Author nodes


XmlElement nameNode = xmlDoc.CreateElement("Name");
XmlElement authorNode = xmlDoc.CreateElement("Author");

// set the text


XmlText nameText = xmlDoc.CreateTextNode(b.Name);
XmlText authorText = xmlDoc.CreateTextNode(b.Author);

parentNode.AppendChild(nameNode);
parentNode.AppendChild(authorNode);

// save the value of the fields into the nodes


nameNode.AppendChild(nameText);
authorNode.AppendChild(authorText);
}

// Save to the XML file


xmlDoc.Save(@"./XmlDocumentBenchmark_" + listBooks.Count + ".xml");

Not less than 20 lines of code are needed to create a 9 lines XML file ! Create a file in this way
not only time-consuming for the developer and does not guarantee good readability of the XML
structure described.
If you have the chance to use a version of the Framework than or equal to 3.5, we will see that it is
much easier to read and make the creation of a complete tree with LinqToXml.
Validation of an XmlDocument

It is quite possible to validate a file XmlDocument.


To this must be added to the Schemas property of the document as well as a new XmlSchemaSet
callback in the call to Validate () method.
Note that the Validate method also accepts an XmlNode second argument: in this case, only that
node will be validated and not the entire document.
Conclusion

While XmlDocument has been widely used in the past, this is no longer a solution worthy of a
developer who respects:
- Lots of code to generate a file
- Low maintainability
- Ok performance for small files, bad for files over 1MB.
XPath

XPath is a language to get a part of the XML document. XPath has been rapidly adopted by
developers as an easy to use query language. (source: Wikipedia).
Unlike other solutions described in this ebook that can read and write XML, XPath allows only
reading.
The main consequence is that the technology makes sense if you're only reading a file. You have to
use another solution to write XML.
How to read XML with XPath

We are going to see how to use XPath with the example we have been using in the previous
chapters:
internal List<Book> ReadData(string fullpath="./SampleLibrary.xml")
{
List<Book> listBooks = new List<Book>();
try
{
XPathDocument doc = new XPathDocument(fullpath);
XPathNavigator nav = doc.CreateNavigator();

XPathExpression expr = nav.Compile("//Book");


XPathNodeIterator nodes = nav.Select(expr);
while (nodes.MoveNext())
{
Book newBook = new Book();
newBook.Id = nodes.Current.GetAttribute("Id","");
newBook.Category = nodes.Current.GetAttribute("Category", "");
nodes.Current.MoveToFirstChild();// name tag
newBook.Name = nodes.Current.Value;
nodes.Current.MoveToNext();//author tag
newBook.Author = nodes.Current.Value;
listBooks.Add(newBook);
}
}
catch(Exception ex)
{
//log error
}
return listBooks;
}
The first thing to do is to load the document by passing the full path as an argument to the
constructor.
XPathDocument doc = new XPathDocument(fullpath);

We then create a XPathNavigator object from this XPathDocument:


XPathNavigator nav = doc.CreateNavigator();

It is in this XPathNavigator object that we will specify the query. This query takes the form of a
XPathExpression that is declared like this:
XPathExpression expr = nav.Compile("//Book");

Our query / /Book will simply retrieve a collection of node made of the Books tag in the
document.
After having specified the query we still have to execute it:
XPathNodeIterator nodes = nav.Select(expr);

The object nodes contains the document Book nodes. For each node in this collection we will
retrieve the values of Id, Category, Name and Author.
while (nodes.MoveNext())
{
Book newBook = new Book();
newBook.Id = nodes.Current.GetAttribute("Id","");
newBook.Category = nodes.Current.GetAttribute("Category", "");
nodes.Current.MoveToFirstChild();// name tag
newBook.Name = nodes.Current.Value;
nodes.Current.MoveToNext();//author tag
newBook.Author = nodes.Current.Value;
listBooks.Add(newBook);
}

The attributes Category and Id are at the same level as <Book> therefore we can directly retrieve
them using the method GetAttribute.
To retrieve the Name tag we have to move into the XML tree using the MoveToFisrtChild method
that allows to move to the first child node.
The value of the node is retrieved using the Value property.
To retrieve the value of <Author>, which is at the same level as <Name>, we use the method
MoveToNext .
There are other methods to navigate in the XML tree. The reader will easily understand what they
are and how to use them, they will not be detailed here.
A few thoughts about XPath

The strength of XPath, in addition to its excellent time performance in reading, lies in the
complexity of queries that can be performed. In our example, the query is basic (/ /Book).
One could imagine, if we had a much more complex data file, of a query that lists the names of the
books which publication date is greater than 01/01/2013, with the price between 8 and 20 $ in the
animal books category, all in one line.
It is also possible to use aggregate functions as Count to filter the results provided by queries.
We will briefly see the way to write the most important queries:
XPath query Survival Kit

In this section we will briefly discuss the query syntax.


Expression XPath Result
/ select the root element, that the whole document except <?xml version="1.0"?>
//Book select all "Book" elements independently of their location in the document
/Library/Book select all "Book" elements that are direct children of Library
//Book[@Id='2'] select all "Book" elements with an attribute "Id" which value is "2"
//Book[Name= ‘Toto’] select all "Book" elements with a "Name" tag which value is "Toto"
Book[
count( Book/Name) >1 select all “Book” with 2 tags Name.
]

The purpose of this book is not a detailed description of XPath so that would be all about Xpath
syntax. If you need a more complex condition to find data in an XML file, you can find the
corresponding XPath query with a little effort in searching the Internet.
How to write XML with XPath

XPath is a reading only solution.


However, it is possible to combine XPath with other solutions to write data.
The most common case is to use XPath in conjunction with an XmlDocument. To insert a node at
a given location, the algorithm is as follows:
try
{
Book newBook = new Book{ Id = "4", Category ="Educational", Name ="Xml : Life and Death",Author ="Whatever"};

XmlDocument xmlDoc = new XmlDocument();


xmlDoc.Load(fullpath);
XPathNavigator nav = xmlDoc.CreateNavigator();

XPathExpression expr = nav.Compile("//Book[@Id='2']");


XPathNodeIterator nodes = nav.Select(expr);
if (nodes.Count != 0 && nodes.MoveNext())
{
nodes.Current.InsertElementAfter("","Book","","");
nodes.Current.MoveToNext(XPathNodeType.Element);
nodes.Current.CreateAttribute("", "Id", "", newBook.Id);
nodes.Current.CreateAttribute("", "Category", "", newBook.Category);
nodes.Current.AppendChildElement("", "Name", "", newBook.Name);
nodes.Current.AppendChildElement("", "Author", "", newBook.Author);
xmlDoc.Save(fullpath);
}
}
catch (Exception ex)
{
//log error
}
Unlike when reading in which we begin by creating an XPathDocument,
Here we will create an XmlDocument.
XmlDocument xmlDoc = new XmlDocument();
xmlDoc.Load(fullpath);
XPathNavigator nav = xmlDoc.CreateNavigator();

XPathExpression expr = nav.Compile("//Book[@Id='2']");


XPathNodeIterator nodes = nav.Select(expr);

The query is also changed to "//Book[@Id='2'].


The goal is to insert a new node after the node which existing Book Id is equal to 2.
As we saw in the previous section, [] allows you to set a condition on the value.
Use [@id = 2] and not [Id = 2] because Id is an attribute.
Once the query is executed it will insert a new node.
if (nodes.Count != 0 && nodes.MoveNext())
{
nodes.Current.InsertElementAfter("","Book","","");
nodes.Current.MoveToNext(XPathNodeType.Element);
nodes.Current.CreateAttribute("", "Id", "", newBook.Id);
nodes.Current.CreateAttribute("", "Category", "", newBook.Category);
nodes.Current.AppendChildElement("", "Name", "", newBook.Name);
nodes.Current.AppendChildElement("", "Author", "", newBook.Author);
xmlDoc.Save(fullpath);
}
To do this, we have created a new element <Book> with:
nodes.Current.InsertElementAfter("","Book","","");

We will position this element to create attributes


nodes.Current.CreateAttribute("", "Id", "", newBook.Id);
nodes.Current.CreateAttribute("", "Category", "", newBook.Category);

The first and the third argument of CreateAttribute are the prefix and namespace. In our example
they are set to "". We then add the tags <Name> and <Author> and save the document:

nodes.Current.AppendChildElement("", "Name", "", newBook.Name);


nodes.Current.AppendChildElement("", "Author", "", newBook.Author);
xmlDoc.Save(fullpath);

The result is below :


<?xml version="1.0" encoding="utf-8"?>
<Library>
<Book Id="1" Category="novel">
<Name>For whom the bell tolls</Name>
<Author>Ernest Hemingway</Author>
</Book>
<Book Id="2" Category="non-fiction">
<Name>My non fiction book</Name>
<Author>Someone</Author>
</Book>
<Book Id="4" Category="Educational">
<Name>Xml : Life and Death</Name>
<Author>Whatever</Author>
</Book>
<Book Id="3" Category="novel">
<Name>The little prince</Name>
<Author>Antoine de Saint-Exupéry</Author>
</Book>
</Library>

We have successfully inserted an element after the Book with Id = 2


It is also possible to combine XPath LinqToXml. However it takes more time compared to the
sole use of LinqToXml which allows the same type of operations than XPath.
Conclusion
XPath is a solution starting to become deprecated. Before LinqToXml, XPath allowed two
things:
• Perform complex queries on XML concisely
• Read XML data very quickly (versus XmlDocument or Dataset)
XPath requires some learning to become familiar with the syntax of the queries, which may
appear slightly esoteric for the non-initiated.
If the version of the .NET framework is older than 3.5, XPath is a good solution to read or
retrieve data quickly.
Otherwise it is better to use LinqToXml.
Dataset

The .NET Framework 2.0 introduced ADO.Net, a technology used primarily to interact with
databases. The key component of ADO.Net is a container called Dataset that can contain the SQL
query result in tabular form. And this component can store recordings from one or more tables in
the same DataSet.
It is also possible to use a Dataset to read XML. After all, an XML file contains data and is
sometimes considered as an alternative to a database.
In this chapter we use a slightly different test file :
<?xml version="1.0" encoding="utf-8" ?>
<Library>
<Book Id="1" Category="novel">
<Info>
<Name>For whom the bell tolls</Name>
<Author>Ernest Hemingway</Author>
</Info>
</Book>
<Book Id="2" Category="non-fiction">
<Info>
<Name>My non fiction book</Name>
<Author>Someone</Author>
</Info>
</Book>
<Book Id="3" Category="novel">
<Info>
<Name>The little prince</Name>
<Author>Antoine de Saint-Exupéry</Author>
</Info>
</Book>
</Library>
It is more or less the same than before but we have added another hierarchical level : the <Info>
tag.
The reason of this change is that from 3 hierarichal levels , the logic to read Xml data from a
dataset is a bit … peculiar. You will see for yourself soon enough.
How to read XML with a Dataset

Reading xml is done quite simply, first we initialize the Dataset:


Dataset ds = new DataSet();

Then we call the ReadXml method :


ds.ReadXml("./SampleLibrary.xml");

Data is now contained in the DataSet. To access it we can proceed as follows:


foreach (DataRow row in ds.Tables["book"].Rows)
{
Book book = new Book();
book.Id = row["Id"].ToString();
book.Type = row["Type"].ToString();

DataRow[] children = row.GetChildRows("book_info"); // relation name


// there is only 1 row in children
book.Name = (children[0]["Name"]).ToString();
book.Author = (children[0]["Author"]).ToString();

listBooks.Add(book);
}
It is similar to an array to get data from it. The record data is contained in a DataRow accessed
by specifying the column name surrounded by square brackets.
row["Id"]
This also works for the attributes that are at the same hierarchical level.
Now let’s read the values of the properties Name and Author. Here is the structure of the XML
file:
<Book Id="1" Type="novel">
<Info>
<Name>For whom the bell tolls</Name>
<Author>Ernest Hemingway</Author>
</Info>
</Book>

The data is not directly in a Book tag but in a nested tag, <Info>.
Similarly we cannot directly access fields like Author and Name in Dataset without accessing to
the child tag of Book, which is possible to get using the GetChildRows method.
Note the argument of this method GetChildRows, “book_info”. To retrieve the values contained in
a nested tag, simply separate the name of the parent tag by _ and write the name of the child tag:
<ParentTag> _ <ChildTag>
This method returns an array of DataRow that reflect the structure of the tag <Info>.
It is then easy to access property values and Author Name:
book.Name = (children[0]["Name"]).ToString();
book.Author = (children[0]["Author"]).ToString();

As you can see, using a Dataset to read XML is pretty simple.


There is less work for the developer to do using a Dataset than with a XmlReader or
XmlDocument.
However, this way of reading the XML does not ignore the XML structure, and changes in the
structure of the XML file will trigger changes in the code written to read the XML file.
How to write XML with Dataset

There are 2 ways to generate XML with a Dataset:


GetXml()
This method will return a string that contains data from all DataTable of the DataSet.
WriteXml (string filepath)
The Write method like GetXml , retrieves the data from all of the Datatable, but rather than return
this code as a string, will directly write it to a file which path is supplied as an argument.
It is therefore quite easy to transform data from a non-XML source into a XML file for example
from a database or a flat file.
Note that LinqToXml also offers this type of transformation in a relatively easy way. If you are
working with a version of the .NET framework greater than 3.0, it is often preferable to use
LinqToXml.
The code below allows you to save as XML data in the Dataset:
internal void WriteData2()
{
List<Book> books = AddBooks();
DataSet ds2 = new DataSet("Library");
DataTable table1 = ds2.Tables.Add("Book");
table1.Columns.Add("Id", typeof(int));
table1.Columns.Add("Category", typeof(string));

DataTable table2 = ds2.Tables.Add("Info");


table2.Columns.Add("Id", typeof(int));
table2.Columns.Add("Name", typeof(string));
table2.Columns.Add("Author", typeof(string));

table1.Columns[0].ColumnMapping = MappingType.Attribute;
table1.Columns[1].ColumnMapping = MappingType.Attribute;
table2.Columns[0].ColumnMapping = MappingType.Hidden;

DataRelation relationBookInfo = ds2.Relations.Add("rel1", ds2.Tables["Book"].Columns["Id"],


ds2.Tables["Info"].Columns["Id"]);
relationBookInfo.Nested = true;

foreach (Book b in books)


{
table1.Rows.Add(b.Id,b.Category);
table2.Rows.Add(b.Id,b.Name,b.Author);
}
ds2.WriteXml(@"C:\temp\DatasetBenchmark_"+books.Count+".xml");

}
First we need to create the Dataset, define the columns and format them. Nothing particularly
difficult for the Book table:
DataSet ds2 = new DataSet();
DataTable table1 = ds2.Tables.Add("Book");
table1.Columns.Add("Id", typeof(int));
table1.Columns.Add("Category", typeof(string));

Notice that the columns Id and Category are displayed as attribute, hence the following code:
table1.Columns[0].ColumnMapping = MappingType.Attribute;
table1.Columns[1].ColumnMapping = MappingType.Attribute;

We will then create a second table that contains data from the Info tag. It is necessary to create
another table otherwise hierarchical links will be poorly reconstructed, such as:
<Book Id="3" Category="Fiction">
<Info/>
<Name> For whom the bell tolls </Name>
<Author> Ernest Hemingway </Author>
</Book>

To make the connection between the Info table( table2) and the Book table( table1) we add an Id
column that is used only as a foreign key:
DataTable table2 = ds2.Tables.Add("Info");
table2.Columns.Add("Id", typeof(int));
table2.Columns.Add("Name", typeof(string));
table2.Columns.Add("Author", typeof(string));
table2.Columns[0].ColumnMapping = MappingType.Hidden;

This column is not displayed in the XML thanks to the value Hidden of the enum MappingType.
Then comes the creation of the foreign key relationship linking the two tables:
DataRelation relationBookInfo = ds2.Relations.Add("rel1", ds2.Tables["Book"].Columns["Id"],
ds2.Tables["Info"].Columns["Id"]);
relationBookInfo.Nested = true;

relationBookInfo.Nested = true is very important. Without it we would get the following XML:
<?xml version="1.0" standalone="yes"?>
<Library>
<Book Id="1" Category="NonFiction" />
<Book Id="2" Category="NonFiction" />
<Info>
<Name>Rich dad Poor Dad</Name>
<Author>Kiyosaki</Author>
</Info>
<Info>
<Name>the 4h work week</Name>
<Author>Feriss</Author>
</Info>
</Library>
You will notice that the <Info> tags ( and their nested tags ) are not included in the <Book> tag.
From here, It remains only to populate the DataSet and create the XML file using the method
WriteXml.
foreach (Book b in books)
{
table1.Rows.Add(b.Id,b.Category);
table2.Rows.Add(b.Id,b.Name,b.Author);
}
ds2.WriteXml(@"C:\temp\DatasetBenchmark_"+books.Count+".xml");

The generated file is as expected:


<?xml version="1.0" standalone="yes"?>
<Library>
<Book Id="1" Category="NonFiction" >
<Info>
<Name>Rich dad Poor Dad</Name>
<Author>Kiyosaki</Author>
</Info>
</Book>
<Book Id="2" Category="NonFiction" />
<Info>
<Name>the 4h work week</Name>
<Author>Feriss</Author>
</Info>
</Book>
</Library>
Conclusion

Using a Dataset is yet another way to read or generate XML. In this case, we do not work directly
with XML tags. It is the architecture of the Dataset and its properties which are used to define the
shape of the output XML file.
Note that the links between XML tags hierarchy are managed using foreign keys, which requires
the creation of dummy columns in the Dataset and foreign key relationships.
Generating XML with the XmlWriter or XmlDocument is a tedious job, however using a Dataset
that can quickly become a very complex work as well. Imagine a file with ten levels of
hierarchy...
In terms of performance, using a Dataset to read XML is the worst between all solutions analyzed
in this document. It is therefore only wise to use it if the rest of the application uses ADO.NET
and small files. If performance is important in your application, you will look for something else.
Serialization

The namespace System.Xml.Serialization contains classes used to serialize objects into XML
and vice versa. Serialization is an interesting alternative to the DOM model or the Dataset
described in the previous paragraphs. In this chapter, we will start generating an XML file, and
then we will see how to read it.
How to write a XML file using serialization

Writing an XML file with serialization is a very different approach than DOM. The data model of
the XML file is contained in a .NET class as for Dataset. However this model is much simpler to
use than a Dataset.
Recall the file that you want to generate:
<?xml version="1.0" encoding="utf-8" ?>
<Library>
<Book Id="1" Category="novel">
<Name>For whom the bell tolls</Name>
<Author>Ernest Hemingway</Author>
</Book>
<Book Id="2" Category="non-fiction">
<Name>My non fiction book</Name>
<Author>Someone</Author>
</Book>
<Book Id="3" Category="novel">
<Name>The little prince</Name>
<Author>Antoine de Saint-Exupéry</Author>
</Book>
</Library>

To do so, we create the following Book classes. We will see in detail how it works.
public class Book
{
[XmlAttribute]
public string Id { get; set; }
[XmlAttribute]
public string Category { get; set; }
public string Name { get; set; }
public string Author { get; set; }
}
[XmlRoot("Library")]
public class Library : List<Book> { }

internal void WriteData()


{
try
{
List<Book> listBooks = AddBooks();
Library lib = new Library();
lib.AddRange(listBooks);

string fullPath = @"./Generated_Serialization.xml";

var output = new StringBuilder();


var settings = new XmlWriterSettings { Encoding = Encoding.UTF8, Indent = true };
using (Stream fs = new FileStream(fullPath, FileMode.Create))
{
using (var xmlWriter = XmlWriter.Create(fs, settings))
{
XmlSerializer serializer = new XmlSerializer(typeof(Library));
serializer.Serialize(xmlWriter, lib);
}
}
}
catch (Exception ex)
{
_log.Error(ex.Message);
}
}
Let’s have a look at how it works.
First we create an XmlWriterSettings object that will be one of the arguments provided to the
XmlWriter. Notice that serialization also uses XmlWriter.
var settings = new XmlWriterSettings { Encoding = Encoding.UTF8, Indent = true };

As it is expected that the output is an XML file, it is necessary to provide a stream to the
XmlWriter object. If we had wanted to get the XML into a string, it would have been enough to
provide a StringBuilder to the XmlWriter.
using (Stream fs = new FileStream(fullPath, FileMode.Create))

Finally, the serialization code itself:


using (var xmlWriter = XmlWriter.Create(fs, settings))
{
XmlSerializer serializer = new XmlSerializer(typeof(Library));
serializer.Serialize(xmlWriter, lib);
}
The Serialize method of the XmlSerializer object serializes all Books contained in “lib”. For this
it requires a XmlWriter as first argument and as a second parameter, the object to serialize.

As you can see, the code is very short! Serialization in itself takes 3 lines.
It only takes a few lines to generate an XML file. In addition, the developer doesn’t have to
manage tags, the hierarchy of tags them: This is the purpose of the Book class. The complexity of
dealing with the XML hierarchical structure is in the object model. It is then much easier to
maintain. This is the main advantage of using Serialization to read/write XML.
The example above is a simple example of what can be achieved with serialization.
As this is an acceptable solution to manage XML files, let's dig a little what is possible to do with
serialization.
What is serializable and what is not

Classes, structures, and enumerations are serializable. In contrast, interfaces are not serializable.
All types to serialize must be public. Only public fields and properties are serialized.
Properties to serialize must not be read-only with the exception of collections.
The type to be serialized must have a default constructor (without parameters and public) that is
required for deserialization to create an instance of this type.
All types to serialize must implement the interface IXmlSerialisable or be composed of types that
implement this interface.
Warning ! Dictionary is not natively serializable. If you want to serialize a Dictionary, you will
have to extend the class to implement IXmlSerialisable. You will find on the Internet a
serializable Dictionnary class implementations without too many difficulties for example here :
http://www.codeproject.com/Questions/454134/Serialize-Dictionary-in-csharp
Serialization of a collection

The example above shows the serialization of a collection which is relatively common. Here we
will see what happens without the Library class.
[XmlRoot("Library")]
public class Library : List<Book> { }

Using the Library class is optional, it would have been quite possible to serialize a List <Book>
The code would have looked like this :
using (var xmlWriter = XmlWriter.Create(fs, settings))
{
XmlSerializer serializer = new XmlSerializer(typeof(List<Book>));
serializer.Serialize(xmlWriter, listBooks);
}

This is therefore a List <Book> which is passed as an argument to the XmlSerializer constructor
and the instance of the list in the Serialize method.
The result is as follows:
<?xml version="1.0" encoding="utf-8" ?>
<ArrayOfBook xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<Book Id="1" Category="novel">
<Name>For whom the bell tolls</Name>
<Author>Ernest Hemingway</Author>
</Book>
<Book Id="2" Category="non-fiction">
<Name>My non fiction book</Name>
<Author>Someone</Author>
</Book>
<Book Id="3" Category="novel">
<Name>The little prince</Name>
<Author>Antoine de Saint-Exupéry</Author>
</Book>
</ArrayOfBook>

Only one thing has changed between the results of the first example and this one: the name of the
root tag that is passed to the Library ArrayOfBook.
We would expect to get something like ListOfBooks serialization but does not differentiate
between types of collection. That we pass an array or list is always <ArrayOf…> tag.
A few attributes to customize serialization
There are a few attributes to customize the serialization. We will focus on them in this section.
XmlRoot
XmlRoot attribute applies to a class and not a property. This allows defining the root element of
the XML file and setting its name:
[XmlRoot("Library")]
public class Library : List<Book> { }

The root of the generated file in the example above is the tag <Library>.
XmlIgnore
In case you do not want to serialize some properties, apply the attribute XmlIgnore.
[XmlIgnore]
public int NbItems { get; set; }

XmlElement

Sometimes you want to have a name for the tag that is different than the property name.
XmlElement can, without changing the name of the property, change the tag name of the generated
file.
[XmlElement("Titre")]
public string Name { get; set; }

The generated XML will be as follows (excerpt):


<Book Id="1" Category="NonFiction">
<Titre>Rich dad Poor Dad</Titre >
<Author>Kiyosaki</Author>
</Book>

XmlArray and XmlArrayItem

XmlArray can change the name of the tag collection. Here we to change the collection name
Books to Livres, its counterpart in French.
[XmlArray("Livres")]
public List<Book> Books { get; set; }

However, this does not change the tag name of the elements of the list:
<Livres>
<Book>

</Book>
</Livres>

To change the name of the list items


[XmlArray("Livres")]
[XmlArrayItem("Livre")]
public List<Book> Books { get; set; }
will result in this :
<Livres>
<Livre>

</Livre>
</Livres>

XmlAttribute
Until then, all properties were transformed into elements. However you may wish to pass a
property, not as an element, but as an attribute of its parent element.
This is the value of XmlAttribute tag. We have also used it in the example earlier in this chapter.
public class Book
{
[XmlAttribute]
public string Id { get; set; }
[XmlAttribute]
public string Category { get; set; }
public string Name { get; set; }
public string Author { get; set; }
}

With of course Category and Id as attributes in the resulting file.


<Book Id="1" Category="NonFiction">
Formatting properties
Suppose our class contains a DateTime property and you want to display only the date, not the
time.
There is no XML attribute in this case. You must configure this in the get, (for serialization) and
set, (for deserialization), or create a new property dedicated to it.
[XmlIgnore]
public DateTime PublicationDate { get; set; }

public string PublicationDateFormatted


{ get { return PublicationDate.ToShortDateString();}
set {PublicationDate = DateTime.Parse(value);
}

We have created a new property PublicationDateFormatted which only get the PublicationDate
value and apply the expected format.
How to manage derived classes
Serialization does not support derived classes without a little extra work.
Let’s suppose that we need the following specialization:
public class EBook : Book
{
public string Format { get; set; }
}
And the following code:
EBook myebook = new EBook();
myebook.Format = "MOBI";
myebook.Name = "MyEbook";
lib.Add(myebook);

The “lib” collection contains both Book objects and an object EBook. As Ebook is a child class
of Book it is ok to add an Ebook to a collection of Book.
If you try to serialize this we get:

InvalidOperationException : The
type EBook was not expected.
Use the XmlInclude or SoapInclude attribute to specify types
that are not known statically.

XmlSerializer does not know how to directly manage serialization: you must add the attribute
XmlInclude.
[XmlInclude(typeof(EBook))]
public class Book
{

}

In this case the serialization works and returns:

<?xml version="1.0" encoding="utf-8" ?>


<Library>
<Book Id="1" Category="novel">
<Name>For whom the bell tolls</Name>
<Author>Ernest Hemingway</Author>
</Book>
<Book Id="2" Category="non-fiction">
<Name>My non fiction book</Name>
<Author>Someone</Author>
</Book>
<Book Id="3" Category="novel">
<Name>The little prince</Name>
<Author>Antoine de Saint-Exupéry</Author>
</Book>
<Book xsi:type="EBook">
<Name>MyEbook </Name>
<Format>MOBI</Format>
</Book>

</Library>

Note that the actual type of book is described in the tag xsi: type
Advanced Customization with IXmlSerializable

If the attributes described above are not enough to achieve your goals, all is not lost! There is a
way to go through this: implement the IXmlSerializable interface. While the serialization process
starts, .NET will first check if the type to serialize implements this interface.
IXmlSerializable is composed of three methods:
• GetSchema
• ReadXml
• WriteXml
We will not detail how to implement these methods. There are many resources on the Internet
describing the code and the code somewhat looks like the syntax used with XmlDocument.
How to read XML with serialization

Reading an XML file is not that different from writing : Instead of serializing we will deserialize.
internal Library ReadData(string path = @"./SampleLibrary.xml")
{

XmlSerializer xs = new XmlSerializer(typeof(Library));


Library res = new Library();

using (StreamReader sr = new StreamReader(path))


{
using (var xmlReader = XmlReader.Create(sr))
{
XmlSerializer serializer = new XmlSerializer(typeof(Library));
res = xs.Deserialize(xmlReader) as Library;
}
}
return res;
}

The differences between writing an XML file (serialization) and reading an XML file
(Deserialization) focus on 2 levels:
- Use of a StreamReader in the case of reading, writing needs a StreamWriter
- To deserialize, the Deserialize method is called whereas to serialize it is the Serialize method
that is called.
Nothing extraordinary…
Note however that, to deserialize, you need the class describing the model. In our example it is the
Book class.
If you did not create this file during application development and you just need to deserialize, you
will have to create it .There are 2 methods to do so:
- If you are shipwrecked on an island in the Pacific ocean knowing that the next ship won’t come
before a few centuries, then you can create it by hand.
- Otherwise, a second approach is to use xsd.exe that will automatically generate the C# class (es)
required for the deserialization from the XSD file.
XSD.exe
The xsd.exe tool is provided with the Visual Studio SDK. For example, on my computer it is
located here:
C: \ Program Files (x86) \ Microsoft SDKs \ Windows \ v7.0A \ Bin
It will perhaps not be the case on your machine, but I'm sure you will eventually find it even if you
only have less than 200 years to accomplish this task.
If you do not have a XSD file, you can generate XML from the command
Xsd C: \ temp \ <XMLFile>. Xml / out :/ C: \ temp

This assumes that your xml file is located in C: \ temp and you have made a “cd” to the file
location xsd.exe.
Then to generate classes from the xsd file, simply write this command:
Xsd / classes C: \ temp \ <XsdFile>. Xsd / out :/ C: \ temp

The generated code can seem quite complex but it contains everything we need to deserialize the
XML file (extract):
[System.CodeDom.Compiler.GeneratedCodeAttribute("xsd", "2.0.50727.3038")]
[System.SerializableAttribute()]
[System.Diagnostics.DebuggerStepThroughAttribute()]
[System.ComponentModel.DesignerCategoryAttribute("code")]
[System.Xml.Serialization.XmlTypeAttribute(AnonymousType=true)]
public partial class LibraryBook {

private LibraryBookInfo[] infoField;

private string idField;

private string categoryField;

/// <remarks/>
[System.Xml.Serialization.XmlElementAttribute("Info", Form=System.Xml.Schema.XmlSchemaForm.Unqualified)]
public LibraryBookInfo[] Info {
get {
return this.infoField;
}
set {
this.infoField = value;
}
}

/// <remarks/>
[System.Xml.Serialization.XmlAttributeAttribute()]
public string Id {
get {
return this.idField;
}
set {
this.idField = value;
}
}

/// <remarks/>
[System.Xml.Serialization.XmlAttributeAttribute()]
public string Category {
get {
return this.categoryField;
}
set {
this.categoryField = value;
}
}
}
Validation with serialization
Before deserializing a file it is useful to perform a validation of the document.
The approach is in fact the same as to validate an XML file with XmlReader as serialization uses
such an object and the validation is done at this level anyway.
XmlSchemaSet sc = new XmlSchemaSet();
sc.Add("", "SampleLibraryXsd.xsd");

XmlReaderSettings settings = new XmlReaderSettings();


settings.ValidationType = ValidationType.Schema;
settings.Schemas = sc;
settings.ValidationEventHandler += new ValidationEventHandler(ValidationCallBack);

using (var xmlReader = XmlReader.Create(sr, settings))


{
….
As seen above the XmlReaderSettings object contains the configuration options of XmlReader.
It is in this class that is assigned the schema. This “settings” object is then passed to the
constructor of the XmlReader class.
Performances

Time Performance for serialization is very respectable for files larger than 1MB.
In the case of small files, it is unfortunately the least efficient. Generating a file with lots of data is
performed in an acceptable time.
Of course, it will always be faster to work with XmlReader / XmlWriter because serialization
will use these classes in the process.
Conclusion

Serialization is a great way to generate XML files. The developer does not need to write low
level code on the tags, everything is configured in the class model with XML attributes applied to
properties. These properties correspond to the tags that will be generated.
Hierarchy is also handled automatically: it will not be necessary to do this by hand as is the case
with XmlReader / XmlWriter , Dataset and XmlDocument.
Customizing serialization can be pushed further by implementing IXmlSerializable to handle non-
serializable types or get a behavior that is not possible with attributes.
There is however a downside: reading an XML file occurs on the entire file. To search for a
specific tag, value.. in the document, it is quite possible to use XPath example.
Coupling Serialization and XPath (or other) proves to be a powerful and maintainable solution.
Linq To Xml

Technologies before LinqToXml to manipulate XML documents like XmlDocument or XPath


were relatively hard to use for the developer and required motivation and time to be mastered.
Microsoft introduced Linq in version 3.5 of the .NET Framework. LinqToXml is a modern,
simple and complete API to process XML files. It includes most of the features provided by other
APIs, all in a relatively simple programming style.
We will dedicate a substantial portion of this book to LinqToXml since it is currently the best
solution in many contexts.
Before to begin to see how to use LinqToXml, it is interesting to have a look at the class
hierarchy:
XObject
The top class is XObject: This class is the basis for most classes LinqToXml provides
functionality to add / delete annotations via methods
AddAnnotation () / RemoveAnnotation ()
XNode
Directly dependent on XObject, XNode is the base class for nodes Elements. It is at this level
that is implemented methods to add nodes AddAferSelf, AddbeforeSelf and their counterpart for
deletion.
XContainer
Still a notch below, there is the XContainer class. This class is used to contain XNode objects
containing nested XNode. XContainer has methods like Add, AddFirst, ReplaceNodes ,
RemoveNodes.
XElement
This class is the most important of LinqToXml. This class represents the XML nodes can contain
other XML nodes. XElement provides a method (Load) to load an XML file from several types of
data sources and a Parse method that allows a node to be created from a string representing the
XML. XElement also offers the Save method to generate a file.
XDocument
XDocument represents an XML document. However with LinqToXml, it is possible to create an
XML document without using this class: the sole class XElement is needed to create a complete
XML tree. However XDocument can be used to add an XML declaration (XDeclaration), a
document type or processing instructions (XSS)
How to read a XML file with LinqToXml
Reading a file with LinqToXml is very simple. Look at the following line
var books = from b
in XElement.Load("SampleLibrary.xml")
.Elements("Book")
select b;

To load an XML file, it uses the Load methodof XElement . The property “Elements (" Book ")“
indicates that you only want to get the <Book> tag contents of the XML file.
The “books” output variable is of type IEnumerable < XElement >. It is therefore possible to
iterate over the collection to retrieve the values of fields:
foreach (var b in books)
{
Book newBook = new Book();
newBook.Id = b.Attribute("Id").Value.ToString();
newBook.Type = b.Attribute("Type").Value.ToString();
newBook.Name = b.Descendants().First().Element("Name").Value.ToString();
newBook.Author = b.Descendants().First().Element("Author").Value.ToString();
listBooks.Add(newBook);
}
The code above is pretty simple to understand. However,what happens if the Value property is not
used? That is to say,
newBook.Name = b.Descendants().First().Element("Name").ToString();

In this case, the returned string will contain for instance :” <Name> For whom the bell tolls </
Name>”
Using the Value property, the string returned will be: “For whom the bell tolls”.
Note that with Linq it is possible to rewrite the above code like this:
var listbooks2 =
(from b in XElement.Load("SampleLibrary.xml").Elements("Book")
select new Book
{
Id =b.Attribute("Id").Value,
Category = b.Attribute("Category").Value,
Name = b.Descendants().First().Element("Name").Value,
Author = b.Descendants().First().Element("Author").Value
}).ToList<Book>();

Instead of retrieving a collection of XElement, we use tthese XElement to create Book objects
directly and return a list containing them.
It is interesting to note that the Load method also accepts a URL, so it is possible to read the
contents of the RSS feed this way.
It is also possible to use an external data source to inject data for the XML file to create. This is a
good way to avoid having to rewrite the XML by hand.
How to write a XML file with LinqToXml

There are several ways to create a file with LinqToXml.


The first approach is called functional approach: The developer writes by hand the entire file
structure.
XElement xml = new XElement("Library",
new XElement("Book",
new XAttribute("Id", 1),
new XAttribute("Category", "Novel"),
new XElement("Name", "For whom the bell tolls"),
new XElement("Author", "Ernest Hemingway"))),
new XElement("Book",
new XAttribute("Id", 2),
new XAttribute("Category", "Novel"),
new XElement("Name", "War and Peace"),
new XElement("Author", "Tolstoi"))));

xml.Save(@"C:\temp\GeneratedXmlFile.xml");

This piece of code will create the following file:


<?xml version="1.0" encoding="utf-8"?>
<Library>
<Book Id="1" Category="Novel">
<Name>For whom the bell tolls</Name>
<Author>Ernest Hemingway</Author>
</Book>
<Book Id="2" Category="Novel">
<Name>War and Peace</Name>
<Author>Tolstoi</Author>
</Book>
</Library>

This approach is called functional because you can create a complete XML tree in a single
statement. The code looks more like the tree generated by this statement that the imperative
approach akin to the code below:
XElement xml = new XElement("Library");
xml.Add(new XElement("Book"));
...
Automatically generate code with the PasteXmlAsXelement
addin

Writing this tree by hand is a long and tedious task. But there is a plugin that allows you to convert
a XML file into a tree of XElement (see code snippet above) via a simple copy and paste.
This works like this:
- Select XML code.
- Copy
- Go to Edit / Paste Xml As XElement in Visual Studio.
- The pasted code is now represented as a tree of XElement.
To install this Addin PasteXmlAsLinq, go download it on the internet:
http://code.msdn.microsoft.com/windowsdesktop/PasteXmlAsLinq-fe6d0540
1. Open the Visual Studio solution PasteXmlAsLinq.sln
2. Compile the project (after conversion if you are using Visual Studio 2010 or 2012)
3. Get the 2 files PasteXmlAsLinq.dll and PasteXmlAsLinq.AddIn
4. Copy them in: Documents \ Visual Studio 2010 \ AddIns
5. Restart Visual Studio and open the solution on which you are working.
From the Edit menu, you should see a new menu "Paste Xml As XElement"
Now that we have seen how to read and write basic XML files with LinqToXml, it is time to go a
little deeper into the available classes and how they work.
XElement
The XElement class has 3 constructors:
• XElement (XName name)
• XElement (XName name, object content)
• XElement (XName name, params object [] content)
The type of the object[] « content » can be :
String
For example:
new XElement("Name", "War and Peace")

LinqToXml will be in charge of the creation of the internal node XText that contains in the
example above, "War and peace".
XText
The XText object can contain either a string or CData value. We use this implementation
for CDATA only. It is more readable to pass a string directly without an object XText.
XAttribute
Added as an attribute
IEnumerable
The enumerable is scanned and processed for each object in the iteration.

Namespace
When creating a XML file in real life, it is common to have to use namespaces.
With LinqToXml we use them as follows:
XNamespace ns = "http://gogo.com";
XElement elt = new XElement(ns + "book");

How to add content to the XML file?

The main method to add nodes to a XElement is the Add method that accepts these two signatures:
public void Add( object content)
public void Add( params object[] content)

We can add to a XElement one or more nodes. For example:


XElement elt = new XElement("Library");
elt.Add(new XElement("book",
new XAttribute("Id",1),
new XElement("Name","For Whom the bell tolls")));

This code will add the Library node, a Book node with an attribute Id = 1 and a Name tag which
value is "For whom the bell tolls". Similarly removing nodes is fairly simple:
// delete the first book
books.Element("book").Remove();
// delete all child books
books.Elements("book").Remove();
How to navigate in the XML tree
To navigate inside the XML tree there are several methods:
Element() & Attribute()
The Element method is used to select a single XML node from a name. The element returned is the
first to satisfy the condition.
// delete the first book
books.Element("book").Remove();

Elements() & Attributes


Unlike Element that returns a single node, the Elements method returns all nodes satisfying the
given name argument are direct children.
// delete all child books
books.Elements("book").Remove();

Warning: the Elements method only works with direct children of the current node.
Descendants() & Ancestors()
The method Descendants works the same way as Elements , but is not limited only to direct child
nodes. The method returns all nodes of the XML tree.
books.Descendants("book").Remove();

Special case: the Descendants method does not include the current node in the results. If this is the
expected behavior, use DescendantsAndSelf.
The Ancestors method is similar to Descendants but if Descendants works downward the tree, the
Ancestors method works upward.
Modifying existing content
To edit content nodes the method SetElementValue can be used
For example:
listBooks.Element("book").SetElementValue("author", "Unknown");

The above code will update the tag <Author> of the first node found
Validation with LinqToXml
Validating an XML file is also possible and relatively simple to implement. It is once again
XmlSchemaSet that is used to encapsulate the schema.
So we must begin by creating this object:
XmlSchemaSet sc = new XmlSchemaSet();
sc.Add("", "SampleLibraryXsd.xsd");

We then use the Validate method to call this object and start with the validation of XML.
XDocument doc1 = new XDocument( … creation of the tree )
bool errors = false;
doc1.Validate(schemas, (o, e) =>
{
Console.WriteLine("{0}", e.Message);
errors = true;
});
Benchmark

Warning

The results of this benchmark below should not be used in an absolute way. Indeed, doing the
same test twice can lead to significant performance up to +/- 15% and greatly depends your
computer hardware.
It is more interesting to use it to make a comparative analysis to find out which solution is the
fastest depending on the size of the processed XML.
Benchmark description

The writing XML benchmark works as follows. Once the program is started, it will create an
XML document containing a number of nodes provided as an argument. The program is shut down
after each test. The generated file is never the same as the node content is random.
However, in the reading benchmark, the same file (for a given size) is used for every test.
Log4net is used to measure how long it takes to create the file or to read it.
Results

Below are the results when reading:


Reading time 10ko 100ko 1mo 5mo 10mo 100mo
(ms)
XmlReader 16 20 45 145 386 3800
XmlDocument 20 24 90 566 1000 10000
Sérialisation 300 300 380 413 600 6000
Dataset 50 110 500 2070 4230 35000
LinqToXml 25 27 69 413 935 7200
XPath 20 24 73 250 535 9000

And those when writing


Writing time 10ko 100ko 1mo 5mo 10mo 100mo
(ms)
XmlReader 17 30 160 225 1350 5200
XmlDocument 20 40 260 705 2100 12500
Sérialisation 266 266 351 500 1000 8000
Dataset 31 159 288 600 2040 11500
LinqToXml 27 40 330 800 1800 9000

Result analysis

Not surprisingly XmlReader is the fastest way to read XML and XmlWriter is the fastest to
write. LinqToXml and the serialization classes use these objects to read/write XML, so it can
only be slower to use LinqToXml or serialization than XmlReader / XmlWriter.
For small documents, whether reading or writing, XmlDocument, XPath (read only) and
LinqToXml show good results. However serialization is an expensive solution for this type of
operation. Serialization is a solution to avoid managing files smaller than 1mb.
From 1 Mo and beyond sizes, the different technical solutions exhibit a relatively consistent
behavior: the fastest solution after XmlReader / XmlWriter is serialization. LinqToXml is not far
behind. It seems quite clear that to deal with large files it is necessary to use XmlReader /
XmlWriter ,LinqToXml or serialization.
One exception: XPath. XPath offers very respectable performance in reading up 10mb but seems
to lose its advantage over larger files such as 100MB.
General Conclusion

We have detailed in this book different ways to work with XML files in .NET
XmlReader / XmlWriter
These two classes are the fastest of all the proposed solutions. However, the implementation of
the code needed to read or write is tedious and has a low maintainability.
XmlTextWriter and XmlTextReader
These two classes are obsolete and their use should be avoided. Many bugs have been found and
if they are still in the package it is only not to break the compatibility with previous versions of
the Framework.
XmlDocument
XmlDocument is a class that was introduced in the early days of .NET Framework. Even though it
is not officially obsolete, there is really no reason to use it today.
Dataset
The Dataset is one of the worst solutions in terms of performance for managing XML in general.
The developer may wish to use them only to keep certain homogeneity in the code already using,
ADO.NET. However it is better to turn to other solutions.

Serialization
Serialization is an alternative to read and write XML. It works quite differently from the other
solutions we have analyzed here. The main advantage is that the developer does not work with
XML tags directly but with the object model which is much more maintainable. Serialization also
has very good performance when it comes to the need to deal with files larger than 5Mo.

LinqToXml
LinqToXml the newest .NET technologies to process XML and it is quite cool indeed.
LinqToXml can read and write files with a syntax similar to SQL. This syntax is consistent with
other Linq technologies (LinqToSql, LinqToObjects).
This is a very powerful technology, maintainable and efficient. Unless you are working with an
older version of Framework, this is definitely the solution to be considered for processing XML
files.
XPath
XPath is a good way to read and parse XML files if the you are working with a version of the
framework older than 3.5. XPath can read or retrieve data from an XML file very quickly. This
technology does not allow writing to files and must confine itself to read only.
The syntax of XPath queries is quite esoteric and can be confusing to many.
That is why, for reasons of maintainability, I recommend using LinqToXml to XPath when
possible.

Anda mungkin juga menyukai