XML Revisited
From a data modelling viewpoint, what does XML offer? Entities (ER!) Attributes
Single-valued, atomic
Roadmap
XPath XQuery
a node test: node type and expanded-name of nodes selected by location step zero or more predicates: further refine set of nodes selected by location step
Pattern Expressions
<?xml version="1.0" encoding="ISO-8859-1"?> <catalog> Locating Nodes in XML document <cd country="USA"> pattern expression to identify nodes in <title>Empire Burlesque</title> <artist>Bob Dylan</artist> document <price>10.90</price> </cd> path through the XML document: <cd country="UK"> .../node1/node2/... <title>Hide your heart</title> pattern "selects" elements that match path, <artist>Bonnie Tyler</artist> <price>9.90</price> result is a (sub)tree </cd> all price elements of all cd elements of the <cd country="USA"> catalog element: <title>Greatest Hits</title> <artist>Dolly Parton</artist> /catalog/cd/price <price>9.90</price> </cd> </catalog>
320302 Databases & Web Applications (P. Baumann) 6
Paths
<?xml version="1.0" encoding="ISO-8859-1"?> <catalog> Absolute vs. relative vs. fitting: <cd country="USA"> path starts with a slash ( / ): <title>Empire Burlesque</title> absolute path <artist>Bob Dylan</artist> <price>10.90</price> path starts with two slashes ( // ): </cd> all fitting elements, <cd country="UK"> even if at different levels in tree <title>Hide your heart</title> Otherwise: path relative to current position <artist>Bonnie Tyler</artist> <price>9.90</price> </cd> Relative addressing via an axis: <cd country="USA"> Defines a node set relative to current node <title>Greatest Hits</title> <artist>Dolly Parton</artist> all children of parent, child, self, ancestor, <price>9.90</price> descendant, attribute, </cd> </catalog>
320302 Databases & Web Applications (P. Baumann) 7
Examples
10
Examples
11
More Examples
self({2}) = {2} <1> <2> <3/> <4/> </2> <5/> <1/> child({1}) = {2,5} parent({3}) ={2} descendant({1}) = {2,3,4,5} descendant-or-self({1}) = {1,2,3,4,5} ancestor({4}) = {1,2} ancestor-or-self({4}) = {1,2,4} following({3}) = {4,5} preceding({4}) = {3} following-sibling({4}) = {} preceding-sibling({5}) = {2}
12
Wildcards
Use * to select unknown elements all child elements of all cd of catalog: /catalog/cd/* all price elements that are grandchilds of catalog: /catalog/*/price all price elements which have 2 ancestors: /*/*/price all elements: //*
320302 Databases & Web Applications (P. Baumann)
<?xml version="1.0" encoding="ISO-8859-1"?> <catalog> <cd country="USA"> <title>Empire Burlesque</title> <artist>Bob Dylan</artist> <price>10.90</price> </cd> <cd country="UK"> <title>Hide your heart</title> <artist>Bonnie Tyler</artist> <price>9.90</price> </cd> <cd country="USA"> <title>Greatest Hits</title> <artist>Dolly Parton</artist> <price>9.90</price> </cd> </catalog>
13
Abbreviations
a/b/c
./child::a/child::b/child::c
a//@id
./child::a/descendant-or-self::node()/attribute::id
//a
root(.)/descendant-or-self::node()/child::a
a/text()
./child::a/child::text()
14
Branch Selection
Selecting branches from subtree: "[...]" first cd child of catalog: /catalog/cd[1]
/catalog/cd[ position() = 1 ] <?xml version="1.0" encoding="ISO-8859-1"?> <catalog> <cd country="USA"> <title>Empire Burlesque</title> <artist>Bob Dylan</artist> <price>10.90</price> </cd> <cd country="UK"> <title>Hide your heart</title> <artist>Bonnie Tyler</artist> <price>9.90</price> </cd> <cd country="USA"> <title>Greatest Hits</title> <artist>Dolly Parton</artist> <price>9.90</price> </cd> </catalog>
15
all cd elements of catalog that have a price element: /catalog/cd[ price ] all cd elements of catalog that have a price with value of 10.90: /catalog/cd[ price=10.90 ]
320302 Databases & Web Applications (P. Baumann)
Multiple Paths
Selecting Several Paths: | operator all title, artist elements: /catalog/cd/title | /catalog/cd/artist all the title and artist elements in the document: //title | //artist all title, artist, price elements: //title | //artist | //price all title elements of cd of catalog, and all artist elements: /catalog/cd/title | //artist
320302 Databases & Web Applications (P. Baumann)
<?xml version="1.0" encoding="ISO-8859-1"?> <catalog> <cd country="USA"> <title>Empire Burlesque</title> <artist>Bob Dylan</artist> <price>10.90</price> </cd> <cd country="UK"> <title>Hide your heart</title> <artist>Bonnie Tyler</artist> <price>9.90</price> </cd> <cd country="USA"> <title>Greatest Hits</title> <artist>Dolly Parton</artist> <price>9.90</price> </cd> </catalog>
16
Attributes
Selecting Attributes: prefix attributes with @ all attributes named country : //@country all cd elements which have an attribute named country: //cd[@country] all cd elements with attribute named country with value 'UK' ": //cd[@country='UK']
320302 Databases & Web Applications (P. Baumann)
<?xml version="1.0" encoding="ISO-8859-1"?> <catalog> <cd country="USA"> <title>Empire Burlesque</title> <artist>Bob Dylan</artist> <price>10.90</price> </cd> <cd country="UK"> <title>Hide your heart</title> <artist>Bonnie Tyler</artist> <price>9.90</price> </cd> <cd country="USA"> <title>Greatest Hits</title> <artist>Dolly Parton</artist> <price>9.90</price> </cd> </catalog>
17
Predicates
Predicates, operators, functions as usual all CDs with price below 10.0: /catalog/cd[ price<10.0 ] all CDs with country "UK" and price below 10.0: / catalog / cd[ @country="UK" ] / [ price<10.0 ]
<?xml version="1.0" encoding="ISO-8859-1"?> <catalog> <cd country="USA"> <title>Empire Burlesque</title> <artist>Bob Dylan</artist> <price>10.90</price> </cd> <cd country="UK"> <title>Hide your heart</title> <artist>Bonnie Tyler</artist> <price>9.90</price> </cd> <cd country="USA"> <title>Greatest Hits</title> <artist>Dolly Parton</artist> <price>9.90</price> </cd> </catalog>
18
Document Order
19
lete! ncomp i
step slash ... slash finalstep | slash step slash ... slash finalstep
::= child | descendant | parent | ancestor | ... ::= node-name | * ::= node-name | * | @ attr-name ::= some boolean expression over nodes and attributes
20
Roadmap
XPath XQuery
21
XQuery
XQuery retrieving information from XML data
XQuery = XML Query Built on XPath
XQuery is to XML what SQL is to tables Allows to extract information from XML structures
Stored in a file or in a database Major DBMS vendors support XQuery
Result: <title> abc </title> <title> def </title> <title> ghi </title>
320302 Databases & Web Applications (P. Baumann) 23
LET $x = expr
binds $x to the entire list expr Defines variable; Binds collection variables
LET $x = document("bib.xml")/bib/book RETURN <result> $x </result>
320302 Databases & Web Applications (P. Baumann)
one value
Returns:
<result> <book>...</book> <book>...</book> ... </result>
24
Aggregates
count = (aggregate) function that returns the number of elems
<big_publishers> FOR $p IN distinct(document("bib.xml")//publisher) LET $b = document("bib.xml")/book[publisher = $p] WHERE count($b) > 100 RETURN $p </big_publishers>
<big_publishers> <publisher>Morgan Kaufmann</publisher> <publisher>Wiley</publisher> </ big_publishers>
27
Sorting
<publisher_list> FOR $p IN distinct(document("bib.xml")//publisher) ORDERBY $p RETURN <publisher> <name> $p/text() </name> , FOR $b IN document("bib.xml")//book[publisher = $p] ORDERBY $b/price DESCENDING RETURN <book> $b/title , $b/price </book> </publisher> </publisher_list>
31
If-Then-Else
FOR $h IN //holding ORDERBY $h/title RETURN <holding> $h/title, IF $h/@type = "Journal" THEN $h/editor ELSE $h/author </holding>
32
FOR $b IN //book WHERE EVERY $p IN $b//para SATISFIES contains($p, "sailing") RETURN $b/title
320302 Databases & Web Applications (P. Baumann) 33
35