Anda di halaman 1dari 12

XML Study Guide

Coalition 5 (Bobby Tables)


Sean Bashaw, Colin Carlson, Zachary Miller

Learning Objectives
1. Understand what XML is and the features that surround it.
2. Correctly identify the main differences and similarities between XML and SQL.
3. Be able to convert a SQL result set to an XML document.
4. Learn about the differences and similarities between JSON and XML.
5. Learn how to create an XML document from a SQL query.
6. Understand the different libraries available for use, and have some idea as to how
you can use specific features.
7. Have an idea on how to use JDBC, and the usages compared to SAX
8. Understand the advantages of JSON, and how it compares to XML style documents.

Introduction to XML
XML stands for Extensible Markup language. It is a semi-structured markup language.
It’s a powerful tool to transfer data because you can have nested tags, comments, at-
tributes, and singular tags, giving you a broad feature set. It can be used as the basis of
some functional programming languages (such as XLST and ANT). It is also part of the
Document Object Model (DOM) programming interface that treats HTML, XHTML,
and XML objects as a tree structure. XML was created in 1998 as a competitor against
SGML (Standard Generalized Markup Language). XML is largely based around the tag
system, meaning that each object is represented by either a start/end tag, or just the
closing tag. If you wanted to store a rectangle object, for example, you would store it
like this:
<r e c t a n g l e>
...
</ r e c t a n g l e>
And you would store each of the variables in the rectangle object as nested tags in the
rectangle one.
<r e c t a n g l e>
<x>490</x>
<y>281</y>
<width>100</ width>
<h e i g h t>37</ h e i g h t>
</ r e c t a n g l e>
If I wanted to attach another piece of information that states the name of this rectangle,
I could use an attribute.
<r e c t a n g l e name=”Boxy”>
<x>490</x>
<y>281</y>
<width>100</ width>
<h e i g h t>37</ h e i g h t>

1
<r e c t a n g l e />
SGML was similar to XML, but instead of ending the tags like:
<menu>
<f o o d>
<name>I c e Cream</name>
<f o o d />
<menu/>
SGML ended their tags with no end tag name, like this:
<menu>
<f o o d>
<name>I c e Cream</>
</>
</>
This would lead to ambiguity when you had complicated tree structures, as well as prob-
lems when establishing a singular closing tag.

Conversion of SQL table to XML document


One of the uses of XML that we are trying to convey in this study guide is the use of
XML to extract information from a SQL table. Here are several things to keep in mind
when translating a database:

1. The primary key is unknown.

• Because we are just collecting the data, and not using this for any specific
task, we do not need store the primary key. If we must use it for reasons like
references to foreign keys, we will add it as an attribute.

2. Everything is stored as a string.

• The only way to store data in XML is by string. There is no way to tell the
interpreters that a specific tag is an integer, boolean, or floating point value.
We instead leave this up to the parser, and trust their judgement.

3. Don’t worry about transferring over formatting whitespace.

• White space that is mainly used for formatting, such as tabs, enter, and other
types, should be dropped. While we want the XML document to be understood
by both computers and humans, we want to drop the extra characters and let
the parser handle the readability.

4. Keep foreign keys.

• There might be some people who decided that it’s better to include a foreign
key object as a nested element. This is not an optimal idea, because it can lead
to redundancies by including the same object multiple times in the same XML
document. It can also lead to extra parsing time. What we recommend you do
instead is keepprimarykeysbyattachingan“entryi d00 attributetotheentrytagpair.

2
Table 1: Saw
entry id name date in date out reason
1 Jonathan 11/1 11/5 Bird-House Class
2 Tim’s House Repair 11/7 11/23 Reconstruction

Let’s say we want to represent an SQL table for a saw in a workshop that records when
someone wants to use it. The first thing we would do to convert this table to an XML
DOM element is take the name of the table and convert it into the root tag.
<saw>
...
</saw>
Now that we have our root element, we need to add 2 children to this parent, representing
the extension of the database table. One good way to name your child tags is by naming
it after the primary key.

<saw>
<entry ></entry >
<entry ></entry >
</saw>
Now, each column in the database is added to each entry set.
<saw>
<entry >
<name>Jonathan </name>
<d a t e i n >11/1</ d a t e i n >
<d a t e o u t >11/5</ d a t e o u t >
<reason >Bird−House Class </reason >
</entry >
<entry >
<name>Tim ’ s House Repair </name>
<d a t e i n >11/7</ d a t e i n >
<d a t e o u t >11/23</ d a t e o u t >
<reason >R e c o n s t r u c t i o n </reason >
</entry >
</saw>
This is how many libraries and programs map data onto an XML document. Note that
the nesting ability of XML is not used, as we are primarily concerned

Parsing with SAX


SAX stands for (Simple API for XML). It provides you with single direction parsing, and
is very quick/doesn’t take up as much memory. It also doesn’t retain the structure or
data, so it’s useful for finding tiny bits of information quickly, and not retaining entire
trees. The steps for using SAX are as follows:

1. Move down the stream until you reach the target tag.

3
2. Execute as need on the data inside of that tag.

3. Return and exit from SAX, or continue parsing for more instances.

Now let’s look at an example on using SAX, with the table we’ve been using.

Examples with SAX


Click on the following link to see some examples of SAX being used to parse information
from XML:
https://www.journaldev.com/1198/java-sax-parser-example
https://www.mkyong.com/java/how-to-read-xml-file-in-java-sax-parser/

Parsing with DOM


DOM stands for Document Object Model, and it’s useful in cases where you want to
store all of the data and retain the structure. This is slower, but it’s easier to reference
data and allows you to reference specific tags. Here are the steps for parsing an XML
document using the DOM model:

1. Parse the opening tag.

2. While the next sequence of characters is not the associated end tag:

(a) If it is data then parse the data


(b) Otherwise, hand the text stream to the associated method.
(c) Manage returned data from the method.

3. Return the location of the tag.

The DOM method is recursive, and takes up less space in the memory.

Examples with DOM


Click on either of the following links to see some solid examples of parsing XML with
DOM:
https://www.mkyong.com/java/how-to-read-xml-file-in-java-dom-parser/
https://www.tutorialspoint.com/java_xml/java_dom_parse_document.htm

Interacting Directly with Databases


Another way to get accurate data from the databases is instead of requesting information
from a database in the form of an XML document, and then updating the data shown
by parsing the XML document, you can interact with the database directly. Below is
a table containing the most popular library for a particular database in a particular
language: You can find links to download these databases in the References section. We
are primarily going to talk about JDBC in this example:

4
Table 2: Database-Interaction libraries by Database and Language
PostgreSQL MySQL Oracle MSoft
Java JDBC JDBC JDBC JDBC
Python psycopg2 MySQLdb cx Oracle,pyodbc pyodbc
C++ libpqxx,libpq++,SQLAPI++ SQLAPI++ SQLAPI++ SQLAPI++

JDBC
try {
C l a s s . forName ( ”com . mysql . j d b c . D r i v e r ” ) ;
Connection con=DriverManager . g e t C o n n e c t i o n (
” j d b c : mysql : / / l o c a l h o s t : 3 3 0 6 / sonoo ” , ” r o o t ” , ” p a s s ” ) ;
// sonoo i s DB name
Statement stmt=con . c r e a t e S t a t e m e n t ( ) ;
R e s u l t S e t r s=stmt . executeQuery ( ” s e l e c t ∗ from emp” ) ;
while ( r s . next ( ) ) {
System . out . p r i n t l n ( r s . g e t S t r i n g (1)+ ” ”+r s . g e t S t r i n g (2)+
” ”+r s . g e t S t r i n g (3)+ ” ”+r s . g e t S t r i n g ( 4 ) ) ;
}
con . c l o s e ( ) ;
}
catch ( E xc ep tio n e ){ System . out . p r i n t l n ( e ) ; }
As you can see, it’s much easier to directly take the information using JDBC. You can
execute three different types of statements in JDBC:
Statement: These statements are simple strings and are difficult to sanitize. These
kinds of statements are used to ececute SQL statements that do not need a return value:
Statement stmt ;
stmt . executeUpdate (
”CREATE TABLE STUDENT( ID NUMBER NOT NULL, NAME VARCHAR) ” ) ;

PreparedStatement: This class extends the Statement class, and has a pre-defined
format: It’s meant to help with sanitation and security issues. It’s equivalent to a printf
statement in C. This would be ideal for creation, updating, and deletion of an element
in a table.
PreparedStatement pstmt ;
pstmt = con . p r e p a r e S t a t e m e n t ( ” update STUDENT s e t NAME = ? where ID = ? ” ) ;
pstmt . s e t S t r i n g ( 1 , ”MyName” ) ; // A s s i g n s ”MyName” t o f i r s t p l a c e h o l d e r
pstmt . s e t I n t ( 2 , 1 1 1 ) ; // A s s i g n s ”111” t o second p l a c e h o l d e r
pstmt . executeUpdate ( ) ; // e x e c u t e ( ) , e x e c u t e Q u e r y ( )

CallableStatement: This object extends the Statement class, and allows you to call
functions stored inside of the database. This is not quite as injection proof, but it will
help you get the job done.
CallableStatement callableStatement ;
c a l l a b l e S t a t e m e n t = con . p r e p a r e C a l l ( ”{ c a l l c a l c u l a t e S t a t i s t i c s ( ? , ? ) } ” ) ;

5
c a l l a b l e S t a t e m e n t . s e t S t r i n g ( 1 , ”param1” ) ;
callableStatement . setInt (2 , 123);
c a l l a b l e S t a t e m e n t . executeQuery ( ) ; // e x e c u t e ( ) , e x e c u t e U p d a t e ( )

XML vs Object Notation


Since XML is a type of object notation using the rules we’ve defined for mapping SQL
to XML, we can resort to other types of Object Notation besides XML. A popular one
being JSON.

JSON
JSON stands for JavaScript Object Notation. It is primary used as a file storage system
for javascript, acting as a semi-structured file type. While you may want to deal with
XMOL when storing data for offline applications, JSON is very useful when working with
online applications, such as pulling data and sorting it with JSON. JSON is shorter to
use than XML, but XML has attribute support and method support as well. They both
have support for databases and AJAX.

Structure of JSON: Here is an example of the structure of JSON using the same
information we used above from the Saw table. Note the structure of JSON and how
information is stored.
{
”saw ” : {
” entry ”: [
{
”name ” : ” Jonathan ” ,
” d a t e i n ” : ”11 /1” ,
” d a t e o u t ” : ” 11/ 5” ,
” r e a s o n ” : ” Bird−House C l a s s ”
},
{
”name ” : ”Tim ’ s House Repair ” ,
” d a t e i n ” : ”11 /7” ,
” d a t e o u t ” : ”11/23 ” ,
” reason ”: ” Reconstruction ”
}
]
}
}

6
Questions
1. What problem with SGML was XML designed to fix?

2. What data interchange syntax is most common in working with databases?

3. What are the basic steps when using any database API?

4. What is the purpose of keeping the “connection” and the “statement/query/cursor”


seperated?

5. What is one of the most important factors in deciding on a data interchange syntax?

6. What do quotes in JSON values signify?

7. How does XML deal with data types?

8. Which of the following are valid XML tag structures?

(a) <tag><tag>

(b) <tag ></tag>

(c) <t ag/>

(d) <tag><t ag/>

9. True or false: whitespace is significant in an XML document.

7
Use the following sample database tables for questions 10 – 13

Table 3: dog
name age breed weight
Fido 6 Golden Retriever 60
Maxine 8 Newfoundland 170
Pip 3 Chihuahua 8
Scout 11 Australian Shepherd 55
Max 4 Newfoundland 140

Table 4: breed
breed lifespan max weight
Chihuahua 20 6
Newfoundland 10 150

Table 5: dog showing


name date
Fido Nov 5, 2017
Pip Nov 8, 2017

Use the following SQL statement for questions 10 – 13


SELECT ∗ FROM dog WHERE age < 5 ;

10. Write XML using nested elements to represent the output of the given SQL state-
ment.
11. Write XML using attributes and self-closing elements where possible to represent
the output of the given SQL statement.
12. Write JSON to represent the output of the given SQL statement.
13. Write a query that could be used to generate the results shown in the following
XML snippet.
<o v e r w e i g h t d o g s >
<dog>
<name>Pip</name>
<age >3</age>
<breed>Chihuahua</breed>
<weight >8</weight>
</dog>
<dog>
<name>Maxine</name>
<age >8</age>
<breed>Newfoundland </breed>
<weight >170</weight>
</dog>
</o v e r w e i g h t d o g s >

8
14. What are the tradeoffs involved with using XML attributes? Do the advantages
outweigh the disadvantages for data representation?

9
Answers
1. End tag ambiguities.

2. JSON.

3. 1) Create a connection to the server


2) Build a statement/query tool from that connection
3) Execute that query
4) Commit any changes when done

4. The connection functions as a transaction initialization. The query object is a


reusable object that actually executes the query. Closing the connection (or com-
miting the queries) will finalize the transaction.

5. Character length or parsing speed.

6. String values.

7. It doesn’t natively, but you could embed it as an attribute.

8. (b) and (c)

9. False

10. <dogs>
<dog>
<name>Pip</name>
<age >3</age>
<breed>Chihuahua</breed>
<weight >8</weight>
</dog>
<dog>
<name>Max</name>
<age >4</age>
<breed>Newfoundland </breed>
<weight >140</weight>
</dog>
</dogs>

11. <dogs>
<dog name= ‘ ‘ Pip ’ ’ age = ‘ ‘ 3 ’ ’ breed = ‘ ‘ Chihuahua ’ ’ weight = ‘ ‘ 8 ’ ’ />
<dog name= ‘ ‘Max ’ ’ age = ‘ ‘ 4 ’ ’ breed = ‘ ‘ Newfoundland ’ ’ weight = ‘ ‘ 1 4
</dogs>

12. { ‘ ‘ dogs ’ ’ : [
‘ ‘ dog ’ ’ : {
‘ ‘ name ’ ’ : ‘ ‘ Pip ’ ’ ,
‘ ‘ age ’ ’ : 3,
‘ ‘ breed ’ ’ : ‘ ‘ Chihuahua ’ ’ ,
‘ ‘ weight ’’ : 8

10
},
‘ ‘ dog ’ ’ : {
‘ ‘ name ’ ’ : ‘ ‘Max ’ ’ ,
‘ ‘ age ’ ’ : 4,
‘ ‘ breed ’ ’ : ‘ ‘ Newfoundland ’ ’ ,
‘ ‘ weight ’ ’ : 140
}
]}

13. SELECT ∗
FROM dog , breed
WHERE dog . breed = breed . breed AND weight > max weight ;

14. Using XML attributes can significantly shorten the data representation, but they
cannot be used to create nested data and they cannot contain multiple values. As
modern storage and bandwidth capacities are quite large, the space savings are
not worth the structure limitations. Note that this does not preclude other uses of
attributes, for example to store metadata.

11
References
JDBC: https://docs.oracle.com/cd/E19226-01/820-7688/gawms/index.html
psycopg2: http://initd.org/psycopg/
MySQLdb: https://pypi.python.org/pypi/MySQL-python/1.2.5
cx Oracle: https://oracle.github.io/python-cx_Oracle/index.html
libpqxx: http://pqxx.org/development/libpqxx/
libpx++: http://www.postgresql.org/docs/7.2/static/libpqplusplus.html
SQLAPI++: http://www.sqlapi.com/
XML Attributes: https://www.w3schools.com/xml/xml_attributes.asp
JSON Specification: https://www.json.org More Examples on JSON:
http://json.org/example.html

12

Anda mungkin juga menyukai