Anda di halaman 1dari 24

PIMPRI CHINCHWAD COLLEGE OF ENGINEERING

NIGDI, PUNE-411044

ADVANCED DATABASES LABORATORY

CLASS BE COMP
(2010 - 2011)
Semester - I

LAB MANUAL

Subject In Charge
Mr. Prashant G. Ahire

HOD
Prof. A. M. Kurkure

LIST OF ASSIGNMENTS
SR.
NO

TITLE OF THE EXPERIMENT

PAGE
NO

Study of SQL Commands

Implementation of Object Oriented features in Oracle 9i

Implementation of DataMining algorithm for Decision Tree


classification.

Implementation of Apriori Algorithm

12

Key word Search

Relevance Ranking

17

Case Study
a) Open Source MYSQL
b) Oracle

19

To create a XML document with instances of the elements defined &


create XML schema with CSS style sheet.

23

Implementation of a Web Based Online System

27

10

To build Data Cubes and OLAP Analysis using MS SQL Server 2000.

30

11

To set up and configure an LDAP server on LINUX loading of data


and search LDA database for email address.

33

15

ASSIGNMENT NO: 1
PROBLEM STATEMENT:
To study the SQL Commands and appreciate the application of them.
OBJECTIVE:
Study of SQL commands
Understand the difference between DDL,DML,DCL.
Develop a model database using the various SQL commands.
THEORY:
SQL:

Structured Query Language ( SQL ) is a language that


provides an interface to relational database system.
SQL was implemented by IBM in 1970s.
SQL encompasses Data Manipulation Language ( DML ) for
insert, delete, update like operations and Data Definition
Language ( DDL ) used for creating & modifying tables.
It is a command language for communication with oracle 9i
server from any tool or application.

Features of SQL:

SQL is a non-procedural language.


SQL can be used by a range of users including those with little
or no programming experience.
SQL reduces amount of time required for creating and
maintaining system

Components of SQL:
(i)

DDL
A set of commands used to create, delete database structure and not
data.These are used by DBA as database designer.
Eg : create : To create object in database
drop : To delete object in database
(ii)
DML
It allows changing data within database
Eg : insert : inserts data into a table
update : updates existing data in table
(iii)
DCL ( Data Control Language )
It allows controlled access to data base
Eg: commit : Saves work done
rollback : restores database to original place since the last
commit.
Example :
Employee, Payroll tables
3

Student, Staff details


Bank, Customer tables etc
Input : One master and one child database table with constraints set
Output : As per the choice of the user a data base is chosen. New tuples are added,
modified, sorted. Old records are deleted. Columns are added and removed. The
table is renamed nd printed.
CONCLUSION :
Thus the SQL commands are used successfully to create a database.

ASSIGNMENT NO: 2
PROBLEM STATEMENT:
To study Object Oriented features in Oracle 9i, and implement them to create a
database.
OBJECTIVE:
Study of Object Oriented features in Oracle 9i.
Understand the need of Object Oriented features.
Implementation of Object Oriented features by creating a database.
THEORY:

Oracle Corporation , after the release of object oriented languages like C++,
extended their Oracle with object oriented features for their customers.
This includes features like creating a table as an object, including methods
for it, invoking them and using inheritance

Table 1:Categories of DBMSs


File Systems

RDBMSs

Simple data without Simple data with


queries
queries

OODBMSs

ORDBMSs

Complex data without Complex data with


queries
queries

Table 2: A Comparison of Database Management Systems


Criteria

RDBMS

ODBMS

ORDBMS

Defining standard

SQL2

ODMG-2.0

SQL3 (in process)

Support for objectoriented features

Does not support; It


Limited support;
is difficult to map
Supports extensively mostly to new data
program object to the
types
database

Usage

Easy to use

Support for complex Does not support


relationships
abstract datatypes

OK for programmers; Easy to use except for


some SQL access for some extensions
end users
Supports a wide
variety of datatypes
and data with
complex interrelationships

Supports Abstract
datatypes and
complex relationships

Relatively less
performance

Expected to perform
very well

Performance

Very good
performance

Product maturity

This concept is few


Relatively old and so
years old and so
very mature
relatively mature

Still in development
stage so immature.
5

The use of SQL

Extensive supports
SQL

OQL is similar to
SQL, but with
additional features
like Complex objects
and object-oriented
features.

SQL3 is being
developed with OO
features incorporated
in it

Advantages

Its dependence on
SQL, relatively
simple query
optimization hence
good performance

It can handle all types


of complex
applications,
reusability of code,
less coding

Ability to query
complex applications
and ability to handle
large and complex
applications

Disadvantages

Low performance due


to complex query
Inability to handle
Low performance in
optimization, inability
complex applications
web applications
to support large-scale
systems

It is considered to be
highly successful so
the market size is
Support from vendors
very large but many
vendors are moving
towards ORDBMS

Presently lacking
vendor support due to
vast size of RDBMS
market

All major RDBMS


vendors are after this
so has very good
future

INPUT:
A simple object relational data base is created. Then it may be used a member in an
another database. Values are inserted, ,index is created, member functions are created , a
table value is inherited into the other.
OUTPUT:
The object oriented relational database table after all the above specified operations
is displayed.
CONCLUSION
Thus the Object relational database is successful created.

ASSIGNMENT NO: 3
PROBLEM STATEMENT:
To design and implement a Data mining Algorithm for decision tree classification.
OBJECTIVE:
Study of Data mining.
Study of classification methods.
Implementation of Datamining algorithm for decision tree classification
THEORY:
Data mining (knowledge discovery from data) :
Extraction of interesting (non-trivial, implicit, previously unknown
and potentially useful) patterns or knowledge from huge amount of
data
Different Types of Data :

Relational database
Data warehouse
Transactional database
Advanced database and information repository
Object-relational database
Spatial and temporal data
Time-series data
Stream data
Multimedia database
Heterogeneous and legacy database
Text databases & WWW

Classification :

Supervised Learning Technique.


Given a collection of records (training set )
Each record contains a set of attributes, one of the attributes is the
class.
Find a model for class attribute as a function of the values of other
attributes.
Goal: previously unseen records should be assigned a class as accurately as
possible.
A test set is used to determine the accuracy of the model. Usually, the
given data set is divided into training and test sets, with training set
used to build the model and test set used to validate it.
7

Decision Tree :
It is flow chart like tree structure. The decision tree algorithm helps in
choosing the root for classification. The attribute with the highest gain is chosen as
the root. Information gain is the expected information to classify a given sample.
Gain(A) = I (s1 , s2 ,.. sm) E(A)
E(A) = s ij ++smj I (s ij .. smj)

I(s1,s2,sm) = Pi log 2 ( Pi )

Pi = S i

S
where E(A) is the Entropy, I is the Information gain and Pi is the probability that a
sample belongs to class Ci .

INPUT :
A Database is created with various attributes.
OUTPUT:
The decision tree classification gives the best root for classification of the database.
CONCLUSION
Thus the decision tree classification datamining algorithm is successfully
implemented.

ASSIGNMENT NO: 4
PROBLEM STATEMENT:
To design and implement Apriori Algorithm for market basket analysis.
OBJECTIVE:

Study of Market basket Analysis


Study oApriori Algorithm
Study of Support and Confidence measures
Implementation of Datamining algorithm for decision tree classification

THEORY:
Market Basket Analysis :

A large set of items, e.g., things sold in a supermarket.


A large set of baskets, each of which is a small set of the items, e.g., the
things one customer buys on one day.

Support, Confidence and Frequent Item Sets:

Support for itemset I = the number of baskets containing all items in


I.
Support ( A=> B) = P(AUB)

Confidence (A=>B) =
P(B/A)
=
Support of(AUB)
_______________
Support(A)
u Given a support threshold s, sets of items that appear in > s baskets are
called frequent itemsets.
u Example :
Items={milk, coke, pepsi, beer, juice}.
Support = 3 baskets.
B1 = {m, c, b}
B2 = {m, p, j}
B3 = {m, b}
B4 = {c, j}
B5 = {m, p, b}
B6 = {m, c, b, j}
B7 = {c, b, j}
B8 = {b, c}
Frequent itemsets: {m}, {c}, {b}, {j},

{m, b}, {c, b}, {j, c}.

Applications :
9

Real market baskets: chain stores keep terabytes of information about


what customers buy together
Baskets = documents; items = words in those documents.
Lets us find words that appear together unusually frequently
Baskets = sentences, items = documents containing those sentences.
Items that appear together too often could represent plagiarism
Apriori Algorithm:
The Apriori algorithm finds the frequent sets L In Database D.

Find frequent set Lk 1.


Join Step.
o Ck is generated by joining Lk 1with itself
Prune Step.
o Any (k 1) -itemset that is not frequent cannot be a subset of a
frequent k -itemset, hence should be removed.

where

(Ck: Candidate itemset of size k)


(Lk: frequent itemset of size k)

INPUT :
A Super Market Database is collected with various attributes like customer id, date
of purchase, item id purchased, age of customer, income .
OUTPUT:
The apriori algorithm gives the best associatively of products according to the age
and income.
CONCLUSION
Thus the Apriori Algorithm in Datamining is successfully implemented for Market
Basket Analysis..

10

ASSIGNMENT NO: 5
PROBLEM STATEMENT:
To search a keyword (k) from a given document (d).
OBJECTIVE:
Study about Information Retrieval

THEORY:
In full text retrieval, all the words in each document are considered to be
keywords. We use the word term to refer to the words in a document
Information-retrieval systems typically allow query expressions formed using
keywords and the logical connectives and, or, and not
Ands are implicit, even if not explicitly specified
Synonyms
E.g. document: motorcycle repair, query: motorcycle maintenance
need to realize that maintenance and repair are synonyms
System can extend query as motorcycle and (repair or maintenance

PROCEDURE :
In full Text retrieval, all the words in each document are considered to be
keywords. Given a search keyword as a Query, it is searches and all the occurrences
are printed on the screen. The synonyms are also printed.
INPUT :
Data

OUTPUT:
Data occurred 5 times in the document.

CONCLUSION
Thus the Apriori Algorithm in Datamining is successfully implemented for Market
Basket Analysis..
11

ASSIGNMENT NO: 6
PROBLEM STATEMENT:
To find the relevance of a document d to a set of terms Q
OBJECTIVE:
Study about Information Retrieval
Study about Relevance Ranking.
THEORY:

Ranking of documents on the basis of estimated relevance to a query is


critical
Relevance ranking is based on factors such as
Term frequency
Frequency of occurrence of query keyword in document
Inverse document frequency
How many documents the query keyword occurs in
Fewer give more importance to keyword
n TF-IDF (Term frequency/Inverse Document frequency) ranking:
Let n(d) = number of terms in the document d
n(d, t) = number of occurrences of term t in the document d.
Relevance of a document d to a term t

TF (d, t) = log TF (d, t) = log n(d, t)

n(d)
The log factor is to avoid excessive weight to frequent terms

Relevance of document to query Q


r (d, Q) =

TF (d, t)
tQ n(t)

where n(t) denotes the no. of documents.

INPUT :
Data Information
OUTPUT:
The contents of the text file that has the best relevance is displayes.
12

CONCLUSION
Thus the relevance ranking in Information retrieval is successfully implemented.
ASSIGNMENT NO: 7
PROBLEM STATEMENT:
Case Study Oracle
Case Study - MYSQL
OBJECTIVE:
Study of Oracle
Study of MYSQL
Develop a model database using the various MYSQL commands.
THEORY:
ORACLE:
Oracle is the first commercial relational database product to reach the
market. Since then Oracle has grown beyond DB Server. In addition to the tools
directly related to DB management, it provides business intelligence tools, query and
analysis tools, datamining products and an application server.
I ) Database Design Tools:
Most of the Oracles design tools are included in Oracle Internet
Development Suit. The components are :
1) ORACLE DESIGNER:
2) APPLICATION DEVELOPER
3) WAREHOUSE BUILDER
II ) Database Querying Tools:
Oracle provides tool for adhoc querying, report generation and data analysis
using OLAP. This tools is a web based tool called Oracle Discover.
Oracle express is multi Dimensional dbase server. It supports a wide
variety of analytical queries as well as forecasting, modeling of management
scenarios.
III ) Storage and Indices:
Database contains one or more logical storage units called table spaces. Each
space consists of one or more physical structure called data files.
Indices used :
B Tree index
Function based index
Bit map index
Join index
13

Domain index
MYSQL:

MY SQL is a multi threaded, multi user SQL database management


system which is a basic program that runs as a server.
MySQL is an open source relational database management system
(RDBMS)that uses Structured Query Language (SQL), the most
popular language for adding, accessing, and processing data in a
database.

HISTORY:
MYSQL was first released internally on May 23,1995.
Windows version was released on June8,1998
Version 3.23 beta from June 2000
Version 5.1 currently from Nov 2005.
Uses:

MySQL is popular for web applications and acts as the database component
of
the
LAMP,
BAMP,
MAMP,
and
WAMP
platforms
(Linux/BSD/Mac/Windows-Apache-MySQL-PHP/Perl/Python), and for
open-source bug tracking tools like Bugzilla.
Its popularity for use with web applications is closely tied to the popularity of
PHP and Ruby on Rails, which are often combined with MySQL.

PHP and MySQL are essential components for running popular content
management systems such as Expression Engine, Drupal, e107, Joomla!,
WordPress and some BitTorrent trackers.

Wikipedia runs on MediaWiki software, which is written in PHP and uses a


MySQL database.

Several high-traffic web sites use MySQL for its data storage and logging of
user data, including Flickr, Facebook,[6][7] Wikipedia and YouTube

Example :

Employee, Payroll tables


Student, Staff details
Bank, Customer tables etc

Input : One master and one child database table with constraints set
Output : As per the choice of the user a data base is chosen. New tuples are added,
modified, sorted. Old records are deleted. Columns are added and removed. The
table is renamed nd printed.
CONCLUSION :
14

Thus Oracle and MYSQL databases are learnt.


The MYSQL commands are used successfully to create a database.

ASSIGNMENT NO: 8
PROBLEM STATEMENT:
To create a XML document with instance of the elements defined and create XML
schema & CSS style sheet for the same.
OBJECTIVE:
Study of XML
Study of CSS
Develop a XML document
THEORY:
XML:

The Extensible Markup Language (XML) is a general-purpose specification


for creating custom markup languages.
It is classified as an extensible language, because it allows the user to define
the mark-up elements.

XML's purpose is aiding information systems share structured data,


especially via the Internet, to encode documents, and to serialize data; in the
last context, it compares with text-based serialization languages such as
JSON and YAML.

XML began as a simplified subset of the Standard Generalized Markup


Language (SGML), meant to be readable by people via semantic constraints;
application languages can be implemented in XML.

These include XHTML, RSS, MathML, GraphML, Scalable Vector


Graphics, MusicXML, and others. Moreover, XML is sometimes used as the
specification language for such application languages.

XML is recommended by the World Wide Web Consortium (W3C). It is a


fee-free open standard. The recommendation specifies lexical grammar and
parsing requirements.
15

Well-formed and Valid XML documents :An XML document has two correctness levels:

Well-formed. A well-formed document conforms to the XML syntax rules;


e.g. if a start-tag (< >) appears without a corresponding end-tag (</>), it is
not well-formed. A document not well-formed is not in XML; a conforming
parser is disallowed from processing it.
Valid. A valid document additionally conforms to semantic rules, either userdefined or in an XML schema, especially DTD; e.g. if a document contains an
undefined element, then it is not valid; a validating parser is disallowed from
processing it.

Syntax :
All XML Elements Must Have a Closing Tag
In HTML, you will often see elements that don't have a closing tag:
<p>This is a paragraph
<p>This is another paragraph
In XML, it is illegal to omit the closing tag. All elements must have a closing tag:
<p>This is a paragraph</p>
<p>This is another paragraph</p>
Note: You might have noticed from the previous example that the XML declaration did
not have a closing tag. This is not an error. The declaration is not a part of the XML
document itself, and it has no closing tag.

XML Tags are Case Sensitive


XML elements are defined using XML tags.
XML tags are case sensitive. With XML, the tag <Letter> is different from the tag
<letter>.
Opening and closing tags must be written with the same case:
<Message>This is incorrect</message>
<message>This is correct</message>
Note: "Opening and closing tags" are often referred to as "Start and end tags". Use
whatever you prefer. It is exactly the same thing.

16

XML Elements Must be Properly Nested


In HTML, you will often see improperly nested elements:
<b><i>This text is bold and italic</b></i>
In XML, all elements must be properly nested within each other:
<b><i>This text is bold and italic</i></b>
In the example above, "Properly nested" simply means that since the <i> element is
opened inside the <b> element, it must be closed inside the <b> element.

XML Documents Must Have a Root Element


XML documents must contain one element that is the parent of all other elements. This
element is called the root element.
<root>
<child>
<subchild>.....</subchild>
</child>
</root>

XML Attribute Values Must be Quoted


XML elements can have attributes in name/value pairs just like in HTML.
In XML the attribute value must always be quoted. Study the two XML documents
below. The first one is incorrect, the second is correct:
<note date=12/11/2007>
<to>Tove</to>
<from>Jani</from>
</note>
<note date="12/11/2007">
<to>Tove</to>
<from>Jani</from>
</note>
Example Website Development :
Shopping Carts, College Websites, Railway reservation
Output : As per the choice of the user a website is created.
17

CONCLUSION :

A XML document is created successfully according to the users choice.

ASSIGNMENT NO: 9
PROBLEM STATEMENT:
To implement a Web Based System using ASP.
OBJECTIVE:
Study of ASP
Develop a Asp web based system
THEORY:
ASP:

Active Server Pages (ASP) is Microsoft's first server-side script engine for
dynamically-generated web pages.
It was initially marketed as an add-on to Internet Information Services (IIS)
via the Windows NT 4.0 Option Pack, but has been included as a free
component of Windows Server since the initial release of Windows 2000
Server.
Most ASP pages are written in VBScript, but any other Active Scripting
engine can be selected.

Versions:ASP has gone through three major releases:

ASP version 1.0 (distributed with IIS 3.0) in December 1996


ASP version 2.0 (distributed with IIS 4.0) in September 1997
ASP version 3.0 (distributed with IIS 5.0) in November 2000

ASP 3.0 is currently available in IIS 6.0 on Windows Server 2003 and IIS 7.0 on
Windows Server 2008.
18

Example :
The default scripting language (in classic ASP) is VBScript:
1. <html>
2. <body>
3. <% Response.Write "Hello World!" %>
4. </body>
5. </html>
Or in a simpler format
1. <html>
2. <body>
3. <%= "Hello World!" %>
4. </body>
5. </html>
The examples above print "Hello World!" into the body of an HTML document.
Connecting to an Access Database:1. <%
2.
Set oConn = Server.CreateObject("ADODB.Connection")
3.
oConn.Open "DRIVER={Microsoft Access Driver (*.mdb)}; DBQ=" &
Server.MapPath("DB.mdb")
4.
Set rsUsers = Server.CreateObject("ADODB.Recordset")
5.
rsUsers.Open "SELECT * FROM Users", oConn
6. %>

Output:
A website according to the users interest is created.
Conclusion:
Thus a website is successful designed using ASP.

19

ASSIGNMENT NO: 10
PROBLEM STATEMENT:
Building Cubes & OLAP analysis for food mart database using MS-SQL server
2000.
OBJECTIVE:
Study of OLAP.
Study of Data Cubes and operations
Develop a data cube and perform operations like slicing, dicing, roll up, roll
down, pivoting .
THEORY:
OLAP :
OLAP is Online Analytical Processing.
Data ware house provides OLAP tools for interactive analysis of
multidimensional data of various granularities, which facilitates
effective data generalization and data mining.
Many datamining functions like association, prediction and
cl;ustering can be imimplemented with OLAP operations
Data ware house systems serve users in the role of data analysis and
decision making . Such systems can organize and present data in
various formats in order to accommodate diverse needs of different
users. These systems are known as OLAP systems
Multidimensional Database Schemas:1) STAR SCHEMA :
a. The most common modeling paradigm is star schema in which data
warehouse contains a large central table containing bulk of data with
no redundancy.
b. A set of smaller attendant tables (dimensional) of one or more for each
dimension.
20

c. The schema graph resembles a star bust, with dimension table


displayed in radial pattern around central fact table.
2) SNOWFLAKE SCHEMA :
a. This schema is a variant of star schema model where some dimension
tables are normalized thereby further splitting the data into
additional tables.
b. The resulting schema graph forms a shape similar to snowflake..

OLAP OPERATIONS:1) ROLL UP : The roll up operation performs aggregation of data cube, either by
climbing up a concept hierarchy for a dimension reduction.
When roll up is performed by dimension reduction, one or more
dimensions are removed from given cube.
2) DRILL DOWN: Drill down is reverse of roll up.
it navigates from less detailed data to more dedicated data.
3) SLICE & DICE: The slice operation performs selection on one dimension of given
cube, resulting in sub cube .
The dice operation defines a sub cube by performing a selection
on 2 or more dimension.
4) PIVOT: Pivot is a visualization operation that installs data axis in view in
order to provide on alternative representation of data.
Output:Various operations are performed on the food mart Cubes & OLAP analysis
for food mart database using MS-SQL server 2000 is done.
Conclusion::Thus using MS SQL Server 2000 slicing, dicing, pivoting, drill down and roll
up are successfully performed.

21

ASSIGNMENT NO: 11
PROBLEM STATEMENT:
Set up and configuration of an LDAP of data server on LINUX. The data has to be
loaded and searched for. The Database has E-Mail address of persons.
OBJECTIVE:
Study of directory systems.
Study of LDAP
Develop a data base with Email address and perform search.
THEORY:
DIRECTORY SYSTEMS :
A directory is a listing of information about some class of objects such
as persons.
Directories can be used to find information about specific object or in
reverse direction to find objects that meet a certain requirement.
DIRECTORY ACCESS PROTOCOLS:
is a computer networking standard promulgated by ITU-T and ISO in 1988
for accessing an X.500 directory service. DAP was intended to be used by
client computer systems, but was not popular as there were few
implementations of the full OSI protocol stack for desktop computers
available to be run on the hardware and operating systems typical of that
time.
The basic operations of DAP: Bind, Read, List, Search, Compare, Modify, Add,
Delete and ModifyRDN, were adapted for the Novell Directory Services
(NDS) and the Internet Lightweight Directory Access Protocol
LDAP:
22

LDAP was designed at the University of Michigan to adapt a complex


enterprise directory system (called X.500) to the modern Internet.
The Lightweight Directory Access Protocol, or LDAP (is an application
protocol for querying and modifying directory services running over TCP/IP.
[1]

A directory is a set of objects with similar attributes organised in a logical


and hierarchical manner. The most common example is the telephone
directory, which consists of a series of names (either of persons or
organizations) organized alphabetically, with each name having an address
and phone number attached.
An LDAP directory tree often reflects various political, geographic, and/or
organizational boundaries, depending on the model chosen.
LDAP deployments today tend to use Domain name system (DNS) names for
structuring the topmost levels of the hierarchy.

Deeper inside the directory might appear entries representing people,


organizational units, printers, documents, groups of people or anything else
that represents a given tree entry (or multiple entries).

ADVANTAGES OF LDAP
Centralized or Distributed white pages.
ISP Online subscriber directory.
INTERNET APPLICATION
White pages, certificate distribution
System n/k management database.
Conclusion::Thus Set up and configuration of an LDAP of data server on LINUX. The data has
to be loaded and searched for. The Database has E-Mail address of persons.

23

24