Anda di halaman 1dari 48

ADBMS ORAL” QUESTION “UNIT -1”

Q1 -> identify the architecture and explain it ?

Q2 -> identify the architecture and explain it ?


Q3 -> identify the architecture and explain it ?
Q4 -> identify the architecture and explain it ?

Q5 -> what are interconnection networks ?

Q 6 -> what do u mean by shared nothing architecture ?

Q 7 -> what do u mean by shared Hierarchical architecture ?


Q 8 -> what are advantages and disadvantages of shared memory
architecture ?

Q 9 -> what are advantages and disadvantages of shared disk architecture ?

Q 10 -> what are advantages and disadvantages of shared nothing


architecture ?

Q 11 -> what are advantages and disadvantages of hierarchical


architecture ?

12)WHAT IS MEANT BY I/O PARALLELISM?

13)WHAT ARE DIFFERENT PARTITIONING TECHNIQUES?

14)WHAT IS HORIZONTAL PARTITIONING?

15)WHICH PARTITIONING TECHNIQUE RESULTS IN HIGHER THROUGHPUT


WHILE MAINTAINING GOOD RESPONSE TIME ?

16)WHICH PARTITIONING TECHNIQUE IS SUITED FOR POINT QUERIES BUT


NOT FOR RANGE QUERIES?

Q 17 -> how parallel systems improve processing and i/o speed ?

Q18.> list the need of parallel database..

Q19.>list the types of interconnection networks.

Q20 -> what is shared memory architecture?


Q 21 -> what is shared disk architecture ?

Q22- WHAT IS PARALLEL JOIN?

Q23- WHAT ARE THE DIFFERENT WAYS OF PARTITIONING

IN PARTITIONED JOIN?

Q24- DEFINE RANGE PARTITIONING AND HASH PARTITIONING?

Q25- WHEN DO WE USE FRAGMENT AND REPLICATE TECHNIQUE?

Q26- EXPLAIN ASYMMETRIC FRAGMENT AND REPLICATE?

Q 27>What is parallel database system?

Q 28> List the following issues that should be consider whil e designing
parallel database system?

Q29 > Why fault tolerance of system is required?

Q30 >Why system should allow online changes?

Q31 > What is “onlne index creations” ?

32. > what are the type of parallelism in parallel systems ?define them?

33> what are the performance measures of parallel system

34> .describe the speed up measure in parallellism performance measures?

35 > decscribe the scaleup measure of parallellism performance measure

36 >.Define alla interconnection networks in terms of ther scale and


communication capacity

37>.Types of Parallel Sort?

38>.Explain Range Partitioning Sort.

39> .Explain Parallel External Sort – Merge.

41> .What is data parallelism?

42>What is intra operation parallelism?Give example.

43>What is parallel sort?

44>What are the two steps in range parallel sort?


45>Give one method which is alternative to range partitioning.

46>What do u mean by data parallelism?

Q47) WHAT IS INTER-QUERY PARALLELISM?

Q48) WHAT IS THE USE OF INTER-QUERY PARALLELISM?

Q49) DESCRIBE A PROTOCOL FOR SHARED DISK SYSTEM

Q50) "THE SHARED DISK PROTOCOLS CAN BE EXTENDED TO SHARED


NOTHING ARCHITECTURE".EXPLAIN.

Q51) WHY IS INTER-QUERY PARALLELISM IS COMPLICATED IN SHARED DISK OR SHARED


NOTHING ARCHITECTURE?

Topic Name: Distributed Database System

Q1:What do you mean by Distributed Database Management


System?
Ans: A distributed database (DDB) is a collection of multiple,
logically interrelated databases distributed over a computer
network. A distributed database management system
(distributed DBMS) is the software system that permits the
management of the distributed database and makes the
distribution transparent to the users . The term “distributed
database system “ (DDBS) is typically used to refer to the
combination of DDB and the distributed DBMS.

Q2: Which architecture it is?

Ans: Client-Server databases architecture

Q3: What is the difference between distributed file system


and distributed database system?
Distributed file systems simply allow users to access files
that are located on machines other than their own. These
files have no explicit structure (i.e., they are flat) and the
relationships among data in different files are not managed
by the system and are the users responsibility. But in
Distributed Database is organized according to a schema
that defines both the structure of the distributed data, and
the relationships among the data.
Q4:What is the advantages and disadvantages of of
Distributed Database System?
Ans:
Advantages:
 Users can be geographically separate.
This is important for large corporations, where business
decisions must be made by people in different locations, but
those decisions must be based on company-wide data.
 Multiple machines can improve performance and
scalability.
Because a client-server system is distributed over several
machines, you can improve the performance and scalability
in several ways. There might be multiple replicas of a server
running on separate machines, so each handles only a
fraction of the total number of clients.
 Heterogeneous systems can use the best tools for each
task.
Different components of an application can run on hardware
that is optimized for a specific task.
 Distributed systems can reduce maintenance costs.
For example, by upgrading an application image on a single
server, it is possible to upgrade thousands of clients.
Disadvatages:
Software: difficult to develop software for distributed
systems
Network: saturation, lossy transmissions
Security: easy access also applies to secrete data.

Q5: What are the goals of Distributed database system?


Ans: 1)Transperancy:
Access: Hides differences in data representation and
invocation mechanisms
Location :Hides where an object resides
Migration :Hides from an object the ability of a system to
change that object’s location
Relocation :Hides from a client the ability of a system to
change the location of an object to which the client is bound
Replication :Hides the fact that an object or its state may be
replicated and that replicas reside at different locations
2)Openness:
Be able to interact with services from other open systems,
irrespective of the underlying environment:

3)Scalabilty:
Number of users and/or processes(size scalability)
Maximum distance between nodes (geographical scalability)
Number of administrative domains (administrative
scalability)

4)Replication:
Make copies of data available at different machines:
Replicated file servers (mainly for fault tolerance)
Replicated databases
Mirrored Web sites
Large-scale distributed shared memory systems

Topic Name: Distributed Data storage

Q11.Give the different sorting approaches of Distributed


Data storage?
Ans: There are three main sorting approaches of Distributed
Database
i. Replication: The system maintains several identical
replicas of the relation,
and store each replicas at different site.
ii. Fragmentation: The system partition the relation into
several fragments ,and
stores each fragment at a different site.
iii. Transparency: User should not required to know where
the data is physically located or how the data can be
accessed at the specific local site.

Q12.Give the advantages of data replication in distributed


data storage?
Ans :i. Availability : If one site fail then data can be found in
another site so that
system can work continuously
ii. Increased parallelism:

Q13.give the disadvantages of data replication in distributed


data storage?
Ans :Increase overhead on update: If update result at one
site it should agrees in various
Sites.
Q14.What is fragmentation?
Ans: Fragmentation consists of breaking a relation into
smaller relation or fragments
and storing the fragment (instead of relation)possibly at
different sites.

Q15.Distinguish between horizontal and vertical


fragmentation in distributed data storage?
Ans:
Horizontal fragmentation Vertical fragmentation

1.each fragment consist of a 1. each fragment consist of a


subset of rows of the original subset of columns of the
relation original relation
2.horizontal fragment are 2.vertical fragment are
identical by a selection identified by a projection
query. query.

Topic Name:

1.Characteristics of distributed database.

-One of the goals in using distributed database is high


availability;that is,the database must function almost all the
times.
-For the distributed system to be robust,it must detect
failures.

2.What is the difference in function between the coordinator


and its backup?
-The difference is that the backup does not take any action
that affects other sites.

3.What is the function of electron algorithm?

-Electron algorithm enables the sites to choose the site for


the new coordinator in a decenterlized mannner.

4. What are the advantages of coordinator selection?

-Ability to continue processing immediately.


-The backup coordinator approach avoids a substantial
amount of delay while the distributed system recovers from
a coordinate failure.

5.What ae the disadvantages of coordinator selection?

-There is problem of overhead of duplicate execution of the


coordinator's task.
-A coordinator and its backup need to communicate
regularly to ensure that their activities are synchronised.

Topic Name: Distributed Query Processing

Q1: What do you mean by Query Processing ?


Ans: The process by which a declarative query is translated
into low- level data manipulation operations.

Q2:What is objective of query processing in Distributed


Systems?
Ans: Easy retrival of data To ensure the user query, which is
posed as if the database was centralized (i.e. logically
integrated), executes correctly and efficiently over data that
is distributed.

Q3:Discuss various steps in Query processing.


Ans:
Query Parser is parsing and translating a given high-level
language query into its immediate form such as relational
algebra expressions. The parser need to check for the syntax
of the query and also check for the semantic of the query ( it
means verifying the relation names, the attribute names in
the query are the names of relations and attributes in the
database). A parse-tree of the query is constructed and then
translated into relational algebra expression.

Q4: In distributed System ,for choosing a strategy for query


processing which issues must be taken into account:
Ans: The cost of a data transmission over the network .The
data transmission depends upon Speed of disk and type of
network. The potential gain in performance from having
several sites process parts of the query in parallel database

Q5: What are the general approaches to query optimization?


Heuristic- based query optimization:
Given query expression, perform selection and projection as
early as early.
Eliminate duplicate computations
Cost-based query optimization:
Estimate cost of different query expressions using heuristic
and algebra manipulation and choose execution plan with
lowest cost estimation.

Topic Name: DIRECTORY SYSTEM

Q.1 what is Directory?


Ans: A directory is a listing of information about some class
or objects. Directories also used to store other information.
e.g. web browser store personal bookmarks.

Q.2 what is use of directory?


Ans: Directories can be used to find information about
specific object or find objects that meet a certain
requirements. It also store the necessary information.

Q.3 what are the ways for accessing directory information?


Ans: Directory information can be made available through
web interfaces. People can access these directory
information sometimes, programs also access directory
information.

Q.4 what are the reasons for having protocols for accessing
directory information?
Ans:
1. Directory access protocols are simplified and modified
to a limited type of access to data.
2. They can be implemented with database access
protocols.
3. It provides simple mechanism for giving name objects
in a hierarchical fashion.

Q.5 where DAP protocol is used?


Ans: DAP protocol is used in a distributed directory system to
specify what information is stored is each to the directory
servers.

Topic Name: LDAP


1>What is LDAP?
LDAP (Lightweight Directory Access Protocol) for accessing
online directory.LDAP (Lightweight Directory Access
Protocol) is a protocol for communications between LDAP
servers and LDAP clients. LDAP servers store "directories"
which are access by LDAP clients.

2>.Why LDAP is called light weight?


LDAP is called lightweight because it is a smaller and easier
protocol which was derived from the X.500 DAP (Directory
Access Protocol) defined in the OSI network
protocol stack.

3>.What is the use of LDIF?


LDIF is LDAP data interchange format used for storing and
exchanging information

4>.How the communication between LDAP server and client


takes place?
A client starts an LDAP session by connecting to an LDAP
server, called a Directory System Agent (DSA), by default
on TCP port 389. The client then sends an operation request
to the server, and the server sends responses in return. With
some exceptions, the client need not wait for a response
before sending the next request, and the server may send
the responses in any order.

5>.What are the different operation request made by client


when it is connected to ldap server?
The client may request the following operations:
Start TLS — use the LDAPv3 Transport Layer Security (TLS)
extension for a secure connection
Bind — authenticate and specify LDAP protocol version
Search — search for and/or retrieve directory entries
Compare — test if a named entry contains a given attribute
value
Add a new entry
Delete an entry
Modify an entry
Unbind — close the connection (not the inverse of Bind)

6>.Difference bet LDAP and database?


The largest general difference between directories and
databases is complexity. Databases are capable of storing
almost any arbitrary set of information and can can be
greatly customized for a specific purpose. They also provide
a complex query interface, allowing for flexible searches
returning customized results.
Directories, on the other hand, tend to have very specific
implementations that follow a strict pattern or schema. This
allows them to be extremely fast, and allows for easy
organization and comprehension of the data they store.

7>.What is DIT?
Directories are viewed as a tree, like a computer's file
system. This overall tree structure is called theDirectory
Information Tree (DIT)

8>.What are the Object? What are the different objects of


DIT?
Each entry in a directory is called an object. These objects
are of two types, containers and leafs. A container is like a
folder: it contains other containers or leafs. A leaf is simply
an object at the end of a tree. A tree cannot contain any
arbitrary set of containers and leafs. It must match the
schema defined for the directory.

9>.What are the applications of LDAP?


Internet Application:
Centralize or Distributed White pages
ISP online subscriber directory
Intranet Application:
Internal White pages
Certification and CRL distribution
System/Network management database
10>.What are the Content of LDAP query?
Base : a node within the DIT by giving distinguish name
Search condition: combination of Boolean condition on
individual attributes
Scope :It can be the just the base or base and its children or
the entire sub tree of base
Attributes: Name of attributes which is to be return
Limits on number of results nad resources consumption

Topic Name: Commit Protocol

1:What are the types of Commit Protocol?


Ans: Two-phase commit protocol(2PC),Three phase commit
protocol(3PC)

2: Explain two phase commit protocol.


Ans: When transaction T completes its execution that is
when all the sites at which T has executed inform
transaction coordinator Ci that T has completed Ci starts the
2PC protocol.

3: What is the main disadvantage of the 2-phase commit


protocol?
Ans: Coordinator failure may result in blocking where a
decision either to commit or to abort transaction may have
to be postponed until Ci recovers.

4:Explain three phase commit protocol.


Ans: This protocol avoids the blocking of problem under
certain assumption that no network partition occurs and not
more than n sites fail. Where n is predetermined number.
Under these assumptions the protocol avoids blocking by
introducing an extra third phase where multiple sites are
involved in the decision to commit.

5: What are the assumption in 3 phase commit & its


disadvantage?
Ans : Assumptions 1) there is no network partition occurs,
and not more than k sites fail, where k is predetermined
number. By this assumption the protocol avoids blocking by
introducing an extra third phase where multiple sites are
involved in the decision to commit. The protocol has to be
carefully implemented to ensure that network partitioning
does not result in inconsistencies, where a transaction is
committed in one partition, and aborted in one another so
that 3PC protocol is not widely used.

Topic Name: Distributed Directory Trees

1) Why directory object is used?


Ans:- The directory object is used to store and retrieve
information about objects.

2) Why naming tree is called the Directory


Information Tree(DIT)?
Ans:- As the directory entry is associated with each vertex of
this tree, where the entry holds information about the object
having the corresponding names.

3) What is Directory System Agent(DSA)?


Ans:- A system that maintaines and communicates directory
information is called as Directory System Agent.

4) What is Relative Distinguished Name(RDN)?


Ans:- The name component added as we move one step
down the naming tree is called the Relative Distinguished
Name for the corresponding entry.

5)All directory information will be part of one "global


directory". true or false?
Ans:- True. All directory information will be part of one
"global directory. Global in the sense that is world wide, and
global in the sense it will be common for all directory uses.

6)How an object is represented?


Ans:- An object is represented by an X.500 directory always
has so-calles distinguished name structured.
Q: Draw LDAP architecture.

ADBMS ORAL QUESTIONS

TOPIC - CLIENT SERVER ARCHITECTURE (UNIT III).

1. WHAT DO U MEAN BY CLIENT SERVER ARCHITECTURE ?

ANS: CLIENT / SERVER ARCHITECTURE DESCRIBES THE


RELATIONSHIP BETWEEN THE TWO COMPUTER PROGRAMS WHERE
ONE PROGRAM THE CLIENT MAKES THE REQUEST FROM ANOTHER
PROGRAM THE SERVER WHICH FULFILLS THE REQUEST.

2. WHAT ARE THE DIFFERENT TYPES OF CLIENT SERVER


ARCHITECTURES?

ANS: 1. MAINFRAME ARCHITECTURE

2. FILE SHARING ARCHITECTURE

3. SINGLE TIER ARCHITECTURE


4. TWO TIER ARCHITECTURE

5. THREE TIER ARCHITECTURE.

3. WHAT IS MAINFRAME ARCHITECTURE?

ANS: WITH MAINFRAME SOFTWARE ARCHITECTURE, ALL


INTELLIGENGE IS WITHIN CENTRAL HOST COMPUTER. USERS
INTERACT WITH HOST THROUGH A TERMINAL THAT CAPTURE
KEYSTROKES AND SENDS INFORMATION TO THE HOST.

4. WHAT IS LIMITATION OF MAINFRAME ARCHITECTURE?


ANS: LIMITATION OF MAINFRAME ARCH. IS THAT THEY DO NOT
EASILY SUPPORT GRAPHICAL USER INTERFACES (GUI) OR ACCESS
MULTIPLE DATABASES.

5. WHAT IS ADVANTAGE OF MAINFRAME ARCHITECTURE?

ANS: ADVANTAGE OF MAINFRAME IS THAT IT IS NOT TIED TO THE


HARDWARE PLATFORM. USER CAN INTERACT THROUGH PCs AND
UNIX WORKSTATIONS.

6) What are advantage and disadvantage of Client Server


architechcture?

Ans : Advantage

1. Processing of entire database system is spread over client server


architechcture.

2. It is possible to keep control over all clients through single


system in DBMS it is require for transaction control and
management.

Disadvantage :
1. Implementation is more complex because it includes network
management.

2. Additional burden on DBMS server to handle concurrency.

7) What are basic components of Database system architechcture?


What is 1-tier architechcture?

Ans: There are 3 basic components of Database system


architechcture.

1. Presentation Logic: User Interface, displaying data to the


user, accepting input from user.

2. Business Logic: Data validation, ensuring the data is 100%


correct before adding it to database.

3. Data Access Logic: Database communication, accessing


tables and indices, packing and unpacking data.

1-tier architechcture: In this architechcture all three


components of the application are handled in single layer.

8) What is 2-tier architechcture?

Ans: 2-tier architechcture: In this architechcture all three


components of the application are distributed in two layers.

1st Layer/ Primary Layer: consists of Presentation logic AND


Business logic.

2nd Layer/Secondary Layer: consists of Data Access logic.

This consists of primary tier which incorporates all presentation and


business logic, and a secondary tier which contains all data access
logic.

9) What are its limitations of 2-tier architechcture?

Ans: Limitations:

1. Implementing business logic in stored procedure can limit the


scalability.

2. This architechcture is not effective in batch processing.


3. It limits interoperability by using stored procedure to implement
complex processing logic because stored procedures are
implemented by DBMS’s proprietary language.

4. This architecture is difficult to administer and maintain because


when application reside on client every upgrade must be
delivered and tested on server.

10) Identify which client server (tier) architechcture is shown below,


and explain its components?

Ans: 3-tier architechcture.

Topic: Web Fundamentals

11. What is the web based systems?

Ans: To access data from databases some application programs


are required. Now a day’s internet is most popular so web tools are
most widely used for user interface. Web based systems are the
systems used on internet for purpose of information exchange using
user interface. Web based systems are the heart of E-commerce.

12. What is need of web interface to database?

Ans: With growth of info services and E-commerce on web,


databases used for information services, DSS, and transaction
processing must be linked with web. For such connection it requires
some bridge to link your application to database which is the web
interface.

13. Which are the web fundamental components?


Ans: URL, HTML, Client side scripting, Applet, Servlets, server side
scripting.

14. What is Servlet?

Ans: In 2-tier client server architechcture the application runs as


part of web server. So in order to complete the user request
architechcture has to load the java program with web server, that
function is provided by Servlet.

Servlets are mainly present on server side which defines the


communication between the web server and application program.

15. What is web server?

Ans: Web server is the program running on server side which


accepts the requests from the web browser and sends back result in
the form of HML document.

16. What is web Applet?

Ans: Java code can be compiled into byte code which is platform
independent and can be executed on any browser. Java applets are
used for better GUI purpose which is downloaded as part of web
page.

17. What is HTML?

Ans: HTML is hypertext markup language allows formatting on text.


HTML is used for designing web pages. HTML allows to display
tables, forms, style sheet as well as other display attributes.

18. What is URL?

Ans: URL is uniform resource locator. URL field in web browser


allows user to enter the web address or web site name after that it
connects to that system using some protocols.

Parts of URL are,

HTTP:is hypertext transfer protocol.

Web address: name/ip address of machine that has web server.


Topic: XML Domain Specific DTDs

19. What is XML?


Ans : XML (eXtensible Markup Language) mainly intended for
Document Management. It is derived from SGML (Standard
Generalized Markup Language) .XML can represent Data ,as well as
many other kinds of structured data used in applications.

20. What is Standard Query Language for XML Called?


Ans: W3C (World Wide Web Consortium) standard query language
for XML is called XQUERY.

21. What is XPATH ?


Ans : It is a language for path expression. It is a sequence of
locations steps separated by “/”.Using XPATH we can select data from
XML document.

22. What is XSLT?


Ans : It is a transformation language can generate XML output.

23. Which is current approved version of XML?


Ans: XML version = “1.0”

24. What is a reference?


Ans: A references allows you to additional text or mark-up in an XML
document. References always begin with “ & ” and end with “ ; ” .

25. What is an entity reference?


Ans : An entity Reference ,like “ &amp ;” contains a name (eg amp)
between the start and end delimiter. The name refers to predefined
text or markup like a macro in C/C++ prog. Languages.
26. What is Character reference?
Ans : A character references ,like “ & ” contains a hash mark
followed by the number. The number always refers to the Unicode
code for a single character such as 65 for A.

27. What is FLOWR expression?


Ans: A FLOWR expression is a query construct composed of
FOR,LET,WHERE,Order by AND RETURN clauses .

28. What is MATHML?


Ans: MATHML is intended to facilitate the use and reuse of
mathematical and scientific content on the web and for other
applications such as computer algebra systems ,print typesetting and
voice synthesis.

SOAP

29. What is SOAP?

Ans-It is Simple Object Access Protocol.It is a Protocol Specification for


exchanging Structured information in the implementation of web
services in computer Networks.It is a XML based messaging Protocol.

30. SOAP relies on which language as its message format?

Ans-XML

31. What are common protocols SOAP relies on?

Ans-It relies on Application layer


Protocols(SMTP,HTTP,HTTPS,RPC,etc.).But the most commonly used
protocols are RPC and HTTP.

32. Why XML was chosen as the standard message format for
SOAP?
Ans-Because of its widespread use by major corporations and open
source development efforts.

Hardware appliances are available to accelerate processing of XML


messages.

33. Advantages of SOAP?

Ans-

i)Using SOAP over HTTP allows for easier communication through


proxies and firewalls than previous remote execution technology.

ii)Versatile enough to allow for the use of different transport protocols.

iii)platform independent

iv)language independent

34. Disadvantages of SOAP?

Ans-

i)Because of verbase XML format, SOAP can be considerably


slower.This may not be an issue when only small messages are sent.

ii)Although SOAP is an open standard, not all languages offer


appropriate support. Java,Delphi, .NET and Flex offer excellent SOAP
integration and/or IDE support.Python and PHP support is much
weaker.

35. State one competing middleware technology with SOAP.

Ans-CORBA(Common Object Request Broker Architecture).

36. What does SOAP specification contains?


Ans-Soap is a specification for using XML documents as messages.The
SOAP specification contains-

i)A syntax for defining messages as XML documents(SOAP messages).

ii)A modelfor exchanging SOAP messages.

iii)Set of rules for representing data within SOAP messages(SOAP


encoding).

iv)A guideline for transporting SOAP message over HTTP.

v)A convention for performing RPC using SOAP messages.

37. Where will you place SOAP in 2-tier and 3-tier Architectures.

Ans-

2-tier -> The Presentation layer and business logic are in a single
layer.

3-tier -> At middle layer.

38. What is Active Server Pages?

Active Server Pages (ASPs) are Web pages that contain server-side
scripts in addition to the usual mixture of text and HTML tags.

Server-side scripts are special commands you put in Web pages that
are processed before the pages are sent from the server to the

web-browser of someone who's visiting your website.

39. what are the Requirements to run ASP?


Since the server must do additional processing on the ASP scripts, it
must have the ability to do so.

The only servers which support this facility are Microsoft Internet
Information Services & Microsoft Personal Web Server.

Let us look at both in detail, so that you can decide which one is most
suitable for you.

40. what are the few basic rules for XML document elements.?

1:Element names can contain letters, numbers, hyphens, underscores,


periods, and colons when namespaces are used (more on namespaces
later).

2:Element names cannot contain spaces; underscores are usually used


to replace spaces.

3: Element names can start with a letter, underscore, or colon, but


cannot start with other non-alphabetic characters or a number, or the
letters xml.

41. what are formats to represent the XML Elements?

Elements look like this and always have an opening and closing tag:

<element></element>

42. what are the Internet Information Services?

This is Microsoft’s web server designed for the Windows NT platform.

It can only run on Microsoft Windows NT 4.0, Windows 2000


Professional, & Windows 2000 Server.

The current version is 5.0, and it ships as a part of the Windows 2000
operating system.
43. what is DTD?

Document Type Definition (DTD) is the original way to validate XML


document

structure and enforce specific formatting of select text, and probably


still the most

prevalent. Although the posting of the XML declaration at the top of


the DTD would lead one to believe that this is an XML document,

DTDs are in fact non-well-formed XML documents. This is because


they follow DTD syntax rules rather than XML document syntax.

In following line the reference is to the DTD located in the first element
under the XML document declaration:

44. what are XML Attributes?

Attributes contain values that are associated with an element and are
always part

of an element’s opening tag:

<element attribute=”value”></element>.

The attribute name must follow an element name, then an equals sign
(=),then the attribute value, in single or double quotes.

The attribute value can contain quotes, and if it does, one type of
quote must be used in the value, and another around the value.

45. State True or False . Is it necessary to close a tag in XML?

Ans : True.(Explanation : Each tag is delimited by angle bracket)

46 . What is difference between HTML and ASP?


Ans :

In HTML we cant make changes dynamically in web pages. But in ASP we


can make changes dynamically.

47. Define the following : i) Thin client ii) Thick client

Ans :

Thin client : The architecture in which the client implement GUI, and the
server implements both business logic and data management ,such
clients are called as thin client.

Thick client: Clients that implement user interface and a part of business
logic, with remaining part being implemented at the server level, such
clients are called as Thick client.

48. Identify the following Diagram .


Ans : Thick Client.

49. Explain the following Diagram.

50. Explain – entity references.


Ans :

It’s a standard XML document – the root element is the Envelope, which
has a namespace called ‘s’. The envelope contains header, which indicates of
transaction and the ID, and a Body, which indicates requested service.

1 what is .DIFF BETWEEN OLTP N OLAP(3 PTS)

A=

OLTP==> SIMPLE TANSACRION READ/WRITE QUERY SMALL DB

OLAP==> COMPLEX TRANS. READ QUERY HUGE DB

2. WHAT ARE 3 TIERS IN DATA WAREHOUSE ARCH.

A=

BOTTOM TIER-DATA MARTS,DATA WAREHOUSE,METADATA REPOSITORY

MIDDLE TIER-OLAP SERVER

TOP TIER- DATA MINING TOOLS,ANALYSIS TOOLS ETC

3.WHAT ARE ADVANTAGE. OF DATA WAREHOUSE

A=

MARKETING WEAPON
CUSTOMER SUPPORT

BUSINESS INTELLIGENCE

DECISION SUPPORT

4.WHAT IS APPLICATION. OF DATA WAREHOUSE

A=

INFORMATION PROCESSING

ANALYTICAL PROCESSING

DATA MINING

5.WHAT IS USE OF META DATA REPOSITORY

A=

STORES DATA ABOUT DATA MARTS AND DEFINITIONS

OF DATA WARE HOUSE

6.TYPES OF OLAP SERVERS

A=

ROLAP--RELATIONAL DBMS

MOLAP--MULTI-DIMENSIONAL VIEWS

HOLAP--COMBINES ROLAP N MOLAP

7.WTA IS DATA MART

A=

CONTAINS SUBSET OF CORPORATE WIDE DATA THAT IS OF

VALUE TO SPECIFIC GROUP OF USERS


8.WHAT ARE TYPES OF DATA WAREHOUSE DESIGN

A=

TOP DOWN--STARTS WITH OVERALL DESIGH N PLANNING

BOTTOM UP--STARTS WITH EXPERIMENTS N PROTOTYPES

COMBINED--COMBINES BOTH OF ABOVE

9.USE OF DATA PREPROCESSING

A=

IMPROVE QUALITY OF DATA

IMPROVE QUALITY OF MINING RESULT

IMPROVE EFFICIENCY N EASE OF MINING

10.DATA PRE PROCESSING TECH.

A=

DATA CLEANING

DATA INTEGRATION

DATA TRANSACTION

DATA REDUCTION

Q1. What is Data Warehouse?What are the features of data Warehouse?

Q2.what are the steps in the design and construction data warehouse?what
are it's components?
Q3.Explain the three-tier data Warehouse architecture

Q4.What are the applications of Data Warehouse?

Q5.What are the differences between data warehouse and data marts?

1. What is Online Analytical processing?(OLAP)

 OLAP is an interactive system that permits an anlyst to view different


summarizes of multidimensional data.OLAP tools support interactive analysis
of summery information.

2. Explain In Brief OLAP Implementation.

 OLAP is implemented on multidimensional models.In MOLAP


servers,data warehousesdirectly store multidimensional data in special
data structures(eg,arrays) and implement the OLAP operations over
this special data structure.

3. What Is Relational OLAP(ROLAP) System?

 Special schema design:star,snowflake.

Special indexes:bitmap,multi-table join

Special tuning: maximize query throughput.

Proven tech. tend to outperform specialized MDDB especially on large


data sets.

4. What Is Hybrid OLAP(HOLAP)Systems?


 A system which stores some summaries in memory and store,the base
data and other summaries in relational database are called HOLAP

5. Give OLAP Component Of SQL

 Extended aggregation->

1999 standard define a rich set of aggregation function .The new aggression
functions on single attributes are standard deviation and variance.1999 also
supports new class of binary aggregate function,which compute stastical result on
pair of attributes,they include correlation,covariance and regression curves which
give a line approximating.

Q1. What is a data cube ?

----- Data cube is used to represent data along some measure of interest.

Although called a “data cube” ,it can be 2-dimensional, 3-dimensional ,

or higher dimensional.

Q2. What are the various operations on data cube.

----- Summerization or rollup

Drill down

Iceberg-Cubes

Q3. What is a cross tab ?

----- Cross tab is a table where values for one attribute (say A) form the row
headers, values for the another attribute (say B) form the column header,
and each cell is identified by (ai, bj) where ai is value for A & bj is value for B.
Q4. What are the different data preprocessing techniques ?

----- Data Integration

Data Cleaning

Data normalization

Data reduction.

Q5. What are the problems with data ?

----- Missing attributes and missing attribute values

Improper types

Q6. What is the need for data preprocessing ?

---- Data quality is a key issue with data mining

To increase the accuracy of the mining, we hav to perform data


preprocessing

Other-wise Garbage in => Garbage out.

• What is semantic integration?


Ans:- a coolection of viewsto give a group of users a uniform
presentation of relevant data from multiple databases is called
semantic integration.

• what is data integration?

Ans:- consolidate different source into one repository usually data


warehousing(schema reconsolidation).

a) using metadata.
b) correlation analysis.

• what is different stratergies of reduction?

Ans:-

1) data cube aggregation.


2) attribute subset selection.
3) dimensionally reduction.
4) numerosity reduction.
5) concept hierarchy generation.

• what is data cleaning?

Ans:-Real world data tend to be incomplete, noisy,inconsistent,to fill in


missing values,smooth out noise and correct inconsistencies in the data.

• which methods to used for data cleaning?

Ans:-

1) look for missing values.


2) Ignore the toples.
3) Fill missing values manually.
4) Use global constant to fill in missing values.
5) Use most probable valuetofill in missing values.

Q:What all sub-processes are genarally involved in Data


Transformation?

Ans:Smoothing,Aggregations,Generalization,Normalization,Attribute
construction.
Q:Name the different strategies for data reduction?

Ans:Data cube aggreation,Attribute subset selection,Dimensionality


reduction,Numerosity reduction,Discretization and concept hierarchy
generation.

Q:What is the use of data reduction?

Ans:To obtain a reduced representation of data set that is much


smaller in volume,yet closely maintains the integrity of the original data.

Q:What is the aim behind data transformation?

Ans:To transform or consolidate into forms appropiate for mining

Q:What is mean by smoothing?

Ans:Removing noise from the data is called smoothing.

Q:compare r-olap & m-olap.

Q:name any two operations on data cube that you have performed in your
practical.

Q:what is hybrid Olap?give its benifits


Q:explain data cleaning

A:real-world data tend to be incomplete,noisy and inconsistent.Data cleaning


routines attempt to fill in missing values,smooth out noise and correct
inconsistencies in the data.

1. Fill in missing values (attribute or class value):

* Ignore the tuple: usually done when class label is missing.

* Use the attribute mean (or majority nominal value) to fill in the
missing value.

* Use the attribute mean (or majority nominal value) for all samples
belonging to the same class.

* Predict the missing value by using a learning algorithm: consider the


attribute with the missing value as a dependent (class) variable and run a
learning algorithm (usually Bayes or decision tree) to predict the missing
value.

2. Identify outliers and smooth out noisy data:

* Binning

o Sort the attribute values and partition them into bins (see
"Unsupervised discretization" below);

o Then smooth by bin means, bin median, or bin boundaries.

* Clustering: group values in clusters and then detect and remove


outliers (automatic or manual)

* Regression: smooth by fitting the data into regression functions.

3. Correct inconsistent data: use domain knowledge or expert decision.

Q:explain data transformation


A:In data transformation,the data is transformed or consolidated into forms
appropriate for mining.Data trnsformaion involve the following:

1. Normalization:

* Scaling attribute values to fall within a specified range.

o Example: to transform V in [min, max] to V' in [0,1], apply V'=(V-


Min)/(Max-Min)

* Scaling by using mean and standard deviation (useful when min and
max are unknown or when there are outliers): V'=(V-Mean)/StDev

2. Aggregation: moving up in the concept hierarchy on numeric attributes.

3. Generalization: moving up in the concept hierarchy on nominal


attributes.

4. Attribute construction: replacing or adding new attributes inferred by


existing attributes.

Q:Explain data reduction

A:data reduction techniques can be applied to obtain a reduced


representation of the data set that is much smallerin volume,yet closely
maintains the intregrity of theoriginal data.that is mining on the reduced data
set shuld be more efficient yet produce the same analyical result.

1. Reducing the number of attributes

* Data cube aggregation: applying roll-up, slice or dice operations.

* Removing irrelevant attributes: attribute selection (filtering and


wrapper methods), searching the attribute space (see Lecture 5: Attribute-
oriented analysis).

* Principle component analysis (numeric attributes only): searching for


a lower dimensional space that can best represent the data..
2. Reducing the number of attribute values

* Binning (histograms): reducing the number of attributes by grouping


them into intervals (bins).

* Clustering: grouping values in clusters.

* Aggregation or generalization

3. Reducing the number of tuples

* Sampling

Q 1>what is the difference between OLTP query and OLAP query ?

Ans=>OLTP query: 1.used to modify data.

2. Require fully updated database.

OLAP query: 1.doesn’t modify data.

2. Doesn’t require fully updated database.

Q 2>what is OLAP?

Ans=>It is a online analytical processing.

Q 3>what is OLTP?

Ans=> It is a online transaction processing .OLTP requires that the data are
completely

Up to date

Q 4> what are the operations OLAP tool supports?

Ans=> supports: 1 slice operation

2 Dice operation
3 Roll up operation

4. Drill down

5. Visualization operation

Q 5>what are the different kinds of OLAP tool used?

Ans=> ROLAP, MOLAP, HOLAP

Q1:What is noisy data?

Ans: noise is random error or variance in a measured variable.so,it is


necessary to smooth out the data to remove the noise.

Q2:What is data Integration?

Ans: Data mining often requires data integration which combines data from
multiple sources into coherent data store.

Q3:How to transform data?

Ans:data are transformed or consolidated into forms appropriate for


mining.methods are:

Smoothing

Aggregation

Generalization

Normalization

Attribute construction

Q4:what are back end tools?

Ans:data extraction

Data cleaning

Data transformation
Load

Refresh

Q5:what are data cube measures?

Ans:data cube measure is numerical function that can be evaluated at each


point in data cube space.

Unit 6

1. Which are different techniques of document indexing ?

2. Compare data retrieval and information retrieval ?

3. What are inverted index and signature file ?

4. Note on Indexing of documents

5. Why Effective index structure is important ?

6.What is web crawler?


Ans: Web crawler are programs that locate and gather
information on web.
They recursively follow hyperlinks present in known
documents to find other documents. A crawler retrieves the
document and adds info. Found the documents to a
combined index; the document is generally not stored,
although some search engines do cache a copy of the
document to give clients a faster access.

7. Describe Web search Engine.


Ans: Since the number of documents on theWeb is very
large, it is not possible to crawl the whole Web in a short
period of time; and in fact, all search engines cover only
some portions of theWeb, not all of it, and their crawlers may
take weeks or months to perform a single crawl of all the
pages they cover. There are usually many processes,
running on multiple machines, involved in crawling. A
database stores a set of links (or sites) to be crawled; it
assigns links from this set to each crawler process. New links
found during a crawl are added to the database, and may be
crawled later if they are not crawled immediately. Pages
found during a crawl are also handed over to an indexing
system, which may be running on a different machine. Pages
have to be refetched (that is, links recrawled) periodically to
obtain updated information, and to discard sites that no
longer exist, so that the information in the search index is
kept reasonably up to date.
The indexing system itself runs on multiple machines in
parallel. It is not a good
idea to add pages to the same index that is being used for
queries, since doing so
would require concurrency control on the index, and affect
query and update performance. Instead, one copy of the
index is used to answer queries while another copy is
updated with newly crawled pages. At periodic intervals the
copies switch over, with the old one being updated while the
new copy is being used for queries.

8. What is question answering system ?


Ans: Question answering systems attempt to provide direct
answers to questions posed by users. They are targeted at
info on web typically generated one or more keyword
queries from a submitted question, execute the keyword
queries on against web search engines, and parsed returned
documents that answer the question.

9. Describe distinct ways a user can find information on the


web?
Ans: 1) Information Extraction.
2) Querying Structured data
3) Question Answering.

10)What do Web Search engines do ? Describe in one line.


Ans: Web search engines crawl the web to find pages,
analyze them to compute prestige measures, and index
them.

Q1:-WHAT IS SYNONYM?
ANS1:-synonym means the words having the same meaning
but different representation

Q2:-WHAT IS HOMONYMS?
ANS2:-Homonym means the words having the same
pronounciation but bifferent meanings

Q3:-WHAT IS ONTOLOGIES?
ANS3:-It is the process to overcome the limitation of keyword
based search

Q4:-EXPLAIN SYNONYM WITH THE HELP OF EXAMPLE?


ANS4:-Synonym is the collections of the words having the
same meaning but different representation
for eg."motorcycle repair" = "motorcycle
representation"etc.

Q5:-EXPLAIN HOMONYM WITH THE HELP OF EXAMPLE?


ANS5:-homonym is the collections of the words having the
different meaning but same pronounciation
for eg. "hair" and "hare"etc.

Q.1 How is relevance ranking calculated using TF?


A. We use the frequency of occurance ( that is how many
times that particular term has occurred ) of the term in the
document as a measure of its relevance.
One way of measuring TF (d,t) i.e. Term Frequency or the
relevance of the document to a term t is
TF(d,t) = log (1+ n(d,t)/n(d))

Q.2What is the use of Information Retrieval System?


A. Information Retrieval System is intended to support
people who are actively seeking or searching for information,
as in internet searching. Information Retrieval typically
assumes a static or relatively static database, against which
people search.

Q.3 Explain Simillarity based Retireval System.


A. Simillarty based Retrieval relies on best match rather
than exact match and uses techniques to compute the
similarities between the query and information items. As the
user information needs are also fuzzy, an important
characteristeic for this class or Retrieval Technique is its
support for the iterative process of retrieval.

Q.4 What is cosine Simillarity?

Q.5 What is the use of similarity based Retrieval System?

Q.1. What is the difference between Information Retrieval


and Data Retrieval?
A.
1. Data Retrieval System gives an exact match of the
search elements, whereas, Information Retrieval System
gives partial or best match results.
2. Query language in Data Retrieval System is Artficial,
whereas, Natural language is used in Information Retrieval
System.
3. Complete Query specification is required in Date
Retrieval System, whereas, partial Query specification works
in Information Retrieval System.

Q.2. Explain the components of information Retrieval


System.
A.
The typical components of Information Retrieval System
are :
1. Input
2. Processor
3. Output
Q.3. what is Relevance? How is it calculated?
A. Relevance can be calculated as the cosine between the
two vectors, i.e. their cross product divided by the square
roots of the squares of each vector. This measure varies
between 0 and 1.

Q.4. what is TF-IDF? (Term Frequency – Inverse Document


Frequency )
A. A measure of the frequency of occurrence of a
particular term in a particular document as well as how often
that term occurs in the entire collection of interest.

Q.5. How is TF – IDF used? What is the need?


A. If a term occurs frequently in one document but also
occurs frequently in every other document in the collection
then it is not a very important t word and the TF-IDF
measure reduces the weight placed on it. A common term is
considered less important than the rare terms.
If a term occurs in every document then the inverse
document frequency is zero./ If it occurs in half of the
documents, it will be 0.3, and if it occurs in 20 of 10000
documents, it will be 2.6

Q.6. Illustrate the components of Information Retrieval


System using Diagram.

Q.7. Information Retrieval System is best match or partial


match, whereas, data Retrieval System is exact match.
Expand.