Anda di halaman 1dari 39

1

UNIT 1
1.what are the different distributed dbms issues related to query processing.

1.convert user transaction to data manipulation instruction.


2.optimization problem.
3.min{cost=data transmission+local processing}
4.genral formulation is NP-hard.

2.explain data logical multi dbms distributed architecture.

1.Distributed DBMS Architectures

DDBMS architectures are generally developed depending on three parameters −

 Distribution − It states the physical distribution of data across the different


sites.
 Autonomy − It indicates the distribution of control of the database system
and the degree to which each constituent DBMS can operate
independently.
 Heterogeneity − It refers to the uniformity or dissimilarity of the data
models, system components and databases.

2.Multi - DBMS Architectures.

 Multi-database View Level − Depicts multiple user views comprising of


subsets of the integrated distributed database.
 Multi-database Conceptual Level − Depicts integrated multi-database that
comprises of global logical multi-database structure definitions.
 Multi-database Internal Level − Depicts the data distribution across
different sites and multi-database to local data mapping.
 Local database View Level − Depicts public view of local data.
 Local database Conceptual Level − Depicts local data organization at each
site.
 Local database Internal Level − Depicts physical data organization at each
site.

1
2

3. explain fully and partially replication of relation?

1. Full Replication

In full replication scheme, the database is available to almost every location or


user in communication network.

Advantages of full replication

 High availability of data, as database is available to almost every location.


 Faster execution of queries.

Disadvantages of full replication

 Concurrency control is difficult to achieve in full replication.


 Update operation is slower.

2
3

3. Partial replication

Partial replication means only some fragments are replicated from the
database.

Advantages of partial replication


The number of replicas created for fragments depend upon the importance
of data in that fragment.

3
4

4. short note on distributed query processing methodology ?

1.Query Decomposition –
Decomposing a high level query (relational calculus) into an algebraic query
(relational algebra) on global relations

2.Data Localization-
Input: Algebraic query on distributed relations
 Purpose: ∗ Apply data distribution information to the algebra operations and
determine which fragments are involved
∗ Substitute global query with queries on fragments
∗ Optimize the global query

3.Global Query Optimization -


Input: Fragment query
Which relation to ship where?
Ship-whole vs. ship-as-needed
➠ Decide on the use of semi joins

4. Local Optimization-
Input: Best global execution schedule • Use the centralized optimization
techniques

4
5

5.define distributed database and give advantages of distributed database.

a distributed database is collection of multiple,logically interrelated database


distributed over computer network.
Advantages-
1.increased reliability and availability
2.local control
3.faster response
4.robust
5.compiled with ACID property
6.modular groth

6.write a shote note on …..

1.Location transparency

Location is the middle level of distribution transparency. With location


transparency, the user must know how the data has been fragmented but still
does not have to know the location of the data.

2.Fragmentation transparency

Fragmentation is the highest level of distribution transparency. If fragmentation


transparency is provided by the DDBMS, then the user does not need to know
that the data is fragmented, As a result, database accesses are based on the
global schema,. so the user does not need to specify fragment names or data
locations.

3. NETWORK TRANSPARENCY

Network transparency is basically one of the properties of distributed database.


According to this property, a distributed database must be network transparent.
Network transparency means that a user must be unaware about the operational
details of the network.

5
6

7. state the importance of query optimization in dbms.

Importance: The goal of query optimization is to reduce the system resources


required to fulfill a query, and ultimately provide the user with the correct result
set faster.

1.First, it provides the user with faster results, which makes the application seem
faster to the user.

2.Secondly, it allows the system to service more queries in the same amount of
time, because each request takes less time than unoptimized queries.

3.Thirdly, query optimization ultimately reduces the amount of wear on the


hardware (e.g. disk drives), and allows the server to run more efficiently (e.g.
lower power consumption, less memory usage).

6
7

8. “distributed query processing is more complex”-justify the statement with


example.

Distributed query processing is more complex

–Fragmentation/replication of relations
–Additional communication costs
–Parallel execution

7
8

UNIT 2
1.explain the need for commit protocol in distrubuted DBMS.describe two
phase commit protocol.

Commit Protocols

Commit protocols are used to ensure atomicity across sites

1.a transaction which executes at multiple sites must either be committed at all
the sites, or aborted at all the sites.

2.not acceptable to have a transaction committed at one site and aborted at


another

3.The two-phase commit (2 PC ) protocol is widely used

2PC has two phases voting and commit phase.

1.voting phase
1)Coordinator sends a Prepare message along with the transaction to all
participants and asks each one of them to cast their vote for commit or abort.

2)If participant can commit the transaction Vote commit is send to the
coordinator and if participant cannot commit Vote abort is send to the
coordinator.

2.Commit phase
3)Decision for commit or abort is taken by the coordinator in this phase. If Vote
commit is received from all the participants then Global commit is send to all the
participants and if at least one Vote abort is received then coordinator send
Global abort to all those voted for commit.

4)Coordinator ask for acknowledgement (Ack) from participants .If a participant


receives Global commit, it commit the transaction and Ack is send to the
coordinator .In case participant receives Global abort it abort the transaction.
-

8
9

2.explain distributed deadlock management.

Deadlock is a permanent phenomenon. If one exists in a system, it will not go


away unless outside intervention takes place. This outside interference may come
fron the user, the system operator, or the software system.

A deadlock can occur because transactions wait for one another and occurs when
the WFG contains a cycle.

For a deadlock to occur, each of the following four conditions must hold.

i. Mutual Exclusion

ii. Hold and wait

iii. No preemption

iv. Circular wait

WFG(Wait for graph)

A useful tool in analyzing deadlocks is a wait for graph. A WFG is a directed graph
that represents wait for relationship among transactions. The node of this graph
represents the concurrent transactions in the system. A edge Ti | Tj exists in the
WFG if transaction Ti is waiting for Tj to release a lock on some entry.

In distributed systems, then it is not sufficient that each local distributed DBMS
form a local wait-for graph at each site; it is also necessary to form a global wait
for graph which is the union of all the LWFG.

Methods of handling Deadlock

There are three known methods for handling deadlocks: prevention, avoidance
and detection and resolution.

9
10

3.define transaction.state and explain the states of transaction. consider the


following transaction it represented T into formal representation

Read(x)
Read(y)
x<-x+y
Write(x)
commit

Definition-
A transaction is a single logical unit of work which accesses and possibly modifies
the contents of a database. Transactions access data using read and write
operations.

States of Transaction
A transaction must be in one of the following states:

 Active: the initial state, the transaction stays in this state while it is
executing.
 Partially committed: after the final statement has been executed.
 Failed: when the normal execution can no longer proceed.
 Aborted: after the transaction has been rolled back and the database has
been restored to its state prior to the start of the transaction.
 Committed: after successful completion.

10
11

4. what is isolation

Transaction isolation is an important part of any transactional system. It deals


with consistency and completeness of data retrieved by queries unaffecting a user
data by other user actions. A database acquires locks on data to maintain a high
level of isolation.

5.write a short note on

1.in-place update
2.out-place update

11
12

6. state and explain Parallel Database Architecture

1.Shared memory system

 Shared memory system uses multiple processors which is attached to a


global shared memory via intercommunication channel or communication
bus.
 Shared memory system have large amount of cache memory at each
processors, so referencing of the shared memory is avoided.
 If a processor performs a write operation to memory location, the data
should be updated or removed from that location.

12
13

2.Shared Disk System

 Shared disk system uses multiple processors which are accessible to


multiple disks via intercommunication channel and every processor has
local memory.
 Each processor has its own memory so the data sharing is efficient.
 The system built around this system are called as clusters.

13
14

3.Shared nothing disk system

 Each processor in the shared nothing system has its own local memory and
local disk.
 Processors can communicate with each other through intercommunication
channel.
 Any processor can act as a server to serve the data which is stored on local
disk.

14
15

7. why ACK message ARE required in 2PC ?

Because The log record describing a message is forced to stable storage before the message is sent

15
16

8. describe and differentiate with example pipelined parallelism and data


partitioned parallelism.Discuss round robin and hash techniques for partitioning
the date

Partitioning is to sub divide the transactions to improve performance.


- Increasing the number of partitions enables Informatica Server for creation of
multiple connections to various sources.

- The following are the partitions

1.Round-Robin Partitioning:
- Data is distributed evenly by Informatica among all partitions.
- This partitioning is used where the number of rows to process in each partition
are approximately same

2.Hash Portioning:
- Informatica server applies a hash function for the purpose of partitioning keys to
group data among partitions.
- It is used where ensuring the processes groups of rows with the same
partitioning key in the same partition, need to be ensured.

16
17

9.define complete schedule. Concurrency Control | Types of Schedules

1. Serial Schedules –
Schedules in which the transactions are executed non-interleaved, i.e., a serial
schedule is one in which no transaction starts until a running transaction has
ended are called serial schedules.

Example: Consider the following schedule involving two transactions T1 and T2.

T1 T2
R(A)
W(A)
R(B)
W(B)
R(A)
R(B)

where R(A) denotes that a read operation is performed on some data item ‘A’
This is a serial schedule since the transactions perform serially in the order T1 —>
T2

2. Complete Schedules –

Schedules in which the last operation of each transaction is either abort (or)
commit are called complete schedules.

Example: Consider the following schedule involving three transactions T1, T2 and
T3.

T1 T2 T3
R(A)
W(A)
R(B)
W(B)

17
18

T1 T2 T3
commit
commit
abort

This is a complete schedule since the last operation performed under every
transaction is either “commit” or “abort”.

3. Recoverable Schedules –
Schedules in which transactions commit only after all transactions whose changes
they read commit are called recoverable schedules. In other words, if some
transaction Tj is reading value updated or written by some other transaction Ti,
then the commit of Tj must occur after the commit of Ti.

Example – Consider the following schedule involving two transactions T1 and T2.

T1 T2
R(A)
W(A)
W(A)
R(A)
Commit
commit

This is a recoverable schedule since T1 commits before T2, that makes the value
read by T2 correct.

18
19

10.explain centralized and linear two phase commit protocol in distributed


database.

1.The Linear Two-Phase Commit Protocol:

In the linear 2PC, as we can see in Figures 3 and 4 , subordinates can


communicate with each other. The sites are labeled 1 to N, where the coordinator
is numbered as site 1. Tthe propagation of the PREPARE message is done serially,
this means that the time required to complete the transaction is longer than the
time required in centralized or distributed methods. At the end, node N is the one
that issues the Global COMMIT. The two phases are discussed below:

First Phase: (Figure 3) The coordinator sends a PREPARE message to participant 2.


If participant 2 is not willing to COMMIT, then it sends a VOTE ABORT (VA) to
participant 3 and the transaction is aborted at this point. If participant 2, is willing
to commit, it sends a VOTE COMMIT (VC) to participant 3 and enters a READY
state. Then participant 3 sends its vote till node N is reached and issues its vote.

19
20

Second Phase: (Figure 4) Node N issues either a GLOBAL ABORT (GA) or a GLOBAL
COMMIT (GC) and sends it to node N-1. Then node N-1 will enter an ABORT or
COMMIT state. Node N-1 will send the GA or GC to node N-2 until the final vote
to commit or abort reaches the coordinator node.

20
21

2. The Centralized Two-Phase Commit Protocol:

1. First Phase: (in Figure 1) In this phase, when a user wants to COMMIT a
transaction, the coordinator issues a PREPARE message to all the subordinates,
[21]. When a subordinate receives the PREPARE message, it writes a PREPARE log
and, if that subordinate is willing to COMMIT, sends a YES VOTE, and enters the
PREPARED state; or, it writes an abort record and, if that subordinate is not willing
to COMMIT, sends a NO VOTE. A subordinate sending a NO VOTE doesn’t need to
enter a PREPARED state since it knows that the coordinator will issue an abort. In
this case, the NO VOTE acts like a veto in the sense that only one NO VOTE is
needed to abort the transaction. The following two rules apply to the
coordinator’s decision, [7]:

a. If even one participant votes to abort the transaction, the coordinator has to
reach a global abort decision.

b. If all the participants vote to COMMIT, the coordinator has to reach a global
COMMIT decision.

21
22

2. Second Phase: (in Figure 2) After the coordinator gets a vote, it has to relay this
vote to the subordinates. If the decision is COMMIT, then the coordinator moves
into the committing state and sends a COMMIT message to all the subordinates
informing them of the COMMIT. When the subordinates receive the COMMIT
message, they move to the committing state and send an acknowledge (ACK)
message to the coordinator. When the coordinator receives the ACK messages, it
ends the transaction. If, on the other hand, the coordinator reaches an ABORT
decision, it sends an ABORT message to all the subordinates. Here, the
coordinator doesn’t need to send an ABORT message to the subordinate(s) that
gave a NO VOTE.

22
23

11.explain wound-die and wound-wait in distributed deadlock management.

There are two algorithms for this purpose, namely wait-die and wound-wait. Let
us assume that there are two transactions, T1 and T2, where T1 tries to lock a
data item which is already locked by T2. The algorithms are as follows −

 Wait-Die − If T1 is older than T2, T1 is allowed to wait. Otherwise, if T1 is


younger than T2, T1 is aborted and later restarted.
 Wound-Wait − If T1 is older than T2, T2 is aborted and later restarted.
Otherwise, if T1 is younger than T2, T1 is allowed to wait.

23
24

Unit 3
1.what are difference between transient and persistent objects. How persistence
is handled In Object Oriented (OO) database systems?

Transient Objects are temporary in nature. Object which is permanent in nature


till the end of the program or remain active until the accidental termination of the
program is called as a persistent object. On the otherhand rest nature of the
object type is called an transient object

1.Transient object:They can't be serialized, its value is not persistent and stored
in heap
2.Persistent Object:They can be serialized, its value is persistent as name implies
and stored in memory

A superclass PersistentObject encapsulates the mechanisms for an object of any


class to store itself in, or retrieve itself from a database. This superclass
implements operations to get an object by object identifier, store, delete and
update objects and to iterate through a set of objects (write and read operations).
Each persistent class could be responsible for its own storage
For each business class that needs to be persistent, there will be an associated
database brokerclass.
The broker class provides the mechanisms to materialize objects from the
database and dematerialize them back to the database.

24
25

2. how time is represented in temporal database. compare the different time


dimension in temporal database.

temporal databases store temporal data, i.e. data that is time dependent
(timevarying). Typical temporal database scenarios and applications include time-
dependent/time-varying economic data, such as:

 Share prices
 Exchange rates
 Interest rates
 Company profits

There are three different forms of time dimensions: user-defined time, valid time,
and transaction time.

User-defined time is a time representation designed to meet specific needs of


users.

valid time concerns the time when an event is true in the real world.

Transaction time concerns the time when an event was present in the database as
stored data.

25
26

3. what do you mean by time granularities with example

One of the problems with the definition of an HMM or a dynamic belief network
is that the model depends on the time granularity.

The time granularity can either be fixed, for example each day or each thirtieth of
a second, or it can be event-based, where a time step exists when something
interesting occurs.

If the time granularity were to change, for example from daily to hourly, the
conditional probabilities must be changed.

One way to model the dynamics independently of the time granularity is to


model, for each variable and each value for the variable,

 how long the variable is expected to keep that value and


 what value it will transition to when its value changes.

4.what is object identifier ? state the characterstics of object identifier.

An object identifier (OID) is a string, of decimal numbers, that uniquely identifies


an object. These objects are typically an object class or an attribute.

If you do not have an OID, you can specify the object class or attribute name
appended with -oid. For example, if you create the attribute tempID, you can
specify the OID as tempID-oid.

An object is described by four characteristics

1.Identifier: a system-wide unique id for an object

2.Name: an object may also have a unique name in DB (optional)

3.Lifetime: determines if the object is persistent or transient

4.Structure: Construction of objects using type constructors

26
27

5.short note on logical data models

Logical data models add further information to the conceptual model elements. It defines the
structure of the data elements and set the relationships between them.

The advantage of the Logical data model is to provide a foundation to form the base for the
Physical model. However, the modeling structure remains generic.

At this Data Modeling level, no primary or secondary key is defined. At this Data modeling
level, you need to verify and adjust the connector details that were set earlier for relationships.

6. explain temporal database with the help of following points:

1.time ontology 2. Granularity

27
28

7. compare RDBMS WITH ORDBMS

RDBMS(relational database) ORDBMS(OBJECT RELATIONAL


DATABASE )
Relational Database Management Relational Database Management
Systems Systems
Based on Relational Data Model Based on Relational Data Model
Dominant model Gaining popularity
Supports Structured Query Language Supports Structured Query Language
( SQL ) ( SQL )

Supports Standard data types and Supports standard data types and new
additional data types richer data types.

8. abstract data type in ORDBMS


An abstract data type (ADT) is a user-defined data type which can encapsulate a
range of data values and functions. The functions can be both defined on, and
operate on the set of values.
In ORDBMS, users are permitted to create new data types of their own on their
interest. This is considered as one of the key features of ORDBMS. While you
create a new data type, you have to define the functions or operations that you
can perform on the new data type. For example, if you create a new data type for
storing Address, then you may add some functions to extract street name alone,
to find similar addresses etc.
The combination of new user defined data type and the operations defined on
that type is called Abstract Data Type (ADT)
ADT = User defined data type + Operations defined on the type

28
29

UNIT 4
1.diffrence between active and passive database

The active database is the one that is currently being used by the clients that have
mailboxes in that database. All the transactions for that database are being
generated by the server it's on.

The passive database is not being used by clients, or generating transactions. It is


simply applying a copy of transaction logs it got from the active database server
to it's copy of the database to keep it up to date.

2. difference between structured semi structured and unstructured data in xml

1.Structured Data

For geeks and developpers (not the same things ^^) Structured data is very banal.
It concerns all data which can be stored in database SQL in table with rows and
columns. They have relationnal key and can be easily mapped into pre-designed
fields. Today, those datas are the most processed in development and the
simpliest way to manage informations.

But structured datas represent only 5 to 10% of all informatics datas. So let’s
introduce semi structured data.

2.Semi structured data

Semi-structured data is information that doesn’t reside in a relational database


but that does have some organizational properties that make it easier to analyze.
With some process you can store them in relation database (it could be very hard
for somme kind of semi structured data), but the semi structure exist to ease
space, clarity or compute…

Examples of semi-structured : CSV but XML and JSON documents are semi
structured documents, NoSQL databases are considered as semi structured.

29
30

But as Structured data, semi structured data represents a few parts of data (5 to
10%) so the last data type is the strong one : unstructured data.

3.Unstructured data

Unstructured data represent around 80% of data. It often include text and
multimedia content. Examples include e-mail messages, word processing
documents, videos, photos, audio files, presentations, webpages and many other
kinds of business documents. Note that while these sorts of files may have an
internal structure, they are still considered « unstructured » because the data
they contain doesn’t fit neatly in a database.

Unstructured data is everywhere. In fact, most individuals and organizations


conduct their lives around unstructured data. Just as with structured data,
unstructured data is either machine generated or human generated.

30
31

3.define the least fixed point and fixed point

the least fixed point (lfp or LFP, sometimes also smallest fixed point) of a
function from a partially ordered set to itself is the fixed point which is less than
each other fixed point, according to the set's order. A function need not have a
least fixed point, and cannot have more than one.

a fixed point (sometimes shortened to fixpoint, also known as an invariant point)


of a function is an element of the function's domain that is mapped to itself by
the function. That is to say, c is a fixed point of the function f(x) if f(c) = c. This
means f(f(...f(c)...)) = fn(c) = c, an important terminating consideration when
recursively computing f. A set of fixed points is sometimes called a fixed set.

4. define XML scheme. difference between xml schema and xml dtd

An XML schema is the structural layout of an XML document, expressed in terms


of constraints and contents of the document. Constraints are expressed using a
combination of the following:

 Grammatical rules governing the order of elements


 Data types governing an element and content attribute
 Boolean predicates that the content has to satisfy
 Specialized rules including uniqueness and referential integrity constraints

Difference-1. XML Schema is namespace aware, while DTD is not.

2. XML Schemas are written in XML, while DTDs are not.

3. XML Schema is strongly typed, while DTD is not.

4. XML Schema has a wealth of derived and built-in data types that are not
available in DTD.

5. XML Schema does not allow inline definitions, while DTD does.

31
32

5.describe various application ares of active database.

32
33

6.what are well formed and valid xml documents

7. what is stratified program

stratified programming allows the developer to build and execute software at


various levels of abstraction, each level corresponding to a program stratum that
provides a specific degree of functionality.

33
34

8. Event condition action (ECA) rules in active database.

Event condition action (ECA) is a short-cut for referring to the structure of active
rules in event driven architecture and active database systems.

Such a rule traditionally consisted of three parts:

 The event part specifies the signal that triggers the invocation of the rule
 The condition part is a logical test that, if satisfied or evaluates to true,
causes the action to be carried out
 The action part consists of updates or invocations on the local data

This structure was used by the early research in active databases which started to
use the term ECA.

34
35

ALL UNIT QUESTION

1.SHORT NOTE ON MULTIMEDIA DATABASE

35
36

2. what is geographic information system and different format represent


geographic data. difference between GIS and spatial database

A geographic information system (GIS) is a system designed to capture, store,


manipulate, analyze, manage, and present spatial or geographic data. ....
GIS data represents real objects (such as roads, land use, elevation, trees,
waterways, ... Data restructuring can be performed by a GIS to convert data into
different formats.
Geographical Information System can be used for scientific investigation,
resource management, asset management, environmental impact assessment,
urban planning, cartography, criminology, history, sales, marketing, and logistics.
geographic information system (GIS) have two formats..vector and raster

geographic information system (GIS)


GIS is a software to visualize and analyze spatial data using spatial analysis
functions
GIS uses SDBMS
*to store, search, query, share large spatial data sets

spatial database
SDBMS focuses on
1.Efficient storage, querying, sharing of large spatial datasets
2.Provides simpler set based query operations
3.Example operations: search by region, overlay, nearest neighbor, distance,
adjacency, perimeter etc.
4.Uses spatial indices and query optimization to speed up queries over large
spatial datasets.
5.SDBMS may be used by applications other than GIS
*Astronomy, Genomics, Multimedia information systems.

36
37

3. what are the difference among immediate, deferred, and detached execution
of active rule actions ?

1.Immediate–In the immediate coupling mode, the triggered transaction is


executed immediately after the event has been signaled.

2.Deferred–In the deferred coupling mode, the triggered transaction is


executed at the end of the triggering transaction, but before
the commit of the triggering transaction.

3.Detached or decoupled–In the case of detached coupling mode, the triggered


transaction is executed as a separate transaction

37
38

4. explain horizontal and vertical fragmentation in distributed database

38
39

5.explain Inter-query parallelism, Inter-operator parallelism, Intra-operator


parallelism

1.Inter-query parallelism: each query runs on one processor, but different


queries can be distributed among different nodes. A common use case for this is
transaction processing,where each transaction can be executed in a different
node.

2.Inter-operator parallelism: each query runs on multiple processors. The
parallelism corre-sponds to different operators of a query running in different
processors.

3.Intra-operator parallelism: a single operator is distributed among multiple
processors. This is also commonly referred to as
data parallelism
.

6. compare between centralized and distributed database

39