Anda di halaman 1dari 5

DDBMS

Page 1 sur 5

<-- Return to the Lecture Guide

Distributed Databases
What is a DDBMS? A Distributed DB is a related collection of data just like a normal DB, but physically distributed over a network. The data is split into fragments on separate machines, each running a local DBMS. A Distributed DBMS is the software that manages distribution of data and processing in a fashion that is invisible to users. Local DBMSs can access local data autonomously, or access remote data through the DDBMS and other local DBMSs. Apps that only use local data are called Local Applications Apps that access remote data are called Global Applications To be a DDBMS, every local DBMS must participate in at least one Global App. Homogeneous DDBMSs use the same DBMS on the same platform on each site Heterogeneous DDBMSs use different DBMSs/Platforms and require gateways or other middleware to convert queries/data models between sites Advantages of a DDBMS: Realistically reflects decentralised organisational structures If well designed can strike an optimal balance between local speed and global access More failure-proof than a single DBMS Shared processing and I/O leads to better performance Scalability Disadvantages of a DDBMS: Increased complexity and higher cost Networks compromises security Validity, consistency and integrity control becomes complicated Its a new technology with few standards/best practices

http://afis.ucc.ie/gkiely/is2213/DDBMS.htm

22/03/2012

DDBMS

Page 2 sur 5

Note: A DDBMS is not the same as distributed processing which is centralised DB accessible through a network (rather than the data itself being distributed) A DDBMS is not always the same as a parallel DBMS which is a single DBMS using multiple processors/multiple disks DDBMS Components: Global External Schema User Views. Provides logical data independence Global Conceptual Schema Logical description of entire DB, including entities and relationships, constraints, domains, security etc. Provides physical data independence Fragmentation and Allocation Schema Describes how the data is partitioned and where it is stored 3 Tier Schemas for each Local DBMS As normal, but instead of external schema, the top level is a mapping schema designed to be used to communicate with the frag/alloc schema above. Local DBMS (LDBMS) normal DBMS controlling local data Data Communications (DC) network software Global System Catalogue same as a normal systems catalogue, plus frag/alloc information Distributed DBMS (DDBMS) main functional unit transaction management, backup/recovery etc. When designing a DDBMS we seek to maximise: Locality of reference (letting local apps hitting local data) Reliability and Accessibility (by strategically replicating the data) System Performance (by avoiding over/under utilisation of resources) When designing a DDBMS we seek to minimise: Cost vs. Storage (affects replication strategy) Communication Costs (taking into account consistency maintenance) Design Strategies with which to approach the problem:

http://afis.ucc.ie/gkiely/is2213/DDBMS.htm

22/03/2012

DDBMS

Page 3 sur 5

Centralised Data Storage this is not a DDB at all. No replication so no additional storage costs. Fragmented Data Storage no replication but data is split up and distributed. If done correctly, locality of reference will be very high as is performance and storage and communications costs are low. Reliability and accessibility are only ok. Design Strategies with which to approach the problem: Fully Replicated Storage every site hosts a full copy of the DB. Very high storage costs, improved performance for read trans/low comm costs. Write trans low performance/high comms cost. Partially Replicated Storage combination of above methods. It makes the most sense if done right. You have high locality, high reliability/access, good all-round performance, ok storage costs and low comm. costs. Fragmentation: There are a few reasons why it makes sense to fragment data: By keeping data not required by local apps separate, we improve security By maximising locality of reference, we improve efficiency Since most apps only use subsets of relations, it makes sense to break the relations into subsets for storage across the network. Done right, we open the door for parallel processing There are drawbacks, like increased integrity administration and performance hits for poorly fragmented data sets For a fragmentation effort to be viable: it must be complete (all items in a relations must appear in at least one fragment of the relation), functional dependencies must be preserved, and (other than primary keys) fragments should be disjoint. Types of Fragmentation: Horizontal break up relation into subsets of the tuples in that relation, based on a restriction on one or more attributes. E.g. we could break up a table with student info into one subset for undergrads and one subset for postgrads. Vertical breaking up a relation into subsets of attributes. E.g. breaking up a hypothetical student table into grade/course related columns and contact/personal related columns. Mixed fragments the data multiple times, in different ways. We could do our postgrad/undergrad split and then our grades/course split to each of the fragments

http://afis.ucc.ie/gkiely/is2213/DDBMS.htm

22/03/2012

DDBMS

Page 4 sur 5

Derived fragmenting a relation to correspond with the fragmentation of another relation upon which the 1st relation is dependent in some way. Transparency: Distribution Transparency allows users to ignore the physical fragmentation of data, to varying degrees: Frag. Transparency is high level transparency where a user could write SELECT * FROM Student WHERE year = 2 without needing to specify what fragment of the Student relation contains the data, nor where that fragment is stored. Location Transparency mid level transparency where a user would need to write SELECT * FROM S14 WHERE year = 2 where S14 is the relevant fragment of the Student relation, but still wouldnt need to say where the fragment is stored. Local Mapping Transparency low level transparency where a user would need to write SELECT * FROM S14 AT SITE 7 WHERE year = 2 where S14 is the relevant fragment of the Student relation and SITE 7 is where the fragment is physically located. Distribution transparency is supported by a database name server which aliases unique database object identifiers with user friendly names. Transaction transparency ensures integrity and consistency vis--vis multi-site transactions, concurrent users and DB failure. Local transactions and remote single site transactions are handled without additional difficulty, but multi-site transactions must be broken into sub-transactions (for each site) and the independence, atomicity and durability of a centralised DBMS. Performance Transparency simply means DDBMS perform at the same level as a normal DBMS. This puts a lot of burden on the distributed query processor, which decides what fragment to hit, which copy (if replicated), and which location to use, as well as calculating I/O time, CPU time and communication costs. DBMS Transparency means that a heterogeneous DDBMS will behave like a homogenous DDBMS. Replication: Advantages better access and reliability, improved performance Disadvantages storage and consistency Replication Options: Synchronous Updates all copy updates are part of one transactions commit phase - a lot of admin overhead, comm. Costs and opportunity for failure but often necessary

http://afis.ucc.ie/gkiely/is2213/DDBMS.htm

22/03/2012

DDBMS

Page 5 sur 5

Asynchronous Updates periodic updates of all copies based on one mater copy violates the idea of data independence but can be useful in situations where the cost of synch updates are unwarranted. What we expect from replication management: Copy data, either synch or asynch from one db to another Scalability and mappability (in heterogeneous environments) Replication of procedures, indexes, schema etc. tools for DBAs to manage replication Replicated data ownership models: Master/slave: publish and subscribe model authoritive changes are made only to the master site and published to the slaves asynchronously Workflow: like M/S, but Master status moves from site to site depending on the task at hand Update-Anywhere: Symmetric, shared write authority for all replicas. Synchronous: We can synch up our replicas using regularly scheduled snapshots of the master data, or database triggers (when X happens, do Y)
<-- Return to the Lecture Guide

http://afis.ucc.ie/gkiely/is2213/DDBMS.htm

22/03/2012

Anda mungkin juga menyukai