Anda di halaman 1dari 135

KONSEP DATABASE

Topik-topik
 Terminologi Database
 Model-model Data
 Arsitektur Database
 Arsitektur Sistem Manajemen Database
 Kapabilitas/Kemampuan Database
 Yang terlibat dengan Database
 Database Market
 Emerging Database Technologies
 Hal yang perlu dipelajari selanjutnya
Pengantar
 Mana yang Database dan bukan
 Model Realitas
 Mengapa menggunakan Model?
 Peta sebagai Model Realitas
 Pesan untuk Pembuat Peta
 Kapan menggunakan DBMS?
 Data Modeling
 Process Modeling
 Perancangan Database
 Abstraksi
Mana yang Database dan yang bukan
Kata database biasanya digunakan untuk
mengacu:
 Buku Alamat pribadi dalam Dokumen Word
 Kumpulan nama file dokumen Word
 Kumpulan file Spreadsheet Excel
 file yang sangat besar dan dalam satu kelompok
digunakan untuk analisis statistik
 Pengumpulan, perawatan, dan penggunaan data
dalam pemesanan tiket pesawat terbang
 Data digunakan mendukung peluncuran pesawat
ruang angkasa (NASA space shuttle)
Model Realitas
DML
DATABASE SYSTEM
REALITAS
• struktur DATABASE
• proses DDL

 database merupakan model struktur realitas


 database digunakan mencerminkan proses realitas
 Database system adalah software system yang
mendukung definisi dan penggunaan database
 DDL: Data Definition Language
 DML: Data Manipulation Language
Mengapa menggunakan Model?
 Model sangat berguna ketika ingin menguji dan
memanajemen bagian darii dunia nyata
 Biaya menggunakan model lebih rendaha daripada
menggunakan atau mencoba melakukan dalam
dunia real itu sendiri
 Contoh:
 simulator pesawat terbang
 Simulator tenaga nuklir
 Sistem peringatan banjir
 model ekonomi
 model perambatan panas
 peta
Peta merupakan Model Realitas
Pesan untuk Pembuat Peta
 Model merupakan alat komunikasi
 User dari model harus mempunyai pengetahuan
umum tertentu
 Model ditekankan pada aspek pilihan tertentu
 Model dijelaskan dalam beberapa bahasa
 Model dapat mempunyai kesalahan
 Pesan untuk pembuat peta : “jalan raya jangan
dicat merah, sungai tidak mempunyai jalur yang
menuju ke tengah kota, dan pada gunung tidak
dapat dilihat konturnya” [Kent 78]
Gunakan DBMS Jangan gunakan
untuk DBMS untuk
 persistent storage of data  the initial investment in
 centralized control of data hardware, software, and
training is too high
 control of redundancy  the generality a DBMS
 control of consistency and provides is not needed
integrity  the overhead for security,
 multiple user support concurrency control, and
 sharing of data recovery is too high
 data documentation  data and applications are
simple and stable
 data independence  real-time requirements cannot
 control of access and be met by it
security  multiple user access is not
 backup and recovery needed
Data Modeling
DATABASE SYSTEM
REALITAS
• struktur MODEL
• proses
data modeling

 Model menyajikan persepsi struktur realitas


 Proses data modeling untuk menetapkan persepsi
struktur realitas dan menyajikannya
 Dalam proses data modeling dipilih berbagai aspek dan
abstrak
Process Modeling
DATABASE SYSTEM
REALITAS process modeling
• struktur MODEL
• proses

 Gunakan model runtuk mencerminkan proses realitas


 Proses mungkin disajikan oleh program dengan
embedded database queries dan update
 Proses mungkin dinyatakan dengan ad-hoc database
queries dan update pada saat jalan (run-time)
DML DML
PROG
Perancangan Database
Tujuan perancangan database untuk membuat
database yang memenuhi:
 model struktur realitas
 mendukung queries dand update pemodelan
proses realitas
 berjalan dengan efisien
Abstraksi
Bahasa yang digunakan untuk representasi data
harus mendukung abstraksi
Tiga macam abstraksi:

 Classification (Klasifikasi)
 Aggregation
 Generalization (Generalisasi)
Classification
Dalam klasifikasi dibentuk konsep sedemikian
sehingga dapat diputuskan apakah gejala yang
ada akan menjadi anggota dari ekstensi konsep
tersebut atau tidak.

PELANGGAN

Amir Edi Bud ....... Rita Jo Lisa


Aggregation
Dalam aggregation dibentuk konsep dari konsep
yang ada. Gejala merupakan anggota dari
ekstensi konsep baru disusun dari gejala yang
ada.

PESAWAT
TERBANG
SAYAP KOKPIT

MESIN
Generalization
Dalam generalization dibentuk konsep baru
dengan menekankan pada aspek umum dari konsp
yang ada mengabaikan aspek-aspek khusus.

PELANGGAN

Kelas Kelas
Kelas VIP BISNIS EKONOMI
Generalization (lanjutan)
Kelas bagian mungkin overlap
PELANGGAN

Kelas
Kelas VIP
BISNIS
Kelas bagian mungkin ada banyak super-kelas

MOTORIZED AIRBORNE
VEHICLES VEHICLES

TRUCKS HELICOPTERS GLIDERS


Relasi antar Abstraksi
aggregation generalization
T T T
intension

classification
extension
O O O
Abstraction Concretization
classification exemplification
aggregation decomposition
generalization specialization
TERMINOLOGI DATABASE
 Data Models
 Keys and Identifiers
 Integrity and Consistency
 Triggers and Stored Procedures
 Null Values
 Normalization
 Surrogates - Things and Names
Data Model

data model terdiri dari notasi untuk ekspresi:

 data structures
 integrity constraints
 operations
Data Model - Data Structures
All data models have notation for defining:
 attribute types
 entity types
 relationship types

FLIGHT-SCHEDULE DEPT-AIRPORT

FLIGHT# AIRLINE WEEKDAY PRICE FLIGHT# AIRPORT-CODE

101 delta mo 156 101 atl

545 american we 110 912 cph

912 scandinavian fr 450 545 lax

242 usair mo 231


Data Model - Constraints
Constraints express rules that cannot be expressed by the

 dataconstraints
Static structuresapply
alone:
to database state
 Dynamic constraints apply to change of database state
 E.g., “All FLIGHT-SCHEDULE entities must have precisely one DEPT-
AIRPORT relationship

FLIGHT-SCHEDULE DEPT-AIRPORT

FLIGHT# AIRLINE WEEKDAY PRICE FLIGHT# AIRPORT-CODE

101 delta mo 156 101 atl

545 american we 110 912 cph

912 scandinavian fr 450 545 lax

242 usair mo 231 242 bos


Data Model - Operations
Operations support change and retrieval of data:
 insert FLIGHT-SCHEDULE(97, delta, tu, 258); insert DEPT-
AIRPORT(97, atl);
 select FLIGHT#, WEEKDAY
from FLIGHT-SCHEDULE
where AIRLINE=‘delta’;

FLIGHT-SCHEDULE DEPT-AIRPORT

FLIGHT# AIRLINE WEEKDAY PRICE FLIGHT# AIRPORT-CODE

101 delta mo 156 101 atl

545 american we 110 912 cph

912 scandinavian fr 450 545 lax

242 usair mo 231 242 bos

97 delta tu 258 97 atl


Data Model – Operasi dari
Program
FLIGHT-SCHEDULE
declare C cursor for
FLIGHT# AIRLINE WEEKDAY PRICE
select FLIGHT#, WEEKDAY 101 delta mo 156

from FLIGHT-SCHEDULE 545 american we 110

where AIRLINE=‘delta’; 912 scandinavian fr 450

242 usair mo 231


open C;
97 delta tu 258
repeat
fetch C into :FLIGHT#,
:WEEKDAY;
do your thing;
until done;
close C;
Keys and Identifiers
Keys (or identifiers) are uniqueness constraints
 A key on FLIGHT# in FLIGHT-SCHEDULE will force all FLIGHT#’s to be
unique in FLIGHT-SCHEDULE
 Consider the following keys on DEPT-AIRPORT:

FLIGHT# AIRPORT-CODE FLIGHT# AIRPORT-CODE FLIGHT# AIRPORT-CODE FLIGHT# AIRPORT-CODE

FLIGHT-SCHEDULE DEPT-AIRPORT

FLIGHT# AIRLINE WEEKDAY PRICE FLIGHT# AIRPORT-CODE

101 delta mo 156 101 atl

545 american we 110 912 cph

912 scandinavian fr 450 545 lax

242 usair mo 231 242 bos


Integrity and Consistency
 Integrity: does the model reflect reality well?
 Consistency: is the model without internal conflicts?

 a FLIGHT# in FLIGHT-SCHEDULE cannot be null because it models


the existence of an entity in the real world
 a FLIGHT# in DEPT-AIRPORT must exist in FLIGHT-SCHEDULE
because it doesn’t make sense for a non-existing FLIGHT-
SCHEDULE entity to have a DEPT-AIRPORT

FLIGHT-SCHEDULE DEPT-AIRPORT

FLIGHT# AIRLINE WEEKDAY PRICE FLIGHT# AIRPORT-CODE

101 delta mo 156 101 atl

545 american we 110 912 cph

912 scandinavian fr 450 545 lax

242 usair mo 231 242 bos


Triggers and Stored Procedures
 Triggers can be defined to enforce constraints on a database, e.g.,

 DEFINE TRIGGER DELETE-FLIGHT-SCHEDULE


ON DELETE FROM FLIGHT-SCHEDULE WHERE FLIGHT#=‘X’
ACTION DELETE FROM DEPT-AIRPORT WHERE FLIGHT#=‘X’;

FLIGHT-SCHEDULE DEPT-AIRPORT

FLIGHT# AIRLINE WEEKDAY PRICE FLIGHT# AIRPORT-CODE

101 delta mo 156 101 atl

545 american we 110 912 cph

912 scandinavian fr 450 545 lax

242 usair mo 231 242 bos


Null Values
CUSTOMER

CUSTOMER# NAME MAIDEN NAME DRAFT STATUS


123-45-6789 Lisa Smith Lisa Jones inapplicable
234-56-7890 George Foreman inapplicable drafted
345-67-8901 unknown Mary Blake inapplicable

 Null-value unknown reflects that the attribute does apply, but


the value is currently unknown. That’s ok!
 Null-value inapplicable indicates that the attribute does not
apply. That’s bad!
 Null-value inapplicable results from the direct use of “catch
all forms” in database design.
 “Catch all forms” are ok in reality, but detrimental in database
design.
Normalization
FLIGHT-SCHEDULE FLIGHT-WEEKDAY

FLIGHT# AIRLINE WEEKDAYS PRICE FLIGHT# WEEKDAY


un
no 101 mo
101 rm delta mo,fr 156
ali
545
zed
american mo,we,fr 110 545 mo

912 scandinavian fr 450 912 fr

101 fr

FLIGHT-SCHEDULE 545 we

545 fr
FLIGHT# AIRLINE WEEKDAY PRICE ju
st
101 delta mo 156 rig
FLIGHT-SCHEDULE ht
545 american mo 110 !
FLIGHT# AIRLINE PRICE
912 scandinavian fr red 450
un 101 delta 156
101 delta fr da156
nt
545 american 110
545 american we 110
912 scandinavian 450
545 american fr 110
Surrogates - Things and Names
reality
name custom# customer
custom# name addr

customer addr
name-based representation

reality
name custom# customer
customer custom# name addr

customer addr
surrogate-based representation
 name-based: a thing is what we know about it
 surrogate-based: “Das ding an sich” [Kant]
 surrogates are system-generated, unique, internal identifiers
DATA MODEL OVERVIEW
 ER-Model
 Hierarchical Model
 Network Model
 Inverted Model - ADABAS
 Relational Model
 Object-Oriented Model(s)
ER-Model
 Data Structures
 Integrity Constraints
 Operations

 The ER-Model is extremely successful as a


database design model
 Translation algorithms to many data models
 Commercial database design tools, e.g., ERwin No
generally accepted query language
 No database system is based on the model
ER-Model - Data Structures
entity type

composite

attribute
relationship

type

attribute
subset
relationship
type
multivalued

attribute

derived

attribute
ER-Model - Integrity Constraints
E1 1 n A
R E2

cardinality: 1:n for E1:E2 in R key attribute


(min,max)
E1 E
R E2

(min,max) participation of E2 in R
E1 E2 E3

E1 R E2 d disjoint

x exclusion
total participation of E2 in R
p partition
E1 R E2

weak entity type E2; identifying relationship


ER Model - Example visa
required

dept domestic international


time flight flight
airport airport
dept
name code p weekdays
1 airport n
airport flight
airport
addr schedule
arriv
1 n
airport 1 flight#
street city zip
arriv
time instance
of
customer# date
n

customer n reserva- n flight


customer instance
name tion

seat#
ER-Model - Operations
 Several navigational query languages have been
proposed
 A closed query language as powerful as relational
languages has not been developed
 None of the proposed query languages has been
generally accepted
Hierarchical Model
 Data Structures
 Integrity Constraints
 Operations

 Commercial systems include IBM’s IMS, MRI’s


System-2000 (now sold by SAS), and CDC’s MARS
IV
Hierarchical Model - Data
flight-sched
Structures
flight#

flight-inst dept-airp arriv-airp


date airport-code airport-code
customer
customer# customer name
 record types: flight-schedule, flight-instance, etc.
 field types: flight#, date, customer#, etc.
 parent-child relationship types (1:n only!!):
(flight-sched,flight-inst), (flight-inst,customer)
 one record type is the root, all other record types is a child of one
parent record type only
 substantial duplication of customer instances
 asymmetrical model of n:m relationship types
Hierarchical Model - Data
Structures - virtual records
customer flight-sched
customer# customer name flight#

P
flight-inst dept-airp arriv-airp
date airport-code airport-code

customer-
pointer

 duplication of customer instances avoided


 still asymmetrical model of n:m relationship types
Hierarchical
flight-sched
flight# Model
flight-inst dept-airp
-
arriv-airp
Operations
date airport-code airport-code
customer
customer# customer name
GET UNIQUE flight-sched (flight#=‘912’) [search flight-sched; get first such flight-sched]

GET UNIQUE flight-sched [for each flight-sched

flight-inst (date=‘102298’) for each flight-inst with date=102298

customer (name=‘Jensen’) for each customer with name=Jensen, get the first one]

GET UNIQUE flight-sched [for each flight-sched

flight-inst (date=‘102298’) for each flight-inst with date=102298, get the first

GET NEXT flight-inst get the next flight-inst, whatever the date]
Network Model
 Data Structures
 Integrity Constraints
 Operations

 Based on the CODASYL-DBTG 1971 report


 Commercial systems include, CA-IDMS and
DMS-1100
Network Model - Data Structures
Type diagram Occurrence diagram

Bachman Diagram The Spaghetti Model


flight-schedule
flight# F1 F2

FR
reservation R1 R2 R3 R4 R5 R6
flight# date customer#

C1 C4
customer CR
customer# customer name
 owner record types: flight-schedule, customer
 member record type: reservations
 DBTG-set types: FR, CR
 n-m relationships cannot be modeled directly
 recursive relationships cannot be modeled directly
Network Model - Integrity
Constraints
flight-schedule
 keys flight#

reservation
 checks flight# date customer# price
check is price>100
flight-schedule
 set retention options: flight#
 fixed
FR
 mandatory
reservation
 optional flight# date customer#
 set insertion options:
customer CR
 automatic
customer# customer name
 manual
FR and CR are fixed and automatic
Network Model - Operations
 The operations in the Network Model are generic,
navigational, and procedural

query: currency indicators:


(1) find flight-schedule where flight#=F2 (F2)

(2) find first reservation of FR (R4)

(3) find next reservation of FR (R5)


F1 F2
(4) find owner of CR FR (C4)

R1 R2 R3 R4 R5 R6

C1 CR C4
Network Model - Operations
 navigation is cumbersome; tuple-at-a-time
 many different currency indicators
 multiple copies of currency indicators may be
needed if the same path is traveled twice
 external schemata are only sub-schemata
Inverted Model - ADABAS
 Data Structures
 Integrity Constraints
 Operations
Relational Model
 Data Structures
 Integrity Constraints
 Operations

 Commercial systems include: ORACLE, DB2,


SYBASE, INFORMIX, INGRES, SQL Server
 Dominates the database market on all platforms
Relational Model - Data Structures
 domains
 attributes
 relations
relation name
attribute names
flight-schedule
flight#: airline: weekday: price:

integer char(20) char(2) dec(6,2)

domain names
Relational Model - Integrity
Constraints
 Keys
 Primary Keys
 Entity Integrity
 Referential Integrity

flight-schedule customer
flight# customer# customer name
p p

reservation
flight# date customer#
Relational Model - Operations
 Powerful set-oriented query languages
 Relational Algebra: procedural; describes how to
compute a query; operators like JOIN, SELECT,
PROJECT
 Relational Calculus: declarative; describes the
desired result, e.g. SQL, QBE
 insert, delete, and update capabilities
Relational Model - Operations
 tuple calculus example (SQL)
select flight#, date
from reservation R, customer C
where R.customer#=C.customer#
and customer-name=‘LEO’;
 algebra example (ISBL)
((reservation join customer) where customer-
name=‘LEO’) [flight#, date];
 domain calculus example (QBE)
reservation customer
flight# date customer# customer# customer-name
.P .P _c _c LEO
Object-Oriented Model(s)
 based on the object-oriented paradigm,
e.g., Simula, Smalltalk, C++, Java
 area is in a state of flux

 object-oriented model has object-oriented repository


model; adds persistence and database capabilities; (see
ODMG-93, ODL, OQL)
 object-oriented commercial systems include GemStone,
Ontos, Orion-2, Statice, Versant, O2

 object-relational model has relational repository model;


adds object-oriented features; (see SQL3)
 object-relational commercial systems include Starburst,
POSTGRES
Object-Oriented Paradigm
 object class
 object attributes, primitive types, values
 object interface, methods; body, implementations
 messages; invoke methods; give method name and
parameters; return a value
 encapsulation
 visible and hidden attributes and methods
 object instance; object constructor & destructor
 object identifier, immutable
 complex objects; multimedia objects; extensible type system
 subclasses; inheritance; multiple inheritance
 operator overloading
 references represent relationships
 transient & persistent objects
Object-Oriented Model - Structures
O2-like syntax
class flight-schedule {
type tuple (flight#: integer,
weekdays: set ( weekday: enumeration {mo,tu,we,th,fr,sa,su})
dept-airport: airport, arriv-airport: airport)
method reschedule(new-dept: airport, new-arriv: airport)}
class international-flight inherit flight-schedule {
type tuple (visa-required:string)
method change-visa-requirement(v: string): boolean}
/* the reschedule method is inherited by international-flight; */
/* when reschedule is invoked in international-flight it may */
/* also invoke change-visa-requirement */
class flight-instance {
type tuple (flight-date: tuple ( year: integer, month: integer, day: integer);
instance-of: flight-schedule,
passengers: set (customer) inv customer::reservations)
method add-passenger(new-passenger:customer):boolean,
/*adds to passengers; invokes customer.make-reservation */
remove-passenger(passenger: customer):boolean}
/*removes from passengers; invokes customer.cancel-reservation*/
class customer {
type tuple (customer#: integer,
customer-name: tuple ( fname: string, lname: string)
reservations: set (flight-instance) inv flight-instance::passengers)
method make-reservation(new-reservation: flight-instance): boolean,
cancel-reservation(reservation: flight-instance): boolean}
Object-Oriented Model - Updates
class customer { O2-like syntax
type tuple (customer#: integer,
customer-name: tuple ( fname: string, lname: string)
reservations: set (flight-instance) inv flight-instance::passengers)

main () {
transaction::begin();
all-customers: set( customer); /*makes persistent root to hold all customers */
customer c= new customer; /*creates new customer object */
c= tuple (customer#: “111223333”,
customer-name: tuple( fname: “Leo”, lname: “Mark”));
all-customers += set( c); /*c becomes persistent by attaching to root */
transaction::commit();}
Object-Oriented Model - Queries
O2-like syntax
“Find the customer#’s of all customers with first name Leo”

select tuple (c#: c.customer#)


from c in customer
where c.customer-name.fname = “Leo”;

“Find passenger lists, each with a flight# and a list of customer names, for flights out
of Atlanta on October 22, 1998”

select tuple(flight#: f.instance-of.flight#,


passengers: select( tuple( c.customer#, c.customer-name.lname)))
from f in flight-instance, c in f.passengers
where f.flight-date=(1998, 10, 22)
and f.instance-of.dept-airport.airport-code=“Atlanta”;
DATABASE ARCHITECTURE
 ANSI/SPARC 3-Level DB Architecture
 Metadata - What is it? Why is it important?
 ISO Information Resource Dictionary System
(ISO-IRDS)
ANSI/SPARC 3-Level DB
Architecture - separating concerns
DML

database system database system


DDL
database schema
schema

data
data

 a database is divided into schema and data


 the schema describes the intension (types)
 the data describes the extension (data)
 Why? Effective! Efficient!
ANSI/SPARC 3-Level DB
Architecture - separating
schema
concerns
schema

data
data

schema
schema internal
internalschema
schema

data
data

external
externalschema
schema conceptual
conceptualschema
schema internal
internalschema
schema

data
data
ANSI/SPARC 3-Level DB
Architecture
external
external external
external external
external
schema
schema1 schema
schema2 schema
schema3
1 2 3

conceptual
conceptual
schema
schema

• external schema:
internal
internal use of data
schema
schema • conceptual schema:
meaning of data
• internal schema:
database
database storage of data
Conceptual Schema
 Describes all conceptually relevant, general, time-
invariant structural aspects of the universe of discourse
 Excludes aspects of data representation and physical
organization, and access

CUSTOMER

NAME ADDR SEX AGE

 An object-oriented conceptual schema would also


describe all process aspects
External Schema
 Describes parts of the information in the
conceptual schema in a form convenient to a
particular user group’s view
 Is derived from the conceptual schema
MALE-TEEN-CUSTOMER

NAME ADDR

TEEN-CUSTOMER(X, Y) =
CUSTOMER(X, Y, S, A)
WHERE SEX=M AND 12<A<20;
CUSTOMER

NAME ADDR SEX AGE


Internal Schema
 Describes how the information described in the conceptual
schema is physically represented to provide the overall best
performance
CUSTOMER

NAME ADDR SEX AGE

CUSTOMER

NAME ADDR SEX AGE

B+-tree on index on
AGE NAME NAME PTR
Physical Data Independence
external
external external
external external
external
schema
schema1 schema
schema2 schema
schema3
1 2 3

conceptual
conceptual
schema
schema

internal
internal Physical data independence is
schema
schema a measure of how much the
internal schema can change
without affecting the
application programs
database
database
Logical Data Independence
external
external external
external external
external
schema
schema1 schema
schema2 schema
schema3
1 2 3

conceptual
conceptual
schema
schema

internal
internal Logical data independence is a
schema
schema measure of how much the
conceptual schema can change
without affecting the
application programs
database
database
Schema Compiler
The schema compiler compiles metadata
schemata and stores them in the
metadatabase

compiler

schemata
• Catalog
• Data Dictionary
• Metadatabase
Query Transformer
Uses metadata to transform a metadata
query at the external schema
level to a query at the storage
level DML

query
query
transformer

data
ANSI/SPARC DBMS Framework
enterprise
administrator

1
schema compiler

3 3
database conceptual application
administrator schema system
processor administrator
13 2 4
14 5
internal external
schema schema
processor metadata processor
query transformer

34 36 38
21 30 31 12
data storage internal conceptual user
internal conceptual external
transformer transformer transformer
Metadata - What is it?
 System metadata:  Business metadata:
 Where data came from  What data are available
 How data were changed  Where data are located
 How data are stored  What the data mean
 How data are mapped  How to access the data
 Who owns data  Predefined reports
 Who can access data  Predefined queries
 Data usage history  How current the data are
Data usage statistics

Metadata - Why is it important?
 System metadata are critical in a DBMS
 Business metadata are critical in a data warehouse
ISO-IRDS - Why?
 Are metadata different from data?
 Are metadata and data stored separately?
 Are metadata and data described by different models?
 Is there a schema for metadata? A metaschema?
 Are metadata and data changed through different
interfaces?
 Can a schema be changed on-line?
 How does a schema change affect data?
DL
ISO-IRDS Architecture
metaschema metaschema; describes all schemata that
can be defined in the data model

data dictionary data dictionary schema; contains copy of


schema metaschema; schema for format
definitions; schema for data about
application data
data dictionary data dictionary data; schema for
data
application data; data about
application data

data raw formatted application data


ISO-IRDS - example
metaschema relations
rel-name att-name dom-name

access-rights
data dictionary user relation operation
schema
relations
rel-name att-name dom-name

data dictionary (u1, supplier, insert)


(u2, supplier, delete)
supplier
s# sname location

data (s1, smith, london)


(s2, jones, boston)
DATABASE MANAGEMENT
SYSTEM ARCHITECTURE
 Teleprocessing Database
 File-Sharing Database
 Client-Server Database - Basic
 Client-Server Database - w/Caching
 Distributed Database
 Federated Database
 Multi-Database
 Parallel Databases
Teleprocessing Database
dumb terminal dumb terminal dumb terminal

communication
lines
OSTP

AP1 AP2 AP3 mainframe


DBMS

OSDB

database
DB
Teleprocessing Database -
characteristics
 Dumb terminals
 APs, DBMS, and DB reside on central computer
 Communication lines are typically phone lines
 Screen formatting transmitted via communication
lines
 User interface character oriented and primitive
 Dumb terminals are gradually being replaced by
micros
File-Sharing Database
AP1 AP2 AP3

DBMS DBMS micros


OSNET OSNET

LAN

OSNET file server

OSDB
micro

database
DB
File-Sharing Database -
characteristics
 APs and DBMS on client micros
 File-Server on server micro
 Clients and file-server communicate via LAN
 Substantial traffic on LAN because large files (and indices) must
be sent to DBMS on clients for processing
 Substantial lock contention for extended periods of time for the
same reason
 Good for extensive query processing on downloaded snapshot
data
 Bad for high-volume transaction processing
Client-Server Database - Basic
AP1 AP2 AP3
micros
OSNET OSNET

LAN

OSNET micro(s) or
DBMS mainframe

OSDB

database
DB
Client-Server Database - Basic -
characteristics
 APs on client micros
 Database-server on micro or mainframe
 Multiple servers possible; no data replication
 Clients and database-server communicate via LAN
 Considerably less traffic on LAN than with file-
server
 Considerably less lock contention than with file-
server
Client-Server Database - w/Caching
AP1 AP2 AP3
micros
DBMS DBMS

OSNET OSNET

LAN
DB DB
OSNET
micro(s) or
DBMS
mainframe
OSDB

database
DB
Client-Server Database - w/Caching
- characteristics
 DBMS on server and clients
 Database-server is primary update site
 Downloaded queries are cached on clients
 Change logs are downloaded on demand
 Cached queries are updated incrementally
 Less traffic on LAN than with basic client-server database
because only initial query result is downloaded followed by
change logs
 Less lock contention than with basic client-server database for
same reason
Distributed Database
AP1 AP2 AP3
micros(s) or
DDBMS DDBMS
mainframes
OSNET&DB OSNET&DB

network

external external external

conceptual

internal

DB DB DB
Distributed Database -
characteristics
 APs and DDBMS on multiple micros or mainframes
 One distributed database
 Communication via LAN or WAN
 Horizontal and/or vertical data fragmentation
 Replicated or non-replicated fragment allocation
 Fragmentation and replication transparency
 Data replication improves query processing
 Data replication increases lock contention and slows
down update transactions
non-partitioned
non-replicated
Distributed Database - Alternatives

partitioned
replicated

replicated
partitioned

D
C
D

D
C

C
A

B
D
C

C
A

A
B

B
increasing cost, complexity, difficulty of control, security risk

-
increasing parallelism, independence, flexibility, availability

+
Federated Database
AP1 AP2 AP3
micros(s) or
DDBMS DDBMS
mainframes
OSNET&DB OSNET&DB

network
federation
schema

export export export


schema1 schema2 schema3

conceptual1 conceptual2 conceptual3

internal1 internal2 internal3

DB DB DB
Federated Database -
characteristics
 Each federate has a set of APs, a DDBMS, and a DB
 Part of a federate’s database is exported, i.e., accessible
to the federation
 The union of the exported databases constitutes the
federated database
 Federates will respond to query and update requests
from other federates
 Federates have more autonomy than with a traditional
distributed database
Multi-Database

AP1 AP2 AP3


micros(s) or
MULTI-DBMS MULTI-DBMS
mainframes
OSNET&DB OSNET&DB
network, e.g
WWW

conceptual1 conceptual2 conceptual3

internal1 internal2 internal3

DB DB DB
Multi-Database - characteristics
 A multi-database is a distributed database without a shared
schema
 A multi-DBMS provides a language for accessing multiple
databases from its APs
 A multi-DBMS accesses other databases via a network, like
the www
 Participants in a multi-database may respond to query and
update requests from other participants
 Participants in a multi-database have the highest possible
level of autonomy
Parallel Databases
 A database in which a single query may be
executed by multiple processors working together
in parallel
 There are three types of systems:
 Shared memory
 Shared disk
 Shared nothing
Parallel Databases - Shared Memory
 processors share memory via
P

M
bus
P  extremely efficient processor
communication via memory
P
writes
P  bus becomes the bottleneck
 not scalable beyond 32 or 64
P processor processors
M memory
disk
Parallel Databases - Shared Disk
 processors share disk via
P
M
interconnection network
M P  memory bus not a bottleneck
 fault tolerance wrt. processor or
M P
memory failure
M P  scales better than shared memory
 interconnection network to disk
subsystem is a bottleneck
 used in ORACLE Rdb
Parallel Databases - Shared
Nothing
M P
 scales better than shared memory
and shared disk
M P
 main drawbacks:
 higher processor communication cost
M P
 higher cost of non-local disk access
 used in the Teradata database
M P
machine
RAID -
redundant array of inexpensive disks
 disk striping improves performance via parallelism
(assume 4 disks worth of data is stored)

 disk mirroring improves reliability via redundancy


(assume 4 disks worth of data is stored)
 mirroring: via copy of data (c); via bit parity (p)

c c c c

p
DATABASE CAPABILITIES
 Data Storage
 Queries
 Optimization
 Indexing
 Concurrency Control
 Recovery
 Security
Data Storage
 Disk management
 File management
 Buffer management
 Garbage collection
 Compression
Queries
SQL queries are composed from the following:
 Set operations
 Selection
 Point
 Cartesian Product
 Range  Union
 Conjunction  Intersection
 Disjunction  Set Difference
 Join  Other
 Natural join
 Equi join
 Duplicate elimination
 Theta join  Sorting
 Outer join  Built-in functions: count, sum,
 Projection avg, min, max
 Recursive (not in SQL)
Query Optimization
select flight#, date reserv
from reserv R, cust C flight# date cust# 10,000 reserv blocks
where R.cust#=C.cust# customer 3,000 cust blocks
and cust-name=‘LEO’; cust# cust-name 30 “Leo” blocks

flight#, date flight#, date

cost: 10,000x30

cust-name=Leo

cust#
cost: 10,000x3,000

cust-name=Leo
cost: 3,000
reserv cust reserv cust
Query Optimization
 Database statistics
 Query statistics
 Index information
 Algebraic manipulation
 Join strategies
 Nested loops
 Sort-merge
 Index-based
 Hash-based
Indexing
Why Bother?
 Disk access time: 0.01-0.03 sec

 Memory access time: 0.000001-0.000003 sec

 Databases are I/O bound

 Rate of improvement of

(memory access time)/(disk access time) >>1


 Things won’t get better anytime soon!

Indexing helps reduce I/O !


Indexing (cont.)
 Clustering vs. non-clustering
 Primary and secondary indices
 I/O cost for lookup:
 Heap: N/2
 Sorted file: log2(N)
 Single-level index: log2(n)+1
 Multi-level index; B+-tree: logfanout(n)+1
 Hashing: 2-3
 View caching; incremental computation
Concurrency Control
flight-inst reserv
flight# date #avail-seats flight# date customer#
T1: T2:
read(flight-inst(flight#,date)
seats:=#avail-seats read(flight-inst(flight#,date)
if seats>0 then { seats:=#avail-seats
seats:=seats-1 if seats>0 then {
seats:=seats-1
write(reserv(flight#,date,customer2))
write(flight-inst(flight#,date,seats))}
write(reserv(flight#,date,customer1))
write(flight-inst(flight#,date,seats))}

overbooking!
Concurrency Control (cont.)
ACID Transactions:
 An ACID transaction is a sequence of database operations

that has the following properties:


 Atomicity

 Either all operations are carries out, or none is


 This property is the responsibility of the concurrency control and
the recovery sub-systems
 Consistency
 A transaction maps a correct database state to another correct state
 This requires that the transaction is correct, which is the
responsibility of the application programmer
Concurrency Control (cont.)
 Isolation
 Although multiple transactions execute concurrently, i.e.
interleaved, not parallel, they appear to execute sequentially
 This is the responsibility of the concurrency control sub-
system
 Durability
 The effect of a completed transaction is permanent
 This is the responsibility of the recovery manager
Concurrency Control (cont.)
 Serializability is a good definition of correctness
 A variety of concurrency control protocols exist
 Two-phase (2PL) locking
 deadlock and livelock possible
 deadlock prevention: wait-die, wound-wait
 deadlock detection: rollback a transaction
 Optimistic protocol: proceed optimistically; back up and
repair if needed
 Pessimistic protocol: do not proceed until knowing that no
back up is needed
Recovery
reserv flight-inst

flight# date customer# flight# date #avail-seats

102298 102398
change-reservation(DL212,102298,DL212,102398,C) 100 50
read(flight-inst(DL212,102298) 100 50
#avail-seats:=#avail-seats+1 100 50
update(flight-inst(DL212,102298,#avail-seats) 101 50
read(flight-inst(DL212,102398) 101 50
#avail-seats:=#avail-seats-1 101 50
update(flight-inst(DL212,102398,#avail-seats) 101 49
update(reserv(DL212,102298,C,DL212,102398,C) 101 49
Recovery (cont.)
Storage types:
 Volatile: main memory
 Nonvolatile: disk

Errors:
 Logical error: transaction fails; e.g. bad input, overflow
 System error: transaction fails; e.g. deadlock
 System crash: power failure; main memory lost, disk survives
 Disk failure: head crash, sabotage, fire; disk lost

What to do?
Recovery (cont.)
 Deferred update (NO-UNDO/REDO):
 don’t change database until ready to commit
 write-ahead to log to disk

 change the database

 Immediate update (UNDO/NO-REDO):


 write-ahead to log on disk

 update database anytime

 commit not allowed until database is completely updated

 Immediate update (UNDO/REDO):


 write-ahead to log on disk

 update database anytime

 commit allowed before database is completely updated

 Shadow paging (NO-UNDO/NO-REDO):


 write-ahead to log in disk

 keep shadow page; update copy only; swap at commit


Security
DAC: Discretionary Access Control
 is used to grant/revoke privileges to users, including

access to files, records, fields (read, write, update


mode)
MAC: Mandatory Access Control
 is used to enforce multilevel security by classifying

data and users into security levels and allowing users


access to data at their own or lower levels only
PEOPLE THAT WORK WITH
DATABASES
 System Analysts
 Database Designers
 Application Developers
 Database Administrators
 End Users
System Analysts
 communicate with each prospective database
user group in order to understand its
 information needs
 processing needs
 develop a specification of each user group’s
information and processing needs
 develop a specification integrating the
information and processing needs of the user
groups
 document the specification
Database Designers
 choose appropriate structures to represent the
information specified by the system analysts
 choose appropriate structures to store the
information in a normalized manner in order to
guarantee integrity and consistency of data
 choose appropriate structures to guarantee an
efficient system
 document the database design
Application Developers
 implement the database design
 implement the application programs to meet the
program specifications
 test and debug the database implementation and
the application programs
 document the database implementation and the
application programs
Database Administrators
 Manage the database structure
 participate in database and application development
 assist in requirement analysis
 participate in database design and creation
 develop procedures for integrity and quality of data
 facilitate changes to database structure
 seek communitywide solutions
 assess impact on all users
 provide configuration control
 be prepared for problems after changes are made
 maintain documentation
Database Administrators (cont.)
 Manage data activity
 establish database standards consistent with data
administration standards
 establish and maintain data dictionary
 establish data proponencies
 work with data proponents to develop data access and
modification rights
 develop, document, and train staff on backup and recovery
procedures
 publish and maintain data activity standards documentation
Database Administrators (cont.)
 Manage the database management system
 generate database application performance reports
 investigate user performance complaints
 assess need for changes in database structure or application
design
 modify database structure
 evaluate and implement new DBMS features
 tune the database
 Establish the database data dictionary
 data names, formats, relationships
 cross-references between data and application programs
 (see metadata slide)
End Users
 Parametric end users constantly query and update the
database. They use canned transactions to support
standard queries and updates.
 Casual end users occasional access the database, but
may need different information each time. They use
sophisticated query languages and browsers.
 Sophisticated end users have complex requirement
and need different information each time. They are
thoroughly familiar with the capabilities of the DBMS.
THE DATABASE MARKET
 Prerelational vs. Relational
 Database Vendors
 Relational Database Products
 Relational Databases for PCs
 Object-Oriented Database Capabilities
Prerelational vs. Relational
billion $
14
prerelational
12 relational

10

0
1994 1995 1996 1997 1998 1999


Prerelational market revenue shrinking about 9%/year. Currently 1.8 billion/year


Relational market revenue growing about 30%/year. Currently 11.5 billion/year


Object-Oriented market revenue about 150 million/year
Database Vendors

Other ($2,272M)
Informix CA Oracle ($1,755M)
Sybase

IBM (IMS+DB2) ($1,460M)


IBM
Other
Sybase ($664M)

Oracle
Informix (+Illustra) ($492M)

CA-IDMS (+Ingress) ($447M)

NEC ($211M)

Fujitsu ($186M)

Total:
Hitachi $7,847M
($117M)

Software AG
Source: (ADABAS)
IDC, 1995 ($136M)
Relational Database Products
We compare the following products:
 ORACLE 7 Version 7.3

 Sybase SQL Server 11

 Informix OnLine 7.2

 Microsoft SQL Server 6.5

 IBM DB2 2.1.1

 CA-OpenIngres 1.2
Relational Database Products COMPARISON
CRITERIA
ORACLE7
VERSION7.3
SYBASE SQL
SERVER11
INFORMIX
ONLINE7.1
Relational Model
Domains no no no
Referential Integ. restrict, except restrict only restrict, except
violation options cascading delete cascading delete
Taylor referential no no no
messages
Referential no no no
WHERE clause
Updatable views yes yes yes
w/check option
Database Objects
User-defined yes yes no
data types
BLOBs yes yes yes
Additional image,video,text, binary,image,text, byte,
data types messaging,spatial money,bit, text up to 2GB
data types varbinary
Table structure heap,clustered heap,clustered no choice
Index structure B-tree,bitmap, B-tree B+-tree,clustered
hash
Tuning facilities table and index index pre-fetch, extents, table
allocation I/O buffer cache, fragmentation by
block size, expression or
table partitioning round robin
Relational Database Products COMPARISON
CRITERIA
MICROSOFT SQL
SERVER6.5
IBM DB2 2.1.1 CA-
OPENINGRES1.2
Relational Model
Domains no no no
Ref. integrity restrict restrict,cascade, restrict only
w/check option set null
Taylor referential no no no
messages
Referential no no no
WHERE clause
Updatable views yes yes, including yes
w/check option union vews
Database objects
User-defined yes yes yes
data types
BLOBs yes yes yes
Additional large objects byte,longbyte,long
data types varchar,spatial,
varbyte, money
Table structure no choice no choice B-tree,hash,heap,
ISAM
Index structure clustered clustered B-tree,hash,ISAM
Tuning facilities fill factors, table & index table&index alloc.
allocation allocation, cluster fill factors,
ratio,cluster factor pre-allocation
COMPARISON ORACLE7 SYBASE SQL INFORMIX
CRITERIA VERSION7.3 SERVER11 ONLINE7.2
Triggers
Relational Database

Level row&set-based set-based row&set-based


Timing before,after after before,after,each
Nesting yes yes yes
Stored procedures
Products

Language PL/SQL Transact-SQL SPL


Nesting yes yes yes
Cursors yes yes yes
External calls RPC RPC system calls
Events yes time-based no
Queries
Locking level table, row table, page db,table,page,row
ANSI SQL comply entry level SQL92 entry level SQL92 entry level SQL92
Cursors forward forward forward,backward
Outer join yes yes yes
ANSI syntax no no no
APIs ODBC DBLIB,CTLIB,ODBC ESQL,TP/XA,CLI,
ODBC
COMPARISON MICROSOFT SQL IBM DB22.1.1 CA-
CRITERIA DERVER6.5 OPENINGRES1.2
Triggers
Relational Database
Level set-based set&row-based row-based
Timing after before,after after
Nesting yes yes yes
Stored procedures
Language Transact-SQL SQL, 3GL SQL-like
Products

Nesting yes yes yes


Cursors yes yes no
External calls system call yes no(db events)
Events no user-def functions db event alerters
Queries
Locking level db,table, page,row db,table, page,row db,table,page
ANSI SQL comply entry level SQL92 entry level SQL92 entry level SQL92
Cursors forward,backward forward forward
,relative,absolute
Outer join yes yes yes
ANSI syntax no no yes
APIs ESQL,DBLIB,ODBC, ESQL,,ODBC ESQL,TP/XA,ODBC
Dist mgt objects
COMPARISON ORACLE7 SYBASE SQL INFORMIX
CRITERIA SERVER11 ONLINE7.2
Database Admin
Tools Oracle Enterp Mgr Sybase SQL Mgr SMI,DB/Cockpit,
Relational Database
Performance Pack SQL Monitor OnPerf
SNMP support yes yes no
Security C2(trusted Oracle) C2 C2,B1online secur
Partial backup & configurable configurable no
recovery
Internet
Products

Internet support OracleWebServer web.sql ESQL,4GLCGI,


Interface Kit
Connectivity,
Distribution
Gateways to other MVS source Adabas,AS/400, Oracle,Sybase,
DBMSs through EDA/SQL DB2,IDMS,IMS, IMS,DB2
(Adabas,IDMS,SQL Informix,Ingres,
/DS,VSAM), any ISAM,SQL Server,
APPC source, AS/ Oracle,Rdb,RMS,
400,DRDA,DB2,Tur seq.flies,SQL/DS,
boimage,Sybase,R SybaseSQL Server,
db,RMS,Informix,C Teradata,VSAM
A-Ingres,SQL
Server,Teradata
Distributed DBs part of base prod OmniConnect Online server
2PC protocol yes yes yes,presumeabort
Heterogeneous gateways DirectConnect no
Optimization yes yes yes
RPC yes yes no
COMPARISON MICROSOFT SQL IBM DB@ 2.1.1 CA-
CRITERIA SERVER6.5 OPENINGRES1.2
Database Admin
Tools Enterprise Mgr, DB Director,Perf IPM, VisualDBA,
Relational Database
Perf Monitor Monitor, IMA
Visual Explain
SNMP support yes yes yes
Security NT integrated three levels C2
Partial backup & per table yes per table
Products

recovery
Internet
Internet support Internet Info Serv DB2 WWW CA-OpenIngres/
(WindowsNT) Connection ICE
Connectivity,
Distribution
Gateways to other no Oracle, Sybase, DB2, Datacom,
DBMSs Informix, MS SQL IMS, IDMS, VSAM,
Server Oracle, Rdb,
Albase, Informix,
Oracle, Sybase
Distributed DBs no DataJoiner CA-OpenIngres*
2PC protocol n/a yes yes,automatic
Heterogeneous no DataJoiner through gateways
Optimization no yes yes
RPC yes no no
COMPARISON ORACLE SYBASE SQL INFORMIX
CRITERIA VERSION7.3 SERVER11 ONLINE7.2
Relational Database
Replication
Recording replic. log/trigger log buffer log
Hot standby yes yes yes
Peer-to-peer yes yes no
To other DBMSs through gateways DirectConnect no
Products

Cascading yes yes no


Additional
restrictions
Name length 30 30 18
Columns 254 250 2767
Column size 2GB 1962 32,767
Tables n/a 2 billion 477 million
Table size n/a storage dependent 64 terabytes
Table width by column storage dependent 32,767
Platforms (OS) most UNIX, OS/2, most UNIX, OS/2, most UNIX,
VAX/VMS, MAC, VAX/VMS, MAC WindowsNT,
WindowsNT, WindowsNT, Windows95
Windows95 Windows95,
COMPARISON MICROSOFT SQL IMB DB2 2.1.1 CA-
CRITERIA SERVER6.5 OPENINGRES1.2
Replication
Relational Database
Recording log log rules(triggers)
Hot standby yes yes yes
Peer-to-peer no yes yes
To other DBMSs through ODBC DataJoiner through gateways
Products

Cascading no no yes
Additional
restrictions
Name length 30 18 32
Columns 250 255 300
Column size 255 4005, except LOB 2008 (BLOBs 2GB)
Tables 2 billion storage dependent n/a
Table size 2 terabytes 64GB n/a
Table width 2048 storage dependent 2008 (BLOBs 2GB)
Platforms (OS) WindowsNT most UNIX, OS/2, most UNIX,VAX/
VAX/VMS, MAC VMS, WindowsNT,
WindowsNT, Windows95 (CA-
Windows95, OpenIngres/
Desktop
Relational Databases for PCs
Relational databases for PCs include:
 Microsoft FoxPro for Windows
 Microsoft FoxPro for DOS
 Borland’s Paradox for Windows
 Borland’s dBASE IV
 Paradox for DOS
 R:BASE
 Microsoft Access
GemStone ONTOS ORION-2 Statice VERSANT
Primary Coop CAD/CAM CAD/CAM - Colab.
Use environ. OIS, MM engineer
Object-Oriented Database Capabilities Version yes yes yes limited yes
Mgt.
Recovery shadowp yes logs & REDO log -
shadowp
Transac. yes yes yes yes yes
Mgt.
Composite no no yes yes yes
Objects
Multiple no yes yes yes yes
Inherit. planned
Concur/ 3 locks 4 locks 5 locks 2PL 4 locks,
Locking optim 2PL
pesim
Distribute yes yes yes yes yes
Support
Dynamic yes yes yes - yes
Evolution limited limited all feature limited
Multimedia yes no yes yes no
Language C,C++,OPAL C++ LISP, C Common C, C++
Interface Smalltalk LISP
Platforms SUN3&4, SUN3&4 Symbolics, Symbolics SUN3&4
Apollo,PCs, OS/2 SUN3, HP,
VAX/VMS VAX/VMS DECstation,
Apollo
Special change Object SQL change browser, change
Feature notific. notific. dev. tools notific.
pri/sha db pri/sha db
O2 Starburst
Primary CAD/CAM, CAD/CAM,
Use GIS, OIS KBS
Version limited no

Object-Oriented Database Capabilities


Mgt.
Recovery yes rollback

Transac. yes yes


Mgt.
Composite yes complex
Objects objects
Multiple yes yes
Inherit.
Concur/ yes rules &
Locking optimistic rollback
Distributed yes yes
Support
Dynamic yes -
Evolution limited
Multimedia yes yes
Language C C, C++
Interface
Platforms SUN OS4.0 IBM PC,
or higher RISC 6000
Special Vis. Interf. -
Feature Powerful QL
EMERGING DB
TECHNOLOGIES
 WEB databases
 Multimedia Databases
 Mobile Databases
 Data Warehousing and Mining
 Geographic Information Systems
 Genome Data Management
 Temporal Databases
 Spatial Databases
WHAT YOU WILL BE ABLE TO
LEARN MORE ABOUT
The Database Certificate Program
 Database Concepts

 Data Modeling

 Relational Database Design

 Performance Issues & Tuning in Relational

Databases
 Data Warehousing and Mining

Anda mungkin juga menyukai