Outline
Database system
Advantages of database system
Data base abstraction
Data models
Instances and schemes
Data independence.
Data definition language
Data manipulation languages
Data base manager
Data base administrator and users
Overall system structure
Database Applications:
Banking: transactions
Airlines: reservations, schedules
Universities: registration, grades
Sales: customers, products, purchases
Online retailers: order tracking, customized recommendations
Manufacturing: production, inventory, orders, supply chain
Human resources: employee records, salaries, tax deductions
Data isolation
Integrity problems
For example, the balance of a bank account may never fall below a
prescribed amount (say, $25).
Atomicity of updates
In many applications, it is crucial that, if a failure occurs,
the data be restored to the consistent state that existed
prior to the failure.
Consider a program to transfer $50 from account A to account
B.
If a system failure occurs during the execution of the program, it is
possible that the $50 was removed from account A but was not
credited to account B, resulting in an inconsistent database state.
Security problems
Hard to provide user access to some, but not all, data.
Levels of Abstraction
Physical level: How
The lowest level of abstraction describes how the data are actually
stored. The physical level describes complex low-level data structures
in detail.
application programs hide details of data types. Views can also hide
information (such as an employees salary) for security purposes.
Physical schema
schema the overall physical structure of the
database
Instance the actual content of the database at a
particular point in time
Analogous to the value of a variable
Data Models
A collection of tools for describing
Data
Data relationships
Data semantics
Data constraints
1. Entity-Relationship data model (mainly for database
design)
2.Relational model (mainly for database implementation)
3.Object-based data models (Object-oriented and Objectrelational database implementation)
4.Semistructured data model (XML for file format
transformations)
5.Other older models:
1. Network model
2.Hierarchical model
The set of all entities of the same type and the set of all
relationships of the same type are termed an entity set and
relationship set, respectively.
E-R Diagrams
The overall logical structure (schema) of a database can
be expressed graphically by an E-R diagram, which is
built up from the following components:
Relational Model
The relational model revolves around a
fundamental data structure called a table,
which is a formalization of the intuitive
notion of a table.
Informally, the relational model consists
of:
A class of data structures referred to
as tables.
A collection of methods for building
new tables starting from an initial
collection of tables;
we refer to these methods as relational
algebra operations.
Columns
Rows
The relational model is at a lower level of abstraction than the E-R model. Database
designs are often carried out in the E-R model, and then translated to the relational
model;
Database Languages
A database system provides a data definition language to
specify the database schema and a data manipulation
language to express database queries and updates.
SQL Query
The most widely used commercial Query
language
SQL is NOT a Turing machine equivalent
language.
To be able to compute complex functions SQL
is usually embedded in some higher-level
language
Application programs generally access
databases through one of
Language extensions to allow embedded SQL
Application program interface (e.g.,
ODBC/JDBC) which allow SQL queries to be
sent to a database
Database Design
The process of designing the general structure of the database:
Design Approaches
Need to come up with a methodology to ensure
that each of the relations in the database is
good
Two ways of doing so:
Entity Relationship Model
Models an enterprise as a collection of entities and
relationships
Represented diagrammatically by an entityrelationship diagram:
Normalization Theory
Formalize what designs are bad, and test for them
Storage or Memory
manager
Query processing
Transaction
manager
Storage Management
Storage manager is a program module that
provides the interface between the low-level data
stored in the database and the application programs
and queries submitted to the system.
The storage manager is responsible for the
interaction with the file manager.
The storage manager translates the various DML
statements into low-level file-system commands.
Thus, the storage manager is responsible for storing,
retrieving, and updating data in the database.
Query Processing
The query processor components
include
Transaction Management
Consider following questions pertaining to state of database
Atomicity
Consistency
Integrity
Durability
Database
Database Architecture
The architecture of a
database systems is
greatly influenced by the
underlying computer
system on which the
database is running:
Centralized
Client-server
Parallel (multi-processor)
Distributed
History (cont.)
1980s:
1990s:
Early 2000s:
Later 2000s:
Keys
In order to talk about a specific student, you have to be
able to identify him.
As long as no two students have the same name, one
can use the name attribute as a key.
Key an attribute, or a set of attributes, that uniquely
identifies each entity in a collection is generally a
necessity for electronic databases.
In the college database, the value of the attribute stno is
sufficient to identify a student entity. Since the set stno
has no proper, nonempty subsets, it clearly satisfies the
minimality condition and, therefore, it is a key for the
STUDENTS entity set.
For the entity set COURSES both cno and cname are
keys?
Types of Keys
What can be keys for entity set Patrons and Books and relationship
set Loans?
We can consider all the set of attributes as one single key Super
Key - to uniquely identify each entity in a entity set.
Also it is possible to have several different set of attribute
combinations as keys for a set of entities each uniquely identifying
each and every entity; all these keys are called as Candidate Keys.
One of these keys is chosen as the primary key; the remaining
keys are alternate keys.
The primary key of a set of entities E is used by other constituents
of the E/R model to refer to the entities of E, and this primary key
is included in the other constituents as a Foreign Key.
The identification of the primary key and of the alternate keys is a
semantic statement:
It reflects our understanding of the role played by various
attributes in the real world.
In other words, choosing the primary key from among the
available keys is a choice of the designer.
The definition of keys for sets of relationships is completely parallel
to the definition of keys for sets of entities.
For example in an entity called STUDENT_INFO, school_ID composed of numbers would be a better choice for a
primary key than first_name or last_name.
No Change Over Time For a primary key avoid semantic data because it can change overtime.
If primary keys are changed then the foreign keys must be updated as well.
Since primary keys are the identity of the table or entity, it should be permanent and unchangeable.
Single-Attribute The primary key should be composed of only one attribute, however this is not
required.
If the primary key is a composite primary key ( one made up of multiple attribute), it will cause the primary keys of
other entities to have multiple attributes as well.
Preferable Numeric Primary keys are easier and better managed when they are composed of
mostly numeric data.
This is useful because when new data is being entered, the database MS can employ a counter style attribute where with
each new entry, the database program generates a number then increments the number by one automatically for the
next entry.
Security Complaint The selected primary must not be an attribute that is considered sensitive
information
Example it would not be a good idea to set a social security number of a person as a primary key .
Participation Constraints
The E/R model allows us to impose constraints on the
number of relationships in which an entity is allowed to
participate.
If (E, u, v,R) is a participation constraint we may add u : v
to whatever other labels may be on the edge joining E to R.
When there is no upper limit to the number of
relationships in which an entity may participate, we write
u : +.
If every student must choose an advisor, and an instructor
may not advise more than 7 students, we have the
participation constraints
(STUDENTS, 1, 1, ADVISING)
and
(INSTRUCTORS, 0, 7, ADVISING)
Types of Participatory
Constraints
Example, the distinguishing feature among employee entities can be the job the employee
performs. Another, coexistent, specialization could be based on whether the person is a
temporary (limited-term) employee or a permanent employee, resulting in the entity sets
temporary-employee and permanent-employee.
In terms of an E-R diagram, specialization is
depicted by a triangle component labeled ISA.
The label ISA stands for is a and represents,
for example, that a customer is a person.
The ISA relationship may also be referred to as
a superclass-subclass relationship.
Higher- and lower-level entity sets are depicted
as regular entity sets
Before Aggregation
Quadratic Relationship
Review - Concepts
Relational Model is made up of tables
A row of table
A column of table
A table
Cardinality
Degree
a relational instance/tuple
an attribute
a schema/relation
number of rows
number of columns
Review - Example
Attribute
SID
Name
Major
GPA
1234
John
CS
2.8
5678
Mary
EE
3.6
4 Degree
A Schema / Relation
Cardinality = 2
tuple/relational
instance
Name
SSN
Advisor
Student
Major
Name
Professor
Dept
GPA
SID
Name Major
GPA
SSN
Name
Dept
1234
John
CS
2.8
9999
Smith
Math
5678
Mary
EE
3.6
8888
Lee
CS
Student
Major
Age
Name
owns
Name
Children
GPA
Age
Name
Parent_SID
10
Bart
1234
Lisa
5678
Identifying Relationship
No relational model representation necessary
Name
Student
Major
Degree
ID Code
study
Major
GPA
SID
Maj_ID Co
S_Degree
9999
07
1234
8888
05
5678
Name
Condition
1:1 Relationship
Student
Major
Have
S/N #
Laptop
GPA
Brand
SID
Name
Major
GPA
LP_S/N
Hav_Cond
9999
Bart
Economy
-4.0
123-456
Own
8888
Lisa
Physics
4.0
567-890
Loan
Name
N:1 Relationship
SSN
Advisor
Student
Major
Semester
Professor
GPA
Dept
Name
SID
Name
Major
GPA
Pro_SSN
Ad_Sem
9999
Bart
Economy
-4.0
123-456
Fall 2006
8888
Lisa
Physics
4.0
567-890
Fall 2005
D-Attribute
E-Set 1
P-Key2
A relationship
A-Key
Another Set
E-Set 2
P-Key3
E-Set 3
P-Key1
P-Key2
P-Key3
A-Key
D-Attribute
9999
8888
7777
6666
Yes
1234
5678
9012
3456
No
Name
Professor
Address
Street
City
SSN
Name
Street
City
9999
Dr. Smith
50 1st St.
Fake City
8888
Dr. Lee
1 B St.
San Jose
Name
Children
Student
Major
GPA
SID
Name
Major
GPA
1234
John
CS
2.8
5678
Homer
EE
3.6
Stud_SID
Children
1234
Johnson
1234
Mary
5678
Bart
5678
Lisa
5678
Maggie
Class Hierarchy
Example 1
SID
SSN
Name
Person
Status
Gender
ISA
Student
Major
GPA
SSN
SID
Status
Major
GPA
1234
9999
Full
CS
2.8
5678
8888
Part
EE
3.6
SSN
Name
Gender
1234
Homer
Male
5678
Marge
Female
Class Hierarchy
Example 2
SSN
Name
SJSU people
ISA
SID
Student
Major
Faculty
GPA
SSN
Name
SID
Major
GPA
SSN
Name
Dept
1234
John
9999
CS
2.8
1234
Homer
C.S.
5678
Mary
8888
EE
3.6
5678
Marge
Math
Representing Aggregation
Name
SSN
Advisor
Student
Name
Professor
Dept
SID
Name
member
Primary Key of Advisor
Dept
SID
Code
1234
04
5678
08
Code
Primary key of Dept
Normalization Definition
Levels of Normalization
Redundancy
Complexity
Most
Mostdatabases
databasesshould
shouldbe
be3NF
3NFororBCNF
BCNFininorder
ordertotoavoid
avoidthe
thedatabase
databaseanomalies.
anomalies.
Levels of Normalization
1NF
2NF
3NF
4NF
5NF
DKNF
Each
Eachhigher
higherlevel
levelisisaasubset
subsetofofthe
thelower
lowerlevel
level
Title
AuName
AuPhone
PubName
PubPhone
Price
0-321-32132-1
Balloon
Sleepy,
Snoopy,
Grumpy
321-321-1111,
232-234-1234,
665-235-6532
Small House
714-000-0000
$34.00
0-55-123456-9
Main Street
Jones,
Smith
123-333-3333,
654-223-3455
Small House
714-000-0000
$22.95
0-123-45678-0
Ulysses
Joyce
666-666-6666
Alpha Press
999-999-9999
$34.00
1-22-233700-0
Visual
Basic
Roman
444-444-4444
Big House
123-456-7890
$25.00
Author
Authorand
andAuPhone
AuPhonecolumns
columnsare
arenot
notscalar
scalar
1NF - Decomposition
1.
2.
3.
Example (1NF)
ISBN
AuName
AuPhone
0-321-32132-1
Sleepy
321-321-1111
ISBN
Title
PubName
PubPhone
Price
0-321-32132-1
Snoopy
232-234-1234
0-321-32132-1
Balloon
Small House
714-000-0000
$34.00
0-321-32132-1
Grumpy
665-235-6532
0-55-123456-9
Main Street
Small House
714-000-0000
$22.95
0-55-123456-9
Jones
123-333-3333
0-123-45678-0
Ulysses
Alpha Press
999-999-9999
$34.00
0-55-123456-9
Smith
654-223-3455
1-22-233700-0
Visual
Basic
Big House
123-456-7890
$25.00
0-123-45678-0
Joyce
666-666-6666
1-22-233700-0
Roman
444-444-4444
Functional Dependencies
If one set of attributes in a table determines
another set of attributes in the table, then the
second set of attributes is said to be
functionally dependent on the first set of
attributes.
Example 1
ISBN
Title
Price
0-321-32132-1
Balloon
$34.00
0-55-123456-9
Main Street
$22.95
0-123-45678-0
Ulysses
$34.00
1-22-233700-0
Visual
Basic
$25.00
Functional Dependencies
Example 2
PubID
PubName
PubPhone
Big House
999-999-9999
Small House
123-456-7890
Alpha Press
111-111-1111
Example 3
AuID
AuName
AuPhone
Sleepy
321-321-1111
Snoopy
232-234-1234
Grumpy
665-235-6532
Jones
123-333-3333
Smith
654-223-3455
Joyce
666-666-6666
Roman
444-444-4444
Table
Scheme:
{PubID,
PubName,
PubPhone}
Functional Dependencies: {PubId}
{PubPhone}
{PubId}
{PubName}
{PubName, PubPhone}
{PubID}
Table Scheme: {AuID, AuName, AuPhone}
Functional
Dependencies:
{AuId}
{AuPhone}
{AuId}
{AuName}
{AuName, AuPhone} {AuID}
FD Example
Database to track reviews of papers submitted to an
academic conference. Prospective authors submit papers
for review and possible acceptance in the published
conference proceedings. Details of the entities
FD Example
Functional Dependencies
AuthNo
AuthName,
AuthEmail,
AuthAddress
AuthEmail AuthNo
PaperNo Primary-AuthNo, Title, Abstract,
Status
RevNo RevName, RevEmail, RevAddress
RevEmail RevNo
RevNo, PaperNo AuthComm, Prog-Comm,
Date, Rating1, Rating2, Rating3, Rating4,
Rating5
{City,
CityPopulation}
1.
2.
3.
4.
5.
Street,
HouseNumber,
HouseColor,
2NF - Decomposition
1.
2.
3.
2NF - Decomposition
Example 2 (Convert to 2NF)
Old Scheme {Studio, Movie, Budget, StudioCity}
New Scheme {Movie, Studio, Budget}
New Scheme {Studio, City}
BuildingI
D
100
150
Contractor
Randolp
h
Ingersol
l
Randolp
200
Primary Key {BuildingID}
h
{BuildingID} {Contractor}
250
Pitkin
{Contractor} {Fee}
300
Randolp
h
{BuildingID} {Fee}
Fee transitively depends on the BuildingID
Both Contractor and Fee depend on the entire key hence 2NF
Fee
120
0
110
0
120
0
110
0
120
0
3NF - Decomposition
1.
2.
3.
3NF - Decomposition
Example 2 (Convert to 3NF)
Old Scheme {Studio, StudioCity, CityTemp}
New Scheme {Studio, StudioCity}
New Scheme {StudioCity, CityTemp}
Contractor
Contractor
150
Randolp
h
Ingersol
l
Randolp
Randolp
h
Ingersol
l
Pitkin
250
h
Pitkin
BuildingI
D
100
300
Randolp
h
Fee
120
0
110
0
110
0
candidate keys.
BCNF is a refinement of the third normal form in which it drops the restriction
of a non-key attribute from the 3rd normal form.
Third normal form and BCNF are not same if the following conditions are true:
BCNF - Decomposition
1.
2.
3.
4.
Use
Useyour
yourown
ownjudgment
judgmentwhen
whendecomposing
decomposingschemas
schemas
BCNF - Decomposition
Example 2 (Convert to BCNF)
Old Scheme {MovieTitle, MovieID, PersonName, Role, Payment }
New Scheme {MovieID, PersonName, Role, Payment}
New Scheme {MovieTitle, PersonName}
Bill Durham
Durham
Drama
The Code
Warrier
New York
Horror
Manager
Jim
Child
Beth
Bob
Primary Key {Manager, Child, Employee} Mary
NULL
Each manager can have more than one childMary
Each manager can supervise more than one employee
4NF Violated
Employe
e
Alice
Jane
Adam
Skill
1234
Cooking
Languag
e
French
1234
Cooking
German
1453
Spanish
1453
Carpentr
y
Cooking
2345
Cooking
Spanish
Spanish
4NF - Decomposition
1.
2.
Genre
Movie
ScreeningCi
ty
Hard Code
Comedy
Hard Code
Bill Durham
Drama
Hard Code
New York
The Code
Warrier
Horror
Bill Durham
Santa Cruz
Bill Durham
Durham
The Code
Warrier
New York
Los Angles
4NF - Decomposition
Example 2 (Convert to 4NF)
Manager
Child
Beth
Jim
Employe
e
Alice
Bob
Mary
Jane
Mary
Adam
Manager
Mary
Skill
Employe
e
1234
Languag
e
French
1453
1234
German
1453
Carpentr
y
Cooking
1453
Spanish
2345
Cooking
2345
Spanish
Cooking
Rule Zero
The system must qualify as relational, as a
database, and as a management system.
For a system to qualify as a relational database
management system (RDBMS), that system
must use its relational facilities (exclusively) to
manage the database.