Anda di halaman 1dari 110

Unit 1: Understand the concept

of DBMS & RDBMS


By
Asmatullah Khan,
CL/CP, GIOE,
Secunderabad.

Outline

Database system
Advantages of database system
Data base abstraction
Data models
Instances and schemes
Data independence.
Data definition language
Data manipulation languages
Data base manager
Data base administrator and users
Overall system structure

Entity and entity sets


Relationship and relationship sets
Super key , candidate key and
primary key
Mapping constraints
Reducing ER diagrams to tables
Generalization, specialization and
aggregation
Functional dependencies
Normalization 1st NF, 2nd NF, 3rd
NF
E.F.Codds rules for RDBMS

Database Management System (DBMS)


A database can be summarily described as a repository for data.

A database management system (DBMS) is an aggregate of data,


hardware, software, and users that helps an enterprise manage its
operational data.
DBMS contains information about a particular enterprise
Collection of interrelated data
Set of programs to access the data
An environment that is both convenient and efficient to use.

Database Applications:

Banking: transactions
Airlines: reservations, schedules
Universities: registration, grades
Sales: customers, products, purchases
Online retailers: order tracking, customized recommendations
Manufacturing: production, inventory, orders, supply chain
Human resources: employee records, salaries, tax deductions

Databases can be very large.


Databases touch all aspects of our lives

Drawbacks of using file systems to store data


Data redundancy and inconsistency

Multiple file formats, duplication of information in


different files

Since different programmers create the files and application


programs over a long period, the various files are likely to
have different formats and the programs may be written in
several programming languages.
Moreover, the same information may be duplicated in several
places (files). For example, the address and telephone number
of a particular customer may appear in a file that consists of
savings-account records and in a file that consists of checkingaccount records.
This redundancy leads to higher storage and access cost. In
addition, it may lead to data inconsistency;
that is, the various copies of the same data may no longer agree.

Difficulty in accessing data

Need to write a new program to carry out each new task

Drawbacks of using file systems to store data (Cont.)

Data isolation

Multiple files and formats

Because data are scattered in various files, and files may be in


different formats, writing new application programs to retrieve
the appropriate data is difficult.

Integrity problems

Integrity constraints (e.g., account balance > 0) become


buried in program code rather than being stated explicitly
Hard to add new constraints or change existing ones
The data values stored in the database must satisfy certain
types of consistency constraints.

For example, the balance of a bank account may never fall below a
prescribed amount (say, $25).

Developers enforce these constraints in the system by adding


appropriate code in the various application programs.
However, when new constraints are added, it is difficult to
change the programs to enforce them.

Drawbacks of using file systems to store data (Cont.)

Atomicity of updates
In many applications, it is crucial that, if a failure occurs,
the data be restored to the consistent state that existed
prior to the failure.
Consider a program to transfer $50 from account A to account
B.
If a system failure occurs during the execution of the program, it is
possible that the $50 was removed from account A but was not
credited to account B, resulting in an inconsistent database state.

Concurrent access by multiple users

Security problems
Hard to provide user access to some, but not all, data.

Database systems offer solutions to all the above


problems and thus are some of its major
advantages.

View of Data and its Abstraction


The need for efficiency has led
designers to use complex data
structures to represent data in the
database.
Since many database-systems
users are not computer trained,
developers hide the complexity
from users through several levels
of abstraction

Levels of Abstraction
Physical level: How

describes how a record (e.g., instructor) is stored.

The lowest level of abstraction describes how the data are actually
stored. The physical level describes complex low-level data structures
in detail.

Logical level: What

describes data stored in database, and the relationships among the


data.
describes the entire database in terms of a small number of relatively
simple structures
type instructor = record
ID : string;
name : string;
dept_name : string;
salary : integer;
end;

View level: Who

application programs hide details of data types. Views can also hide
information (such as an employees salary) for security purposes.

Instances and Schemas


Similar to types and variables in programming languages
Logical Schema the overall logical structure of the
database

Example: The database consists of information about a set of


customers and accounts in a bank and the relationship
between them
Analogous to type information of a variable in a program

Physical schema
schema the overall physical structure of the
database
Instance the actual content of the database at a
particular point in time
Analogous to the value of a variable

Physical Data Independence the ability to modify the


physical schema without changing the logical schema
Applications depend on the logical schema
In general, the interfaces between the various levels and
components should be well defined so that changes in some
parts do not seriously influence others.

Data Models
A collection of tools for describing
Data
Data relationships
Data semantics
Data constraints
1. Entity-Relationship data model (mainly for database
design)
2.Relational model (mainly for database implementation)
3.Object-based data models (Object-oriented and Objectrelational database implementation)
4.Semistructured data model (XML for file format
transformations)
5.Other older models:
1. Network model
2.Hierarchical model

Entity - Relational Model


The entity-relationship (E-R) data model is based on a
perception of a real world that consists of a collection of basic
objects, called entities, and of relationships among these
objects.
An entity is a thing or object in the real world that is
distinguishable from other objects.
For example, each person is an entity, and bank accounts can be
considered as entities.

Entities are described in a database by a set of attributes.

For example, the attributes account-number and balance may


describe one particular account in a bank, and they form
attributes of the account entity set.

A relationship is an association among several entities.

For example, a depositor relationship associates a customer with


each account that she has.

The set of all entities of the same type and the set of all
relationships of the same type are termed an entity set and
relationship set, respectively.

E-R Diagrams
The overall logical structure (schema) of a database can
be expressed graphically by an E-R diagram, which is
built up from the following components:

Rectangles, which represent entity sets


Ellipses, which represent attributes
Diamonds, which represent relationships among entity sets
Lines, which link attributes to entity sets and entity sets to
relationships

In addition to entities and relationships, the E-R model


represents certain constraints to which the contents of a
database must conform.
One important constraint is mapping cardinalities,
which express the number of entities to which another
entity can be associated via a relationship set.
For example, if each account must belong to only one
customer, the E-R model can express that constraint.

Sample E-R Diagram


The E-R diagram indicates that there are two
entity sets, customer and account, with
attributes as outlined earlier.
The diagram also shows a relationship depositor
between customer and account.

Relational Model
The relational model revolves around a
fundamental data structure called a table,
which is a formalization of the intuitive
notion of a table.
Informally, the relational model consists
of:
A class of data structures referred to
as tables.
A collection of methods for building
new tables starting from an initial
collection of tables;
we refer to these methods as relational
algebra operations.

A collection of constraints imposed on


the data contained in tables.
The relational model uses a collection of
tables to represent both data and the
relationships among those data.
Each table has multiple columns, and
each column has a unique name.

Columns

Rows

A Sample Relational Database

The relational model is at a lower level of abstraction than the E-R model. Database
designs are often carried out in the E-R model, and then translated to the relational
model;

Database Languages
A database system provides a data definition language to
specify the database schema and a data manipulation
language to express database queries and updates.

Two classes of languages


Pure used for proving properties about computational power and for optimization
Relational Algebra
Tuple relational calculus
Domain relational calculus
Commercial used in commercial systems
SQL is the most widely used commercial language

In practice, the data definition and data manipulation


languages are not two separate languages; instead they simply
form parts of a single database language, such as the widely
used SQL language.
The commands in the language are classified into different
categories based on their functional implementation

DDL Data Definition Language


DML Data Manipulation Language
DCL Data Control Language
TCL Transaction Control Language

Data Definition Language (DDL)


data storage and definition language.
These statements define the implementation details of the database
schemas, which are usually hidden from the users.
The data values stored in the database must satisfy certain consistency
constraints.
Specification notation for defining the database schema
Example:
create table instructor (
ID
char(5),
name
varchar(20),
dept_name varchar(20),
salary
numeric(8,2))
Execution of the above DDL statement creates the account table.
In addition, it updates a special set of tables called the data dictionary or
data directory.
Data dictionary contains metadata (i.e., data about data)
Database schema
Integrity constraints
Primary key (ID uniquely identifies instructors)
Authorization
Who can access what

Data Manipulation Language (DML)


Data manipulation is

The retrieval of information stored in the database


The insertion of new information into the database
The deletion of information from the database
The modification of information stored in the database

A data-manipulation language (DML) is a language that enables users to


access or manipulate data as organized by the appropriate data model.
There are basically two types:
Procedural DMLs require a user to specify what data are needed and how to get those
data.
Declarative DMLs (also referred to as nonprocedural DMLs) require a user to
specify what data are needed without specifying how to get those data.

The DML component of the SQL language is nonprocedural.


A query is a statement requesting the retrieval of information. The portion of a
DML that involves information retrieval is called a query language.

SQL Query
The most widely used commercial Query
language
SQL is NOT a Turing machine equivalent
language.
To be able to compute complex functions SQL
is usually embedded in some higher-level
language
Application programs generally access
databases through one of
Language extensions to allow embedded SQL
Application program interface (e.g.,
ODBC/JDBC) which allow SQL queries to be
sent to a database

Sample SQL Query


This query in the SQL language
finds the name of the customer
whose customer-id is 192-83-7465:
select customer.customer-name from
customer
where customer.customer-id = 192-83-7465

The query specifies that those rows


from the table customer where the
customer-id is 192-83-7465 must be
retrieved, and the customer-name
attribute of these rows must be
displayed.

What is result of following Query


select account.balance from depositor, account where
depositor.customer-id = 192-83-7465 and depositor.accountnumber = account.account-number

Database Design
The process of designing the general structure of the database:

Logical Design Deciding on the database schema.


Database design requires that we find a good
collection of relation schemas.
Business decision What attributes should we record in
the database?
Computer Science decision What relation schemas
should we have and how should the attributes be
distributed among the various relation schemas?

Physical Design Deciding on the physical layout of


the database

Database Design (Cont.)


Is there any problem with this relation?

Design Approaches
Need to come up with a methodology to ensure
that each of the relations in the database is
good
Two ways of doing so:
Entity Relationship Model
Models an enterprise as a collection of entities and
relationships
Represented diagrammatically by an entityrelationship diagram:

Normalization Theory
Formalize what designs are bad, and test for them

Object-Relational Data Models


Relational model: flat, atomic values
Object Relational Data Models
Extend the relational data model by including object
orientation and constructs to deal with added data
types.
Allow attributes of tuples to have complex types,
including non-atomic values such as nested relations.
Preserve relational foundations, in particular the
declarative access to data, while extending modeling
power.
Provide upward compatibility with existing relational
languages.

XML: Extensible Markup Language


Defined by the WWW Consortium (W3C)
Originally intended as a document markup
language not a database language
The ability to specify new tags, and to create
nested tag structures made XML a great way to
exchange data, not just documents
XML has become the basis for all new generation
data interchange formats.
A wide variety of tools is available for parsing,
browsing and querying XML documents/data

Database Manager or Engine

Storage or Memory
manager
Query processing
Transaction
manager

Storage Management
Storage manager is a program module that
provides the interface between the low-level data
stored in the database and the application programs
and queries submitted to the system.
The storage manager is responsible for the
interaction with the file manager.
The storage manager translates the various DML
statements into low-level file-system commands.
Thus, the storage manager is responsible for storing,
retrieving, and updating data in the database.

Storage manage components


The storage manager components include:
Authorization and integrity manager, which tests for the satisfaction of
integrity constraints and checks the authority of users to access data.
Transaction manager, which ensures that the database remains in a
consistent (correct) state despite system failures, and that concurrent transaction
executions proceed without conflicting.
File manager, which manages the allocation of space on disk storage and the
data structures used to represent information stored on disk.
Buffer manager, which is responsible for fetching data from disk storage into
main memory, and deciding what data to cache in main memory. The buffer
manager is a critical part of the database system, since it enables the database to
handle data sizes that are much larger than the size of main memory.

The storage manager implements several data structures as part of the


physical system implementation:
Data files, which store the database itself.
Data dictionary, which stores metadata about the structure of the database, in
particular the schema of the database.
Indices, which provide fast access to data items that hold particular values.

Query Processing
The query processor components
include

DDL interpreter, which


interprets DDL statements and
records the definitions in the data
dictionary.
DML compiler, which translates
DML statements in a query
language into an evaluation plan
consisting of low-level instructions
that the query evaluation engine
understands.

The DML compiler also performs


query optimization, that is, it picks
the lowest cost evaluation plan
from among the alternatives.

Query evaluation engine,


which executes low-level
instructions generated by the DML
compiler.

Transaction Management
Consider following questions pertaining to state of database

What if the system fails?


What if more than one user is concurrently updating the same data?

A transaction is a collection of operations that performs a single


logical function in a database application.
Each transaction is a unit of both atomicity and consistency. Thus, we
require that transactions do not violate any database-consistency
constraints such as:

Atomicity
Consistency
Integrity
Durability

Ensuring the atomicity and durability properties is the responsibility


of the database system itselfspecifically, of the transactionmanagement component which has following components.
failure recovery that detect system failures and restore database to
state that existed prior to the occurrence of failure.
concurrency-control manager to control interaction among
concurrent transactions, to ensure consistency of the database.

Database Users and Administrators


The community of users of a DBMS are
classified based on their roles and
interests in accessing and managing the
databases.
Once a database is created, it is the job
of the Database Administrator to make
decisions about nature of data to be
stored in the database, the access
policies to be enforced monitoring and
tuning the performance of the
database, etc.
End Users have limited access rights,
and they need to have only minimal
technical knowledge of the database.
Application Programmers their role is
to work within existing DBMS systems
and, using a combination of the query
languages and higher-level languages,
to create various reports based on the
data contained in the database.

Database

Database Architecture
The architecture of a
database systems is
greatly influenced by the
underlying computer
system on which the
database is running:
Centralized
Client-server
Parallel (multi-processor)
Distributed

History of Database Systems


1950s and early 1960s:

Data processing using magnetic tapes for storage


Tapes provided only sequential access

Punched cards for input

Late 1960s and 1970s:

Hard disks allowed direct access to data


Network and hierarchical data models in
widespread use
Ted Codd defines the relational data model

Would win the ACM Turing Award for this work


IBM Research begins System R prototype
UC Berkeley begins Ingres prototype

High-performance (for the era) transaction


processing

History (cont.)
1980s:

Research relational prototypes evolve into


commercial systems
SQL becomes industrial standard

Parallel and distributed database systems


Object-oriented database systems

1990s:

Large decision support and data-mining


applications
Large multi-terabyte data warehouses
Emergence of Web commerce

Early 2000s:

XML and XQuery standards


Automated database administration

Later 2000s:

Giant data storage systems

Google BigTable, Yahoo PNuts, Amazon, ..

Database design through E-R Model based diagrams

The E/R model uses the notions of entity, relationship,


and attribute.
Database of the college used for our running example
reflects the following information:
Students: any student who has ever registered at the
college;
Instructors: anyone who has ever taught at the college;
Courses: any course ever taught at the college;
Advising: which instructor currently advises which
student, and
Grades: the grade received by each student in each
course, including the semester and the instructor.

Individual entities and individual relationships are


grouped into homogeneous sets of entities
(STUDENTS, COURSES, and INSTRUCTORS) and
homogeneous sets of relationships (ADVISING,
GRADES).
STUDENTS represent all the student entities, and
ADVISING, all the individual advising relationships.

We refer to such sets as entity sets and relationship


sets, respectively.

The notion of role helps explain


the significance of entities in
relationships.
Roles appear as labels of the
edges of the E/R diagram.

Attributes - properties of entities and relationships are


described by attributes.
Each attribute A has an associated set of values, which we
refer to as the domain of A and denote by Dom(A).
The set of attributes of a set of entities E is denoted by
Attr(E); similarly, the set of attributes of a set of relationships
R is denoted by Attr(R).
Domains of attributes consist of atomic values.

Means that the elements of such domains must be simple


values such as integers, dates, or strings of characters.

If s is a student entity, then the values associated to s are


denoted by
stno(s), name(s), addr(s), city(s), state(s), zip(s).

DBMS must support attribute domains.


Such support includes validity checks
and implementation of operations
specific to the domains such as:
string concatenation for strings of
characters,
various computations involving
dates, and
arithmetic operations on numeric
domains.

E/R Diagram for College Database

Keys
In order to talk about a specific student, you have to be
able to identify him.
As long as no two students have the same name, one
can use the name attribute as a key.
Key an attribute, or a set of attributes, that uniquely
identifies each entity in a collection is generally a
necessity for electronic databases.
In the college database, the value of the attribute stno is
sufficient to identify a student entity. Since the set stno
has no proper, nonempty subsets, it clearly satisfies the
minimality condition and, therefore, it is a key for the
STUDENTS entity set.

For the entity set COURSES both cno and cname are
keys?

Types of Keys
What can be keys for entity set Patrons and Books and relationship
set Loans?
We can consider all the set of attributes as one single key Super
Key - to uniquely identify each entity in a entity set.
Also it is possible to have several different set of attribute
combinations as keys for a set of entities each uniquely identifying
each and every entity; all these keys are called as Candidate Keys.
One of these keys is chosen as the primary key; the remaining
keys are alternate keys.
The primary key of a set of entities E is used by other constituents
of the E/R model to refer to the entities of E, and this primary key
is included in the other constituents as a Foreign Key.
The identification of the primary key and of the alternate keys is a
semantic statement:
It reflects our understanding of the role played by various
attributes in the real world.
In other words, choosing the primary key from among the
available keys is a choice of the designer.
The definition of keys for sets of relationships is completely parallel
to the definition of keys for sets of entities.

Some Notational Types of


Keys
Composite Primary Key a
primary key that is made up of
more than one attribute.
Surrogate Primary Key a system
assigned primary key generally
numeric and auto-incremented
Natural Key a real world,
generally accepted identifier used
to distinguish real world objects
Candidate Key- a minimal
superkey that does not contain a
subset of attributes that it itself a
superkey
Foreign Key an attribute in one
table whose values must match
the primary key in another table.

Characteristics Primary Key


A primary key is an attribute that uniquely identifies the entity that it resides in.
For a primary key to be useful and functional there are several characteristics that should
be followed:
Non-Null values Primary key attributes cannot empty i.e., Primary key should contain some
value and cannot be null.
Unique values The primary key values must be unique, as it identifies each entity of the table.
Nonintelligent -The primary key should be fact less i.e., it cannot be composed of semantic data.

For example in an entity called STUDENT_INFO, school_ID composed of numbers would be a better choice for a
primary key than first_name or last_name.

No Change Over Time For a primary key avoid semantic data because it can change overtime.

If primary keys are changed then the foreign keys must be updated as well.
Since primary keys are the identity of the table or entity, it should be permanent and unchangeable.

Single-Attribute The primary key should be composed of only one attribute, however this is not
required.

If the primary key is a composite primary key ( one made up of multiple attribute), it will cause the primary keys of
other entities to have multiple attributes as well.

Preferable Numeric Primary keys are easier and better managed when they are composed of
mostly numeric data.

This is useful because when new data is being entered, the database MS can employ a counter style attribute where with
each new entry, the database program generates a number then increments the number by one automatically for the
next entry.

Security Complaint The selected primary must not be an attribute that is considered sensitive
information
Example it would not be a good idea to set a social security number of a person as a primary key .

Participation Constraints
The E/R model allows us to impose constraints on the
number of relationships in which an entity is allowed to
participate.
If (E, u, v,R) is a participation constraint we may add u : v
to whatever other labels may be on the edge joining E to R.
When there is no upper limit to the number of
relationships in which an entity may participate, we write
u : +.
If every student must choose an advisor, and an instructor
may not advise more than 7 students, we have the
participation constraints
(STUDENTS, 1, 1, ADVISING)
and
(INSTRUCTORS, 0, 7, ADVISING)

Types of Participatory
Constraints

The set of relationships R from U to V is:


1. one-to-one if p = 0, q = 1 and m = 0, n = 1;
2. one-to-many if p = 0, q > 1 and m = 0, n = 1;
3. many-to-one if p = 0, q = 1 and m = 0, n > 1;
4. many-to-many if p = 0, q > 1 and m = 0, n > 1.
A recursive relationship is a binary relationship connecting a set of entities to itself.

Weak and Strong Entities and Identity Relationship


Suppose that we need to expand our database by
adding information about student loans by adding a
set of entities called LOANS.
The existence of a loan entity in the E/R model of the
college database is conditioned upon the existence of
a student entity corresponding to the student to
whom that loan was awarded, this type of
dependency is called an existence dependency.
An entity is said to be existence dependent on other
entity when the entitys (Weak Entity) existence solely
depends on the existence of the other entity (Strong
Entity).

E is a set of weak entities if the following conditions


are satisfied:
1. The set of entities E does not have a key, and
2. the participation constraint (E, 1, k, R) is satisfied for
some k 1.

Weak entity sets are represented in E/R diagrams by


dashed boxes.
No Weak entity can exist in E unless it is involved in
a relationship of R with a Strong entity of E and
such a relationship is called as Identity relationship,
where the primary key of strong entity set is added as
a foreign key to the existing set of attributes of Weak
entity.

If a student entity is deleted,


the LOANS entities that depend
on the student entity should also
be removed.
Note that the attributes of the
LOANS entity set (source,
amount, year) are not sufficient
to identify an entity in this set.

Enhanced E/R features Specialization


Refinement from an initial entity set into successive levels of entity subgroupings
represents a top-down design process in which distinctions are made explicit.
The process of designating subgroupings within an entity set is called
specialization.
The specialization of an entity set person, with attributes name, street, and city;
allows us to distinguish among persons according to whether they are employees
with employee-id and salary; or customers with customer-id.
An entity set may be specialized by more than one distinguishing feature.

Example, the distinguishing feature among employee entities can be the job the employee
performs. Another, coexistent, specialization could be based on whether the person is a
temporary (limited-term) employee or a permanent employee, resulting in the entity sets
temporary-employee and permanent-employee.
In terms of an E-R diagram, specialization is
depicted by a triangle component labeled ISA.
The label ISA stands for is a and represents,
for example, that a customer is a person.
The ISA relationship may also be referred to as
a superclass-subclass relationship.
Higher- and lower-level entity sets are depicted
as regular entity sets

Enhanced E/R features Aggregation


One limitation of the E-R model is that it cannot express
relationships among relationships.
Consider a E/R diagram representation of Banking
database system, where consider the ternary relationship
works-on, between a employee, branch, and job.
Now, suppose we want to record managers for tasks
performed by an employee at a branch; that is, we want to
record managers for (employee, branch, job)
combinations. Let us assume that there is an entity set
manager.
One alternative for representing this relationship is to
create a quaternary relationship manages between
employee, branch, job, and manager.
(A quaternary relationship is requireda binary
relationship between manager and employee would
not permit us to represent which (branch, job)
combinations of an employee are managed by which
manager.)

The best way to model a situation such as the one


just described is to use aggregation.
Aggregation is an abstraction through which
relationships are treated as higherlevel entities.
Thus, for our example, we regard the relationship
set works-on as a higher-level entity set called
works-on enitity.

Before Aggregation
Quadratic Relationship

Aggregated Works-on set

E/R Diagram Symbols


Alternate E/R Diagram
Symbols

Banking System Database E/R Diagram

Transforming E-R model to Relational Model


Design to Implementation

Review - Concepts
Relational Model is made up of tables
A row of table
A column of table
A table
Cardinality
Degree

a relational instance/tuple
an attribute
a schema/relation
number of rows
number of columns

Review - Example
Attribute

SID

Name

Major

GPA

1234

John

CS

2.8

5678

Mary

EE

3.6

4 Degree

A Schema / Relation

Cardinality = 2

tuple/relational
instance

From ER Model to Relational Model


So how do we convert an ER diagram into a
table?? Simple!!
Basic Ideas:
Build a table for each entity set
Build a table for each relationship set if necessary (more
on this later)
Make a column in the table for each attribute in the
entity set
Indivisibility Rule and Ordering Rule
Primary Key

Example Strong Entity Set


SID

Name

SSN
Advisor

Student

Major

Name

Professor
Dept

GPA

SID

Name Major

GPA

SSN

Name

Dept

1234

John

CS

2.8

9999

Smith

Math

5678

Mary

EE

3.6

8888

Lee

CS

Representation of Weak Entity Set


Weak Entity Set Cannot exists alone
To build a table/schema for weak entity set
Construct a table with one column for each attribute in
the weak entity set
Remember to include discriminator
Augment one extra column on the right side of the
table, put in there the primary key of the Strong Entity
Set (the entity set that the weak entity set is depending
on)
Primary Key of the weak entity set = Discriminator +
foreign key

Example Weak Entity Set


SID

Student

Major

Age

Name

owns

Name

Children

GPA

Age

Name

Parent_SID

10

Bart

1234

Lisa

5678

* Primary key of Children is Parent_SID + Name

Representation of Relationship Set


--This is a little more complicated-Unary/Binary Relationship set
Depends on the cardinality and participation of the relationship
Two possible approaches

N-ary (multiple) Relationship set


Primary Key Issue

Identifying Relationship
No relational model representation necessary

Representing Relationship Set


Unary/Binary Relationship
For one-to-one relationship w/out total participation
Build a table with two columns, one column for each
participating entity sets primary key. Add successive
columns, one for each descriptive attributes of the
relationship set (if any).
For one-to-one relationship with one entity set having
total participation
Augment one extra column on the right side of the
table of the entity set with total participation, put in
there the primary key of the entity set without
complete participation as per to the relationship.

Example One-to-One Relationship Set


SID

Name

Student

Major

Degree

ID Code

study

Major

GPA

SID

Maj_ID Co

S_Degree

9999

07

1234

8888

05

5678

* Primary key can be either SID or Maj_ID_Co

Example One-to-One Relationship Set


SID

Name

Condition
1:1 Relationship

Student

Major

Have

S/N #

Laptop

GPA

Brand

SID

Name

Major

GPA

LP_S/N

Hav_Cond

9999

Bart

Economy

-4.0

123-456

Own

8888

Lisa

Physics

4.0

567-890

Loan

* Primary key can be either SID or LP_S/N

Representing Relationship Set


Unary/Binary Relationship
For one-to-many relationship w/out total
participation
Same thing as one-to-one

For one-to-many/many-to-one relationship with


one entity set having total participation on
many side
Augment one extra column on the right side of the
table of the entity set on the many side, put in
there the primary key of the entity set on the one
side as per to the relationship.

Example Many-to-One Relationship Set


SID

Name

N:1 Relationship

SSN

Advisor

Student

Major

Semester

Professor

GPA

Dept

Name

SID

Name

Major

GPA

Pro_SSN

Ad_Sem

9999

Bart

Economy

-4.0

123-456

Fall 2006

8888

Lisa

Physics

4.0

567-890

Fall 2005

* Primary key of this table is SID

Representing Relationship Set


Unary/Binary Relationship
For many-to-many relationship
Same thing as one-to-one relationship without
total participation.
Primary key of this new schema is the union of the
foreign keys of both entity sets.
No augmentation approach possible

Representing Relationship Set


N-ary Relationship
Intuitively Simple
Build a new table with as many columns as there are
attributes for the union of the primary keys of all
participating entity sets.
Augment additional columns for descriptive attributes
of the relationship set (if necessary)
The primary key of this table is the union of all
primary keys of entity sets that are on many side
That is it, we are done.

Example N-ary Relationship Set


P-Key1

D-Attribute

E-Set 1
P-Key2

A relationship

A-Key

Another Set

E-Set 2
P-Key3
E-Set 3

P-Key1

P-Key2

P-Key3

A-Key

D-Attribute

9999

8888

7777

6666

Yes

1234

5678

9012

3456

No

* Primary key of this table is P-Key1 + P-Key2 + P-Key3

Representing Relationship Set


Identifying Relationship
This is what you have to know
You DONT have to build a table/schema for the
identifying relationship set once you have built a
table/schema for the corresponding weak entity set
Reason:
A special case of one-to-many with total participation
Reduce Redundancy

Representing Composite Attribute


Relational Model Indivisibility Rule Applies
One column for each component attribute
NO column for the composite attribute itself
SSN

Name

Professor
Address

Street

City

SSN

Name

Street

City

9999

Dr. Smith

50 1st St.

Fake City

8888

Dr. Lee

1 B St.

San Jose

Representing Multivalue Attribute


For each multivalue attribute in an entity
set/relationship set
Build a new relation schema with two columns
One column for the primary keys of the entity
set/relationship set that has the multivalue attribute
Another column for the multivalue attributes. Each
cell of this column holds only one value. So each value
is represented as an unique tuple
Primary key for this schema is the union of all
attributes

Example Multivalue attribute


SID

Name
Children

Student

Major

The primary key for this


table is Student_SID +
Children, the union of all
attributes

GPA

SID

Name

Major

GPA

1234

John

CS

2.8

5678

Homer

EE

3.6

Stud_SID

Children

1234

Johnson

1234

Mary

5678

Bart

5678

Lisa

5678

Maggie

Representing Class Hierarchy


Two general approaches depending on
disjointness and completeness
For non-disjoint and/or non-complete class hierarchy:
create a table for each super class entity set
according to normal entity set translation method.
Create a table for each subclass entity set with a
column for each of the attributes of that entity set
plus one for each attributes of the primary key of the
super class entity set
This primary key from super class entity set is also
used as the primary key for this new table

Class Hierarchy
Example 1
SID

SSN

Name

Person

Status
Gender

ISA
Student

Major

GPA

SSN

SID

Status

Major

GPA

1234

9999

Full

CS

2.8

5678

8888

Part

EE

3.6

SSN

Name

Gender

1234

Homer

Male

5678

Marge

Female

Representing Class Hierarchy


Two general approaches depending on
disjointness and completeness
For disjoint AND complete mapping class hierarchy:
DO NOT create a table for the super class entity set
Create a table for each subclass entity set include all
attributes of that subclass entity set and attributes of
the superclass entity set
Simple and Intuitive enough, need example?

Class Hierarchy
Example 2

SSN

Name

No table created for superclass


entity set

SJSU people

ISA

SID

Student

Major

Faculty

Disjoint and Complete


mapping
Dept

GPA

SSN

Name

SID

Major

GPA

SSN

Name

Dept

1234

John

9999

CS

2.8

1234

Homer

C.S.

5678

Mary

8888

EE

3.6

5678

Marge

Math

Representing Aggregation
Name

SSN
Advisor

Student

Name

Professor
Dept

SID

Name

member
Primary Key of Advisor

Dept

SID

Code

1234

04

5678

08

Code
Primary key of Dept

Eliminating redundant information through restructuring of


database tables.

Normalization Definition

This is the process which allows you to winnow out


redundant data within your database.
This involves restructuring the tables to
successively meeting higher forms of
Normalization.
A properly normalized database should have the
following characteristics

Scalar values in each fields


Absence of redundancy.
Minimal use of null values.
Minimal loss of information.

Levels of Normalization

First Normal Form (1NF)


Second Normal Form (2NF)
Third Normal Form (3NF)
Boyce-Codd Normal Form (BCNF)
Fourth Normal Form (4NF)
Fifth Normal Form (5NF)
Domain Key Normal Form (DKNF)

Redundancy

Complexity

Levels of normalization based on the amount


of redundancy in the database.
Various levels of normalization are:
Number of Tables

Most
Mostdatabases
databasesshould
shouldbe
be3NF
3NFororBCNF
BCNFininorder
ordertotoavoid
avoidthe
thedatabase
databaseanomalies.
anomalies.

Levels of Normalization
1NF
2NF
3NF
4NF
5NF

DKNF

Each
Eachhigher
higherlevel
levelisisaasubset
subsetofofthe
thelower
lowerlevel
level

First Normal Form (1NF)


A table is considered to be in 1NF if all the fields
contain
only scalar values (as opposed to list of values).
Example (Not 1NF)
ISBN

Title

AuName

AuPhone

PubName

PubPhone

Price

0-321-32132-1

Balloon

Sleepy,
Snoopy,
Grumpy

321-321-1111,
232-234-1234,
665-235-6532

Small House

714-000-0000

$34.00

0-55-123456-9

Main Street

Jones,
Smith

123-333-3333,
654-223-3455

Small House

714-000-0000

$22.95

0-123-45678-0

Ulysses

Joyce

666-666-6666

Alpha Press

999-999-9999

$34.00

1-22-233700-0

Visual
Basic

Roman

444-444-4444

Big House

123-456-7890

$25.00

Author
Authorand
andAuPhone
AuPhonecolumns
columnsare
arenot
notscalar
scalar

1NF - Decomposition
1.
2.
3.

Place all items that appear in the repeating group


in a new table
Designate a primary key for each new table
produced.
Duplicate in the new table the primary key of the
table from which the repeating group was
extracted or vice versa.

Example (1NF)

ISBN

AuName

AuPhone

0-321-32132-1

Sleepy

321-321-1111

ISBN

Title

PubName

PubPhone

Price

0-321-32132-1

Snoopy

232-234-1234

0-321-32132-1

Balloon

Small House

714-000-0000

$34.00

0-321-32132-1

Grumpy

665-235-6532

0-55-123456-9

Main Street

Small House

714-000-0000

$22.95

0-55-123456-9

Jones

123-333-3333

0-123-45678-0

Ulysses

Alpha Press

999-999-9999

$34.00

0-55-123456-9

Smith

654-223-3455

1-22-233700-0

Visual
Basic

Big House

123-456-7890

$25.00

0-123-45678-0

Joyce

666-666-6666

1-22-233700-0

Roman

444-444-4444

Functional Dependencies
If one set of attributes in a table determines
another set of attributes in the table, then the
second set of attributes is said to be
functionally dependent on the first set of
attributes.
Example 1
ISBN

Title

Price

0-321-32132-1

Balloon

$34.00

0-55-123456-9

Main Street

$22.95

0-123-45678-0

Ulysses

$34.00

1-22-233700-0

Visual
Basic

$25.00

Table Scheme: {ISBN, Title, Price}


Functional Dependencies: {ISBN}
{Title}
{ISBN}
{Price}

Functional Dependencies
Example 2
PubID

PubName

PubPhone

Big House

999-999-9999

Small House

123-456-7890

Alpha Press

111-111-1111

Example 3
AuID

AuName

AuPhone

Sleepy

321-321-1111

Snoopy

232-234-1234

Grumpy

665-235-6532

Jones

123-333-3333

Smith

654-223-3455

Joyce

666-666-6666

Roman

444-444-4444

Table
Scheme:
{PubID,
PubName,
PubPhone}
Functional Dependencies: {PubId}
{PubPhone}
{PubId}
{PubName}
{PubName, PubPhone}
{PubID}
Table Scheme: {AuID, AuName, AuPhone}
Functional
Dependencies:
{AuId}

{AuPhone}
{AuId}
{AuName}
{AuName, AuPhone} {AuID}

FD Example
Database to track reviews of papers submitted to an
academic conference. Prospective authors submit papers
for review and possible acceptance in the published
conference proceedings. Details of the entities

Author information includes a unique author number, a name,


a mailing address, and a unique (optional) email address.
Paper information includes the primary author, the paper
number, the title, the abstract, and review status (pending,
accepted, rejected)
Reviewer information includes the reviewer number, the
name, the mailing address, and a unique (optional) email
address
A completed review includes the reviewer number, the date,
the paper number, comments to the authors, comments to
the program chairperson, and ratings (overall, originality,
correctness, style, clarity)

FD Example
Functional Dependencies

AuthNo

AuthName,
AuthEmail,
AuthAddress
AuthEmail AuthNo
PaperNo Primary-AuthNo, Title, Abstract,
Status
RevNo RevName, RevEmail, RevAddress
RevEmail RevNo
RevNo, PaperNo AuthComm, Prog-Comm,
Date, Rating1, Rating2, Rating3, Rating4,
Rating5

Second Normal Form


(2NF)
For a table to be in 2NF, there are two requirements

The database is in first normal form


All nonkey attributes in the table must be functionally
dependent on the entire primary key

Note: Remember that we are dealing with non-key attributes


Example 1 (Not 2NF)
Scheme {Title, PubId, AuId, Price, AuAddress}
1.
2.
3.
4.
5.

Key {Title, PubId, AuId}


{Title, PubId, AuID} {Price}
{AuID} {AuAddress}
AuAddress does not belong to a key
AuAddress functionally depends on AuId which is a
subset of a key

Second Normal Form


(2NF)
Example 2 (Not 2NF)
Scheme

{City,
CityPopulation}
1.
2.
3.
4.
5.

Street,

HouseNumber,

HouseColor,

key {City, Street, HouseNumber}


{City, Street, HouseNumber} {HouseColor}
{City} {CityPopulation}
CityPopulation does not belong to any key.
CityPopulation is functionally dependent on the City which is a
proper subset of the key

Example 3 (Not 2NF)


Scheme {studio, movie, budget, studio_city}
1.
2.
3.
4.
5.

Key {studio, movie}


{studio, movie} {budget}
{studio} {studio_city}
studio_city is not a part of a key
studio_city functionally depends on studio which is a proper
subset of the key

2NF - Decomposition
1.

2.
3.

If a data item is fully functionally dependent on only a


part of the primary key, move that data item and that
part of the primary key to a new table.
If other data items are functionally dependent on the
same part of the key, place them in the new table also
Make the partial primary key copied from the original
table the primary key for the new table. Place all items
that appear in the repeating group in a new table

Example 1 (Convert to 2NF)


Old Scheme {Title, PubId, AuId, Price, AuAddress}
New Scheme {Title, PubId, AuId, Price}
New Scheme {AuId, AuAddress}

2NF - Decomposition
Example 2 (Convert to 2NF)
Old Scheme {Studio, Movie, Budget, StudioCity}
New Scheme {Movie, Studio, Budget}
New Scheme {Studio, City}

Example 3 (Convert to 2NF)


Old Scheme {City, Street, HouseNumber, HouseColor,
CityPopulation}
New Scheme {City, Street, HouseNumber, HouseColor}
New Scheme {City, CityPopulation}

Third Normal Form (3NF)


This form dictates that all non-key attributes of a table must be
functionally dependent on a candidate key i.e. there can be
no interdependencies among non-key attributes.
For a table to be in 3NF, there are two requirements

The table should be second normal form


No attribute is transitively dependent on the primary key

Example (Not in 3NF)


Scheme {Title, PubID, PageCount, Price }
1.
2.
3.
4.
5.

Key {Title, PubId}


{Title, PubId} {PageCount}
{PageCount} {Price}
Both Price and PageCount depend on a key hence 2NF
Transitively {Title, PubID} {Price} hence not in 3NF

Third Normal Form (3NF)


Example 2 (Not in 3NF)
Scheme {Studio, StudioCity, CityTemp}
1.
2.
3.
4.
5.
6.

Primary Key {Studio}


{Studio} {StudioCity}
{StudioCity} {CityTemp}
{Studio} {CityTemp}
Both StudioCity and CityTemp depend on the entire key hence
2NF
CityTemp transitively depends on Studio hence violates 3NF

Example 3 (Not in 3NF)


Scheme {BuildingID, Contractor, Fee}
1.
2.
3.
4.
5.
6.

BuildingI
D
100
150

Contractor

Randolp
h
Ingersol
l
Randolp

200
Primary Key {BuildingID}
h
{BuildingID} {Contractor}
250
Pitkin
{Contractor} {Fee}
300
Randolp
h
{BuildingID} {Fee}
Fee transitively depends on the BuildingID
Both Contractor and Fee depend on the entire key hence 2NF

Fee

120
0
110
0
120
0
110
0
120
0

3NF - Decomposition
1.
2.
3.

Move all items involved in transitive dependencies to a


new entity.
Identify a primary key for the new entity.
Place the primary key for the new entity as a foreign
key on the original entity.

Example 1 (Convert to 3NF)


Old Scheme {Title, PubID, PageCount, Price }
New Scheme {PubID, PageCount, Price}
New Scheme {Title, PubID, PageCount}

3NF - Decomposition
Example 2 (Convert to 3NF)
Old Scheme {Studio, StudioCity, CityTemp}
New Scheme {Studio, StudioCity}
New Scheme {StudioCity, CityTemp}

Example 3 (Convert to 3NF)

Contractor

Contractor

150

Randolp
h
Ingersol
l
Randolp

Randolp
h
Ingersol
l
Pitkin

250

h
Pitkin

BuildingI
D
100

Old Scheme {BuildingID, Contractor, Fee}


New Scheme {BuildingID, Contractor}
200
New Scheme {Contractor, Fee}

300

Randolp
h

Fee
120
0
110
0
110
0

Boyce-Codd Normal Form


(BCNF)

BCNF does not allow dependencies between

attributes that belong to

candidate keys.
BCNF is a refinement of the third normal form in which it drops the restriction
of a non-key attribute from the 3rd normal form.
Third normal form and BCNF are not same if the following conditions are true:

The table has two or more candidate keys


At least two of the candidate keys are composed of more than one attribute
The keys are not disjoint i.e. The composite candidate keys share some attributes

Example 1 - Address (Not in BCNF)


Scheme {City, Street, ZipCode }
1.
Key1 {City, Street }
2.
Key2 {ZipCode, Street}
3.
No non-key attribute hence 3NF
4.
{City, Street} {ZipCode}
5.
{ZipCode} {City}
6.
Dependency between attributes belonging to a key

Boyce Codd Normal Form


(BCNF)
Example 2 - Movie (Not in BCNF)

Scheme {MovieTitle, MovieID, PersonName, Role, Payment }


1.
2.
3.
4.
5.

Key1 {MovieTitle, PersonName}


Key2 {MovieID, PersonName}
Both role and payment functionally depend on both candidate keys
thus 3NF
{MovieID} {MovieTitle}
Dependency between MovieID & MovieTitle Violates BCNF

Example 3 - Consulting (Not in BCNF)


Scheme {Client, Problem, Consultant}
1.
2.
3.
4.
5.
6.

Key1 {Client, Problem}


Key2 {Client, Consultant}
No non-key attribute hence 3NF
{Client, Problem} {Consultant}
{Client, Consultant} {Problem}
Dependency between attributess belonging to keys violates BCNF

BCNF - Decomposition
1.
2.

Place the two candidate primary keys in


separate entities
Place each of the remaining data items in one
of the resulting entities according to its
dependency on the primary key.

Example 1 (Convert to BCNF)


Old Scheme {City, Street, ZipCode }
New Scheme1 {ZipCode, Street}
New Scheme2 {City, Street}

Loss of relation {ZipCode} {City}


Alternate New Scheme1 {ZipCode, Street }
Alternate New Scheme2 {ZipCode, City}

Decomposition Loss of Information


1.
2.

3.

4.

If decomposition does not cause any loss of information


it is called a lossless decomposition.
If a decomposition does not cause any dependencies to
be lost it is called a dependency-preserving
decomposition.
Any table scheme can be decomposed in a lossless way
into a collection of smaller schemas that are in BCNF
form. However the dependency preservation is not
guaranteed.
Any table can be decomposed in a lossless way into 3 rd
normal form that also preserves the dependencies.

3NF may be better than BCNF in some cases

Use
Useyour
yourown
ownjudgment
judgmentwhen
whendecomposing
decomposingschemas
schemas

BCNF - Decomposition
Example 2 (Convert to BCNF)
Old Scheme {MovieTitle, MovieID, PersonName, Role, Payment }
New Scheme {MovieID, PersonName, Role, Payment}
New Scheme {MovieTitle, PersonName}

Loss of relation {MovieID} {MovieTitle}


New Scheme {MovieID, PersonName, Role, Payment}
New Scheme {MovieID, MovieTitle}

We got the {MovieID} {MovieTitle} relationship back

Example 3 (Convert to BCNF)


Old Scheme {Client, Problem, Consultant}
New Scheme {Client, Consultant}
New Scheme {Client, Problem}

Fourth Normal Form (4NF)

Fourth normal form eliminates independent many-to-one


relationships between columns.

To be in Fourth Normal Form,

a relation must first be in Boyce-Codd Normal Form.


a given relation may not contain more than one multivalued attribute.

Example (Not in 4NF)


Scheme {MovieName, ScreeningCity, Genre)
Primary Key: {MovieName, ScreeningCity, Genre)
1. All columns are a part of the only candidate key, hence
BCNF
Movie
ScreeningCi
Genre
ty
2. Many Movies can have the same Genre
Hard Code
Los Angles
Comedy
3. Many Cities can have the same movie
Hard Code
New York
Comedy
4. Violates 4NF
Bill Durham
Santa Cruz
Drama

Bill Durham

Durham

Drama

The Code
Warrier

New York

Horror

Fourth Normal Form (4NF)


Example 2 (Not in 4NF)
Scheme {Manager, Child, Employee}
1.
2.
3.
4.

Manager
Jim

Child

Beth

Bob
Primary Key {Manager, Child, Employee} Mary
NULL
Each manager can have more than one childMary
Each manager can supervise more than one employee
4NF Violated

Employe
e
Alice
Jane
Adam

Example 3 (Not in 4NF)


Scheme {Employee, Skill, ForeignLanguage}
1.
2.
3.
4.

Primary Key {Employee, Skill, Language }


Each employee can speak multiple languages
Each employee can have multiple skills
Employe
Thus violates 4NF
e

Skill

1234

Cooking

Languag
e
French

1234

Cooking

German

1453

Spanish

1453

Carpentr
y
Cooking

2345

Cooking

Spanish

Spanish

4NF - Decomposition
1.
2.

Move the two multi-valued relations to separate tables


Identify a primary key for each of the new entity.

Example 1 (Convert to 3NF)


Old Scheme {MovieName, ScreeningCity, Genre}
New Scheme {MovieName, ScreeningCity}
New Scheme {MovieName, Genre}
Movie

Genre

Movie

ScreeningCi
ty

Hard Code

Comedy

Hard Code

Bill Durham

Drama

Hard Code

New York

The Code
Warrier

Horror

Bill Durham

Santa Cruz

Bill Durham

Durham

The Code
Warrier

New York

Los Angles

4NF - Decomposition
Example 2 (Convert to 4NF)

Manager

Old Scheme {Manager, Child, Employee}


Jim

Child

Beth

Jim

Employe
e
Alice

New Scheme {Manager, Child}

Bob

Mary

Jane

Mary

Adam

Manager

Mary

New Scheme {Manager, Employee}

Example 3 (Convert to 4NF)


Old Scheme {Employee, Skill, ForeignLanguage}
New Scheme {Employee, Skill}
New Scheme {Employee, ForeignLanguage}
Employe
e
1234

Skill

Employe
e
1234

Languag
e
French

1453

1234

German

1453

Carpentr
y
Cooking

1453

Spanish

2345

Cooking

2345

Spanish

Cooking

Fifth Normal Form (5NF)

Fifth normal form is satisfied when all tables


are broken into as many tables as possible in
order to avoid redundancy. Once it is in fifth
normal form it cannot be broken into smaller
relations without changing the facts or the
meaning.

Domain Key Normal Form


(DKNF)

The relation is in DKNF when there can be no


insertion or deletion anomalies in the
database.

Codd proposed thirteen rules (numbered zero to twelve) and


said that if a Database Management System meets these rules, it
can be called as a Relational Database Management System.
These rules are called as Codd's12 rules. Hardly any commercial
product follows all.
Refer notes section for short explanation.

Rule Zero
The system must qualify as relational, as a
database, and as a management system.
For a system to qualify as a relational database
management system (RDBMS), that system
must use its relational facilities (exclusively) to
manage the database.

Rule 1 : The information rule:


All information in the database is to be represented in
one and only one way, namely by values in column
positions within rows of tables.

Rule 2 : The guaranteed access rule:


All data must be accessible.
This rule is essentially a restatement of the
fundamental requirement for primary keys.
It says that every individual scalar value in the
database must be logically addressable by specifying
the name of the containing table, the name of the
containing column and the primary key value of the
containing row.

Rule 5 : The comprehensive data sub language rule:


The system must support at least one relational language that
1.

Has a linear syntax

2. Can be used both interactively and within application programs,


3. Supports data definition operations (including view definitions),
data manipulation operations (update as well as retrieval), security
and integrity constraints, and transaction management operations
(begin, commit, and rollback).
Rule 6 : The view updating rule:
All views those can be updated theoretically, must be updated by the
system.
Rule 7 : High-level insert, update, and delete:
The system must support set-at-a-time insert, update, and delete operators.
This means that data can be retrieved from a relational database in sets
constructed of data from multiple rows and/or multiple tables.
This rule states that insert, update, and delete operations should be supported
for any retrievable set rather than just for a single row in a single table.

Rule 8 : Physical data independence:


Changes to the physical level (how the data is stored, whether in arrays
or linked lists etc.) must not require a change to an application based
on the structure.

Rule 9 : Logical data independence:


Changes to the logical level (tables, columns, rows, and so on) must not
require a change to an application based on the structure.
Logical data independence is more difficult to achieve than physical
data independence.

Rule 10 : Integrity independence:


Integrity constraints must be specified separately from application
programs and stored in the catalog.
It must be possible to change such constraints as and when appropriate
without unnecessarily affecting existing applications.

Rule 11 : Distribution independence:


The distribution of portions of the database to various locations should
be invisible to users of the database.
Existing applications should continue to operate successfully :
1. when a distributed version of the DBMS is first introduced; and
2. when existing distributed data are redistributed around the
system.

Rule 12: The non subversion rule:


If the system provides a low-level (record-at-a-time) interface, then
that interface cannot be used to subvert the system,

for example, bypassing a relational security or integrity constraint.

Anda mungkin juga menyukai