Anda di halaman 1dari 46

Normalization

Informal Design guidelines for relational schemas

So far we have assumed that attributes are


grouped to form relation schema by using
common sense of database designer.
We still need some formal measures of why one
grouping of attributes into relation schema may
be better than other.
goodness of relation schema can be discussed
at two levels:
Logical level: how users interpret the relation
schema and meaning of their attribute.
Enable user to understand the meaning of data in the
relation and formulate the query correctly.

Implementation level: how tuples are stores


and updated in base relation.

Informal Design guidelines for relational schemas

Data base design may be done with two


approaches:
Bottom-up approach: consider basic
relationships among individual attributes
and use those to construct relation
schemas.
Problem of having to collect large
number of binary relationships among
attributes as a starting point.
Top-down approach: starts with number
of groupings of attributes into relations
that exist together naturally. The relations

Informal measures of quality

Semantics of the attributes


Reducing redundant information in
tuples
Reducing the NULL values in tuples
Disallowing the possibilities of
generating spurious tuples.

Semantics of attributes
Attribute belonging to one relation have certain
real world meaning and proper interpretation
associated with it
The ease with which the meaning of relations
attribute can be explained is an informal measure
of how well relation is designed.

Guideline 1
Design a relation schema so that it is easy to
explain its meaning.
Do not combine attributes from multiple entity
types and relationship types into single relation.

Although there is nothing wrong in it logically,


but it is considered to be poor design, and
cause problem as we will discuss

Redundant information in Tuples

One goal of schema design is to


minimize storage space used by base
relation.
Grouping attributes can have effect
on storage space.
Compare space used by employee
and department with EMP_DEPT
which is result of NATURAL JOIN to
employee and department

Redundant information in Tuples

Redundant information in Tuples


Natural join of two
relations and can
be stored for the
performance
reason.
Disadvantage is
redundancy:
Like (Dnumber,
Dname, Dmgr_ssn
are repeated for
every employee
who works for that
department

Update Anomalies
Another serious issue of combining two relation
like in previous slide is update anomalies
Insertion Anomalies: of two types
To insert new employee into EMP_DEPT who
works in department 5, we must have to insert all
values of dept 5 correctly so that they are
consistent with values of other tuples. While in
separate employee and department we dont
have to worry about this, as only department no
is to be entered and record will be matched by
department.
It is difficult to insert new department in
EMP_DEPTwhich has no employee associated with
it yet. the only way is to place NULL values in
attribute of employee. Problem will be SSn is a

Update Anomalies

Deletion Anomalies: if we delete


the employee tuple from EMP_DEPT
which happens to be the last
employee
of
department
than
information of department is also
lost.
Modification anomalies: if we
change value of one of the attribute
of department say- manager, we
must update tuples of all employee
of that department, otherwise data

Guideline 2

Design a relation so that no update


anomalies are present.
It may be noted that guidelines may
some time have to be violated in
order to improve performance of
certain queries.
In that case anomalies may be noted
and accounted e.g. trigger can be
used for automatic updates so that
we do not end with inconsistencies.

NULL Values in tuple


When we group many attributes together into one
relation, then many attributes do not apply to all
tuples and we end up NULL in the tuple.
Waste lot of space and difficulty in understanding
the meaning of attributes.
How to account for COUNT and SUM.
SELECT or JOIN involve comparison. with NULL
values results are predictable.
NULL can have multiple interpretations:
The attribute does not apply to this tuple.
Value is unknown
The value is known but absent.

Guideline 3

Avoid placing attributes in a base


relation whose value may frequently
be absent
E.g if only 10% of employees have
individual office, then there is no use
of including Office no. in Employee
relation.
Rather a relation EMP_OFFICE (Essn,
Office_no.) can be created to include
tuples for only the employees with
individual offices.

Generation of spurious tuples

Suppose we divide

into two relations

Generation of spurious tuples


Original relation

Generation of spurious tuples

Generation of spurious tuples


Natural join of both relations to get back the original one, which results in supurious
tuples(marked with *), tuples
that are not valid

Guideline 4

Design relation schema in a way that


they can be joined with equality
conditions on attributes that are
(Pk,FK) pairs in a way that
guarantees that no spurious tuples
are generated.

Summary of guidelines
Design a relation schema so that it is easy to
explain its meaning. Do not combine attributes
from multiple entity types and relationship types
into single relation.
Design a relation so that no update anomalies are
present.
Avoid placing attributes in a base relation whose
value may frequently be absent( avoid NULL
values)
Generation of spurious tuples during join on
improperly related relations.

Functional Dependencies

A FDX
such that for any two
Y t and t in r that have t [X] =
tuples
1
2
1
t2[X], they must also have t1[Y] =
t2[Y].
Value of Y component of a tuple in r
depend on values of the X
component .
So X functionally determines Y in a
relation schema R if and only if two
X
Y
tuples
of r agree on their
X-value,
Y
X
they must necessarily agree on their

Functional Dependencies

A FD is a property of semantics or
meaning of the attributes.

Inference rules for Functional Dependencies

Some FDs are semantically obvious,


but some can be derived from
dependencies in F.
These are inferred or deduced from
other FDs.
E.g if each department has only one
manager then Dept no.
Mgr_ssn
If manager has unique phone no.
then
Mgr_ssn
Mgr_phone
Then these two implies that

Inference rules for Functional Dependencies

The set of all dependencies that


include F as well as all dependencies
that can be inferred from F is called
the closure of F and is denoted by F+.

Normalization

Normalization of a data can be


considered as a process of analyzing
the given relation schema based on
their FDs and primary keys to
i) minimize redundancy and update
anomalies
unsatisfactory relation schema that
do not met certain normal tests are
decomposed
(Normalized)
into
smaller relation schema that meet
the test.

First Normal Form

Domain of attribute must include


only atomic (simple, indivisible) and
that the value of any attribute in a
tuple must be a single value from the
domain of attribute.
So it disallows multivalued attributes,
composite attributes and relation
within relations.

First Normal Form

Not in 1NF

First Normal Form

Solution 1: Remove Dlocation that


violates 1NF and place it in separate
relation along with primary key
Dnumber. Then primary key for new
relation will be both.

First Normal Form

2nd solution: expand key so that


separate tuple in original relation.

Disadvantage: redundancy

First Normal Form

3rd Solution: if it is known that at


most three locations can exist for
department, then replace Dlocation
attribute with three attributes:
Dlocation1, Dlocation2, Dlocation3.
Disadvantage of Null value for fewer
than three locations.

First Normal Form

Which one is best?


Of the three first solution is best as
no redundancy, Null values

First Normal Form

It also disallows relation within


relation

Some definitions

Candidate key: relation schema


may have more than one key, then
each key is called candidate key.
One of the key has to be primary key.

Prime Attribute is an attribute if it


is a member of some candidate key.
An attribute is a nonprime if it is not
a prime attribute.

Some definitions
A functional dependencyX
is a full
functional dependency
if removal of any
Y
attribute A from X means that dependency
does not hold any more: i.e, for any attribute A
X, (X-{A}) does not functionally determine Y.
So Functional dependency
is a partial
X
dependency if some attribute
A X can be
Y
removed from X and dependency
still holds.
{Ssn, Pnumber}
Hours is full dependency
but {Ssn, Pnumber}
Ename is not as Ssn
Ename holds.

Second Normal Form

A relation schema R is in 2NF if every


non prime attribute A in R is fully
functionally dependent on primary
key of R.

Non prime attribute Ename violates


2NF due to FD2 and Pname and
Plocation violates 2NF due to FD3.

Second Normal Form


So if Relation is not in 2NF than it can be
2NF normalized into number of 2NF
relations in which nonprime attributes are
associated only with part of primary key
on which they are fully functionally
dependent.

Third Normal Form

A relation schema R is in 3NF if it


satisfies 2NF and no nonprime
attribute of R is transitively
dependent on the primary key.

Ssn
Dnumber and Dnumber
Dmgr_ssn
So Ssn

Dmgr_ssn

General definitions of 2nd and 3rd Normal Form

So far we have only considered


partial and transitive dependencies
on the primary key.
Now we will give more general
definitions of 2NF and 3NF that take
all candidate keys into account.

General definitions of 2nd Normal Form

A relation schema R is in 2NF if every


non prime attribute A in R is fully
functionally dependent on any key of
R.

General definitions of 3rd Normal Form

A relation schema R is in 3NF if,


whenever aX nontrivial FD
holds
Y
in R, either
a) X is a superkey of R
B) Y is prime attribute of R

FD4 is not in 3NF as Area is not a

General definitions of 3rd Normal Form

Boyce-Codd Normal Form

Stricter than 3NF.


A relation in BCNF is also in 3NF but
vice-verca is not necessarily true.
A relation schema R is in BCNF if,
X nontrivial FD
whenever a
holds
in R, then XY is a superkey of R.

Boyce-Codd Normal Form

FD5 is in 3NF but not in


BCNF

Multivalued dependency and 4th Normal


Form

Multivalued dependency and 4th Normal Form

End of chapter 7

Anda mungkin juga menyukai