FD and Normalization

Normalization
Informal Design guidelines for relational schemas
So far we have assumed that attributes are

grouped to form relation schema by using
common sense of database designer.
We still need some formal measures of why one
grouping of attributes into relation schema may
be better than other.
goodness of relation schema can be discussed
at two levels:
Logical level: how users interpret the relation
schema and meaning of their attribute.
Enable user to understand the meaning of data in the
relation and formulate the query correctly.
Implementation level: how tuples are stores

and updated in base relation.
Informal Design guidelines for relational schemas
Data base design may be done with two

approaches:
Bottom-up approach: consider basic
relationships among individual attributes
and use those to construct relation
schemas.
Problem of having to collect large
number of binary relationships among
attributes as a starting point.
Top-down approach: starts with number
of groupings of attributes into relations
that exist together naturally. The relations
Informal measures of quality
Semantics of the attributes

Reducing redundant information in
tuples
Reducing the NULL values in tuples
Disallowing the possibilities of
generating spurious tuples.
Semantics of attributes
Attribute belonging to one relation have certain
real world meaning and proper interpretation
associated with it
The ease with which the meaning of relations
attribute can be explained is an informal measure
of how well relation is designed.
Guideline 1
Design a relation schema so that it is easy to
explain its meaning.
Do not combine attributes from multiple entity
types and relationship types into single relation.
Although there is nothing wrong in it logically,

but it is considered to be poor design, and
cause problem as we will discuss
Redundant information in Tuples
One goal of schema design is to

minimize storage space used by base
relation.
Grouping attributes can have effect
on storage space.
Compare space used by employee
and department with EMP_DEPT
which is result of NATURAL JOIN to
employee and department

Natural join of two
relations and can
be stored for the
performance
reason.
Disadvantage is
redundancy:
Like (Dnumber,
Dname, Dmgr_ssn
are repeated for
every employee
who works for that
department
Update Anomalies
Another serious issue of combining two relation
like in previous slide is update anomalies
Insertion Anomalies: of two types
To insert new employee into EMP_DEPT who
works in department 5, we must have to insert all
values of dept 5 correctly so that they are
consistent with values of other tuples. While in
separate employee and department we dont
have to worry about this, as only department no
is to be entered and record will be matched by
department.
It is difficult to insert new department in
EMP_DEPTwhich has no employee associated with
it yet. the only way is to place NULL values in
attribute of employee. Problem will be SSn is a
Update Anomalies
Deletion Anomalies: if we delete

the employee tuple from EMP_DEPT
which happens to be the last
employee
of
department
than
information of department is also
lost.
Modification anomalies: if we
change value of one of the attribute
of department say- manager, we
must update tuples of all employee
of that department, otherwise data
Guideline 2
Design a relation so that no update

anomalies are present.
It may be noted that guidelines may
some time have to be violated in
order to improve performance of
certain queries.
In that case anomalies may be noted
and accounted e.g. trigger can be
used for automatic updates so that
we do not end with inconsistencies.
NULL Values in tuple

When we group many attributes together into one
relation, then many attributes do not apply to all
tuples and we end up NULL in the tuple.
Waste lot of space and difficulty in understanding
the meaning of attributes.
How to account for COUNT and SUM.
SELECT or JOIN involve comparison. with NULL
values results are predictable.
NULL can have multiple interpretations:
The attribute does not apply to this tuple.
Value is unknown
The value is known but absent.
Guideline 3
Avoid placing attributes in a base

relation whose value may frequently
be absent
E.g if only 10% of employees have
individual office, then there is no use
of including Office no. in Employee
relation.
Rather a relation EMP_OFFICE (Essn,
Office_no.) can be created to include
tuples for only the employees with
individual offices.
Generation of spurious tuples
Suppose we divide
into two relations

Original relation

Natural join of both relations to get back the original one, which results in supurious
tuples(marked with *), tuples
that are not valid
Guideline 4
Design relation schema in a way that

they can be joined with equality
conditions on attributes that are
(Pk,FK) pairs in a way that
guarantees that no spurious tuples
are generated.
Summary of guidelines
Design a relation schema so that it is easy to
explain its meaning. Do not combine attributes
from multiple entity types and relationship types
into single relation.
Design a relation so that no update anomalies are
present.
Avoid placing attributes in a base relation whose
value may frequently be absent( avoid NULL
values)
Generation of spurious tuples during join on
improperly related relations.
Functional Dependencies
A FDX
such that for any two
Y t and t in r that have t [X] =
tuples
1
2
1
t2[X], they must also have t1[Y] =
t2[Y].
Value of Y component of a tuple in r
depend on values of the X
component .
So X functionally determines Y in a
relation schema R if and only if two
X
Y
tuples
of r agree on their
X-value,
Y
X
they must necessarily agree on their
Functional Dependencies
A FD is a property of semantics or
meaning of the attributes.
Inference rules for Functional Dependencies
Some FDs are semantically obvious,

but some can be derived from
dependencies in F.
These are inferred or deduced from
other FDs.
E.g if each department has only one
manager then Dept no.
Mgr_ssn
If manager has unique phone no.
then
Mgr_ssn
Mgr_phone
Then these two implies that
Inference rules for Functional Dependencies
The set of all dependencies that

include F as well as all dependencies
that can be inferred from F is called
the closure of F and is denoted by F+.
Normalization
Normalization of a data can be

considered as a process of analyzing
the given relation schema based on
their FDs and primary keys to
i) minimize redundancy and update
anomalies
unsatisfactory relation schema that
do not met certain normal tests are
decomposed
(Normalized)
into
smaller relation schema that meet
the test.
First Normal Form
Domain of attribute must include

only atomic (simple, indivisible) and
that the value of any attribute in a
tuple must be a single value from the
domain of attribute.
So it disallows multivalued attributes,
composite attributes and relation
within relations.
First Normal Form
Not in 1NF
First Normal Form
Solution 1: Remove Dlocation that

violates 1NF and place it in separate
relation along with primary key
Dnumber. Then primary key for new
relation will be both.
First Normal Form
2nd solution: expand key so that

separate tuple in original relation.
Disadvantage: redundancy
First Normal Form
3rd Solution: if it is known that at

most three locations can exist for
department, then replace Dlocation
attribute with three attributes:
Dlocation1, Dlocation2, Dlocation3.
Disadvantage of Null value for fewer
than three locations.
First Normal Form
Which one is best?

Of the three first solution is best as
no redundancy, Null values
First Normal Form
It also disallows relation within

relation
Some definitions
Candidate key: relation schema

may have more than one key, then
each key is called candidate key.
One of the key has to be primary key.
Prime Attribute is an attribute if it

is a member of some candidate key.
An attribute is a nonprime if it is not
a prime attribute.
Some definitions
A functional dependencyX
is a full
functional dependency
if removal of any
Y
attribute A from X means that dependency
does not hold any more: i.e, for any attribute A
X, (X-{A}) does not functionally determine Y.
So Functional dependency
is a partial
X
dependency if some attribute
A X can be
Y
removed from X and dependency
still holds.
{Ssn, Pnumber}
Hours is full dependency
but {Ssn, Pnumber}
Ename is not as Ssn
Ename holds.
Second Normal Form
A relation schema R is in 2NF if every

non prime attribute A in R is fully
functionally dependent on primary
key of R.
Non prime attribute Ename violates

2NF due to FD2 and Pname and
Plocation violates 2NF due to FD3.
Second Normal Form

So if Relation is not in 2NF than it can be
2NF normalized into number of 2NF
relations in which nonprime attributes are
associated only with part of primary key
on which they are fully functionally
dependent.
Third Normal Form
A relation schema R is in 3NF if it

satisfies 2NF and no nonprime
attribute of R is transitively
dependent on the primary key.
Ssn
Dnumber and Dnumber
Dmgr_ssn
So Ssn
Dmgr_ssn
General definitions of 2nd and 3rd Normal Form
So far we have only considered

partial and transitive dependencies
on the primary key.
Now we will give more general
definitions of 2NF and 3NF that take
all candidate keys into account.
General definitions of 2nd Normal Form
A relation schema R is in 2NF if every

non prime attribute A in R is fully
functionally dependent on any key of
R.
General definitions of 3rd Normal Form
A relation schema R is in 3NF if,

whenever aX nontrivial FD
holds
Y
in R, either
a) X is a superkey of R
B) Y is prime attribute of R
FD4 is not in 3NF as Area is not a
General definitions of 3rd Normal Form
Boyce-Codd Normal Form
Stricter than 3NF.

A relation in BCNF is also in 3NF but
vice-verca is not necessarily true.
A relation schema R is in BCNF if,
X nontrivial FD
whenever a
holds
in R, then XY is a superkey of R.
Boyce-Codd Normal Form
FD5 is in 3NF but not in

BCNF
Multivalued dependency and 4th Normal

Form
Multivalued dependency and 4th Normal Form
End of chapter 7

FD and Normalization

Diunggah oleh

Informasi Dokumen

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

FD and Normalization

Diunggah oleh

Hak Cipta:

Format Tersedia

Normalization

Informal Design guidelines for relational schemas

So far we have assumed that attributes are

Implementation level: how tuples are stores

Informal Design guidelines for relational schemas

Data base design may be done with two

Informal measures of quality

Semantics of the attributes

Although there is nothing wrong in it logically,

Redundant information in Tuples

One goal of schema design is to

Redundant information in Tuples

Redundant information in Tuples

Deletion Anomalies: if we delete

Design a relation so that no update

NULL Values in tuple

Avoid placing attributes in a base

Generation of spurious tuples

into two relations

Generation of spurious tuples

Generation of spurious tuples

Generation of spurious tuples

Design relation schema in a way that

Inference rules for Functional Dependencies

Some FDs are semantically obvious,

Inference rules for Functional Dependencies

The set of all dependencies that

Normalization of a data can be

First Normal Form

Domain of attribute must include

First Normal Form

First Normal Form

Solution 1: Remove Dlocation that

First Normal Form

2nd solution: expand key so that

First Normal Form

3rd Solution: if it is known that at

First Normal Form

Which one is best?

First Normal Form

It also disallows relation within

Candidate key: relation schema

Prime Attribute is an attribute if it

Second Normal Form

A relation schema R is in 2NF if every

Non prime attribute Ename violates

Second Normal Form

Third Normal Form

A relation schema R is in 3NF if it

General definitions of 2nd and 3rd Normal Form

So far we have only considered

General definitions of 2nd Normal Form

A relation schema R is in 2NF if every

General definitions of 3rd Normal Form

A relation schema R is in 3NF if,

FD4 is not in 3NF as Area is not a

General definitions of 3rd Normal Form

Boyce-Codd Normal Form

Stricter than 3NF.

Boyce-Codd Normal Form

FD5 is in 3NF but not in

Multivalued dependency and 4th Normal

Multivalued dependency and 4th Normal Form

Anda mungkin juga menyukai