CH 15

Chapter 15
Basics of Functional Dependencies and Normalization for Relational Databases

Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley
Chapter 15 Outline
Informal Design Guidelines for Relation Schemas Functional Dependencies Normal Forms Based on Primary Keys General Definitions of Second and Third Normal Forms Boyce-Codd Normal Form
Copyright 2011 Ramez Elmasri and Shamkant Navathe
Introduction
Levels at which we can discuss goodness of relation schemas
Logical (or conceptual) level Implementation (or physical storage) level
Approaches to database design:
Bottom-up or top-down
Informal Design Guidelines for Relation Schemas

Measures of quality
Making sure attribute semantics are clear Reducing redundant information in tuples Reducing NULL values in tuples Disallowing possibility of generating spurious tuples
Imparting Clear Semantics to Attributes in Relations

Semantics of a relation
Meaning resulting from interpretation of attribute values in a tuple Indicates better schema design
Easier to explain semantics of relation
Functional Dependency
A relationship between attributes in which one attribute (or group of attributes) determines the value of another attribute in the same table Illustration
The price of one cookie can determine the price of a box of 12 cookies
(CookiePrice, Qty) BoxPrice
Determinants
The attribute (or attributes) that we use as the starting point (the variable on the left side of the equation) is called a determinant
(CookiePrice, Qty)
BoxPrice
Determinant
Candidate/Primary Keys and Functional Dependency

By definition A candidate key of a relation will functionally determine all other attributes in the row Likewise, by definition A primary key of a relation will functionally determine all other attributes in the row
Primary Key and Functional Dependency Example

(EmployeeID) (EmpLastName, EmpPhone)
(ProjectID)
(ProjectName, StartDate)
Normalization
A process of analyzing a relation to ensure that it is well formed More specifically, if a relation is normalized (well formed), rows can be inserted, deleted or modified without creating update anomalies
Normalization Principles
Relational design principles for normalized relations:
To be a well-formed relation, every determinant must be a candidate key Any relation that is not well formed should be broken into two or more well-formed relations.
Normalization Example
(StudentID)
However, if
(StudentName, DormName, DormCost) (DormCost)
(DormName)
Then DormCost should be placed into its own relation, resulting in the relations:
(StudentID) (DormName)
(StudentName, DormName) (DormCost)
Normalization Example
(AttorneyID, ClientID)
However, if
(ClientName, MeetingDate, Duration)

(ClientName)
(ClientID)
Then ClientName should be placed into its own relation, resulting in the relations:
(AttorneyID, ClientID) (ClientID)
(MeetingDate, Duration) (ClientName)
Guideline 1
Design relation schema so that it is easy to explain its meaning Do not combine attributes from multiple entity types and relationship types into a single relation (unless the attribute is a foreign KEY) Example of violating Guideline 1: Figure 15.3
Guideline 1 (contd.)
Although there is nothing wrong logically with these two relations, they violate Guideline 1 by mixing attributes from distinct real-world entities: EMP_ DEPT mixes attributes of employees and departments, and EMP_ PROJ mixes attributes of employees and projects and the WORKS_ ON relationship.
Guideline 2
Design base relation schemas so that no anomalies are present in the relations If any anomalies are present:
Note them clearly Make sure that the programs that update the database will operate correctly
Guideline 2 (contd)
Insert Anomaly: To insert a new tuple for an employee who works in department number 5, we must enter all the attribute values of department 5 correctly so that they are consistent with the corresponding values for department 5 in other tuples in EMP_ DEPT.
Guideline 2 (contd)
Delete Anomaly: If we delete from EMP_ DEPT an employee tuple that happens to represent the last employee working for a particular department, the information concerning that department is lost from the database.
Guideline 2 (contd)
Modification Anomaly: In EMP_ DEPT, if we change the value of one of the attributes of a particular department say, the manager of department 5 we must update the tuples of all employees who work in that department; otherwise, the database will become inconsistent. If we fail to update some tuples, the same department will be shown to have two different values for manager in different employee tuples, which would be wrong.
NULL Values in Tuples

May group many attributes together into a fat relation
Can end up with many NULLs
Problems with NULLs
Wasted storage space Problems understanding meaning
Guideline 3
Avoid placing attributes in a base relation whose values may frequently be NULL If NULLs are unavoidable:
Make sure that they apply in exceptional cases only, not to a majority of tuples
Generation of Spurious Tuples

Figure 15.5(a)
Relation schemas EMP_LOCS and EMP_PROJ1
NATURAL JOIN
Result produces many more tuples than the original set of tuples in EMP_PROJ Called spurious tuples Represent spurious information that is not valid
Guideline 4
Design relation schemas to be joined with equality conditions on attributes that are appropriately related
Guarantees that no spurious tuples are generated
Avoid relations that contain matching attributes that are not (foreign key, primary key) combinations
Summary and Discussion of Design Guidelines

Anomalies cause redundant work to be done Waste of storage space due to NULLs Difficulty of performing operations and joins due to NULL values Generation of invalid and spurious data during joins
What FDs hold here?
Examples of FD constraints
Social security number determines employee name
SSN -> ENAME
Project number determines project name and location
PNUMBER -> {PNAME, PLOCATION}
Employee ssn and project number determines the hours per week that the employee works on the project
{SSN, PNUMBER} -> HOURS
FDs are a property of the meaning of data and hold at all times: certain FDs can be ruled out based on a given state of the database
Normal Forms Based on Primary Keys

Normalization process Approaches for relational schema design
Perform a conceptual schema design using a conceptual model then map conceptual design into a set of relations Design relations based on external knowledge derived from existing implementation of files or forms or reports
Normalization of Relations
Takes a relation schema through a series of tests
Certify whether it satisfies a certain normal form Proceeds in a top-down fashion
Normal form tests
Normalization of Relations (contd.)

Properties that the relational schemas should have:
Nonadditive join property

Extremely critical No spurious tuples
Dependency preservation property

Desirable but sometimes sacrificed for other factors
Practical Use of Normal Forms

Normalization carried out in practice
Resulting designs are of high quality and meet the desirable properties stated previously Pays particular attention to normalization only up to 3NF, BCNF, or at most 4NF
Do not need to normalize to the highest possible normal form
Definitions of Keys and Attributes Participating in Keys

Definition of superkey and key Candidate key
If more than one key in a relation schema

One is primary key Others are secondary keys
First Normal Form

Part of the formal definition of a relation in the basic (flat) relational model Only attribute values permitted are single atomic (or indivisible) values Techniques to achieve first normal form
Remove attribute and place in separate relation Expand the key Use several atomic attributes
First Normal Form (contd.)

Does not allow nested relations
Each tuple can have a relation within it
To change to 1NF:
Remove nested relation attributes into a new relation Propagate the primary key into it Unnest relation into a set of 1NF relations
1NF
We would separate or decompose the table into two tables There is one attribute in common that is the primary key of one of the tables
Second Normal Form

Uses the concepts of FDs, primary key Definitions
Prime attribute: An attribute that is member of the primary key K

Nonprime attribute: not a member of any candidate key.
Full functional dependency: a FD Y -> Z where removal of any attribute from Y means the FD does not hold any more
Examples:
{SSN, PNUMBER} -> HOURS is a full FD since neither SSN -> HOURS nor PNUMBER -> HOURS hold {SSN, PNUMBER} -> ENAME is not a full FD (it is called a partial dependency ) since SSN -> ENAME also holds
Second Normal Form

A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key The test for 2NF only applies to relations where the primary key contains multiple attributes. If a non-prime attribute can be determined from a subset of the prime attributes or another nonprime attribute, the relation is NOT in 2NF and should be decomposed.
2NF Example
Prime attributes are Ssn and Pnumber. Non-prime attributes are Hours, Ename, Pname, Plocation
Since SSN->Ename, Ename is not fully dependent on the primary key, and FD2 violates 2NF
Also, since Pnumber-> Pname, Plocation, the RHS is not fully dependent on the primary key, so FD3 violates 2NF
General Algorithm for Decomposition
Break the original relation into two relations: One table contains the attributes from the violating FD. (Include attributes from the closure as well.) The other table contains the remaining attributes plus the attributes from the LHS of the violating FD. EMP_PROJ becomes: E1(Ssn, Ename) and E2 (Ssn, Pnumber, Hours, Pname, Plocation) from FD2 violation. Then E2 becomes (Pnumber, Pname, Plocation) and (Ssn, Pnumber, Hours) from FD3 violation.
Slide 10- 40
Normalizing into 2NF

The final 2NF Decomposition is:
Be sure there is a relation with the original primary key and the attributes that are fully dependent on it.
Third Normal Form

Based on concept of transitive dependency
Problematic FD
Left-hand side is part of primary key Left-hand side is a nonkey attribute
General Definitions of Second and Third Normal Forms
General Definition of Second Normal Form
General Definition of Third Normal Form
Boyce-Codd Normal Form

Every relation in BCNF is also in 3NF
Relation in 3NF is not necessarily in BCNF
Difference:
Condition which allows A to be prime is absent from BCNF
Most relation schemas that are in 3NF are also in BCNF

Summary
Informal guidelines for good design Functional dependency
Basic tool for analyzing relational schemas

1NF, 2NF, 3NF, BCNF, 4NF, 5NF
Normalization:

CH 15

Diunggah oleh

Informasi Dokumen

Deskripsi Asli:

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

CH 15

Diunggah oleh

Hak Cipta:

Format Tersedia

Chapter 15

Basics of Functional Dependencies and Normalization for Relational Databases

Copyright 2011 Ramez Elmasri and Shamkant Navathe

Logical (or conceptual) level Implementation (or physical storage) level

Approaches to database design:

Copyright 2011 Ramez Elmasri and Shamkant Navathe

Informal Design Guidelines for Relation Schemas

Copyright 2011 Ramez Elmasri and Shamkant Navathe

Imparting Clear Semantics to Attributes in Relations

Easier to explain semantics of relation

Copyright 2011 Ramez Elmasri and Shamkant Navathe

Copyright 2011 Ramez Elmasri and Shamkant Navathe

Copyright 2011 Ramez Elmasri and Shamkant Navathe

Candidate/Primary Keys and Functional Dependency

Copyright 2011 Ramez Elmasri and Shamkant Navathe

Primary Key and Functional Dependency Example

Copyright 2011 Ramez Elmasri and Shamkant Navathe

Copyright 2011 Ramez Elmasri and Shamkant Navathe

Copyright 2011 Ramez Elmasri and Shamkant Navathe

(StudentName, DormName, DormCost) (DormCost)

(StudentName, DormName) (DormCost)

(ClientName, MeetingDate, Duration)

(AttorneyID, ClientID) (ClientID)

(MeetingDate, Duration) (ClientName)

Copyright 2011 Ramez Elmasri and Shamkant Navathe

Copyright 2011 Ramez Elmasri and Shamkant Navathe

Copyright 2011 Ramez Elmasri and Shamkant Navathe

Copyright 2011 Ramez Elmasri and Shamkant Navathe

Copyright 2011 Ramez Elmasri and Shamkant Navathe

Copyright 2011 Ramez Elmasri and Shamkant Navathe

NULL Values in Tuples

Can end up with many NULLs

Problems with NULLs

Wasted storage space Problems understanding meaning

Copyright 2011 Ramez Elmasri and Shamkant Navathe

Copyright 2011 Ramez Elmasri and Shamkant Navathe

Generation of Spurious Tuples

Relation schemas EMP_LOCS and EMP_PROJ1

Copyright 2011 Ramez Elmasri and Shamkant Navathe

Guarantees that no spurious tuples are generated

Copyright 2011 Ramez Elmasri and Shamkant Navathe

Summary and Discussion of Design Guidelines

Copyright 2011 Ramez Elmasri and Shamkant Navathe

What FDs hold here?

Copyright 2011 Ramez Elmasri and Shamkant Navathe

SSN -> ENAME

Project number determines project name and location

PNUMBER -> {PNAME, PLOCATION}

{SSN, PNUMBER} -> HOURS

Copyright 2011 Ramez Elmasri and Shamkant Navathe

Copyright 2011 Ramez Elmasri and Shamkant Navathe

Normal Forms Based on Primary Keys

Copyright 2011 Ramez Elmasri and Shamkant Navathe

Certify whether it satisfies a certain normal form Proceeds in a top-down fashion

Normal form tests

Copyright 2011 Ramez Elmasri and Shamkant Navathe

Normalization of Relations (contd.)

Nonadditive join property

Dependency preservation property

Copyright 2011 Ramez Elmasri and Shamkant Navathe

Practical Use of Normal Forms

Do not need to normalize to the highest possible normal form