Anda di halaman 1dari 49

Chapter 15

Basics of Functional Dependencies and Normalization for Relational Databases


Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley

Chapter 15 Outline
Informal Design Guidelines for Relation Schemas Functional Dependencies Normal Forms Based on Primary Keys General Definitions of Second and Third Normal Forms Boyce-Codd Normal Form

Copyright 2011 Ramez Elmasri and Shamkant Navathe

Introduction
Levels at which we can discuss goodness of relation schemas

Logical (or conceptual) level Implementation (or physical storage) level

Approaches to database design:

Bottom-up or top-down

Copyright 2011 Ramez Elmasri and Shamkant Navathe

Informal Design Guidelines for Relation Schemas


Measures of quality

Making sure attribute semantics are clear Reducing redundant information in tuples Reducing NULL values in tuples Disallowing possibility of generating spurious tuples

Copyright 2011 Ramez Elmasri and Shamkant Navathe

Imparting Clear Semantics to Attributes in Relations


Semantics of a relation

Meaning resulting from interpretation of attribute values in a tuple Indicates better schema design

Easier to explain semantics of relation

Copyright 2011 Ramez Elmasri and Shamkant Navathe

Functional Dependency
A relationship between attributes in which one attribute (or group of attributes) determines the value of another attribute in the same table Illustration

The price of one cookie can determine the price of a box of 12 cookies
(CookiePrice, Qty) BoxPrice

Copyright 2011 Ramez Elmasri and Shamkant Navathe

Determinants
The attribute (or attributes) that we use as the starting point (the variable on the left side of the equation) is called a determinant

(CookiePrice, Qty)

BoxPrice

Determinant

Copyright 2011 Ramez Elmasri and Shamkant Navathe

Candidate/Primary Keys and Functional Dependency


By definition A candidate key of a relation will functionally determine all other attributes in the row Likewise, by definition A primary key of a relation will functionally determine all other attributes in the row

Copyright 2011 Ramez Elmasri and Shamkant Navathe

Primary Key and Functional Dependency Example


(EmployeeID) (EmpLastName, EmpPhone)

(ProjectID)

(ProjectName, StartDate)

Copyright 2011 Ramez Elmasri and Shamkant Navathe

Normalization
A process of analyzing a relation to ensure that it is well formed More specifically, if a relation is normalized (well formed), rows can be inserted, deleted or modified without creating update anomalies

Copyright 2011 Ramez Elmasri and Shamkant Navathe

Normalization Principles
Relational design principles for normalized relations:

To be a well-formed relation, every determinant must be a candidate key Any relation that is not well formed should be broken into two or more well-formed relations.

Copyright 2011 Ramez Elmasri and Shamkant Navathe

Normalization Example
(StudentID)
However, if

(StudentName, DormName, DormCost) (DormCost)

(DormName)

Then DormCost should be placed into its own relation, resulting in the relations:

(StudentID) (DormName)
Copyright 2011 Ramez Elmasri and Shamkant Navathe

(StudentName, DormName) (DormCost)

Normalization Example
(AttorneyID, ClientID)
However, if

(ClientName, MeetingDate, Duration)


(ClientName)

(ClientID)

Then ClientName should be placed into its own relation, resulting in the relations:

(AttorneyID, ClientID) (ClientID)

(MeetingDate, Duration) (ClientName)

Copyright 2011 Ramez Elmasri and Shamkant Navathe

Guideline 1
Design relation schema so that it is easy to explain its meaning Do not combine attributes from multiple entity types and relationship types into a single relation (unless the attribute is a foreign KEY) Example of violating Guideline 1: Figure 15.3
Copyright 2011 Ramez Elmasri and Shamkant Navathe

Guideline 1 (contd.)

Although there is nothing wrong logically with these two relations, they violate Guideline 1 by mixing attributes from distinct real-world entities: EMP_ DEPT mixes attributes of employees and departments, and EMP_ PROJ mixes attributes of employees and projects and the WORKS_ ON relationship.

Copyright 2011 Ramez Elmasri and Shamkant Navathe

Guideline 2
Design base relation schemas so that no anomalies are present in the relations If any anomalies are present:

Note them clearly Make sure that the programs that update the database will operate correctly

Copyright 2011 Ramez Elmasri and Shamkant Navathe

Guideline 2 (contd)
Insert Anomaly: To insert a new tuple for an employee who works in department number 5, we must enter all the attribute values of department 5 correctly so that they are consistent with the corresponding values for department 5 in other tuples in EMP_ DEPT.

Copyright 2011 Ramez Elmasri and Shamkant Navathe

Guideline 2 (contd)
Delete Anomaly: If we delete from EMP_ DEPT an employee tuple that happens to represent the last employee working for a particular department, the information concerning that department is lost from the database.

Copyright 2011 Ramez Elmasri and Shamkant Navathe

Guideline 2 (contd)
Modification Anomaly: In EMP_ DEPT, if we change the value of one of the attributes of a particular department say, the manager of department 5 we must update the tuples of all employees who work in that department; otherwise, the database will become inconsistent. If we fail to update some tuples, the same department will be shown to have two different values for manager in different employee tuples, which would be wrong.

Copyright 2011 Ramez Elmasri and Shamkant Navathe

NULL Values in Tuples


May group many attributes together into a fat relation

Can end up with many NULLs

Problems with NULLs

Wasted storage space Problems understanding meaning

Copyright 2011 Ramez Elmasri and Shamkant Navathe

Guideline 3
Avoid placing attributes in a base relation whose values may frequently be NULL If NULLs are unavoidable:

Make sure that they apply in exceptional cases only, not to a majority of tuples

Copyright 2011 Ramez Elmasri and Shamkant Navathe

Generation of Spurious Tuples


Figure 15.5(a)

Relation schemas EMP_LOCS and EMP_PROJ1

NATURAL JOIN

Result produces many more tuples than the original set of tuples in EMP_PROJ Called spurious tuples Represent spurious information that is not valid

Copyright 2011 Ramez Elmasri and Shamkant Navathe

Guideline 4
Design relation schemas to be joined with equality conditions on attributes that are appropriately related

Guarantees that no spurious tuples are generated

Avoid relations that contain matching attributes that are not (foreign key, primary key) combinations

Copyright 2011 Ramez Elmasri and Shamkant Navathe

Summary and Discussion of Design Guidelines


Anomalies cause redundant work to be done Waste of storage space due to NULLs Difficulty of performing operations and joins due to NULL values Generation of invalid and spurious data during joins

Copyright 2011 Ramez Elmasri and Shamkant Navathe

What FDs hold here?

Copyright 2011 Ramez Elmasri and Shamkant Navathe

Examples of FD constraints
Social security number determines employee name

SSN -> ENAME

Project number determines project name and location

PNUMBER -> {PNAME, PLOCATION}

Employee ssn and project number determines the hours per week that the employee works on the project

{SSN, PNUMBER} -> HOURS

Copyright 2011 Ramez Elmasri and Shamkant Navathe

FDs are a property of the meaning of data and hold at all times: certain FDs can be ruled out based on a given state of the database

Copyright 2011 Ramez Elmasri and Shamkant Navathe

Normal Forms Based on Primary Keys


Normalization process Approaches for relational schema design

Perform a conceptual schema design using a conceptual model then map conceptual design into a set of relations Design relations based on external knowledge derived from existing implementation of files or forms or reports

Copyright 2011 Ramez Elmasri and Shamkant Navathe

Normalization of Relations
Takes a relation schema through a series of tests

Certify whether it satisfies a certain normal form Proceeds in a top-down fashion

Normal form tests

Copyright 2011 Ramez Elmasri and Shamkant Navathe

Normalization of Relations (contd.)


Properties that the relational schemas should have:

Nonadditive join property


Extremely critical No spurious tuples

Dependency preservation property


Desirable but sometimes sacrificed for other factors

Copyright 2011 Ramez Elmasri and Shamkant Navathe

Practical Use of Normal Forms


Normalization carried out in practice

Resulting designs are of high quality and meet the desirable properties stated previously Pays particular attention to normalization only up to 3NF, BCNF, or at most 4NF

Do not need to normalize to the highest possible normal form

Copyright 2011 Ramez Elmasri and Shamkant Navathe

Definitions of Keys and Attributes Participating in Keys


Definition of superkey and key Candidate key

If more than one key in a relation schema


One is primary key Others are secondary keys

Copyright 2011 Ramez Elmasri and Shamkant Navathe

First Normal Form


Part of the formal definition of a relation in the basic (flat) relational model Only attribute values permitted are single atomic (or indivisible) values Techniques to achieve first normal form

Remove attribute and place in separate relation Expand the key Use several atomic attributes
Copyright 2011 Ramez Elmasri and Shamkant Navathe

First Normal Form (contd.)


Does not allow nested relations

Each tuple can have a relation within it

To change to 1NF:

Remove nested relation attributes into a new relation Propagate the primary key into it Unnest relation into a set of 1NF relations

Copyright 2011 Ramez Elmasri and Shamkant Navathe

Copyright 2011 Ramez Elmasri and Shamkant Navathe

1NF
We would separate or decompose the table into two tables There is one attribute in common that is the primary key of one of the tables

Copyright 2011 Ramez Elmasri and Shamkant Navathe

Second Normal Form


Uses the concepts of FDs, primary key Definitions

Prime attribute: An attribute that is member of the primary key K


Nonprime attribute: not a member of any candidate key.

Full functional dependency: a FD Y -> Z where removal of any attribute from Y means the FD does not hold any more

Examples:

{SSN, PNUMBER} -> HOURS is a full FD since neither SSN -> HOURS nor PNUMBER -> HOURS hold {SSN, PNUMBER} -> ENAME is not a full FD (it is called a partial dependency ) since SSN -> ENAME also holds
Copyright 2011 Ramez Elmasri and Shamkant Navathe

Second Normal Form


A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key The test for 2NF only applies to relations where the primary key contains multiple attributes. If a non-prime attribute can be determined from a subset of the prime attributes or another nonprime attribute, the relation is NOT in 2NF and should be decomposed.

Copyright 2011 Ramez Elmasri and Shamkant Navathe

2NF Example
Prime attributes are Ssn and Pnumber. Non-prime attributes are Hours, Ename, Pname, Plocation

Since SSN->Ename, Ename is not fully dependent on the primary key, and FD2 violates 2NF

Also, since Pnumber-> Pname, Plocation, the RHS is not fully dependent on the primary key, so FD3 violates 2NF

Copyright 2011 Ramez Elmasri and Shamkant Navathe

General Algorithm for Decomposition

Break the original relation into two relations: One table contains the attributes from the violating FD. (Include attributes from the closure as well.) The other table contains the remaining attributes plus the attributes from the LHS of the violating FD. EMP_PROJ becomes: E1(Ssn, Ename) and E2 (Ssn, Pnumber, Hours, Pname, Plocation) from FD2 violation. Then E2 becomes (Pnumber, Pname, Plocation) and (Ssn, Pnumber, Hours) from FD3 violation.
Copyright 2011 Ramez Elmasri and Shamkant Navathe

Slide 10- 40

Normalizing into 2NF


The final 2NF Decomposition is:

Be sure there is a relation with the original primary key and the attributes that are fully dependent on it.
Copyright 2011 Ramez Elmasri and Shamkant Navathe

Third Normal Form


Based on concept of transitive dependency

Problematic FD

Left-hand side is part of primary key Left-hand side is a nonkey attribute

Copyright 2011 Ramez Elmasri and Shamkant Navathe

General Definitions of Second and Third Normal Forms

Copyright 2011 Ramez Elmasri and Shamkant Navathe

General Definition of Second Normal Form

Copyright 2011 Ramez Elmasri and Shamkant Navathe

Copyright 2011 Ramez Elmasri and Shamkant Navathe

General Definition of Third Normal Form

Copyright 2011 Ramez Elmasri and Shamkant Navathe

Boyce-Codd Normal Form


Every relation in BCNF is also in 3NF

Relation in 3NF is not necessarily in BCNF

Difference:

Condition which allows A to be prime is absent from BCNF

Most relation schemas that are in 3NF are also in BCNF


Copyright 2011 Ramez Elmasri and Shamkant Navathe

Copyright 2011 Ramez Elmasri and Shamkant Navathe

Summary
Informal guidelines for good design Functional dependency

Basic tool for analyzing relational schemas


1NF, 2NF, 3NF, BCNF, 4NF, 5NF

Normalization:

Copyright 2011 Ramez Elmasri and Shamkant Navathe

Anda mungkin juga menyukai