Chapter 15 Outline
Informal Design Guidelines for Relation Schemas Functional Dependencies Normal Forms Based on Primary Keys General Definitions of Second and Third Normal Forms Boyce-Codd Normal Form
Introduction
Levels at which we can discuss goodness of relation schemas
Bottom-up or top-down
Making sure attribute semantics are clear Reducing redundant information in tuples Reducing NULL values in tuples Disallowing possibility of generating spurious tuples
Meaning resulting from interpretation of attribute values in a tuple Indicates better schema design
Functional Dependency
A relationship between attributes in which one attribute (or group of attributes) determines the value of another attribute in the same table Illustration
The price of one cookie can determine the price of a box of 12 cookies
(CookiePrice, Qty) BoxPrice
Determinants
The attribute (or attributes) that we use as the starting point (the variable on the left side of the equation) is called a determinant
(CookiePrice, Qty)
BoxPrice
Determinant
(ProjectID)
(ProjectName, StartDate)
Normalization
A process of analyzing a relation to ensure that it is well formed More specifically, if a relation is normalized (well formed), rows can be inserted, deleted or modified without creating update anomalies
Normalization Principles
Relational design principles for normalized relations:
To be a well-formed relation, every determinant must be a candidate key Any relation that is not well formed should be broken into two or more well-formed relations.
Normalization Example
(StudentID)
However, if
(DormName)
Then DormCost should be placed into its own relation, resulting in the relations:
(StudentID) (DormName)
Copyright 2011 Ramez Elmasri and Shamkant Navathe
Normalization Example
(AttorneyID, ClientID)
However, if
(ClientID)
Then ClientName should be placed into its own relation, resulting in the relations:
Guideline 1
Design relation schema so that it is easy to explain its meaning Do not combine attributes from multiple entity types and relationship types into a single relation (unless the attribute is a foreign KEY) Example of violating Guideline 1: Figure 15.3
Copyright 2011 Ramez Elmasri and Shamkant Navathe
Guideline 1 (contd.)
Although there is nothing wrong logically with these two relations, they violate Guideline 1 by mixing attributes from distinct real-world entities: EMP_ DEPT mixes attributes of employees and departments, and EMP_ PROJ mixes attributes of employees and projects and the WORKS_ ON relationship.
Guideline 2
Design base relation schemas so that no anomalies are present in the relations If any anomalies are present:
Note them clearly Make sure that the programs that update the database will operate correctly
Guideline 2 (contd)
Insert Anomaly: To insert a new tuple for an employee who works in department number 5, we must enter all the attribute values of department 5 correctly so that they are consistent with the corresponding values for department 5 in other tuples in EMP_ DEPT.
Guideline 2 (contd)
Delete Anomaly: If we delete from EMP_ DEPT an employee tuple that happens to represent the last employee working for a particular department, the information concerning that department is lost from the database.
Guideline 2 (contd)
Modification Anomaly: In EMP_ DEPT, if we change the value of one of the attributes of a particular department say, the manager of department 5 we must update the tuples of all employees who work in that department; otherwise, the database will become inconsistent. If we fail to update some tuples, the same department will be shown to have two different values for manager in different employee tuples, which would be wrong.
Guideline 3
Avoid placing attributes in a base relation whose values may frequently be NULL If NULLs are unavoidable:
Make sure that they apply in exceptional cases only, not to a majority of tuples
NATURAL JOIN
Result produces many more tuples than the original set of tuples in EMP_PROJ Called spurious tuples Represent spurious information that is not valid
Guideline 4
Design relation schemas to be joined with equality conditions on attributes that are appropriately related
Avoid relations that contain matching attributes that are not (foreign key, primary key) combinations
Examples of FD constraints
Social security number determines employee name
Employee ssn and project number determines the hours per week that the employee works on the project
FDs are a property of the meaning of data and hold at all times: certain FDs can be ruled out based on a given state of the database
Perform a conceptual schema design using a conceptual model then map conceptual design into a set of relations Design relations based on external knowledge derived from existing implementation of files or forms or reports
Normalization of Relations
Takes a relation schema through a series of tests
Resulting designs are of high quality and meet the desirable properties stated previously Pays particular attention to normalization only up to 3NF, BCNF, or at most 4NF
Remove attribute and place in separate relation Expand the key Use several atomic attributes
Copyright 2011 Ramez Elmasri and Shamkant Navathe
To change to 1NF:
Remove nested relation attributes into a new relation Propagate the primary key into it Unnest relation into a set of 1NF relations
1NF
We would separate or decompose the table into two tables There is one attribute in common that is the primary key of one of the tables
Full functional dependency: a FD Y -> Z where removal of any attribute from Y means the FD does not hold any more
Examples:
{SSN, PNUMBER} -> HOURS is a full FD since neither SSN -> HOURS nor PNUMBER -> HOURS hold {SSN, PNUMBER} -> ENAME is not a full FD (it is called a partial dependency ) since SSN -> ENAME also holds
Copyright 2011 Ramez Elmasri and Shamkant Navathe
2NF Example
Prime attributes are Ssn and Pnumber. Non-prime attributes are Hours, Ename, Pname, Plocation
Since SSN->Ename, Ename is not fully dependent on the primary key, and FD2 violates 2NF
Also, since Pnumber-> Pname, Plocation, the RHS is not fully dependent on the primary key, so FD3 violates 2NF
Break the original relation into two relations: One table contains the attributes from the violating FD. (Include attributes from the closure as well.) The other table contains the remaining attributes plus the attributes from the LHS of the violating FD. EMP_PROJ becomes: E1(Ssn, Ename) and E2 (Ssn, Pnumber, Hours, Pname, Plocation) from FD2 violation. Then E2 becomes (Pnumber, Pname, Plocation) and (Ssn, Pnumber, Hours) from FD3 violation.
Copyright 2011 Ramez Elmasri and Shamkant Navathe
Slide 10- 40
Be sure there is a relation with the original primary key and the attributes that are fully dependent on it.
Copyright 2011 Ramez Elmasri and Shamkant Navathe
Problematic FD
Difference:
Summary
Informal guidelines for good design Functional dependency
Normalization: