Anda di halaman 1dari 66

1

1.1 Semantics of the Relation


Attributes
GUIDELINE 1: Informally, each tuple in a relation should represent one entity or relationship instance. (Applies to individual relations and their attributes).
Attributes of different entities (EMPLOYEEs, DEPARTMENTs,

PROJECTs) should not be mixed in the same relation Only foreign keys should be used to refer to other entities Entity and relationship attributes should be kept apart as much as possible.

Bottom Line: Design a schema that can be explained easily relation by relation. The semantics of attributes should be easy to interpret.
Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright 2004 Ramez Elmasri and Shamkant Navathe

Chapter 10-7 2

Figure 10.1 A simplified COMPANY relational database schema

Note: The above figure is now called Figure 10.1 in Edition 4


Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition
Copyright 2004 Ramez Elmasri and Shamkant Navathe

Chapter 10-8 3

1.2 Redundant Information in


Tuples and Update Anomalies
Mixing attributes of multiple entities may cause problems Information is stored redundantly wasting storage Problems with update anomalies

- Insertion anomalies - Deletion anomalies - Modification anomalies

Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition


Copyright 2004 Ramez Elmasri and Shamkant Navathe

Chapter 10-9 4

EXAMPLE OF AN UPDATE ANOMALY (1)


Consider the relation:
EMP_PROJ ( Emp#, Proj#, Ename, Pname, No_hours)

Update Anomaly: Changing the name of project number P1 from Billing to CustomerAccounting may cause this update to be made for all 100 employees working on project P1.

Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 10-10 5


Copyright 2004 Ramez Elmasri and Shamkant Navathe

EXAMPLE OF AN UPDATE ANOMALY (2)


Insert Anomaly: Cannot insert a project unless an employee is assigned to . Inversely - Cannot insert an employee unless an he/she is assigned to a project. Delete Anomaly: When a project is deleted, it will result in deleting all the employees who work on that project. Alternately, if an employee is the sole employee on a project, deleting that employee would result in deleting the corresponding project.

Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 10-11 6


Copyright 2004 Ramez Elmasri and Shamkant Navathe

Figure 10.3 Two relation schemas suffering from update anomalies

Note: The above figure is now called Figure 10.3 in Edition 4


Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 10-12 7
Copyright 2004 Ramez Elmasri and Shamkant Navathe

Figure 10.4 Example States for EMP_DEPT and EMP_PROJ

Note: The above figure is now called Figure 10.4 in Edition 4


Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 10-13 8
Copyright 2004 Ramez Elmasri and Shamkant Navathe

Guideline to Redundant Information in Tuples and Update Anomalies


GUIDELINE 2: Design a schema that does not suffer from the insertion, deletion and update anomalies. If there are any present, then note them so that applications can be made to take them into account

Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 10-14 9


Copyright 2004 Ramez Elmasri and Shamkant Navathe

1.3 Null Values in Tuples


GUIDELINE 3: Relations should be designed such that their tuples will have as few NULL values as possible Attributes that are NULL frequently could be placed in separate relations (with the primary key) Reasons for nulls:
- attribute not applicable or invalid - attribute value unknown (may exist) - value known to exist, but unavailable

Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 10-15 10


Copyright 2004 Ramez Elmasri and Shamkant Navathe

1.4 Spurious Tuples


Bad designs for a relational database may result in erroneous results for certain JOIN operations The "lossless join" property is used to guarantee meaningful results for join operations

GUIDELINE 4: The relations should be designed to satisfy the lossless join condition. No spurious tuples should be generated by doing a natural-join of any relations.
Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 10-16 11
Copyright 2004 Ramez Elmasri and Shamkant Navathe

Spurious Tuples (2)


There are two important properties of decompositions: (a) non-additive or losslessness of the corresponding join (b) preservation of the functional dependencies. Note that property (a) is extremely important and cannot be sacrificed. Property (b) is less stringent and may be sacrificed. (See Chapter 11).
Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 10-17 12
Copyright 2004 Ramez Elmasri and Shamkant Navathe

Spurious Tuples
If we decompose a relation R into smaller relations and then we apply natural join on smaller relations , if the result of natural join contains many more tuples than the original set of tuples in original relation R. The extra tuples are called Spurious Tuples. Spurious Tuples are normally generated when the join is made on

attributes that are neither primary key or foreign key.


Example is on next slides

13

ENAME RAVI KIRAN SAHIL

ENO
1 3 2

PNUMBER
PROJ1 PROJ3 PROJ2

HOURS
10 10 10

PNAME
XYZ PQR DEF

PLOCATION
DELHI NOIDA DELHI

14

Example of Spurious Tuples


ENAME RAVI KIRAN PLOCATION DELHI NOIDA ENO 1 2 3 PNUMBER PROJ1 PROJ2 PROJ3 HOURS 10 10 10 PNAME XYZ DEF PQR PLOCATION DELHI DELHI NOIDA

SAHIL

DELHI

EMP_LOCS ENAME RAVI RAVI KIRAN SAHIL SAHIL ENO


1 2 3 1 2

EMP_PROJ1 PNUMBER
PROJ1 PROJ2 PROJ3 PROJ1 PROJ2

HOURS
10 10 10 10 10

PNAME
XYZ DEF PQR XYZ DEF

PLOCATION
DELHI DELHI NOIDA DELHI DELHI

Records in red color are spurious tuples

15

Need of Schema Refinement : To reduce redundant storage. What is redundant storage: same information is stored repeatedly in the database.

16

1.

Redundant Storage : some information is stored repeatedly.

2.

Update anomalies: if one copy of repeated data is updated, an inconsistency is created

unless all copies are similarly updated.

17

3. Insertion anomalies: It may not be possible to store some information unless some other information is stored as well. 4. Deletion anomalies: It may not be possible to delete some information without losing some other information as well.
18

Consider the relation:


EMP_PROJ ( Emp#, Proj#, Ename, Pname, No_hours)
EmpID 10 20 30 40 50 60 70 ProjID 006 006 007 007 008 008 009 Ename E1 E2 E3 E4 E5 E6 E7 Pname P1 P1 P2 P2 P3 P3 P4

Primary Key: EmpID and ProjID

19

EMPID 10 20 30 40 50 60 70

ENAME E1 E2 E3 E4 E5 E6 E7

Projid 006 006 007 007 008 008 009

PROJID 006

PNAME P1

007
008 009

P2
P3 P4

20

Decompositions : Is replacing a relation with a collection of smaller relations. Each of the smaller relations contains a (strict) subset of the attributes of the original relation.

21

Decomposition can also create problems than it solves. Two important questions must be asked repeatedly.
1. 2.

Do we need to decompose a relation. What problems (if any) does a given decomposition cause ?

22

For the first question : Several Normal Forms are proposed for relations. If the relation is in one of these normal forms, we know that certain kinds of problems cannot arise. But if we still need decomposition than , we must carefully choose the decomposition.

23

For the second question : two properties of decompositions are of particular interest. A. Lossless-join property: B. Dependency-preservation :

24

Lossless-join Property enables us to recover any instance of the decomposed relation from corresponding instances of the smaller relations. Dependency Preservation property enables us to enforce any constraint on the original relation by simply enforcing some constraints one each of the smaller relations.

25

A serious drawback of decomposition is that queries over the original relation may require us to join the decomposed relations. If such queries are common (regularly performed), the performance penalty of decomposing the relation may not be acceptable.

26

Let R be a relation schema and let X and Y be nonempty sets of attributes in R. We say that an instance r of R satisfies the FD X -> Y if the following holds for every pair of tuples t1 and t2 in r: If t1.x = t2.x , then t1.y = t2.y

27

A A1
A1 A2 A1

B B1
B2 B1 B1

C C1
C2 C3 C1

D D1
D1 D1 D2

FD: AB -> C Row Value of AB 1 A1B1 4 A1B1 Value of C C1 C1

28

It is natural to ask whether we need to decompose relations produced by translating an ER diagram. The following examples will illustrate why decomposition of relations produced through ER design might be necessary.

29

Only FDs that determine all attributes of relation (i.e. key constraints) can be expressed in the ER Model.

{ssn} -> {ssn, name, lot, rating, hourly_wages, hours_worked} FD: rating->hourly_wages.

30

Ssn
12369 12345 12569 47895 12567

Name
E1 E2 E3 E4 E5

Lot
48 22 35 35 35

Rating
8 8 5 5 8

Hourly_wage
10 10 7 7 10

Hours_wor ked
40 30 30 32 40

Hourly_Emps

31

Consider the following ER diagram.


PID

PName

quanti ty

Sid

SName

Parts

Contracts

Suppliers

Departments

did

dname
32

Contractid C11
C12 c13 c14

Sid S1
s2 s1 s1

Quantity Pid 12
13 14 15

did D2
D3 d2 d2

P1
P2 P1 p1

Redundant information

Company have a policy that a department purchases at most one part from any given supplier. If there is several contracts between same supplier and department, we know that the same part must be involved in all of them. FD : DS->P

33

Contractid C11
C12 c13 c14

Quantity 12
13 14 15

Sid S1
s2 s1 s1

did D2
D3 d2 d2

Sid S1 s2

Did d2 d3

pid p1 p2

34

EID

Name

since

did

dName

Employees

Works_in

Department s

lot

budget

35

EID

Name

since

did

dName

Employees

Works_in

Department s

budget

lot

36

Workers table
Eid 101 Name E1 Lot L1 Did 10 since 1999 Redundant information

102
103 104 105

E2
E3 E4 e5

L1
L2 L2 L1

10
20 20 10

1999
2001 2002 2000

All employees are assigned parking lots based on their departments. So the FD: did -> lot

37

Eid

Name E1 E2 E3 E4 e5

Did 10 10 20 20 10

since 1999 1999 2001 2002 2000

did
10 20

Lot
L1 L2

101 102 103 104 105

38

Sailor_id

Boat_id

Date

Credit_card_ no

S1 S1
S2 S2 S1

Interlake Redocean
Interlake Readocean Seanation

1 Jan 11 1 Feb 11
2 Jan 11 2 Feb 11 3 Feb 11

123696589 123696589
593569633 593569633 123696589

39

Sailor_id S1 S1 S2 S2 S1

Boat_id Interlake Redocean Interlake Readocean Seanation

Date 1 Jan 11 1 Feb 11 2 Jan 11 2 Feb 11 3 Feb 11

Sailor id

Credit card

s1
s2

123696589
593569633

40

Given set of FDs over a relation schema R, there are typically several additional FDs that hold over R whenever all of the given FDs hold. Example

Workers( ssn, name, lot, did, since) Stated fds : ssn -> did, did -> lot Implied fd: ssn -> lot

41

Let R be a relation schema and let X and Y be nonempty sets of attributes in R. We say that an instance r of R satisfies the FD X -> Y if the following holds for every pair of tuples t1 and t2 in r: If t1.x = t2.x , then t1.y = t2.y

42

The set of all FDs implied by a given set F of FDs is called the closure of F and is denoted by F+.

Informally, We can group FDs in two types 1. All FDs stated in Set F of FDs. 2. Second group, is all FDs that can be derived from Set F The rule to derive FDs from F is by using three rules called Armstrongs Axioms. These rules can applied repeatedly to infer all FDs implied by a set of FDs.

43

Reflexivity:

If Y X then X Y

(trivial dependency)

sname, sid sname

Augmentation: If X Y then XW YW course_no subj so course_no, grade subj, grade Transitivity: If X Y and Y Z then X Z eid did and did lot so eid lot

44

A trivial FD is one in which the right side contains only attributes that also appear on the left side.

45

Armstrongs Axioms are sound in that they generate only FDs in F+ when applied to a set F of FDs. They are complete in that repeated applications of these rules will generate all FDs in the closure F+.

46

Union:

If X Y and X Z then X YZ Pseudotransitivity: If X Y and WY Z then XW Z Decomposition: If X YZ then X Z

47

Proof of decomposition 1. X -> YZ (GIVEN) 2. YZ -> Y (using IR1 and knowing that YZ > Y) 3. X -> Y (using IR3 on 1 and 2)

48

Example F= {AC, BC, that ADE: 1) AC 2) ADCD 3) CDE 4) ADE

CDE}, let show


(given) (Augmentation) (given) (2, 3 and Transitivity)
49

R(C S Z) GIVEN FDS. CSZ, Z C

Z->C (given) SZ->SC (Augmentation) SZ->Z (Transitivity) SZ->C (Transitivity)

50

Question: Let R (C,S,J,D,P,Q,V) Find other FDs using rules Given FDs : 1. C->CSJDPQV 2. JP->C 3. SD->P Ans. 4. From 1 and 2 : JP -> CSJDPQV 5. Using Augmentation on 3 SDJ ->JP, refer to 4. JP -> CSDJPQV Then using transitivity : SDJ -> CSJDPQV

51

If we just want to check whether a given dependency say, X -> Y is in closure of a set of F of FDs, we can do efficiently without computing F+. We first compute the attribute closure X+ with respect to F, which is the set of attributes A such that X -> A can be inferred using the Armstrong Axioms.

52

Closure = X; Repeat until there is no change: {


if there is an FD U -> V in F such that U closure, then set closure = closure U V

53

Three types of Normal Forms


1. 2. 3.

First Normal Form (1NF) Second Normal Form (2NF) Third Normal Form (3NF) The normal forms (abbrev. NF) of relational database theory provide criteria for determining a table's degree of vulnerability to logical inconsistencies and anomalies.

Initially Codd proposed three Normal Forms and later Boyce and Codd proposed stronger definition of 3NF known as BCNF.

54

Normalization of data can be looked upon as a process of analyzing the given relation schemas based on their FDs and primary keys to achieve the desirable properties of Minimizing redundancy and 2 minimizing the insertion, deletion and updation anomalies.

The normal form of a relation refers to the highest normal form condition it meets.

55

The process of normalization should not be considered in isolation , other two properties should also be considered.
1.

2.

Losseless join or nonadditive join property : which guarantees that the spurious tuple generation problem does not occur with respect to the relation schemas created after decomposition. Dependency preservation property which ensures that each functional dependency is represented in some individual relation resulting after decomposition.

56

Definitions of Keys and Attributes Participating in Keys (2)


If a relation schema has more than one key, each is called a candidate key. One of the candidate keys is arbitrarily designated to be the primary key, and the others are called secondary keys. APrime attribute must be a member of some candidate key ANonprime attribute is not a prime attribute that is, it is not a member of any candidate key.

Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 10-33


Copyright 2004 Ramez Elmasri and Shamkant Navathe

3.2 First Normal Form


Disallows composite attributes, multivalued attributes, and nested relations; attributes whose values for an individual tuple are non-atomic
Nested Relations: multivalued attributes that are themselves composite are called nested relations.

Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 10-34


Copyright 2004 Ramez Elmasri and Shamkant Navathe

Figure 10.8 Normalization into 1NF

Note: The above figure is now called Figure 10.8 in Edition 4


Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 10-35
Copyright 2004 Ramez Elmasri and Shamkant Navathe

Figure 10.9 Normalization nested relations into 1NF

Note: The above figure is now called Figure 10.9 in Edition 4


Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 10-36
Copyright 2004 Ramez Elmasri and Shamkant Navathe

3.3 Second Normal Form (1)

Uses the concepts of FDs, primary key

Definitions: Prime attribute - attribute that is member of the some candidate key. Full functional dependency - a FD Y -> Z where removal of any attribute from Y means the FD does not hold any more
Examples: - {SSN, PNUMBER} -> HOURS is a full FD since neither SSN -> HOURS nor PNUMBER -> HOURS hold - {SSN, PNUMBER} -> ENAME is not a full FD (it is called a partial dependency ) since SSN -> ENAME also holds
Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 10-37
Copyright 2004 Ramez Elmasri and Shamkant Navathe

Second Normal Form (2) A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key

R can be decomposed into 2NF relations via The process of 2NF

normalization.
NOTE: if the primary key contains single attribute, the test need not be applied at all.

Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 10-38


Copyright 2004 Ramez Elmasri and Shamkant Navathe

2NF NORMALIZATION ENO FD1 FD2 FD3 PNUMBER HOURS ENAME PNAME LOCAIOTN

ENO FD1

PNUMBER

HOURS

ENO

ENAME FD2

PNUMBER FD3

PNAME

PLOCATION

63

3.4 Third Normal Form (1)


Definition: Transitive functional dependency - a FD X -> Z
that can be derived from two FDs X -> Y and Y -> Z Examples:

- SSN -> DMGRSSN is a transitive FD since


SSN -> DNUMBER and DNUMBER -> DMGRSSN hold - SSN -> ENAME is non-transitive since there is no set of attributes X where SSN -> X and X -> ENAME

Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 10-41


Copyright 2004 Ramez Elmasri and Shamkant Navathe

Third Normal Form (2)


A relation schema R is in third normal form (3NF) if it is in 2NF and no non-prime attribute A in R is transitively dependent on the primary key R can be decomposed into 3NF relations via the process of 3NF normalization

NOTE: In X -> Y and Y -> Z, with X as the primary key, we consider this a problem only if Y is not a candidate key. When Y is a candidate key, there is no problem with the transitive dependency . E.g., Consider EMP (SSN, Emp#, Salary ). Here, SSN -> Emp# -> Salary and Emp# is a candidate key.
Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 10-42
Copyright 2004 Ramez Elmasri and Shamkant Navathe

3NF NORMALIZATION ENAME ENO DOB ADDR DNO DNAME DMGRENO

ENAME ENO

DOB

ADDR DNO

DNO

DNAME

DMGRENO

66