Lecture 10
Telephone
Customer ID First Name Surname
Number
123 Robert Ingram 555-861-2025
555-403-1659
456 Jane Wright
555-776-4100
789 Maria Fernandez 555-808-9633
Telephone
Customer ID First Name Surname
Number
123 Robert Ingram 555-861-2025
456 Jane Wright 555-403-1659
456 Jane Wright 555-776-4100
789 Maria Fernandez 555-808-9633
Normalization:
Eliminates anomalies
CSE 303: Ashikur Rahman 9
Data Anomalies
When a database is poorly designed we get anomalies:
One person may have multiple phones, but lives in only one city
Anomalies:
• Redundancy = repeated data
• Update anomalies = Fred moves to “Bellevue”
• Deletion anomalies = Joe deletes his phone number:
what is his city ?
11
CSE 303: Ashikur Rahman
Relation Decomposition
Break the relation into two:
Name SSN PhoneNumber City
Fred 123-45-6789 206-555-1234 Seattle
Fred 123-45-6789 206-555-6543 Seattle
Joe 987-65-4321 908-555-2121 Westfield
R
A1 ... Am B1 ... Bm
t’
How about?
title, year starName
CSE 303: Ashikur Rahman 18
Examples
An FD holds, or does not hold on an instance:
Position Phone
name color
category department name, category price
color, category price
CSE 303: Ashikur Rahman 22
Armstrong’s Rules (1/3)
A1 ... Am B1 ... Bm
A1, A2, …, An B1
A1, A2, …, An B2
.....
A1, A2, …, An Bm
CSE 303: Ashikur Rahman 23
Armstrong’s Rules (2/3)
where i = 1, 2, ..., n
A1 … Am
Why ?
name color
category department name, category price
color, category price
27
{name, category}+ =
{ name, category, color, department, price }
R(A,B,C,D,E,F) A, B C
B, C A, D
D E
C, F B
R(A,B,C,D,E,F) A, B C
B, C A, D
D E
C, F B
R(A,B,C,D) A, B C
B, C D
A, D C
C, D A
A, D B
1. Compute {A,D}+
R(A,B,C,D) A, B C
B, C D B, D A
C, D A
A, D B
A+ = A, B+ = B, C+ = C, D+ = D
AB+ =ABCD, AC+=AC, AD+=ABCD,
What are all the keys?
BC+=ABCD, BD+=BD, CD+=ABCD
ABC+ = ABD+ =What + = ABCD (no need to compute– why ?)
ACDare all the superkeys that are not keys?
BCD+ = ABCD, ABCD+ = ABCD
A+ = A, B+ = B, C+ = C, D+ = D
AB+ =ABCD, AC+=AC, AD+=ABCD,
What are all the keys?
BC+=ABCD, BD+=BD, CD+=ABCD
ABC+ = ABD+ =What + = ABCD (no need to compute– why ?)
ACDare all the superkeys that are not keys?
BCD+ = ABCD, ABCD+ = ABCD
A, B C
Keys:
A, D B
AB , AD
B D
A+ = A, B+ = BD, C+ = C, D+ = D
AB+ =ABCD, AC+=AC, AD+=ABCD,
What are all the keys?
BC+=BCD, BD+=BD, CD+=CD
ABC+ = ABD+ What
= ACD + = ABCD (no need to compute– why ?)
are all the superkeys that are not keys?
BCD+ = BCD, ABCD+ = ABCD
A, B C
Superkeys that are not Keys:
A, D B
ABC , ABD, ACD, ABCD
B D
A+ = A, B+ = BD, C+ = C, D+ = D
AB+ =ABCD, AC+=AC, AD+=ABCD,
What are all the keys?
BC+=BCD, BD+=BD, CD+=CD
ABC+ = ABD+ What
= ACD + = ABCD (no need to compute– why ?)
are all the superkeys that are not keys?
BCD+ = BCD, ABCD+ = ABCD
student address
room, time course
student, course room, time
ABC ABC
or
BCA BAC
what are the keys here ?
52
Can you design FDs such that there are three keys ?
Eliminating Anomalies
Main idea:
• X A is OK if X is a (super)key
• X A is not OK otherwise
In other words: there are no “bad” FDs, that is the left side of
every nontrivial FD must be a super key.
Equivalently:
" X, either (X+ = X) or (X+ = all attributes)
CSE 303: Ashikur Rahman 55
All two-attribute relations are in BCNF
R(A,B)
Case 1: Suppose there is no dependency
Nothing is being violated, so fine
R1 R2 57
Example
Name SSN PhoneNumber City
Fred 123-45-6789 206-555-1234 Seattle
Fred 123-45-6789 206-555-6543 Seattle
Joe 987-65-4321 908-555-2121 Westfield
Joe 987-65-4321 908-555-1234 Westfield
let Y = X+ - X
let Z = [all attributes] - X+
decompose R into R1(X Y) and R2(X Z)
continue to decompose recursively R1 and R2
CSE 303: Ashikur Rahman 60
R(A,B,C,D) Find X s.t.: X ≠X+ ≠ [all attributes]
A B
Example BC
R(A,B,C,D)
A+ = ABC ≠ ABCD
R1(A,B,C) R2(A,D)
B+ = BC ≠ ABC
What are
R11(B,C) R12(A,B) the keys ?
Iteration 1: Person
SSN+ = SSN, name, age, hairColor
Decompose into: P(SSN, name, age, hairColor)
Phone(SSN, phoneNumber)
Iteration 2: P
SSN+ = SSN, name, age, hairColor
age+ = age, hairColor What are
Decompose: People(SSN, name, age) the keys ?
Hair(age, hairColor)
Phone(SSN, phoneNumber) 62
Decompositions in General
R(A1, ..., An, B1, ..., Bm, C1, ..., Cp)
R1(A1, ..., An, B1, ..., Bm) R2(A1, ..., An, C1, ..., Cp)