Anda di halaman 1dari 11

# Ch 6 Normalization Part 2 1 Relational Databases Design Algorithms . Relational Decomposition .

We start with a universal relation schema R containing all the database attributes: R={A1,A2,...,An} . The design goal is a decomposition D of R into m relation schemas: D={R1,R2,...,Rm} . Each relation schema Ri contains a subset of the attributes of R . Every attribute in R should appear in at least one Ri . The Dependency Preservation Property . The database designers define a set F of functional dependencies that should hold on the attributes of R . R should preserve the dependencies; informally, the collection of all dependencies that hold on the individual relations Ri should be equivalent to F . Formally

. Define the projection of F on Ri, denoted by F(Ri), to be the set of FDs X Y in F+ such that (X Y) Ri . A decomposition D={R1,R2,...,Rm} is dependency preserving if (F(R1)F(R2)...F(Rm))+=F+ . This property makes it possible to ensure that the FDs in F hold simply by ensuring that the dependencies on each relation Ri hold individually . There is an algorithm to decompose R into a dependency preserving decomposition D={R1,R2,...,Rm} with respect to F such that each Ri is in 3NF . Called the relational synthesis algorithm 1. Find a minimal set of FDs G equivalent to F 2. For each X of an FD X A in G Create a relation schema Ri in D with the attributes {XA1A2...Ak} where the Aj's are all the attributes appearing in an FD in G with X as left hand side 3. If any attributes in R are not placed in any Ri, create another relation in D for these attributes . It can be proven that all dependencies in F are preserved
2

and that all relation schemas in D are in 3NF . Problems . Must find a minimal cover G for F . No efficient algorithm for finding a minimal cover . Several minimal covers can exist for F; the result of the algorithm can be different depending on which is chosen . The Lossless (Non-Additive) Join Property . Informally, this property ensures that no spurious tuples appear when the relations in the decomposition are joined . Formally . A decomposition D={R1,R2,...,Rm} of R has the lossless join property with respect to a set F of FDs if, for every relation instance r(R) whose tuples satisfy all the FDs in F, we have, (R1(r(R))* R2(r(R))*...*Rm(r(R))=r(R) . This condition ensures that whenever a relation instance r(R) satisfies F, no spurious tuples are generated by joining the decomposed relations r(Ri)

. Since we actually store the decomposed relations as base relations, this condition is necessary to generate meaningful results for queries involving joins . There is an algorithm for testing whether a decomposition D satisfies the lossless join property with respect to a set F of FDs . There is an algorithm for decomposing R into BCNF relations such that the decomposition has the lossless join property with respect to a set of FDs on R 1. Set D {R} 2. While there is a relation schema Q in D that is not in BCNF do begin choose one Q in D that is not in BCNF; find a FD X Y in Q that violates BCNF; replace Q in D by two relation schemas (Q-Y) and (XY) end; . This is based on two properties of lossless join decomposition, 1. The decomposition D={R1,R2} of R has the lossless join property with respect to F if and only if either,

. the FD (R1R2) (R1 - R2) is in F+, or . the FD (R1R2) (R2 - R1) is in F+ 2. If D={R1,R2,...,Rm} of R has the lossless join property with respect to F, and D1={Q1,Q2,...,Qk} of Ri has the lossless join property with respect to Ri(F), then D={R1,R2,...,Ri-1,Q1,Q2,...,Qk,Ri+1,...,Rm} has the lossless join property respect to F . There is no algorithm for decomposition into BCNF relations that is dependency preserving . A modification of the synthesis algorithm guarantees both the lossless join and dependency preserving properties but into 3NF relations (not BCNF) . Fortunately, many 3NF relations are also in BCNF . Lossless join and dependency preserving decomposition into 3NF relations 1. Find a minimal set of FDs G equivalent to F 2. For each X of an FD X Y in G Create a relation schema Ri in D with the attributes {XA1A2...Ak} where the Aj's are all the attributes
5

appearing in an FD in G with X as left hand side 3. If any attributes in R are not placed in any Ri, create another relation in D for these attributes 4. If none of the relations in D contain a key of R, create a relation that contains a key of R and add it to D . Null Values and Dangling Tuples . Null values may create problems if they appear as join attributes . The distinction between the result of a regular join and an outer join becomes important when specifying queries . Some queries require regular join and other queries require outer join . Dangling tuples . Sometimes the attributes of a relation are "partitioned" into several relations with the primary key repeated in each of the relations . Tuples whose primary key does not appear in all of the relations are called "dangling tuples"

. If a regular join is taken on the relations on the primary key to rebuild the tuples, the dangling tuples do not appear in the result 2 Further Dependencies and Normal Forms . Functional dependencies (FDs) are used to specify one very common type of constraint . Other types of constraints cannot be specified by FDs alone . Additional dependencies include multivalued dependencies (MVDs), join dependencies (JDs), inclusion dependencies . Some dependencies lead to normal forms beyond 3NF and BCNF . Multivalued Dependencies and 4NF . Informally, a set of attributes X multidetermines a set of attributes Y if the value of X determines a set of values for Y (independently of any other attributes) . A multi-valued dependency (MVD) is written as X Y . Specifies a constraint on all relation instances r(R)

. Formally . Let R be a relation schema; let X and Y be subsets of the attributes in R; and let Z=R-(XY) (the remaining attributes) . An MVD X Y holds in R if whenever two tuples t1 and t2 exist in a relation instance r(R) with t1[X]=t2[X], then two tuples t3 and t4 must also exist in r(R) such that the following holds, t3[X]=t4[X]=t1[X]=t2[X] t3[Y]=t1[Y] and t4[Y]=t2[Y] t3[Z]=t2[Z] and t4[Z]=t1[Z] . The MVD constraint implies that a value of X determines a set of values of Y independently from the values of Z . Property of MVD: If X Y holds, then X Z also holds . An MVD X Y is called a trivial MVD if either, (a) Y X or, (b)(XY)=R . A trivial MVD always holds according to the formal MVD definition . Given a set F of FDs and MVDs, we can infer additional FDs and MVDs that hold whenever the dependencies in F
8

hold . Sound and complete set of inference rules for FDs and MVDs: 1. (Reflexive for FDs) if Y X, then XY 2. (Augmentation for FDs) if XY, then XZYZ (Notation: XZ stands for X Z) 3. (Transitive for FDs) If XY and YZ, then XZ 4. (Complementation for MVDs) If X Y, then X Z (where Z=R-(XY)) 5. (Augmentation for MVDs) If X Y and Z W, then WX YZ 6. (Transitive rule for MVDs) If X Y and Y Z, then X (Z-Y) 7. (Replication rule FD to MVD) If XY, then X Y 8. (Coalescence rule for MVDs) If X Y and there exists W such that (a)Z Y, (b) WZ, and (c) WY is empty, then WX YZ . Notes . By rule 7, every FD is also an MVD . 1 to 8 can derive the closure F+ of a set of dependencies F . Fourth Normal Form (4NF) . 3NF and BCNF do not deal with multivalued dependencies

. A relation schema with some non-trivial MVDs may not be a good design . 4NF takes care of these problems, and implies BCNF (every relation in 4NF is also in BCNF) . Formal definition of 4NF . A relation schema R is in 4NF with respect to a set of (functional and multivalued) dependencies F if for every nontrivial multivalued dependency X Y in F+, X is a superkey of R . Notes . Since every MVD is an FD, 4NF implies BCNF . If all dependencies in F are FDs, the definition for 4NF becomes the definition for BCNF . There is an algorithm for decomposing R into 4NF relations such that the decomposition has the lossless join property with respect to a set F of FDs and MVDs on R . Algorithm 1. Set D{R} 2. While there is a relation schema Q in D that is not in 4NF do

10

begin choose one Q in D that is not in 4NF; find a nontriivial MVDX Y in Q that violates 4NF; replace Q in D by two relation schemas (Q-Y) and (XY) end; . Join Dependencies and 5NF . A join dependency JD (R1, R2, ..., Rn) is a constraint on R . Specifies that every legal instance r(R) should have a lossless join decomposition into R1, R2, ..., Rn . An MVD is a special case of a JD where n=2 . A JD (R1, R2, ..., Rn) is a trivial JD if some Ri=R . Fifth normal form (5NF) . A relation schema R is in 5NF with respect to a set of F of FDs, MVDs, and JDs if for every nontrivial JD (R1,R2,...,Rn), each Ri is a superkey of R . 5NF is also called PJNF (project-join normal form)

11