Anda di halaman 1dari 13

Normalization : 1NF, 2NF, 3NF, BCNF, 4NF, and 5NF

Normalization:
Normalization refers to rigorous standards for good design, designed formally, and methods for
testing a DB’s design as given below.To design a good relational database schema, by which an existing
schema is modified to bring its component tables into compliance (fulfillment) with a series of progressive
normal forms as below.

Normalization is a process of looking at the table(relational schema) in the RDBMS to test whether
they pass a series of test. The tests are used to avoid update anomalies, redundancy, and ambiguity from the
data base schema. If the table fails (bad relation) in a test, the solution is decomposing the attributes of bad
relation (table) into smaller tables.

The goal of database normalization is to ensure that every non-key column in every table is directly
dependent on the key, the whole key and nothing but the key and with this goal come benefits in the form of
reduced redundancies, fewer anomalies, and improved efficiencies (While normalization tends to increase
the duplication of data, it does not introduce redundancy, which is unnecessary duplication.).
Normalization up to BC normal form (BCNF) is based on functional dependencies and key
constraints.

About the Key:

Column(s) C is primary key for table T if:

Property 1: All columns in T are functionally dependent on C

Property 2: No sub-collection of columns in C (assuming C is a collection of columns and not


just a single column) has Property 1

Superkey:

Any combination of attributes that uniquely identifies each tuple in a relation is .


Candidate Keys:

Any minimal combination of attributes that uniquely identifies each tuple in a relation – you can’t
remove any attribute and still have unique identification.
Alternate Keys / Secondary Keys:

Candidate keys not chosen as primary keys are called alternate keys.
Primary key:

The candidate key chosen to be the unique identifier for a relation is called Primary key.

Prime Attribute:

An attribute that is a member of a candidate key is called Prime Attribute.

Non-Prime Attribute:

An attribute that is not a member of a candidate key is called Prime Attribute.

1. First Normal Form:

The first normal form (or 1NF) requires that the values in each column of a table are atomic. By
atomic we mean that there are no multiple values in a column.

Example:

1NF isn’t very interesting – it is a stepping stone to others. For instance, if we have (s#, name, city),
where each supplier may have several branch locations in different cities, then the relation has a domain that
allows sets of cities for values, thus they aren’t atomic.
Not in 1NF

Normalized to 1NF.
The solution is to decompose the table into 2 tables: (s#, name) and (s#, location).

2. Second Normal Form:

A relation is in 2NF if and only if it is in 1NF and every nonprime attribute is fully functionally
dependent on the primary (Composite) key.

Example 1:

Where, the Underline represents the Primary Key attributes and the arrows represent the
Functional Dependencies of the relation R1.

The primary key is (s#, p#), and FDs are


city  tax
s#  tax,city
s#,p#  qty,

Where tax is determined by the city, the tax and city are determined by s# and qty is determined by both the

s#,p# . Note that this structure will have update anomalies as below.
Anomalies because of Partial Dependancy:

Insert. You cannot enter the existence of a new supplier and city unless that supplier is shipping a part. This
is because of the integrity rule, that all fields in the key must have values. p# is part of the key.
Delete. If a supplier has only one shipment, and it gets deleted, you also delete all knowledge of that
supplier, such as the city.
Update. Because of the redundancy, if the city of a supplier moves, then you must either find all
occurrences of the supplier and change the city or change one occurrence, and have an inconsistent DB.

The solution is to divide this into two tables, where the key of the new table will be the one that was
independently determining the values of some attributes. So now we have

The functional dependency diagram shows that each of them now contains attributes that are fully
dependent on the primary key of each relation. Insert – can now insert the existence of a supplier into R2a,
without a shipment. Delete – can now delete a shipment from R2b without losing information about the
supplier. Update – can now update the city in only one place.

3. Third Normal Form:

A relation is in 3NF if and only if it is in 2NF and every nonprime attribute is non-transitively
dependent on the primary key.

Third Normal Form (3NF) requires that all columns depend directly on the primary key. Tables
violate the Third Normal Form when one column depends on another column, which in turn depends on
the primary key (a transitive dependency).

One way to identify transitive dependencies is to look at your table and see if any columns would
require updating if another column in the table was updated. If such a column exists, it probably violates
3NF.

A transitive dependence is when


r.a→r.b and r.b→r.c hold. Therefore, the transitive dependency
r.a→r.c also holds. This can be seen in the functional dependency for R2a.

Tax rate is dependent on city. City is dependent on s#. Therefore, tax rate is dependent on s# through city.
This shows by the fact that there are arrows that originate from places other than the key. This also gives
anomalies.
Anomalies because of transitive dependency:

Insert – cannot enter that a city has a tax rate unless we have a supplier there. Again, this is because we
cannot have a null primary key.

Delete – if there is only one supplier in a city, when we delete the supplier, we delete the tax information for
that city.

Update – if we change the tax rate for a city, we must either 1. find all suppliers in that city and change the
status for it or 2. change only one and have an inconsistent DB.

The solution is to break the relation into two relations. The point here is to get rid of the extra arrows, and
make simple functional dependencies. So the two new relations are

Now the functional dependency diagrams are simple, there are no transitive dependencies, all
attributes are fully dependent on the key, and they are in 3NF.

4. Boyce-Codd Normal Form (BCNF):

A relation is in BCNF if and only if every determinant is a candidate key. A determinant is any
attribute on which another attribute is functionally dependent.
This Normal Form was developed to deal with relations where the relation has multiple candidate
keys, and the candidate keys are composite, and the candidate keys are overlapped. When a relation has
more than one candidate key, anomalies may occur even though the relation is in 3NF.
BCNF is based on the concept of a determinant. A determinant is any attribute (simple or composite)
on which some other attribute is fully functionally dependent. A relation is in BCNF is, and only if, every
determinant is a candidate key.
Example:

Where, sname is also unique. The candidate keys are (s#, p#) and (sname, p#). s# determines sname,
and sname determines s#, so they are both determinants. But they are not candidate keys – they are part of
different candidate keys.
update anomaly because of multiple Determinant:
If you update the sname in one tuple, you must either update it in all tuples with the same s#, or be
inconsistent. Ditto with p#.
The solution is to make two projections of R5:

Now, each determinant in each relation is also a candidate key. You can update sname or s# in one
place (taking into account the issues of foreign keys).

Complexity of more tables:

A complete normalization of tables is desirable, but you may find that in practice that full
normalization can introduce complexity to your design and application. More tables often lead more JOIN
operations, and in most database management systems (DBMSs) such JOIN operations can be costly,
leading to decreased performance.

The key lies in finding a balance where the first three normal forms are generally met without
creating an exceedingly complicated schema.
5. Multivalued Dependencies – 4NF:

A problem with multi-valued dependencies (MVD) occurs when you are trying to express two
independent 1:N relations, or multi-valued attributes, in the same relation. For example, in your initial
design process, you may have seen something like:

Example:

Where, the manager is associated (multi-determines) a set of phone numbers, and also a set of
employees, but the phone numbers and the employees have nothing to do with each other. Of course, you
can’t have a relation that looks like the ones above – it is excluded by 1NF.

You are trying to express the idea that the manager is associated with a set of employees, and a set of phone
numbers, but that the employees and the phone numbers are independent of one another. So, you might
design a relation that looks like:

But that implies a relationship (connection) between 999-1212 and George. To avoid that
appearance, you would have to store all combinations of phone# and employee.

Two 1:N relations (or multivalued attributes), A:B, A:C, where B and C are independent of each other.

A->>B A determines a set of (Multiple) values B


A->>C A determines a set of (Multiple) values C

The only time a multi-valued dependency (mvd) is a problem is when you have more than one mvd, and the
B and C values are independent.

A trivial mvd is one where:


1. The B attribute(s) are a subset of the A attributes. That is, if you made them distinct from each other,
there would no longer be an mvd. E.g., ABC->>B.

2. The union of A and B make up the entire relation – there are no other attributes in the table.

Otherwise, you have a nontrivial mvd, and these are the potential problems.

There are lots of redundancies allow room for anomalies. Note that the relations with non-trivial mvd’s tend
to be all-key relations, where the key is the entire relation.

The cure: 4NF

A relation is in 4NF if for every nontrivial mvd A->>B, C is a superkey (any combination of
attributes) that is a unique ID in the relation (non-minimal) for the relation. The manager table used as an
example above is not in 4NF. mgr, phone# is a nontrivial mvd because phone# is not a subset of mgr, and
there is also employee. Similarly, mgr, employee is a nontrivial mvd. As usual, the way to get a relation into
4NF is to decompose it, to get the mvds into separate relations:

Which are now trivial mvds, making up the entire table? This decomposition will have the lossless
join property.

The Overall Idea

Remember that the goal here is to get a good design. Starting from an ER diagram is one way,
although you still mgiht want to check normalization of tables. But starting with a bunch of tables and then
normalizing them (or starting with one enormous table) is another approach. We have been talking about
normalization as something that you do regarding just one table in the database. It is also important to look
at your DB design in terms of how the tables relate to each other, and how you can combine them. Merely
having a bunch of tables in 3NF or BCNF is not enough.

Some definitions:

In a database design, we have a decomposition D of the universal relation R. This is the way that
all of the attributes have been decomposed into tables. There is a set of functional dependencies F that hold
over the attributes of R; this depends on the semantics of the DB and how things work in the world it
models.
1. Dependency Preserving Decomposition:

In decomposition, it is possible to lose a functional dependency – this is undesirable, so a good


decomposition will preserve dependencies. There are two ways of storing functional dependencies: they can
be in the same table, or they can be inferred from different tables.

2. Lossless (Additive) Joins:

Another important feature of a good decomposition is that it gives lossless joins. This is the problem
of spurious tuples. The term “lossless” refers to the problem of losing information – the way that you lose
information here is by getting noise (spurious tuples) into your table.

Properties of lossless join decomposition:

1. For 2-relation DB schemas: the attributes in both relations must functionally determine either those
attributes that appear in only the first relation, or those that appear in only the second relation.

2. Once you have established decomposition with the lossless join property, you can further decompose
one of its tables without losing the property. So, to decompose and maintain lossless joins:

For each table in the DB that isn’t in BCNF, find the functional dependency that is in
violation (that is, contains a determinant that is not part of a candidate key), and break the relation
into two. One relation contains the A and B (AB) attributes from the functional dependency A
->>BC. The other contains the rest of the attributes (i.e A C).

You can’t always perform the “ideal” decomposition that is in BCNF and preserves dependencies.
You may only be able to get to 3NF. You must then decide whether to leave it there, and build in protection
for update anomalies, or to decompose even further, with the resulting loss of performance.

In terms of design, remember that it isn’t a good idea to design a table that will get too many nulls. It
is better to break it up into another table. However, this could also result in the problem of dangling tuples.
The representation of a “thing” is broken up into 2 tables. To get the full information on the “thing”, you
join the tables together. However, if some tuples have either null value on a join attribute, or don’t appear at
all in one table, they won’t appear in the result, unless you know in advance that you should do an outer
join.
6. Fifth Normal Form (5NF) and Join Dependencies :

5NF, also known as Project-join normal form (PJ/NF) is based on the idea of a lossless JOIN or
the lack of a join-projection anomaly. This problem occurs when you have an n-way relationship, where
n>2. A quick check for 5NF is to see if the table is in 3NF and all the candidate keys are single columns.

Join dependency

A join dependency (JD), denoted by JD(R1, R2, ..., Rn), specified on relation schema R, specifies a
constraint on the states r of R. The constraint states that every legal state r of R should have a non-additive
join decomposition into R1, R2, ..., Rn; that is, for every such r we have

Note: an MVD is a special case of a JD where n = 2.

• Lossless-join property refers to when we decompose a relation into two relations - we can rejoin
the resulting relations to produce the original relation.

• Consider a table of supply with no MVD is in 4NF but not in 5NF if it has JD(R1,R2,R3).

• The SUPPLY relation with the join dependency is decomposed into three relations R1,R2, and $3
that are each in 5NF.

• Applying a natural join to any two of these relations produces spurious tuples, but applying a natural
join to all three together does not.
Basic Definitions in Normalization

What are update Anomalies?

The Problems resulting from data redundancy in an un-normalized database table are collectively
known as update anomalies. So any database insertion, deletion or modification that leaves the database in
an inconsistent state is said to have caused an update anomaly. They are classified as
 Insertion anomalies
 Deletion anomalies
 Modification anomalies

Tbl_Staff_Branch

Tbl_Staff

Tbl_Branch

• Insertion anomalies: To insert the details of a new member of staff located at branch B1 into
the Tbl_Staff_Branch Table shown above, we must enter the correct details of branch number B1 so
that the branch details are consistent with the values for branch B1 in other rows.
To insert the details of a new branch that currently has no members of staff into
the Tbl_Staff_Branch table, it is necessary to enter nulls for the staff details which is not allowed
as staffID is the primary key. But if you normalize Tbl_Staff_Branch, which is in Second Normal Form
(2NF) to Third Normal Dorm (3NF), you end up with Tbl_Staff and Tbl_Branch and you shouldn't
have the problems mentioned above.

• Deletion anomalies: If we delete a row from the Tbl_Staff_Branch table that represents the last
member of staff located at that branch, (for e.g. row with Branch numbers B",B3 or B4) the detals about
that branch are also lost from the Database.

• Modification anomalies: Should we need to change the address of a perticular branch in


the Tbl_Staff_Branch table, we must update the rows of all staff located at that branch. If this
modification is not carried out on all the relevent rows, the database will become inconsistent.

What’s a spurious tuple?

A spurious tuple is, basically, a record in a database that gets created when two tables are joined
badly. In database-ese, spurious tuples are created when two tables are joined on attributes that are neither
primary keys nor foreign keys.

What is Functional Dependency? what are the different types of Functional Dependencies?

Functional Dependency is a constraint between two sets of attributes from the database.
Functional Dependency describes the relationship between attributes (columns) in a table. They are
fundamental to the process of Normalization.
For example, if A and B are attributes of a table, B is functionally dependent on A, if each value
of A is associated with exactly one value of B (so, you can say, 'A functionally determines B').

Functional dependency between A and B


• Attribute or group of attributes on the left hand side of the arrow of a functional dependency is
referred to as 'determinant'.
• Attribute or group of attributes on the right hand side of the arrow of a functional dependency is
referred to as 'dependent'.
Simple example would be StaffID functionally determines Position in the above tables.

Functional Dependency can be classified as follows:

• Full Functional dependency Indicates that if A and B are attributes(columns)of a table, B is fully
functionally dependent on A if B is functionally dependent on A ,but not on any proper subset of A.
E.g. StaffID---->BranchID

• Partial Functional Dependency Indicates that if A and B are attributes of a table , B is partially
dependent on A if there is some attribute that can be removed from A and yet the dependency still
holds.
Say for Ex, consider the following functional dependency that exists in the Tbl_Staff table:
StaffID,Name -------> BranchID
BranchID is functionally dependent on a subset of A (StaffID,Name), namely StaffID.

• Transitive Functional Dependency: A condition where A , B and C are attributes of a table such
that if A is functionally dependent on B and B is functionally dependent on C then C is Transitively
dependent on A via B.
Say for Ex, consider the following functional dependencies that exists in
the Tbl_Staff_Branch table:
StaffID---->Name,Sex,Position,Sal,BranchID,Br_Address
BranchID----->Br_Address
So, StaffID attribute functionally determines Br_Address via BranchID attribute.

What is closure of a set of FDs?


If F is a set of FDs on a relation schema R, then F+, the closure of F, is the smallest set of FDs such that
F+ F and no FD can be derived from F by using the inference axioms that are not contained in F+ . if R is
not specified, it is assumed to contain all the attributes that appear in F.

Anda mungkin juga menyukai