Anda di halaman 1dari 8

How should data items be organized to form relations

that describe entities and relationships between


Conceptual Database Design entities?

Normalization ƒ Normalization is a foundation for relational database design

ƒ It involves removing redundant data from relational tables by


decomposing a relational table into smaller tables by projection.
GIS Applications
Objectives
Spring 2007
ƒ Minimize unnecessary redundancies

ƒ Minimize unwanted side effects of database updates


Inadvertent deletions or insertion errors

Normalization Functional Dependence


ƒ Normalization theory is based on the concepts of normal forms. ƒ The concept of functional dependence is the basis for the
first four normal forms.
ƒ A relational table is said to be in a particular normal
form if it satisfies a certain set of constraints. ƒ Given a relation R, a column Y, of R is said to be
functionally dependent upon column X of R if and only if
ƒ There are currently five normal forms that have been defined. each value of X in R is associated with precisely one value of
Y at any given time
ƒ The first three normal forms defined by E. F. Codd are
typically required of all tables in a relational database ƒ Saying that column Y is functionally dependent upon X is the
same as saying the values of column X identify the values of
ƒ The fourth and fifth normal forms are relatively rare . column Y.

ƒ Based on the analysis of functional dependencies among ƒ Such a functional dependency is denoted as XÆY
attributes.

1
Candidate and Primary Keys Primary Keys
Every relation (entity) must have a primary key
Superkey – a set of one or more attributes that uniquely
identifies a specific instance of an entity To qualify as a primary key, an attribute must have the following
properties:
Candidate key – any subset of the attributes of a superkey
• it must have a non-null value for each instance of the entity
that is also a superkey and not reducible to another superkey
• the value must be unique for each instance of an entity
Primary key – a selection from the set of candidate keys - • the values must not change or become null during the life of
used to index a relation each entity instance

Student # Student Name Major


38214 Bright IS
69173 Smith PM

Full Functional Dependence


Composite Keys
Sometimes it requires more than one attribute to uniquely identify ƒ Applies to tables with composite keys
an entity. A primary key made up of more than one attribute is
known as a composite key. ƒ Column Y in relational table R is fully functionally on X of
R if it is functionally dependent on X and not functionally
dependent upon any subset of X.

Student # Student Name Major ƒ Full functional dependence means that when a primary key is
38214 Bright IS composite, then the other columns must be identified by the
38214 Bright EE entire key and not just some of the columns that make up the key.
69173 Smith PM

2
Foreign Keys Steps in Normalization
ƒ A foreign key is an attribute that completes a relationship by
identifying the parent entity. ƒ Assemble data items from user views

ƒ Foreign keys provide a method for maintaining integrity in ƒ Convert to un-normalized relations
the data (called referential integrity) and for navigating
between different instances of an entity. ƒ Convert to first normal form (1NF)

ƒ Every relationship in the model must be supported by a foreign key. ƒ Convert to second normal form (2NF)

ƒ Foreign keys are formed in dependent entities by migrating the ƒ Convert to third normal form (3NF)
entire primary key from the parent entity. If the primary key is
composite, it may not be split. Should result in simple relations that correspond to
entities or associations between entity classes

Un-Normalized Relations Un-Normalized Relations


Un-normalized relations contain one or more repeating groups –
one-to-many relationship
multiple values at the intersection of rows and columns
one-to-one relationship
Student Student Major Course Course title Instructor Instructor Grade
# Name # name Location
one-to-one relationship
38214 Bright IS IS 350 Databases Codd B104 A
IS 465 System Kemp B213 C Student
Student# Major Course#
Analysis Name
69173 Jones PM IS 465 System Kemp B213 A
Analysis
PM 300 Prod Mang Lewis D317 B Need to address the one-to-many relationships
QM 440 Op Res Kemp B213 C

Contains redundant information e.g IS 465 appears in more than one


row

3
Normalized relations: First First Normal Form
Remove repeating groups and form 2 new relations – migrate the
Normal Form primary key, and assure there is a valid new primary key

ƒ A relation is in first normal form if the underlying Student # Student Name Major
domains contain only atomic values 38214 Bright IS
69173 Smith PM
ƒ There are no repeating groups within a tuple
Student # Course # Course Instructor Instructor Grade
ƒ Most relational systems require a database to be in 1NF Title name Location
38214 IS 350 Database Codd B104 A
38214 IS 465 Sys Anal Kemp B213 C
69173 IS 465 Sys Anal Kemp B213 A
69173 PM 300 Op Res Lewis D317 B

Identification of Primary Key Insert anomaly


Insertion of a new course cannot occur until a student has
Student # Student Name Major registered for the course since Student # is part of the
38214 Bright IS composite key
69173 Smith PM

Student # Course # Course Instructor Instructor Grade


Student # Course # Course Instructor Instructor Grade Title name Location
Title name Location
38214 IS 350 Database Codd B104 A
38214 IS 350 Database Codd B104 A
38214 IS 465 Sys Anal Kemp B213 C
38214 IS 465 Sys Anal Kemp B213 C
69173 IS 465 Sys Anal Kemp B213 A
69173 IS 465 Sys Anal Kemp B213 A
69173 PM 300 Op Res Lewis D317 B
69173 PM 300 Op Res Lewis D317 B

4
Update anomaly Deletion anomaly
Changing a course title or course number requires Dropping a single student from a course requires dropping the
searching all tuples to find every occurrence of a course course and losing the associated course and instructor
number or title information

Student # Course # Course Instructor Instructor Grade Student # Course # Course Instructor Instructor Grade
Title name Location Title name Location
38214 IS 350 Database Codd B104 A 38214 IS 350 Database Codd B104 A
38214 IS 465 Sys Anal Kemp B213 C 38214 IS 465 Sys Anal Kemp B213 C
69173 IS 465 Sys Anal Kemp B213 A 69173 IS 465 Sys Anal Kemp B213 A
69173 PM 300 Op Res Lewis D317 B 69173 PM 300 Op Res Lewis D317 B

Functional Dependencies Second Normal Form


Student # Course # Course Instructor Instructor Grade A relation is in second normal form if it is in 1NF and every
Title name Location
non-key attribute is fully dependent on the primary key
38214 IS 350 Database Codd B104 A
38214 IS 465 Sys Anal Kemp B213 C
Student # Course # Course Instructor Instructor Grade
69173 IS 465 Sys Anal Kemp B213 A
Title name Location
69173 PM 300 Op Res Lewis D317 B 38214 IS 350 Database Codd B104 A
38214 IS 465 Sys Anal Kemp B213 C
Course # ÆCourse Title
69173 IS 465 Sys Anal Kemp B213 A
Course # ÆInstructor Name 69173 PM 300 Op Res Lewis D317 B

Course # ÆInstructor Location


Course Title, Instructor Name and Instructor Location are
Student #, Course # Æ Grade partially dependent (only on Course#) on the primary key

5
Second Normal Form Transitive dependencies
To convert from first to second normal form – remove partial dependencies
Course # Course Instructor Instructor
Create 2 new relations, one with attributes fully dependent on Title Name Location
primary key, other with attributes only partially dependent IS 350 Database Codd B104
A non-key
Student # Course # Grade IS 465 Sys Anal Kemp B213
Courses are independent of attribute is
38214 IS 350 A PM 300 Prod man Lewis D317 dependent on
Student # and so can be inserted
38214 IS 465 C or deleted independently, only a QM 440 Op Res Kemp B213 one or more no-
69173 IS 465 B single tuple needs to be updated key attributes
69173 PM 300 C in the course relation
one-to-one relationship
Course # Course Instructor Instructor
Title Name Location one-to-one relationship
IS 350 Database Codd B104
IS 465 Sys Anal Kemp B213 Instructor Instructor
Course# Course Title Name
PM 300 Prod man Lewis D317 Location
QM 440 Op Res Kemp B213

Insertion anomaly Update anomaly


Since instructor is dependent on Course # as primary Course # Course Instructor Instructor
key no information about an instructor can be added Title Name Location
until an instructor has been assigned to a course IS 350 Database Codd B104
IS 465 Sys Anal Kemp B213
PM 300 Prod man Lewis D317
Delete anomaly QM 440 Op Res Kemp B213

Deleting data for a course results in deleting instructor To update instructor information the entire relation must be
information searched since instructor information occurs more than once.

6
Third Normal Form Boyce-Codd Normal Form
ƒ A relation is in third normal form if it is in 2NF and Occurs in the case of overlapping candidate keys
contains no transitive dependencies
ƒ Each student can major in several subjects
ƒ Every non-key attribute is fully dependent on the primary
key and there are no transitive dependencies ƒ For each major a student has one advisor
ƒ Each major has several advisors
Instructor Name Instructor Location Non-key attributes that
Codd B104 participate in the ƒ Each advisor advises only one major
Kemp B213 transitive dependency
form a new relations There are 2 possible candidate keys:
Lewis D317 Student # Major Advisor
Student #-Major or Major –Advisor
123 Physics Einstein
Course # Course title Instructor Name and they are overlapping.
Foreign key – a non- 123 Music Mozart
IS 350 Database Codd
IS 465 Sys Anal Kemp key attribute in one 456 Biol Darwin Attributes that are part of a
relation that serves 789 Physics Bohr candidate key are dependent on
PM 300 Prod Mang Lewis as a primary key in
999 Physics Einstein part of another candidate key.
QM 440 OP Res Kemp another relation

Boyce-Codd Normal Form Fourth Normal Form


A relation is in Boyce-Codd normal form if it is in 3NF Removes multi-valued dependencies
and there are no dependencies in candidate keys
Multi-valued dependency – when 3 attributes (A, B, C)
exist in a relation and for each value of A there is a well
defined set of values for B and a well defined set of values
for C, yet B and C are independent of each other
Student # Major Advisor Need to project into 2 new relations
123 Physics Einstein
Student # Advisor Advisor Major
123 Music Mozart
123 Einstein Einstein Physics
456 Biol Darwin
123 Mozart Mozart Music Computer Package Outlet
789 Physics Bohr
456 Darwin Darwin Biol
999 Physics Einstein
789 Bohr Bohr Physics
999 Einstein

7
Fourth Normal Form Normalization Summary
Computer Package Outlet Several
redundancies Leads to simpler (to implement) applications and to more
Apple Visicalc Computerland
exist in the maintainable systems
Apple Applestar Computerland
relation Based on a set of rules that define normal forms – of which first
Apple Visicalc Byte Shop
three are most important:
Zenith Wordstar Computershop Can generate
Zenith Supercalc Computershop
deletion and
update anomalies First normal form: All column values are atomic
Zenith Wordstar Byte Shop
Second normal form: All column values depend on the
Computer Package Computer Outlet
Project to 2 value of the primary key: no partial dependencies
Apple Visicalc Apple Computerland new Third normal form: No column value depends on the value
Apple Applestar Apple Byte Shop relations of any other column except the primary key – no
Zenith Wordstar Zenith Computershop transitive dependencies
Zenith Supercalc Zenith Byte Shop

Limits of Normalization Limits of Normalization


ƒ Normalization rules are guidelines ƒ There can be a number of cases where there is a compelling
need for non first normal form structures.
ƒ In certain circumstances 3NF or higher may not be desirable

Customer (Name, Street, City, State, Zipcode) ƒ Spatial data objects is one of them

Does not meet 3NF


ƒ Object-relational model supports ability to implement non-
Customer(Name, Street, Zipcode) first normal structures
Location(Zipcode City, State) arrays
3NF may not be efficient in terms of regular queries nested tables

Need to apply judgment and common sense

Anda mungkin juga menyukai