Anda di halaman 1dari 67

DEPARTMENT - INFORMATION

TECHNOLOGY
​ COMPUTER NETWORKING LAB PROJECT
PROJECT NAME - PACKET TRACER: - ACCESS CONTROL LIST DEMONSTRATION

​ ​SUBHAMROY ( 91 )
SUDIP MAJUMDER ( 88 )
Paper Name: Database Management System Paper Code: IT-601

1. MAKAUT Syllabus

Paper name: Database Management System


Code: IT601
Contacts: 3L
Credits: 3

Pre-requisites: CS302 (Data Structure & Algorithm) , M101 & M201 (Mathematics),
IT401(Object Oriented Programming & UML)

Detailed Syllabus:

Introduction [4L]
Concept & Overview of DBMS, Data Models, Database Languages, Database Administrator,
Database Users, Three Schema architecture of DBMS.
Entity-Relationship Model [6L]
Basic concepts, Design Issues, Mapping Constraints, Keys, Entity-Relationship Diagram,
Weak Entity Sets, Extended E-R features.
Relational Model [5L]
Structure of relational Databases, Relational Algebra, ​Relational Calculus,​ Extended
Relational Algebra Operations, Views, Modifications of the Database.
SQL and Integrity Constraints [8L]
Concept of DDL, DML, DCL. Basic Structure, Set operations, Aggregate Functions, Null
Values, Domain Constraints, Referential Integrity Constraints, assertions, views, Nested
Subqueries, Database security application development using SQL, Stored procedures and
triggers.
Relational Database Design [9L]
Functional Dependency, Different anomalies in designing a Database., Normalization using
functional dependencies, Decomposition, Boyce-Codd Normal Form, 3NF, Normalization
using multi-valued dependencies, 4NF, 5NF
Internals of RDBMS [7L]
Physical data structures, Query optimization: join algorithm, statistics and cost based
optimization. Transaction processing, Concurrency control and Recovery Management:
transaction model properties, state serializability, lock base protocols, two phase locking.
File Organization & Index Structures [6L]
File & Record Concept, Placing file records on Disk, Fixed and Variable sized Records,
Types of Single-Level Index (primary, secondary, and clustering), Multilevel Indexes,
Dynamic Multilevel Indexes using B tree and B+ tree .
2. Recommended Books:
a. Henry F. Korth and Silberschatz Abraham, “Database System Concepts”,
Mc.Graw Hill. .
b. Date C. J., “Introduction to Database Management”, Vol. I, II, III,
Addison Wesley.
c. Ullman JD., “Principles of Database Systems”, Galgottia Publication.

3. Course Outcomes:
IT601.1 Develop a good description of the data, its relationships and constraints
IT601.2 Use Functional Dependencies to express relational schema in a well
normalized form.
IT601.3 It helps you to maintain the quality of data in the database
IT601.4 It helps you to identify bad designs

4. Day wise Lesson Plan with book reference: Times New Roman 12(Format given
below). Note that video link in the lesson plan is optional.

Sl. Day Module Topic Video Recommended books


No Links for the topic
(Optional
)
1. 1 5 Introduction to DBMS Henry F. Korth and
Silberschatz
Abraham, “Database
System Concepts”,
Mc.Graw Hill
2. 2 5 Functional Dependency Henry F. Korth and
Silberschatz
Abraham, “Database
System Concepts”,
Mc.Graw Hill
3. 3 5 Different anomalies in Henry F. Korth and
designing a Database Silberschatz
Abraham, “Database
System Concepts”,
Mc.Graw Hill
4. 4 5 Normalization using Henry F. Korth and
functional dependencies, Silberschatz
Abraham, “Database
System Concepts”,
Mc.Graw Hill
5. 5 5 Decomposition Henry F. Korth and
Silberschatz
Abraham, “Database
System Concepts”,
Mc.Graw Hill
6. 6 5 Boyce-Codd Normal Henry F. Korth and
Form, Silberschatz
Abraham, “Database
System Concepts”,
Mc.Graw Hill
7. 7 5 3NF Henry F. Korth and
Silberschatz
Abraham, “Database
System Concepts”,
Mc.Graw Hill
8. 8 5 Normalization using Henry F. Korth and
multi-valued Silberschatz
dependencies, Abraham, “Database
System Concepts”,
Mc.Graw Hill
9. 9 5 4NF, 5NF Henry F. Korth and
Silberschatz
Abraham, “Database
System Concepts”,
Mc.Graw Hill
10. 10 6 Physical data structures Henry F. Korth and
Silberschatz
Abraham, “Database
System Concepts”,
Mc.Graw Hill
11. 11 Query optimization: join Henry F. Korth and
algorithm, statistics and Silberschatz
Abraham, “Database
System Concepts”,
Mc.Graw Hill
12. 12 6 cost bas optimization Henry F. Korth and
Silberschatz
Abraham, “Database
System Concepts”,
Mc.Graw Hill
13. 13 6 Relational Calculus Henry F. Korth and
Silberschatz
Abraham, “Database
System Concepts”,
Mc.Graw Hill
14. 14 3 Practice of RC Henry F. Korth and
Silberschatz
Abraham, “Database
System Concepts”,
Mc.Graw Hill
15. 15 3 File & Record Concept, Henry F. Korth and
Placing file records on Silberschatz
Disk, Fixed and Abraham, “Database
Variable sized Records System Concepts”,
Mc.Graw Hill
16. 16 7 Types of Single-Level Henry F. Korth and
Index (primary, Silberschatz
secondary, and Abraham, “Database
clustering)
System Concepts”,
Mc.Graw Hill
17. 17 7 Multilevel Indexes, Henry F. Korth and
Silberschatz
Abraham, “Database
System Concepts”,
Mc.Graw Hill
18. 18 7 Dynamic Multilevel Henry F. Korth and
Indexes using B tree and Silberschatz
B+ tree Abraham, “Database
System Concepts”,
Mc.Graw Hill

5. Course Information
PROGRAMME: IT DEGREE: ​BTech

COURSE: ​Database Management System SEMESTER: ​6 CREDITS: ​3

COURSECODE: IT601 COURSE TYPE: ​Theory

CORRESPONDING LAB COURSE


CONTACT HOURS:
CODE (IF ANY): ​IT691
DAY 1
Course: Database Management System IT601

Relevant MAKAUT syllabus portion: Functional Dependency, Different anomalies in


designing a Database. Normalization using
functional dependencies, Decomposition, Boyce-
Codd Normal Form, 3NF, Normalization using
multi-valued dependencies, 4NF, 5NF
Course Outcomes: Basic Knowledge about DBMS

Lecture 1 (1 hr)

Topics Covered: Introduction to Database and DBMS

Prerequisites​: Have you Read

● Set theory
● Entity Relationship
● Data Structure

Objectives:​ Impart basic knowledge about basics of Database and DBMS

Notes:
Database: A collection of logically-related information stored in a consistent fashion.

E.g. : Phone book, Bank records (checking statements, etc)

The storage format typically appears to users as some kind of tabular list (table, spreadsheet)

Jobs of Database:

✔ Stores information in a highly organized manner


✔ Manipulates information in various ways, some of which are not available in other
applications or are easier to accomplish with a database
✔ Models some real world process or activity through electronic means
o Often called modeling a business process
o Often replicates the process only in appearance or end result

DBMS: A database-management system (DBMS) is a collection of interrelated data and a set of


programs to access those data.
Paper Name: Database Management System Paper Code: IT-601

Relational Database:

Relational database was proposed by Edgar Codd (of IBM Research) around 1969.

✔ A relational database organizes data in tables (or relations).


✔ A table is made up of rows and columns.
✔ A row is also called a record (or tuple).
✔ A column is also called a field (or attribute).
✔ A database table is similar to a spreadsheet.
✔ However, the relationships that can be created among the tables enable a relational
database to efficiently store huge amount of data, and effectively retrieve selected data.

Primary Key

✔ In the relational model, a table cannot contain duplicate rows, because that would create
ambiguities in retrieval.
✔ To ensure uniqueness, each table should have a column (or a set of columns), called primary
key that uniquely identifies every records of the table.
✔ A primary key is called a simple key if it is a single column; it is called a composite key if it
is made up of several columns.

The most important logical criteria in data base design are reduction/elimination of
redundancy and maintenance of database consistency.

Normal Relations: The relations that store each fact (tuple) only once in the database and that
remain consistent following database operations (updates, insertions and deletions).

Normalization: Process of decomposing unsatisfactory "bad" relations by breaking up their


attributes into smaller relations.

First Normal Form (1NF)


✔ A table is in first normal form if there are no repeating groups.
✔ Repeating Groups : a set of logically related fields or values that occur multiple times in
one record
1. non-atomic value, or multiple values, stored in a field
2. multiple fields in the same table that hold logically similar values

Let’s learn by doing:

1. What is the difference between a database and a table?


Paper Name: Database Management System Paper Code: IT-601

2. State significant difference between file system and DBMS.

3. What are the basic components of DBMS?

4.  Why is the following table NOT in first normal form (1NF)?

5. How can the following table be changed to first normal form?


Paper Name: Database Management System Paper Code: IT-601

DAY 2
Course: Database Management System IT601

Relevant MAKAUT syllabus portion: Functional Dependency, Different anomalies in


designing a Database. Normalization using
functional dependencies, Decomposition, Boyce-
Codd Normal Form, 3NF, Normalization using
multi-valued dependencies, 4NF, 5NF
Course Outcomes: Functional Dependency

Lecture 1 (1 hr)

Topics Covered: ​Functional Dependency

Prerequisites​: Have you Read

● Set theory

● Entity Relationship

● Data Structure

Objectives:​ Impart basic knowledge about Functional Dependency

Notes:

Functional Dependency

A set of attributes Y is functionally dependent on a set of attributes X if a given set of values for
each attribute in X determines unique values for the set of attributes in Y.
We use the notation: X →Y to denote that Y is functionally dependent on X. The set of attributes
X is known as the determinant of the FD, X →Y.

Rules of Functional Dependency:

The Splitting/Combining rule of FDs

Attributes on right independent of each other

​ ​d,e,f
–Consider ​a,b,c →
– “Attributes ​a, b​, and ​c ​functionally determine ​d, e​, and ​f​”
⇨​ No mention of d relating to e or f directly
Splitting rule (Useful to split up right side of FD)
Paper Name: Database Management System Paper Code: IT-601

– abc ​→ d​ ef ​becomes ​abc →


​ ​d , abc →
​ ​e ​and ​abc ​→ f​

No safe way to split left side

– abc ​→ d​ ef ​is ​NOT ​the same as ​ab→


​ ​def a​ nd

c​→​def ​! Combining rule (Useful to combine right

sides):

– ​ d​ , abc​→e​ , abc→
if ​abc→ ​ ​f ​holds, then ​abc​→d​ ef ​holds

Trivial FDs
Not all functional dependencies are useful
– A→
​ ​ ​always holds
A
– abc ​→ ​a ​also always holds (right side is subset of left side)

FD with an attribute on both sides is “trivial”


– Simplify by removing L ∩ R from R ​abc → ​ a​ d b​ ecomes a​ bc ​→​d
–Or, in singleton form, delete trivial FDs ​abc ​→​a ​and ​abc → ​ d​ b​ ecomes just ​abc ​→​d

Transitive rule
• The transitive rule holds for FDs
– ​ ​b a​ nd ​b ​→ ​c;​ then ​a→
Consider the FDs: ​a → ​ ​c ​holds
– ​ ​b a​ nd ​b ​→​cd;​ then ​ad→
Consider the FDs: ​ad → ​ ​cd ​holds or just ​ad​→​c (​ because of the
trivial dependency rule)

Cyclic functional dependencies:


• Attributes on right side of one FD may appear on left side of another!
– Simple example: assume relation (A, B) & FDs: A ​-​> B, B ​-​> A
– What does this say about A and B?
• Example
– studentID ​-​> email email ​-​> studentID

Geometric view of FDs


▪ Let D be the domain of tuples in R
♣​ Every possible tuple is a point in D
▪ FD X on R restricts tuples in R to a subset of D
♣ Points in D which violate X cannot be in R
▪ Example: D(x,y,z)
♣​ xy -​> z
▪ z= abs(x) + abs(y)
♣​ z -​> x,y
♣ x=y=abs(z)/2
Paper Name: Database Management System Paper Code: IT-601

Let’s learn by doing:

1. List all functional dependencies satisfied by the relation of following table

2. Use the definition of functional dependency to argue that each of Armstrong’s axioms
(reflexivity, augmentation, and transitivity) is sound.

3. Consider the following proposed rule for functional dependencies: If α → β and γ → β, then
α → γ. Prove that this rule is not sound by showing a relation r that satisfies α → β and γ →
β, but does not satisfy α → γ.
Paper Name: Database Management System Paper Code: IT-601

DAY 3

Course: Database Management System IT601

Relevant MAKAUT syllabus portion: Functional Dependency, Different anomalies in


designing a Database. Normalization using
functional dependencies, Decomposition, Boyce-
Codd Normal Form, 3NF, Normalization using
multi-valued dependencies, 4NF, 5NF
Course Outcomes: Functional Dependency

Lecture 1 (1 hr)

Topics Covered: ​Different anomalies in designing a Database


Prerequisites​: Have you Read

● Set theory
● Entity Relationship
● Data Structure

Objectives:​ Impart basic knowledge about Tables and database.

Notes:
Anomalies​:
♣​ ​Insertion anomaly: ​occurs when a row cannot be added to a relation, because not all data
are available (or one has to invent “dummy” data)
♣​ ​Deletion anomaly: ​occurs when data is deleted from a relation, and other critical data are
unintentionally lost
♣​ ​Update anomaly: ​occurs when one must make many changes to reflect the modification of a
single datum

Anomalies are primarily caused by:


▪ data redundancy: replication of the same field in multiple tables, other than foreign keys
▪ Functional dependencies whose determinants are not candidate keys, including
● partial dependency
● transitive dependency
Paper Name: Database Management System Paper Code: IT-601

Let’s learn by doing:

Find the three types of anomaly in the following table:

StudentNum CourseNum Student Address Course


Name
S21 9201 Malti Etawa Accounts
S21 9267 Malti Etawa Accounts
S24 9267 Smitha Golsi physics
S30 9201 Rabi Malancha Computing
S30 9322 Rabi Malancha Maths
DAY 4

Course: Database Management System IT601

Relevant MAKAUT syllabus portion: Functional Dependency, Different anomalies in


designing a Database. Normalization using
functional dependencies, Decomposition, Boyce-
Codd Normal Form, 3NF, Normalization using
multi-valued dependencies, 4NF, 5NF
Course Outcomes: Normalization

Lecture 1 (1 hr)

Topics Covered: ​. Normalization using functional dependencies


Prerequisites​: Have you Read

● Set theory

● Entity Relationship

● Data Structure

Objectives:​ Impart basic knowledge about Functional Dependency

Notes:
Removing FDs 
 
❖ Suppose we have a relation R with scheme S and the FD A A ∩B wBh=er{e}
❖ Let C = S – (A U B)
❖ In other words:
♣​ A – attributes on the left hand side of the FD
♣​ B – attributes on the right hand side of the FD
♣​ C – all other attributes
❖ It turns out that we can split R into two parts:
♣​ R1, with scheme C U A
♣​ R2, with scheme A U B
❖ The original relation can be recovered as the natural join of R1 and R2:
❖ R = R1 NATURAL JOIN R2
Problems Resolved in 2NF
♣​ ​Problems in 1NF
♣​ ​INSERT – Can't add a module with no texts
♣​ ​UPDATE – To change lecturer for M1, we have to change two rows
♣​ ​DELETE – If we remove M3, we remove L2 as well
♣​ ​In 2NF the first two are resolved

Closure of a Set of FDs


♣ The set of all FDs logically implied by F is called the closure of F.
♣ The closure of F is denoted by F +​​ .
♣ Given a set F , we can find all FDs in F + by applying Armstrong’s Axioms

Let’s learn by doing:

1. Compute the closure of the following set F of functional dependencies for relation
schema R = (A, B, C, D, E).
A →BC, CD →E, B→ D, E→ A. List the candidate keys for R.

2. Give an example with explanation where a database is in 1NF and not in 2NF.
DAY 5

Course: Database Management System IT601

Relevant MAKAUT syllabus portion: Functional Dependency, Different anomalies in


designing a Database. Normalization using
functional dependencies, Decomposition, Boyce-
Codd Normal Form, 3NF, Normalization using
multi-valued dependencies, 4NF, 5NF
Course Outcomes: Learn Normalization in Relational database

Lecture 1 (1 hr)

Topics Covered: ​Decomposition

Prerequisites​: Have you Read

● Set theory

● Entity Relationship

● Data Structure

Objectives:​ Impart basic knowledge about Functional Dependency

Notes:
Decomposition
Redundancy causes problems: Solution => decompose schema so that each information content
is represented only once
Definition: Let R be a relation scheme {R1, ..., Rn} is a decomposition of R if R = R1​∪ ​... ​∪
Rn (i.e., all of R’s attributes are represented)
We will deal mostly with binary decomposition:
R into {R1, R2} where R = R1 ​∪ ​R2
Eg: student(s_id, name, dept_id, dept_head, dept_phone, grade)
⇨ student(s_id, name, dept, grade) dept(dept_id, dept_head, grade)
✔ Lossless: Data should not be lost or created when splitting relations up
✔ Dependency preservation: It is desirable that FDs are preserved when splitting relations up
Paper Name: Database Management System Paper Code: IT-601

Let’s learn by doing:

1. Suppose that we decompose the schema R = (A, B, C, D, E) into (A, B, C) & (A, D, E).
Show that this decomposition is a lossless-join decomposition if the following set F of
functional dependencies holds:
A → BC, CD → E, B → D, E → A

2. Consider the relation R ( A, B, C, D, E ) with the set of F = { A → C, B → C, C → D, DC → C,


CE → A }. Suppose the relation has been decomposed by the relations R1 ( A, D ) R2 ( A, B )
R3 ( B, E ) R4 ( C, D, E ), R5 ( A, E ). Is this decomposition lossy or lossless?
Paper Name: Database Management System Paper Code: IT-601

DAY 6

Course: Database Management System IT601

Relevant MAKAUT syllabus portion: Functional Dependency, Different anomalies in


designing a Database. Normalization using
functional dependencies, Decomposition, Boyce-
Codd Normal Form, 3NF, Normalization using
multi-valued dependencies, 4NF, 5NF
Course Outcomes: Learn Normalization in Relational database

Lecture 1 (1 hr)

Topics Covered: ​Boyce-Codd Normal Form


Prerequisites​: Have you Read

● Set theory
● Entity Relationship
● Data Structure

Objectives:​ Impart basic knowledge about Functional Dependency

Notes:
Boyce-Codd Normal Form

A relation schema R is in BCNF with respect to a set F of functional dependencies if for all
functional dependencies in F+ of the form α → β, where α ​⊆ ​R and β ​⊆ ​R, at least one of
the following holds:

♣​ α →β is trivial (i.e., β ​⊆ ​α)


♣​ α is a superkey for R

Example

R = (A, B, C), F = {A → B ; B → C}, Key = {A}

♣​ R is not in BCNF
♣​ Decompose into R1 = (A, B), R2 = (B, C)
❖ R1 and R2 in BCNF
❖ Lossless-join decomposition
❖ Dependency preserving
Paper Name: Database Management System Paper Code: IT-601

Let’s learn by doing:


1. Give a lossless-join decomposition into BCNF of schema R= (A, B, C, D, E) with FD A →
BC, CD → E, B → D, E → A.

2. Consider the following collection of relations and dependencies. For each relation, please
(a) determine the candidate keys, and (b) if a relation is not in BCNF then
decompose it into a collection of BCNF relations.
a. R​1​(A,C,B,D,E), A → B, C→ D

b. R​2​(A,B,F), AB→ F, B→ F

c. R(A,B,C,D,E) with functional dependencies D → B, CE → A


Paper Name: Database Management System Paper Code: IT-601

DAY 7

Course: Database Management System IT601

Relevant MAKAUT syllabus portion: Functional Dependency, Different anomalies in


designing a Database. Normalization using
functional dependencies, Decomposition, Boyce-
Codd Normal Form, 3NF, Normalization using
multi-valued dependencies, 4NF, 5NF
Course Outcomes: Learn Normalization in Relational database

Lecture 1 (1 hr)

Topics Covered: ​3NF

Prerequisites​: Have you Read

● Set theory
● Entity Relationship
● Data Structure

Objectives:​ Impart basic knowledge about Functional Dependency

Notes:
Third Normal Form
❖ Allows some redundancy (with resultant problems)
❖ But FDs can be checked on individual relations without computing a join
❖ There is always a lossless-join, dependency-preserving decomposition into 3NF

A relation schema R is in third normal form (3NF) if for all α → β in F+ at least one of the
following holds:
❖ α→β is trivial (i.e., β ​∈ ​α)
❖ α is a superkey for R
❖ Each attribute A in β – α is contained in a candidate key for R.
(NOTE: each attribute may be in a different candidate key)
❖ If a relation is in BCNF it is in 3NF (since in BCNF one of the first two conditions above
must hold).

Let’s learn by doing:

1. R(ABCD), ABC → D, D → A, is R in 3NF?


Paper Name: Database Management System Paper Code: IT-601

2. Compare BCNF with 3NF

3. The relation schema Student_Performance (name, courseNo, rollNo, grade) has the
following FDs:
name,courseNo->grade, rollNo,courseNo->grade ,name->rollNo, rollNo->name
Find the highest normal form of this relation scheme
Paper Name: Database Management System Paper Code: IT-601

DAY 8

Course: Database Management System IT601

Relevant MAKAUT syllabus portion: Functional Dependency, Different anomalies in


designing a Database. Normalization using
functional dependencies, Decomposition, Boyce-
Codd Normal Form, 3NF, Normalization using
multi-valued dependencies, 4NF, 5NF
Course Outcomes: Learn Normalization in Relational database

Lecture 1 (1 hr)

Topics Covered: ​Normalization using multi-valued dependencies


Prerequisites​: Have you Read

● Set theory
● Entity Relationship
● Data Structure

Objectives:​ Impart basic knowledge about Multi-valued Dependency

Notes:
Multi-valued Dependency
❖ R=XYZ: relation scheme. An MVD X →→ Y holds iff each X-value in R is associated
with a set of Y-values in a way that does not depend on Z-values.
❖ Formally, for any pair of tuples t1, t2 of r(R) such that t1[X]=t2[X]

❖ There exists t3, t4 in r such that


o ​t1[X]=t2[X]= t3[X]=t4[X]
o ​t3[Y]= t1[Y]
o ​t3[Z]=t2[Z]
o ​t4[Y]=t2[Y]
o ​t4[Z]=t1[Z]
Paper Name: Database Management System Paper Code: IT-601

Let’s learn by doing:

1. List all nontrivial MVDs in

2. Add tuples to the following table so that it will satisfy X →→ Y.

3. Prove that α →→ β implies α →→ β .


Paper Name: Database Management System Paper Code: IT-601

DAY 9

Course: Database Management System IT601

Relevant MAKAUT syllabus portion: Functional Dependency, Different anomalies in


designing a Database. Normalization using
functional dependencies, Decomposition, Boyce-
Codd Normal Form, 3NF, Normalization using
multi-valued dependencies, 4NF, 5NF
Course Outcomes: Learn Normalization in Relational database

Lecture 1 (1 hr)

Topics Covered: ​Functional Dependency

Prerequisites​: Have you Read

● Set theory
● Entity Relationship
● Data Structure

Objectives:​ Impart basic knowledge about 4NF and 5NF

Notes:
Fourth Normal Form (4NF)
Relation schema R is in 4NF
❖ w.r.t. a set of dependencies D (FDs & MVDs)
❖ if for all MVD α →→ β in D+
α →→ β is a trivial MVD, OR
α is a superkey for R
❖ Effect: if there is any nontrivial MVD => it must be FD

Lemma​: If R is in 4NF then it is in BCNF


Example
R = (A, B, C, G, H, I), F = {A →→ B, B →→ HI, CG →→ H }
R is not in 4NF since A →→ B and A is not a superkey for R

Join Dependencies
Let R be a relation schema and R1, R2, . . . , Rn be a decomposition of R. The join dependency
*(R1, R2, . . . , Rn) is used to restrict the set of legal relations to those for which R1, R2, . . . , Rn
is a lossless-join decomposition of R. Formally, if R = R1 ​∪ ​R2 ​∪ ​. . . ​∪ ​Rn, we say that a
relation r(R) satisfies the join dependency *(R1, R2, . . . , Rn) if
Paper Name: Database Management System Paper Code: IT-601

Project-join normal form (PJNF) ​is defined in the same way as BCNF and 4NF, except that join
dependencies are used.
A relation schema R is in PJNF with respect to a set D of functional, multi-valued, and join
dependencies if, for all join dependencies in D+ of the form *(R1, R2, . . . , Rn), where each Ri​⊆​R and
R = R1 ​∪ ​R2 ​∪ ​. . . ​∪ ​Rn, at least one of the following holds:
✔ *(R1, R2, . . . , Rn) is a trivial join dependency.
✔ Every Ri is a superkey for R.

Let’s learn by doing:

1. Consider the following schema.


R = (A, B, C, D, E), S = (G, H, I, J), F = {A​ ​B, B​ ​C, B​ ​E, B​ ​D, G​ ​H, G​ ​I, I​ ​J}
Normalize the above schema with given constraints, to 4NF.
DAY 10
Course: Database Management System IT601

Relevant MAKAUT syllabus portion: Physical data structures, Query optimization: join
algorithm, statistics and cost based optimization.
Course Outcomes: Knowledge on Query Optimization

Lecture 1 (1 hr)

Topics Covered: ​Functional Dependency

Prerequisites​: Have you Read

● Set theory
● Entity Relationship
● RDBMS

Objectives:​ Understanding on Query Optimization

Notes:
Queries

A ​query ​is a language expression that describes data to be retrieved from a database.

Steps in query processing:

Query Optimization:
Query optimization is the process of selecting an efficient execution plan for evaluating the query.
After parsing of query, parsed query is passed to query optimizer, which generates different execution
plans to evaluate parsed query and select the plan with least estimated cost.
We can have different scenario and based on those scenario we will choose different execution
plan for query.

Scenerio1:-when both data files are small.


Load both F1 and F2 in main memory and then do cross product.
Pick up one record at a time from F1 and do cross product with every element of F2 and repeat this
process for every element of F1. As both data files are small so no require of lot memory space.

Scenerio2:when F1 is very large and F2 is small.


In this case we cannot load F1 in main memory as it will requires large space and it is not an
efficient plan.
Solution:-​In this case load only F2 in main memory and read records from F1 direct from
secondary memory and do cross product since we don't require writing records into F1 hence
we have least amount of I/O in this process and also no large space overhead.
Amount of I/O = size of F1 (no. of pages in F1) + size of F2 (no. of pages in F2) = m+n
Above one is the best execution plan for given scenario as in this we have to read at least once
from both the files.

Scenerio3: When we have a limited constant space available of main memory.


Case1:-​Let us have two pages p1 and p2 main memory available to perform cross product of
F1 and F2.
Then we read a page from F1 and load it into p1 and a page from F2 into p2 and then we
don't change p1 and change contents of p2 by loading elements of F2 till n and perform cross
product every time. After this change p1 and thus repeat this process till m times.
Amount of I/O = m x n
Case2​: find the best way to perform cross product when pages of main memory available
equals to 4(p1, p2, p3, p4).
We have two solutions for this:
Solution1:- ​read 2 pages from F1 at a time and also 2 pages from F2 then perform cross
product between them and then remove last two pages of F2 and read two new
pages and repeat this process till n. thus we repeat same process n/2 times then we
read 2 next pages from F1 and repeat process m/2 times.
Hence Amount of I/O = (m/2)*(n/2)
Solution2:- ​read 3 pages from F1 at a time and also 1 page from F2 then perform cross
product between them and then remove last pages of F2 and read new page and
repeat this process till n. thus we repeat same process n times then we read 3 next
pages from F1 and repeat process m/3 times.
Hence Amount of I/O = (m/3)*n

Scenerio4: When we have amount of memory not constant


Some fraction of data files first half pages (p1) from F1 and read half pages (p2) from F2
and do cross product then read another half part of F2 (p3) and do cross product with same
part of F1 (p1) and then read another half part of F1 (p4) and do cross product with latest
read part of F2 (p3) then do cross product with first read part of F2 (p2).
Amount of I/O = (m/2) + (n/2) + (n/2) + (m/2) + (n/2)

 Natural join (|X|):


Scenerio1: if we sort the file in common attribute

First take record from F1 then from F2 if matches then store in temporary file and repeat this
process for all records of F1. Amount of I/O = size of F1 + size of F2
Like cross product we also require different sorting method on the basis of different scenario
present like size of data files. Best sorting method does not require main memory.

Scenario_X: ​if we have both F1 and F2 very-2 large we should use merge sort for this case.
Scenario_Y: ​if we have very-2 large data files and some memory in fraction of data files:
Let we have 5% pages main memory available of data file having 100 pages then we
read 5 pages from F1 and sort them and assign one page and again next 5 pages and
sort them and assign one page thus we have different sorted group assigned into one
page. Then keep on merging between these all sorted groups and get a final sorted
file.
Amount of I/O = whole scan of file = to create 20 sorted file + cost of merging

Comparison between above types of memory available (constant and fraction of data
file):

1. In first one, we have sufficient memory available but in second one we have
some fraction of data file
2. In scenario_X we sort records in decreasing order i.e. first we have more sorted no.
of files those reduces to one file.
3. In scenario_Y in starting we have fix no. of files those after sorting reduces to
one file.
Paper Name: Database Management System Paper Code: IT-601

 
Selection operation( σ ):
If given data file is large then instead of load it into main memory we take a temporary page
in main memory and pick records from data files(secondary memory)one by one, if any
particular record qualify required condition then store it.

 Projection( П ):
In given data files we search table by table as above and if we found required attribute in
particular table then store that attribute and values.

Let’s learn by doing:

How will Oracle (with the rule-based optimizer) evaluate this query?
SELECT E.ENAME
FROM EMP E
WHERE DEPTNO = 20
AND SAL >= 2000
AND ENAME LIKE ’F%’
Paper Name: Database Management System Paper Code: IT-601

DAY 11
Course: Database Management System IT601

Relevant MAKAUT syllabus portion: Physical data structures, Query optimization: join
algorithm, statistics and cost based optimization.
Course Outcomes: Knowledge on Query Optimization

Lecture 1 (1 hr)

Topics Covered: ​Functional Dependency

Prerequisites​: Have you Read

● Set theory
● Entity Relationship
● RDBMS

Objectives:​ Understanding on statistics and cost based optimization

Notes:
Distributed Cost Model
Two different types of cost functions can be used
Reduce total time
✔ Reduce each cost component (in terms of time) individually, i.e., do as little for
each cost component as possible
✔ Optimize the utilization of the resources (i.e., increase system throughput)
Reduce response time
✔ Do as many things in parallel as possible
✔ May increase total time because of increased total activity

Total time: Sum of the time of all individual components


❖ Local processing time: CPU time + I/O time
❖ Communication time: fixed time to initiate a message + time to transmit the data

The individual components of the total cost have different weights:


● Wide area network
❖ Message initiation and transmission costs are high
❖ Local processing cost is low (fast mainframes or minicomputers)
Paper Name: Database Management System Paper Code: IT-601

❖ Ratio of communication to I/O costs is 20:1


● Local area networks
❖ Communication and local processing costs are more or less equal
❖ Ratio of communication to I/O costs is 1:1.6 (10MB/s network)

Response time: Elapsed time between the initiation and the completion of a query

– where #seq x (x in instructions, I/O, messages, bytes) is the maximum number of x which must
be done sequentially.
• Any processing and communication done in parallel is ignored

Database Statistics
✔ The primary cost factor is the size of intermediate relations
that are produced during the execution and
must be transmitted over the network, if a subsequent operation is located on a different site
✔ It is costly to compute the size of the intermediate relations precisely.
✔ Instead global statistics of relations and fragments are computed and used to provide
approximations

♣​ Let R(A1,A2, . . . ,Ak) be a relation fragmented into R1,R2, . . . ,Rr.


♣​ Relation statistics
❖ min and max values of each attribute: min{Ai}, max{Ai}.
❖ length of each attribute: length(Ai)
❖ number of distinct values in each fragment (cardinality): card(Ai), (card(dom(Ai)))
♣​ Fragment statistics
❖ cardinality of the fragment: card(Ri)
❖ cardinality of each attribute of each fragment: card(Π​Ai​(Rj))
Let’s learn by doing:
1. Consider the SQL query
Select *
From employee, department
Where employee.dept_id = department.dept_id
What evaluation plan would a query optimizer likely choose to get the least estimated cost?
DAY 12
Course: Database Management System IT601

Relevant MAKAUT syllabus portion: File & Record Concept, Placing file records on Disk,
Fixed and Variable sized Records, Types of Single-
Level Index (primary, secondary, and clustering),
Multilevel Indexes, Dynamic Multilevel Indexes using
B tree and B+ tree
Course Outcomes: File & Record Concept, Placing file records on Disk

Lecture 1 (1 hr)

Topics Covered:​ ​File system and Memory

Prerequisites​: Have you Read

● Memory
● Entity Relationship
● File System

Objectives:​ Understanding File & Record Concept in Disk.

Notes:
Physical Storage
Speed with which data can be accessed
Cost per unit of data
Reliability
o data loss on power failure or system crash
o physical failure of the storage device
Can differentiate storage into:
o volatile storage: loses contents when power is switched off
o non-volatile storage:
Contents persist even when power is switched off.
Includes secondary and tertiary storage, as well as battery-backed up
main-memory.
Let’s learn by doing:

1. Do a comparative study on various levels of RAID.


DAY 13
Course: Database Management System IT601

Relevant MAKAUT syllabus portion: File & Record Concept, Placing file records on Disk,
Fixed and Variable sized Records, Types of Single-
Level Index (primary, secondary, and clustering),
Multilevel Indexes, Dynamic Multilevel Indexes using
B tree and B+ tree
Course Outcomes: Knowledge of Fixed and Variable sized Records

Topics Covered:​ ​Fixed and Variable sized Records


Prerequisites​: Have you Read

● Memory
● Entity Relationship
● File System

Objectives:​ Learn about Fixed and Variable sized Records and use them in real problem.

Notes:
File organization
⇨​ The database is stored as a collection of files. Each file is a sequence of records. A
record is a sequence of fields.
⇨​ One approach:
o assume record size is fixed
o each file has records of one particular type only
o different files are used for different relations
⇨​ This case is easiest to implement; will consider variable length records later.
Fixed Length Record
Simple approach:
Store record i starting from byte n * (i – 1), where n is the size of each record.
Record access is simple but records may cross blocks
o ​Modification: do not allow records to cross block
boundaries Deletion of record i:
alternatives:
move records i + 1, . . ., n to i, . . . , n – 1
move record n to i
do not move records, but link all free records on a free list
Variable Length Record

Variable-length records arise in database systems in several ways:


Storage of multiple record types in a file.
Record types that allow variable lengths for one or more fields.
Record types that allow repeating fields (used in some older data models).
Paper Name: Database Management System Paper Code: IT-601

Slotted page structure

Let’s learn by doing:


1. ​Given a block can hold either 3 records or 10 key pointers. A database contains n
records, then how many blocks do we need to hold the data file?
Paper Name: Database Management System Paper Code: IT-601

DAY 14
Course: Database Management System IT601

Relevant MAKAUT syllabus portion: File & Record Concept, Placing file records on Disk,
Fixed and Variable sized Records, Types of Single-
Level Index (primary, secondary, and clustering),
Multilevel Indexes, Dynamic Multilevel Indexes using
B tree and B+ tree
Course Outcomes: Knowledge of Index

Topics Covered:​ ​Types of Single-Level Index (primary, secondary, and clustering),

Prerequisites​: Have you Read

● Memory
● Entity Relationship
● File System
Objectives:​ Impart knowledge of Indexing

Notes:
Basic Concepts
​Indexing mechanisms used to speed up access to desired data.
o ​E.g., author catalog in library
​Search Key - attribute to set of attributes used to look up records in a
file. An index file consists of records (called index entries) of the form

Index files are typically much smaller than the original file
​Two basic kinds of indices:
Ordered indices: search keys are stored in sorted order
Hash indices: search keys are distributed uniformly across “buckets” using a “hash
function”.
Indexing techniques evaluated on basis of:

In an ordered index, index entries are stored sorted on the search key value. E.g., author catalog
in library.
Primary index​: in a sequentially ordered file, the index whose search key specifies the
sequential order of the file.
o Also called clustering index
o The search key of a primary index is usually but not necessarily the primary key.
Secondary index: ​an index whose search key specifies an order different from the sequential
order of the file. Also called non-clustering index.
Index-sequential file: ​ordered sequential file with a primary index.
Paper Name: Database Management System Paper Code: IT-601

Dense index — Index record appears for every search-key value in the file.
Sparse Index: contains index records for only some search-key values.
o Applicable when records are sequentially ordered on search-key
To locate a record with search-key value K we:
⇨​ Find index record with largest search-key value < K
⇨​ Search file sequentially starting at the record to which the index record points
Clustering Index
A clustered index can be defined as an ordered data file. Sometimes the index is created on non-
primary key columns which may not be unique for each record.
In this case, to identify the record faster, we will group two or more columns to get the unique
value and create index out of them. This method is called a clustering index.
The records which have similar characteristics are grouped, and indexes are created for these
group.

Let’s learn by doing:


Suppose we have a relation R(a,b,c,d,e) and there are at least 1000 distinct values for each of the
attributes. Consider each of the following query workloads, independently of each other. If it is
possible to speed it up significantly by adding up to two additional indexes to relation R, specify for
each index (1) which attribute or set of attributes form the search key of the index, (2) if the index
should be clustered or un-clustered?
Paper Name: Database Management System Paper Code: IT-601

DAY 15
Course: Database Management System IT601

Relevant MAKAUT syllabus portion: File & Record Concept, Placing file records on Disk,
Fixed and Variable sized Records, Types of Single-
Level Index (primary, secondary, and clustering),
Multilevel Indexes, Dynamic Multilevel Indexes using
B tree and B+ tree
Course Outcomes: Knowledge of Multilevel Indexes

Topics Covered:​ ​Multilevel Indexes

Prerequisites​: Have you Read

● Memory
● Entity Relationship
● File System

Objectives:​ Working with Multilevel Indexes

Notes:
Multilevel Index
★ If primary index does not fit in memory, access becomes expensive.
★ To reduce number of disk accesses to index records, treat primary index kept on disk as a
sequential file and construct a sparse index on it.
outer index – a sparse index of primary index
inner index – the primary index file
★ If even outer index is too large to fit in main memory, yet another level of index can
be created, and so on.
★ Indices at all levels must be updated on insertion or deletion from the file.
Paper Name: Database Management System Paper Code: IT-601

Let’s learn by doing:


1. How does the multilevel indexing structure improve the efficiency of searching an index file?

2. Consider a disk with block size B = 512 bytes. A block pointer is P = 8 bytes long, and a
record pointer is Pr = 9 bytes long. A file has r = 50,000 STUDENT records of fixed-size R =
147 bytes. The key field ID# has a length V = 12 bytes. Answer the following questions:

Suppose the key field ID# is the ordering field, and a primary index has been constructed.
Now if we want to make it into a multilevel index, what is the number of levels needed and
what is the total number of blocks required by the multilevel index?

Suppose the key field ID# is NOT the ordering field, and a secondary index has been built.
Now if we want to make it into a multilevel index, what is the number of levels needed and
what is the total number of blocks required by the multilevel index?
Paper Name: Database Management System Paper Code: IT-601

DAY 16
Course: Database Management System IT601

Relevant MAKAUT syllabus portion: File & Record Concept, Placing file records on Disk,
Fixed and Variable sized Records, Types of Single-
Level Index (primary, secondary, and clustering),
Multilevel Indexes, Dynamic Multilevel Indexes using
B tree and B+ tree
Course Outcomes: Learn of Dynamic Multilevel Indexes using B tree and B+ tree
Topics Covered: ​Dynamic Multilevel Indexes using B tree and B+ tree
Prerequisites​: Have you Read

● Data Structure
● Entity Relationship
● File System

Objectives:​ Handling Dynamic Multilevel Indexes using B tree and B+ tree


Notes:
B+-Tree Index Files
B+-tree indices are an alternative to indexed-sequential files.
★ Disadvantage of indexed-sequential files: performance degrades as file grows, since many
overflow blocks get created. Periodic reorganization of entire file is required.
★ Advantage of B+-tree index files: automatically reorganizes itself with small, local, changes,
in the face of insertions and deletions. Reorganization of entire file is not required to maintain
performance.
★ Disadvantage of B+-trees: extra insertion and deletion overhead, space overhead.
★ Advantages of B+-trees outweigh disadvantages, and they are used extensively.
A B+-tree is a rooted tree satisfying the following properties:
✔ All paths from root to leaf are of the same length
✔ Each node that is not a root or a leaf has between [n/2] and n children.
✔ A leaf node has between [(n–1)/2] and n–1 values
✔ Special cases:
✔ If the root is not a leaf, it has at least 2 children.
✔ If the root is a leaf (that is, there are no other nodes in the tree), it can have between 0 and (n–1)
values.
Structure of B​+ ​Tree
✔ Every leaf node is at equal distance from the root node. A B​+ ​tree is of the order ​n ​where ​n ​is
fixed for every B​+ ​tree.
Internal nodes ​−
● Internal (non-leaf) nodes contain at least ​⌈​n/2​⌉ ​pointers, except the root node.
● At most, an internal node can contain ​n ​pointers.
Leaf nodes ​−
● Leaf nodes contain at least ​⌈​n/2​⌉ ​record pointers and ​⌈​n/2​⌉ ​key values.
● At most, a leaf node can contain ​n ​record pointers and ​n ​key values.
● Every leaf node contains one block pointer ​P ​to point to next leaf node and forms a
linked list.
 B​+​ Tree Insertion

● B​+​ trees are filled from bottom and each entry is done at the leaf node.
● If a leaf node overflows −
o Split node into two parts.
o Partition at ​i = ​⌊​(m+1)​/2​⌋​.
o First ​i ​entries are stored in one node.
o Rest of the entries (i+1 onwards) are moved to a new node.
​ ey is duplicated at the parent of the leaf.
o i​th​ k
● If a non-leaf node overflows −
o Split node into two parts.
o Partition the node at ​i = ​⌈​(m+1)​/2​⌉​.
o Entries up to ​i ​are kept in one node.
o Rest of the entries are moved to a new node.
 B​+​ Tree Deletion
● B​+​ tree entries are deleted at the leaf nodes.
● The target entry is searched and deleted.
o If it is an internal node, delete and replace with the entry from the left position.
● After deletion, underflow is tested,
o If underflow occurs, distribute the entries from the nodes left to it.
● If distribution is not possible from left, then
o Distribute from the nodes right to it.
● If distribution is not possible from left or from right, then
o Merge the node with left and right to it.

Let’s learn by doing:


Consider the disk with block size B = 512 bytes. A block pointer is P = 8 bytes long, and a record
pointer is Pr = 9 bytes long. A file has r = 50,000 STUDENT records of fixed-size R = 147 bytes.
The key field is ID# whose length is V = 12 bytes. (This is the same disk file as in previous
exercises. Some of the early results should be utilised.) Suppose that the file is NOT sorted by the
key field ID# and we want to construct a B-tree access structure (index) on ID#. Answer the
following questions:
1. What is an appropriate order p of this B-tree?
2. How many levels are there in the B-tree if nodes are approximately 69% full?
3. What is the total number of blocks needed by the B-tree if they are approximately 69% full?
4. How many block accesses are required to search for and retrieve a record from the data file,
given an ID#, using the B-tree?
DAY 17
Course: Database Management System IT601

Relevant MAKAUT syllabus portion: Relational Calculus


Course Outcomes: To learn Relational Calculus

Topics Covered:​ ​Relational Calculus

Prerequisites​: Have you Read

● Memory
● Entity Relationship
● File System

Objectives:​ Learn to use Relational Calculus in DBMS

Notes:
RELATIONAL CALCULUS

⇨​ Relational Algebra is a PROCEDURAL LANGUAGE


⇨​ we must explicitly provide a sequence of operations to generate a desired output result
⇨​ Relational Calculus is a DECLARATIVE LANGUAGE
⇨​ we specify what to retrieve, not how to retrieve it
⇨​ Declarative ~ Non-Procedural

If retrieval can be specified in the relational calculus, it can be specified in the relational algebra, and
vise-versa
→expressive power of the languages is identical
A query language L is relationally complete if L can express any query that can be expressed in
the relational calculus
Let’s learn by doing:
1. Let R = (A, B) and S = (A, C), and let r(R) and s(S) be relations. The relational
algebra expression ​∠​A(​ ​⌠​B=10 ​(r)) is equivalent to the following domain relational
calculus expression:
{<​a​> | ​∃ ​b (​ <​a, b> ​ ​∈ ​r ∧ ​ ​b = ​ 10)}
Give an expression in the domain relational calculus that is equivalent to each of the
following:
a) r ​⋈ ​s
b) ​∠r​ .A ​((r ​⋈ ​s) ​⋈c​ =r2.A ​∧ ​r.B>r2.B ​(​ρr​ 2​(r)))
2. Consider the following relational schema.
Students(rollno: integer, sname: string)
Courses(courseno: integer, cname: string)
Registration(rollno: integer, courseno: integer, percent: real)
Express in TRC "Find the distinct names of all students who score more than 90% in the
course numbered 107"

Anda mungkin juga menyukai