Anda di halaman 1dari 35

Database is a collection of related data and data is a collection of facts and figures

that can be processed to produce information.

Mostly data represents recordable facts. Data aids in producing information, which is
based on facts. For example, if we have data about marks obtained by all students,
we can then conclude about toppers and average marks.

A database management system stores data in such a way that it becomes easier
to retrieve, manipulate, and produce information.

Characteristics
Traditionally, data was organized in file formats. DBMS was a new concept then, and
all the research was done to make it overcome the deficiencies in traditional style of
data management. A modern DBMS has the following characteristics −

• Real-world entity − A modern DBMS is more realistic and uses real-world entities to
design its architecture. It uses the behaviour and attributes too. For example, a school
database may use students as an entity and their age as an attribute.

• Relation-based tables − DBMS allows entities and relations among them to form tables.
A user can understand the architecture of a database just by looking at the table names.

• Isolation of data and application − A database system is entirely different than its
data. A database is an active entity, whereas data is said to be passive, on which the
database works and organizes. DBMS also stores metadata, which is data about data, to
ease its own process.

• Less redundancy − DBMS follows the rules of normalization, which splits a relation when
any of its attributes is having redundancy in values. Normalization is a mathematically rich
and scientific process that reduces data redundancy.

• Consistency − Consistency is a state where every relation in a database remains


consistent. There exist methods and techniques, which can detect attempt of leaving
database in inconsistent state. A DBMS can provide greater consistency as compared to
earlier forms of data storing applications like file-processing systems.

• Query Language − DBMS is equipped with query language, which makes it more efficient
to retrieve and manipulate data. A user can apply as many and as different filtering
options as required to retrieve a set of data. Traditionally it was not possible where file-
processing system was used.

• ACID Properties − DBMS follows the concepts of Atomicity, Consistency, Isolation,


and Durability (normally shortened as ACID). These concepts are applied on transactions,
which manipulate data in a database. ACID properties help the database stay healthy in
multi-transactional environments and in case of failure.

• Multiuser and Concurrent Access − DBMS supports multi-user environment and allows
them to access and manipulate data in parallel. Though there are restrictions on
transactions when users attempt to handle the same data item, but users are always
unaware of them.

• Multiple views − DBMS offers multiple views for different users. A user who is in the
Sales department will have a different view of database than a person working in the
Production department. This feature enables the users to have a concentrate view of the
database according to their requirements.

• Security − Features like multiple views offer security to some extent where users are
unable to access data of other users and departments. DBMS offers methods to impose
constraints while entering data into the database and retrieving the same at a later stage.
DBMS offers many different levels of security features, which enables multiple users to
have different views with different features. For example, a user in the Sales department
cannot see the data that belongs to the Purchase department. Additionally, it can also be
managed how much data of the Sales department should be displayed to the user. Since a
DBMS is not saved on the disk as traditional file systems, it is very hard for miscreants to
break the code.

Users
A typical DBMS has users with different rights and permissions who use it for
different purposes. Some users retrieve data and some back it up. The users of a
DBMS can be broadly categorized as follows –

• Administrators − Administrators maintain the DBMS and are responsible for


administrating the database. They are responsible to look after its usage and by whom it
should be used. They create access profiles for users and apply limitations to maintain
isolation and force security. Administrators also look after DBMS resources like system
license, required tools, and other software and hardware related maintenance.

• Designers − Designers are the group of people who actually work on the designing part of
the database. They keep a close watch on what data should be kept and in what format.
They identify and design the whole set of entities, relations, constraints, and views.

• End Users − End users are those who actually reap the benefits of having a DBMS. End
users can range from simple viewers who pay attention to the logs or market rates to
sophisticated users such as business analysts.
3-tier Architecture
• A 3-tier architecture separates its tiers from each other based on the
complexity of the users and how they use the data present in the database. It
is the most widely used architecture to design a DBMS.

• Database (Data) Tier − At this tier, the database resides along with its query processing
languages. We also have the relations that define the data and their constraints at this
level.

• Application (Middle) Tier − At this tier reside the application server and the programs
that access the database. For a user, this application tier presents an abstracted view of
the database. End-users are unaware of any existence of the database beyond the
application. At the other end, the database tier is not aware of any other user beyond the
application tier. Hence, the application layer sits in the middle and acts as a mediator
between the end-user and the database.

• User (Presentation) Tier − End-users operate on this tier and they know nothing about
any existence of the database beyond this layer. At this layer, multiple views of the
database can be provided by the application. All views are generated by applications that
reside in the application tier.

Data models define how the logical structure of a database is modelled. Data Models are
fundamental entities to introduce abstraction in a DBMS. Data models define how data is
connected to each other and how they are processed and stored inside the system.

Entity-Relationship Model
Entity-Relationship (ER) Model is based on the notion of real-world entities and
relationships among them. While formulating real-world scenario into the database
model, the ER Model creates entity set, relationship set, general attributes and
constraints.

ER Model is best used for the conceptual design of a database.

ER Model is based on −

• Entities and their attributes.

• Relationships among entities.

These concepts are explained below.


• Entity − An entity in an ER Model is a real-world entity having properties
called attributes. Every attribute is defined by its set of values called domain. For
example, in a school database, a student is considered as an entity. Student has various
attributes like name, age, class, etc.

• Relationship − The logical association among entities is called relationship.


Relationships are mapped with entities in various ways. Mapping cardinalities define the
number of associations between two entities.

Mapping cardinalities −

o one to one

o one to many

o many to one

o many to many

Relational Model
The most popular data model in DBMS is the Relational Model. It is more scientific a
model than others. This model is based on first-order predicate logic and defines a
table as an n-array relation.

The main highlights of this model are −

• Data is stored in tables called relations.

• Relations can be normalized.

• In normalized relations, values saved are atomic values.


• Each row in a relation contains a unique value.

• Each column in a relation contains values from a same domain.

Database Schema
• A database schema is the skeleton structure that represents the logical view of
the entire database. It defines how the data is organized and how the relations
among them are associated. It formulates all the constraints that are to be
applied on the data.
• A database schema defines its entities and the relationship among them. It
contains a descriptive detail of the database, which can be depicted by means
of schema diagrams. It’s the database designers who design the schema to
help programmers understand the database and make it useful.

database schema can be divided broadly into two categories −

• Physical Database Schema − This schema pertains to the actual storage of data and its
form of storage like files, indices, etc. It defines how the data will be stored in a secondary
storage.

• Logical Database Schema − This schema defines all the logical constraints that need to
be applied on the data stored. It defines tables, views, and integrity constraints.
Database Instance
A database instance is a state of operational database with data at any given time. It
contains a snapshot of the database. Database instances tend to change with time. A
DBMS ensures that its every instance (state) is in a valid state, by diligently following all
the validations, constraints, and conditions that the database designers have imposed.
Database schema is the skeleton of database. It is designed when the database doesn't
exist at all. Once the database is operational, it is very difficult to make any changes to it.
A database schema does not contain any data or information.

The ER model defines the conceptual view of a database. It works around real-
world entities and the associations among them. At view level, the ER model is
considered a good option for designing databases.

Entity
An entity can be a real-world object, either animate or inanimate, that can be easily
identifiable. For example, in a school database, students, teachers, classes, and
courses offered can be considered as entities. All these entities have some attributes
or properties that give them their identity.

An entity set is a collection of similar types of entities. An entity set may contain
entities with attribute sharing similar values. For example, a Students set may
contain all the students of a school; likewise, a Teachers set may contain all the
teachers of a school from all faculties. Entity sets need not be disjoint.

Attributes
Entities are represented by means of their properties, called attributes. All
attributes have values. For example, a student entity may have name, class, and age
as attributes.

There exists a domain or range of values that can be assigned to attributes. For
example, a student's name cannot be a numeric value. It has to be alphabetic. A
student's age cannot be negative, etc.

Types of Attributes
• Simple attribute − Simple attributes are atomic values, which cannot be divided further.
For example, a student's phone number is an atomic value of 10 digits.

• Composite attribute − Composite attributes are made of more than one simple attribute.
For example, a student's complete name may have first name and last name.

• Derived attribute − Derived attributes are the attributes that do not exist in the physical
database, but their values are derived from other attributes present in the database. For
example, average salary in a department should not be saved directly in the database,
instead it can be derived. For another example, age can be derived from date of birth.
• Single-value attribute − Single-value attributes contain single value. For example –
Social Security Number.

• Multi-value attribute − Multi-value attributes may contain more than one values. For
example, a person can have more than one phone number, email address, etc.

These attribute types can come together in a way like −

• simple single-valued attributes

• simple multi-valued attributes

• composite single-valued attributes

• composite multi-valued attributes

Entity-Set and Keys


Key is an attribute or collection of attributes that uniquely identifies an entity among
entity set.

For example, the roll number of a student makes him/her identifiable among
students.

• Super Key − A set of attributes (one or more) that collectively identifies an entity in an
entity set.

• Candidate Key − A minimal super key is called a candidate key. An entity set may have
more than one candidate key.

• Primary Key − A primary key is one of the candidate keys chosen by the database
designer to uniquely identify the entity set.

Relationship
The association among entities is called a relationship. For example, an
employee works at a department, a student enrols in a course. Here, Works at and
Enrols are called relationships.

Relationship Set
A set of relationships of similar type is called a relationship set. Like entities, a
relationship too can have attributes. These attributes are called descriptive
attributes.

Degree of Relationship
The number of participating entities in a relationship defines the degree of the
relationship.

• Binary = degree 2

• Ternary = degree 3

• n-ary = degree
Mapping Cardinalities
Cardinality defines the number of entities in one entity set, which can be associated
with the number of entities of other set via relationship set.

• One-to-one − One entity from entity set A can be associated with at most one entity of
entity set B and vice versa.

• One-to-many − One entity from entity set A can be associated with more than one entity
of entity set B however an entity from entity set B, can be associated with at most one
entity.

• Many-to-one − More than one entity from entity set A can be associated with at most one
entity of entity set B, however an entity from entity set B can be associated with more
than one entity from entity set A.
• Many-to-many − One entity from A can be associated with more than one entity from B
and vice versa.

Entity
Entities are represented by means of rectangles. Rectangles are named with the
entity set they represent.

Attributes
Attributes are the properties of entities. Attributes are represented by means of
ellipses. Every ellipse represents one attribute and is directly connected to its entity
(rectangle).

If the attributes are composite, they are further divided in a tree like structure.
Every node is then connected to its attribute. That is, composite attributes are
represented by ellipses that are connected with an ellipse.
Multivalued attributes are depicted by double ellipse.

Derived attributes are depicted by dashed ellipse.


A weak entity set is one which does not have any primary key associated with it.

Relationship
Relationships are represented by diamond-shaped box. Name of the relationship is
written inside the diamond-box. All the entities (rectangles) participating in a
relationship, are connected to it by a line.

Binary Relationship and Cardinality


A relationship where two entities are participating is called a binary relationship.
Cardinality is the number of instance of an entity from a relation that can be
associated with the relation.

• One-to-one − When only one instance of an entity is associated with the relationship, it is
marked as '1:1'. The following image reflects that only one instance of each entity should
be associated with the relationship. It depicts one-to-one relationship.

• One-to-many − When more than one instance of an entity is associated with a


relationship, it is marked as '1:N'. The following image reflects that only one instance of
entity on the left and more than one instance of an entity on the right can be associated
with the relationship. It depicts one-to-many relationship.

• Many-to-one − When more than one instance of entity is associated with the relationship,
it is marked as 'N:1'. The following image reflects that more than one instance of an entity
on the left and only one instance of an entity on the right can be associated with the
relationship. It depicts many-to-one relationship.

• Many-to-many − The following image reflects that more than one instance of an entity on
the left and more than one instance of an entity on the right can be associated with the
relationship. It depicts many-to-many relationship.

Generalization
As mentioned above, the process of generalizing entities, where the generalized
entities contain the properties of all the generalized entities, is called generalization.
In generalization, a number of entities are brought together into one generalized
entity based on their similar characteristics. For example, pigeon, house sparrow,
crow and dove can all be generalized as Birds.

Specialization
Specialization is the opposite of generalization. In specialization, a group of entities
is divided into sub-groups based on their characteristics. Take a group ‘Person’ for
example. A person has name, date of birth, gender, etc. These properties are
common in all persons, human beings. But in a company, persons can be identified
as employee, employer, customer, or vendor, based on what role they play in the
company.
Similarly, in a school database, persons can be specialized as teacher, student, or a
staff, based on what role they play in school as entities.

Inheritance
We use all the above features of ER-Model in order to create classes of objects in
object-oriented programming. The details of entities are generally hidden from the
user; this process known as abstraction.

Inheritance is an important feature of Generalization and Specialization. It allows


lower-level entities to inherit the attributes of higher-level entities.

For example, the attributes of a Person class such as name, age, and gender can be
inherited by lower-level entities such as Student or Teacher.

Relational data model is the primary data model, which is used widely
around the world for data storage and processing. This model is simple and it has all
the properties and capabilities required to process data with storage efficiency.
Concepts
Tables − In relational data model, relations are saved in the format of Tables. This
format stores the relation among entities. A table has rows and columns, where rows
represents records and columns represent the attributes.

Tuple − A single row of a table, which contains a single record for that relation is
called a tuple.

Relation instance − A finite set of tuples in the relational database system


represents relation instance. Relation instances do not have duplicate tuples.

Relation schema − A relation schema describes the relation name (table name),
attributes, and their names.

Relation key − Each row has one or more attributes, known as relation key, which
can identify the row in the relation (table) uniquely.

Attribute domain − Every attribute has some pre-defined value scope, known as
attribute domain.

Constraints
Every relation has some conditions that must hold for it to be a valid relation. These
conditions are called Relational Integrity Constraints. There are three main
integrity constraints −

• Key constraints

• Domain constraints

• Referential integrity constraints

Key Constraints
There must be at least one minimal subset of attributes in the relation, which can
identify a tuple uniquely. This minimal subset of attributes is called keyfor that
relation. If there are more than one such minimal subsets, these are
called candidate keys.

Key constraints force that −

• in a relation with a key attribute, no two tuples can have identical values for key
attributes.

• a key attribute can not have NULL values.

Key constraints are also referred to as Entity Constraints.

Domain Constraints
Attributes have specific values in real-world scenario. For example, age can only be a
positive integer. The same constraints have been tried to employ on the attributes of
a relation. Every attribute is bound to have a specific range of values. For example,
age cannot be less than zero and telephone numbers cannot contain a digit outside
0-9.

Referential integrity Constraints


Referential integrity constraints work on the concept of Foreign Keys. A foreign key is
a key attribute of a relation that can be referred in other relation.

Referential integrity constraint states that if a relation refers to a key attribute of a


different or same relation, then that key element must exist.

SQL is a programming language for Relational Databases. It is designed over relational


algebra and tuple relational calculus. SQL comes as a package with all major distributions of
RDBMS.

Data Definition Language


CREATE - Creates new databases, tables and views from RDBMS.
ALTER - Modifies database schema. ALTER performs
1) Add, drop, modify table columns
2) Add and drop constraints
3) Enable and Disable constraints

DROP - Drops commands, views, tables, and databases from RDBMS.


RENAME - Renames new databases, tables and views from RDBMS.
TRUNCATE - Removes all the data from the table but retains Structure.

DDL SYNTAX EXAMPLE


Command

CREATE TABLE CREATE TABLE


<table_name> employee
CREATE
(column_name1 ( id number(5),
TABLE datatype, name char(20),
column_name2 dept char(10),
datatype, age number(2),
... column_namen salary number(10),
datatype location char(10)
); );

CREATE TABLE CREATE TABLE


<table_name> employee

SELECT SELECT *
<Columns_list>
FROM emp;
FROM <table-name>

[WHERE condition]

ALTER TABLE ALTER TABLE employee


table_name ADD ADD experience
column_name number(3);
datatype;

ALTER ALTER TABLE ALTER TABLE employee


table_name DROP DROP location;
TABLE column_name;

ALTER TABLE ALTER TABLE employee


table_name MODIFY MODIFY salary
column_name number(15,2);
datatype;

DROP DROP TABLE DROP TABLE employee


table_name
TABLE

RENAME RENAME RENAME employee TO


old_table_name To my_emloyee;
TABLE
new_table_name;

TRUNCATE TRUNCATE TABLE TRUNCATE TABLE


table_name employee
TABLE

Data Manipulation Language


SQL is equipped with data manipulation language (DML). DML modifies the database
instance by inserting, updating and deleting its data. DML is responsible for all forms
data modification in a database. SQL contains the following set of commands in its
DML section −

• INSERT INTO/VALUES - The INSERT Statement is used to add new rows of data to a table.

• UPDATE/SET/WHERE - The UPDATE Statement is used to modify the existing rows in a


table.

• DELETE FROM/WHERE - The DELETE Statement is used to delete rows from a table.

DML SYNTAX EXAMPLE


Command

INSERT INTO INSERT INTO employee


TABLE_NAME (id, name, dept, age,
VALUES (value1, value2, salary location) VALUES
value3,...valueN); (105, 'ABC', 'MBA', 25,
33000);

INSERT INTO employee


VALUES (106, 'XYZ',
'CSE', 23, 30000);
INSERT

INSERT INTO INSERT INTO employee


table_name SELECT * FROM
[(column1, column2, ... temp_employee;
columnN)]
SELECT column1,
column2, ...columnN
FROM table_name
[WHERE condition];

UPDATE table_name UPDATE employee


SET column_name1 = SET location ='Mysore'
value1, WHERE id = 101;
UPDATE column_name2 =
UPDATE employee
value2, ...
SET salary = salary +
[WHERE condition]
(salary * 0.2);

DELETE DELETE FROM DELETE FROM employee


table_name [WHERE WHERE id = 100;
condition];
DELETE FROM
employee;

Data Retrieval Language


• SELECT/FROM/WHERE -SQL SELECT statement is used to query or retrieve data from a
table in the database. A query may retrieve information from specified columns or from all
of the columns in the table.

DRL SYNTAX EXAMPLE


Command
SELECT column_list SELECT *
FROM employee;
FROM table-name
[WHERE Clause] SELECT id,name,dept
FROM employee
[GROUP BY clause]
WHERE dept='MBA';
[HAVING clause]
SELECT dept, SUM (salary)
[ORDER BY clause];
FROM employee
SELECT GROUP BY dept;

SELECT dept, SUM (salary)


FROM employee
GROUP BY dept
HAVING SUM (salary) >
25000

SELECT name, salary FROM


employee
ORDER BY salary;

SQL AGGREGATE(GROUP) Functions: Group functions are built-in SQL functions


that operate on groups of rows and return one value for the entire group. These functions
are: COUNT, MAX, MIN, AVG, SUM, DISTINCT

GROUP Function Example


COUNT SELECT COUNT (*) FROM employee
WHERE dept = 'MBA';
DISTINCT SELECT DISTINCT dept FROM
employee;
MIN SELECT MIN (salary) FROM employee;
MAX SELECT MAX (salary) FROM employee;
SUM SELECT SUM (salary) FROM employee;
AVG SELECT AVG (salary) FROM employee;

Transaction Control Language


• COMMIT

• ROLLBACK

SQL Integrity Constraints


Integrity Constraints are used to apply business rules for the database tables.

Constraints can be defined in two ways


1) The constraints can be specified immediately after the column definition. This is
called column-level definition.
2) The constraints can be specified after all the columns are defined. This is called
table-level definition.
The constraints available in SQL are Primary Key, Foreign Key, Unique,
Not Null and Check.

SYNTAX EXAMPLE

column name datatype CREATE TABLE employee


[CONSTRAINT] ( id number(5) PRIMARY KEY,
name char(20) NOT NULL,
dept char(10) REFERENCES depts(deptid),
age number(2),
salary number(10) CHECK salary > 0,
location char(10) UNIQUE
);

CREATE TABLE depts


( deptid number(5),
dname char(20)

);

ALTER TABLE depts ADD CONSTRAINT


dept_id_pk PRIMARY KEY (deptid)
SQL JOINS
A SQL Join statement is used to combine data or rows from two or more tables based on a
common field between them. Different types of Joins are:
• SELF JOIN
• INNER JOIN
• LEFT JOIN
• RIGHT JOIN
• FULL JOIN

SELF JOIN: Self join is nothing but joining the table by itself.

Syntax:
SELECT Column_name1,Column_name2….Column_name ‘n’
FROM Table A,Table B
WHERE A.Column_name = B.Column_name;
Example:
SELECT E.Employee_id,E.Name as ‘Employee Name’,F.Name as ‘Manager Name’
FROM Employee E,Employee F
WHERE E.Emp_jd=F.Mgr_id;

INNER JOIN: The INNER JOIN keyword selects all rows from both the tables as long as the
condition satisfies. This keyword will create the result-set by combining all rows from both
the tables where the condition satisfies i.e value of the common field will be same.

LEFT JOIN: This join returns all the rows of the table on the left side of the join and
matching rows for the table on the right side of join. The rows for which there is no matching
row on right side, the result-set will contain null. LEFT JOIN is also known as LEFT OUTER
JOIN

RIGHT JOIN: RIGHT JOIN is similar to LEFT JOIN. This join returns all the rows of the table
on the right side of the join and matching rows for the table on the left side of join. The rows
for which there is no matching row on left side, the result-set will contain null. RIGHT JOIN is
also known as RIGHT OUTER JOIN.

FULL JOIN: FULL JOIN creates the result-set by combining result of both LEFT JOIN and
RIGHT JOIN. The result-set will contain all the rows from both the tables. The rows for which
there is no matching, the result-set will contain NULL values.
SYNTAX FIGURE
SELECT table1.column1,
table1.column2,
table2.column1,....
FROM table1
INNER JOIN table2
ON table1.matching_column =
table2.matching_column;

SELECT table1.column1,
table1.column2,
table2.column1,....
FROM table1
LEFT JOIN table2
ON table1.matching_column =
table2.matching_column;

SELECT table1.column1,
table1.column2,
table2.column1,....
FROM table1
RIGHT JOIN table2
ON table1.matching_column =
table2.matching_column;

SELECT table1.column1,
table1.column2,
table2.column1,....
FROM table1
FULL JOIN table2
ON table1.matching_column =
table2.matching_column;
Example of JOINS:

JOIN EXAMPLE

SELF JOIN SELECT E.Employee_id,E.Name as ‘Employee Name’,


F.Name as ‘Manager Name’
FROM Employee E,Employee F
WHERE E.Emp_jd=F.Mgr_id;

INNER JOIN Type 1 : Using Where Clause


SELECT E.Employee_name,D.Department_name
FROM Employee E,Department D
WHERE E.Emp_no=D.Emp_no;
Type 2 : Using Join Clause
SELECT E.Employee_name,D.Department_name
FROM Employee E JOIN Department D —Join Clause
ON E.Emp_no=D.Emp_no; —On Condition
Type 3: Using Natural Join
SELECT E.Employee_name,D.Department_name,E.Salary
FROM Employee E NATURAL JOIN Department D;
—natural join clause
Type 4:
SELECT E.Employee_name,D.Department_name,E.Salary
FROM Employee E Join Department D —join
USING(Emp_no); —using condition

LEFT JOIN SELECT a.Employee_name,b.Department_Name


FROM Employee a LEFT JOIN Department b
(or)
ON a.Department_ID=b.Department_ID;
LEFT OUTER
(or)
JOIN
SELECT a.Employee_name,b.Department_Name
FROM Employee a, Department b
ON a.Department_ID=b.Department_ID(+);

RIGHT JOIN SELECT a.Employee_name,b.Department_Name


FROM Employee a RIGHT JOIN Department b
(or)
ON a.Department_ID=b.Department_ID;
RIGHT
(or)
OUTER JOIN
SELECT a.Employee_name,b.Department_Name
FROM Employee a, Department b
ON a.Department_ID(+)=b.Department_ID;

FULL JOIN SELECT a.Employee_name,b.Department_Name


FROM Employee a FULL JOIN Department b
(or)
ON a.Department_ID=b.Department_ID;
FULL OUTER
JOIN

STORED PROCEDURES:

A stored procedure is a set of Structured Query Language (SQL) statements with an assigned
name, which are stored in a relational database management system as a group, so it can be
reused and shared by multiple programs.

Stored procedures can access or modify data in a database. A stored procedure in Oracle
follows the basic PL/SQL block structure, which consists of declarative, executable and
exception-handling parts. Declarative Part is an optional part which contains declarations of
types, cursors, constants, variables, exceptions, and nested subprograms. Executable
Part is a mandatory part and contains statements that perform the designated action.
Exception-handling is an optional part that contains the code that handles run-time errors.

SYNTAX:

CREATE [OR REPLACE] PROCEDURE procedure_name

[(parameter_name [IN | OUT | IN OUT] type [, ...])]

{IS | AS}

BEGIN

< procedure_body >

END procedure_name;

Where,

• procedure-name specifies the name of the procedure.

• [OR REPLACE] option allows the modification of an existing procedure.

• The optional parameter list contains name, mode and types of the
parameters. IN represents the value that will be passed from outside and OUT
represents the parameter that will be used to return a value outside of the procedure.

• procedure-body contains the executable part.


• The AS keyword is used instead of the IS keyword for creating a standalone
procedure.

Example of Stored Procedure:

CREATE OR REPLACE PROCEDURE greetings

AS

BEGIN

dbms_output.put_line('Hello World!');

END;

We execute the procedure as

SQL:/>EXECUTE greetings;

Output we get as :

Hello World

PL/SQL procedure successfully completed.

NORMALIZATION:

Normalization is a database design technique which organizes tables in a manner that


reduces redundancy and dependency of data.

It divides larger tables to smaller tables and links them using relationships.

Database Normal Forms

1NF (First Normal Form) Rules


• Each table cell should contain a single value.
• Each record needs to be unique.

Before we proceed for 2NF we need to know about Primary key, Non-primary attribute,
candidate key and Functional dependency.

A KEY is a value used to identify a record in a table uniquely. A KEY could be a single column
or combination of multiple columns. Columns in a table that are NOT used to identify a
record uniquely are called non-key columns.

A Primary Key is a single column value used to identify a database record uniquely. It has
following attributes-

• A primary key cannot be NULL

• A primary key value must be unique

• The primary key values should rarely be changed

• The primary key must be given a value when a new record is inserted.

A Composite Key is a primary key composed of multiple columns used to identify a record
uniquely.

Functional Dependency:

Functional dependency (FD) is a set of constraints between two attributes in a relation.


Functional dependency says that if two tuples have same values for attributes A1, A2,..., An,
then those two tuples must have to have same values for attributes B1, B2, ..., Bn.

Functional dependency is represented by an arrow sign (→) that is, X→Y, where X
functionally determines Y. The left-hand side attributes determine the values of attributes on
the right-hand side.

A transitive functional dependency is when changing a non-key column, might cause any of
the other non-key columns to change.

2NF (Second Normal Form) Rules


• Rule 1- Be in 1NF

• Rule 2- Single Column Primary Key i.e. every non-prime attribute should be fully
functionally dependent on prime key attribute.

3NF (Third Normal Form) Rules


• Rule 1- Be in 2NF

• Rule 2- Has no transitive functional dependencies- Transitive functional dependency of


non-prime attribute on any super key should be removed.

A table is in 3NF if it is in 2NF and for each functional dependency X-> Y at least one of the
following conditions hold:
• X is a super key of table

• Y is a prime attribute of table

Boyce- Codd Normal Form (BCNF)


Even when a database is in 3rd Normal Form, still there would be anomalies resulted if it has
more than one Candidate Key. Sometimes is BCNF is also referred as 3.5 Normal Form.

4NF (Fourth Normal Form) Rules


If no database table instance contains two or more, independent and multi-valued data
describing the relevant entity, then it is in 4th Normal Form.

5NF (Fifth Normal Form) Rules


A table is in 5th Normal Form only if it is in 4NF and it cannot be decomposed into any
number of smaller tables without loss of data.

6NF (Sixth Normal Form) Proposed


6th Normal Form is not yet standardized.

Eg: of Normal Forms

Example of 1NF:

Suppose a company wants to store the names and contact details of its employees. It
creates a table that looks like this:

emp_id emp_name emp_address emp_mobile


101 Herschel New Delhi 8912312390
8812121212
102 Jon Kanpur
9900012222
103 Ron Chennai 7778881212
9990000123
104 Lester Bangalore
8123450987

This table is not in 1NF as the rule says “each attribute of a table must have atomic (single)
values”, the emp_mobile values for employees Jon & Lester violates that rule.

To make the table complies with 1NF we should have the data like this:

emp_id emp_name emp_address emp_mobile


101 Herschel New Delhi 8912312390
102 Jon Kanpur 8812121212

102 Jon Kanpur 9900012222

103 Ron Chennai 7778881212

104 Lester Bangalore 9990000123

104 Lester Bangalore 8123450987

Example of 2NF:
Suppose a school wants to store the data of teachers and the subjects they teach. They
create a table that looks like this: Since a teacher can teach more than one subjects, the
table can have multiple rows for a same teacher.

teacher_id subject teacher_age


111 Maths 38
111 Physics 38
222 Biology 38
333 Physics 40
333 Chemistry 40
Candidate Keys: {teacher_id, subject}
Non prime attribute: teacher_age

The table is in 1 NF because each attribute has atomic values. However, it is not in 2NF
because non prime attribute teacher_age is dependent on teacher_id alone which is a proper
subset of candidate key. This violates the rule for 2NF as the rule says “no non-prime
attribute is dependent on the proper subset of any candidate key of the table”.

To make the table complies with 2NF we can break it in two tables like this:

teacher_details table:

teacher_id teacher_age
111 38
222 38
333 40
teacher_subject table:

teacher_id subject
111 Maths
111 Physics
222 Biology
333 Physics
333 Chemistry
Now the tables comply with Second normal form (2NF).

Example of 3NF:
Suppose a company wants to store the complete address of each employee, they create a
table named employee_details that looks like this:

emp_id emp_name emp_zip emp_state emp_city emp_district


1001 John 282005 UP Agra Dayal Bagh
1002 Ajeet 222008 TN Chennai M-City
1006 Lora 282007 TN Chennai Urrapakkam
1101 Lilly 292008 UK Pauri Bhagwan
1201 Steve 222999 MP Gwalior Ratan
Super keys: {emp_id}, {emp_id, emp_name}, {emp_id, emp_name, emp_zip}…etc
Candidate Keys: {emp_id}
Non-prime attributes: all attributes except emp_id are non-prime as they are not part of
any candidate keys.

Here, emp_state, emp_city & emp_district dependent on emp_zip. And, emp_zip is


dependent on emp_id that makes non-prime attributes (emp_state, emp_city & emp_district)
transitively dependent on super key (emp_id). This violates the rule of 3NF.

To make this table complies with 3NF we have to break the table into two tables to remove
the transitive dependency:

employee table:

emp_id emp_name emp_zip


1001 John 282005
1002 Ajeet 222008
1006 Lora 282007
1101 Lilly 292008
1201 Steve 222999
employee_zip table:

emp_zip emp_state emp_city emp_district


282005 UP Agra Dayal Bagh
222008 TN Chennai M-City
282007 TN Chennai Urrapakkam
292008 UK Pauri Bhagwan
222999 MP Gwalior Ratan

Boyce Codd normal form (BCNF)


It is an advance version of 3NF that’s why it is also referred as 3.5NF. BCNF is stricter than
3NF. A table complies with BCNF if it is in 3NF and for every functional dependency X->Y, X
should be the super key of the table.

Example: Suppose there is a company wherein employees work in more than one
department. They store the data like this:

dept_no_
emp_id emp_nationality emp_dept dept_type
of_emp
1001 Austrian Production and planning D001 200
1001 Austrian stores D001 250
1002 American design and technical support D134 100
1002 American Purchasing department D134 600
Functional dependencies in the table above:
emp_id -> emp_nationality
emp_dept -> {dept_type, dept_no_of_emp}

Candidate key: {emp_id, emp_dept}

The table is not in BCNF as neither emp_id nor emp_dept alone are keys. To make the table
comply with BCNF we can break the table in three tables as:

emp_nationality table:
emp_id emp_nationality
1001 Austrian
1002 American
emp_dept table:

emp_dept dept_type dept_no_of_emp


Production and planning D001 200
stores D001 250
design and technical support D134 100
Purchasing department D134 600
emp_dept_mapping table:

emp_id emp_dept
1001 Production and planning
1001 stores
1002 design and technical support
1002 Purchasing department
Functional dependencies:
emp_id -> emp_nationality
emp_dept -> {dept_type, dept_no_of_emp}

Candidate keys:
For first table: emp_id
For second table: emp_dept
For third table: {emp_id, emp_dept}

This is now in BCNF as in both the functional dependencies left side part is a key.

TRANSACTION MANAGEMENT

A transaction can be defined as a group of tasks. A single task is the minimum


processing unit which cannot be divided further.
ACID Properties
A transaction is a very small unit of a program and it may contain several lowlevel
tasks. A transaction in a database system must
maintain Atomicity, Consistency, Isolation, and Durability − commonly known as
ACID properties − in order to ensure accuracy, completeness, and data integrity.

• Atomicity − This property states that a transaction must be treated as an atomic unit,
that is, either all of its operations are executed or none. There must be no state in a
database where a transaction is left partially completed. States should be defined either
before the execution of the transaction or after the execution/abortion/failure of the
transaction.

• Consistency − The database must remain in a consistent state after any transaction. No
transaction should have any adverse effect on the data residing in the database. If the
database was in a consistent state before the execution of a transaction, it must remain
consistent after the execution of the transaction as well.

• Isolation − In a database system where more than one transaction are being executed
simultaneously and in parallel, the property of isolation states that all the transactions will
be carried out and executed as if it is the only transaction in the system. No transaction
will affect the existence of any other transaction.

• Durability − The database should be durable enough to hold all its latest updates even if
the system fails or restarts. If a transaction updates a chunk of data in a database and
commits, then the database will hold the modified data. If a transaction commits but the
system fails before the data could be written on to the disk, then that data will be updated
once the system springs back into action.

States of Transactions
A transaction in a database can be in one of the following states −
• Active − In this state, the transaction is being executed. This is the initial state of every
transaction.

• Partially Committed − When a transaction executes its final operation, it is said to be in


a partially committed state.

• Failed − A transaction is said to be in a failed state if any of the checks made by the
database recovery system fails. A failed transaction can no longer proceed further.

• Aborted − If any of the checks fails and the transaction has reached a failed state, then
the recovery manager rolls back all its write operations on the database to bring the
database back to its original state where it was prior to the execution of the transaction.
Transactions in this state are called aborted. The database recovery module can select one
of the two operations after a transaction aborts −

o Re-start the transaction

o Kill the transaction

• Committed − If a transaction executes all its operations successfully, it is said to be


committed. All its effects are now permanently established on the database system.

Deadlock :
In a multi-process system, deadlock is an unwanted situation that arises in a shared
resource environment, where a process indefinitely waits for a resource that is held by
another process.

For example, assume a set of transactions {T0, T1, T2, ...,Tn}. T0 needs a resource X to
complete its task. Resource X is held by T1, and T1 is waiting for a resource Y, which is
held by T2. T2 is waiting for resource Z, which is held by T0. Thus, all the processes wait
for each other to release resources. In this situation, none of the processes can finish
their task. This situation is known as a deadlock.

Two-Phase Locking 2PL

This locking protocol divides the execution phase of a transaction into three parts. In the
first part, when the transaction starts executing, it seeks permission for the locks it
requires. The second part is where the transaction acquires all the locks. As soon as the
transaction releases its first lock, the third phase starts. In this phase, the transaction
cannot demand any new locks; it only releases the acquired locks.
Two-phase locking has two phases, one is growing, where all the locks are being
acquired by the transaction; and the second phase is shrinking, where the locks held by
the transaction are being released.

To claim an exclusive (write) lock, a transaction must first acquire a shared (read) lock
and then upgrade it to an exclusive lock.

Data Warehousing - Overview


A data warehouses provides us generalized and consolidated data in multidimensional
view. Along with generalized and consolidated view of data, a data warehouses also
provides us Online Analytical Processing (OLAP) tools. These tools help us in interactive
and effective analysis of data in a multidimensional space. This analysis results in data
generalization and data mining.

• A data warehouse is a database, which is kept separate from the organization's operational
database.

• There is no frequent updating done in a data warehouse.

• It possesses consolidated historical data, which helps the organization to analyze its
business.

• A data warehouse helps executives to organize, understand, and use their data to take
strategic decisions.

• Data warehouse systems help in the integration of diversity of application systems.

• A data warehouse system helps in consolidated historical data analysis.

Metadata
Metadata is simply defined as data about data. The data that are used to represent
other data is known as metadata. For example, the index of a book serves as a
metadata for the contents in the book. In other words, we can say that metadata is
the summarized data that leads us to the detailed data.

In terms of data warehouse, we can define metadata as following −

• Metadata is a road-map to data warehouse.

• Metadata in data warehouse defines the warehouse objects.

• Metadata acts as a directory. This directory helps the decision support system to locate the
contents of a data warehouse.

Data Warehouse Features


The key features of a data warehouse are discussed below −

• Subject Oriented − A data warehouse is subject oriented because it provides information


around a subject rather than the organization's ongoing operations. These subjects can be
product, customers, suppliers, sales, revenue, etc. A data warehouse does not focus on
the ongoing operations, rather it focuses on modelling and analysis of data for decision
making.

• Integrated − A data warehouse is constructed by integrating data from heterogeneous


sources such as relational databases, flat files, etc. This integration enhances the effective
analysis of data.

• Time Variant − The data collected in a data warehouse is identified with a particular time
period. The data in a data warehouse provides information from the historical point of
view.

• Non-volatile − Non-volatile means the previous data is not erased when new data is
added to it. A data warehouse is kept separate from the operational database and
therefore frequent changes in operational database is not reflected in the data warehouse.

Data Warehouse Applications


Data warehouse helps business executives to organize, analyze, and use their data
for decision making. Data warehouses are widely used in the following fields −

• Financial services

• Banking services

• Consumer goods

• Retail sectors

• Controlled manufacturing
Types of Data Warehouse
Information processing, analytical processing, and data mining are the three types of
data warehouse applications.

• Information Processing − A data warehouse allows to process the data stored in it. The
data can be processed by means of querying, basic statistical analysis, reporting using
crosstabs, tables, charts, or graphs.

• Analytical Processing − A data warehouse supports analytical processing of the


information stored in it. The data can be analyzed by means of basic OLAP operations,
including slice-and-dice, drill down, drill up, and pivoting.

• Data Mining − Data mining supports knowledge discovery by finding hidden patterns and
associations, constructing analytical models, performing classification and prediction.
These mining results can be presented using the visualization tools.

Functions of Data Warehouse Tools and Utilities


The following are the functions of data warehouse tools and utilities −

• Data Extraction − Involves gathering data from multiple heterogeneous sources.

• Data Cleaning − Involves finding and correcting the errors in data.

• Data Transformation − Involves converting the data from legacy format to warehouse
format.

• Data Loading − Involves sorting, summarizing, consolidating, checking integrity, and


building indices and partitions.

• Refreshing − Involves updating from data sources to warehouse.

Anda mungkin juga menyukai