Anda di halaman 1dari 45

UNIT -I

Concept of DBMS

1. Define Database System:


A database-management system (DBMS) is a collection of interrelated data and a
set of programs to access those data.
The collection of data, usually referred to as the database, contains information
relevant to an enterprise.
The primary goal of a DBMS is to provide a way to store and retrieve database
information that is both convenient and efficient.

2. Advantages of Database: Using a DBMS to manage data has many


advantages:
Data independence: Application programs should be as independent as
possible from details of data representation and storage. The DBMS can
provide an abstract view of the data to insulate application code from such
details.
Efficient data access: A DBMS utilizes a variety of sophisticated
techniques to store and retrieve data efficiently.
Data integrity and security: If data is always accessed through the
DBMS, the DBMS can enforce integrity constraints on the data. The DBMS
can enforce access controls that govern what data is visible to different
classes of users.
Data administration: When several users share the data, centralizing
the administration of data can offer significant improvements. Experienced
professionals, who understand the nature of the data being managed, and
how different groups of users use it, can be responsible for organizing the
data representation to minimize redundancy and for fine-tuning the
storage of the data to make retrieval efficient.
Concurrent access and crash recovery: A DBMS schedules concurrent
accesses to the data in such a manner that users can think of the data as
being accessed by only one user at a time. Further, the DBMS protects
users from the effects of system failures.
Reduced application development time: Clearly, the DBMS supports
many important functions that are common to many applications
accessing data stored in the DBMS. This, in conjunction with the high-level
interface to the data, facilitates quick development of applications. Such
applications are also likely to be more robust than applications developed
from scratch because many important tasks are handled by the DBMS
instead of being implemented by the application.

3. Data Abstraction: The data in a DBMS is described at three levels of


abstraction. The database description consists of a schema at each of these
three levels of abstraction: the conceptual, physical, and external/view level.
External/ View level - This is the highest level in data abstraction. At this level users
see the data in the form of rows and columns. This level illustrates the users how the
data is stored in terms of tables and relations. Users view full or partial data based on
the business requirement. The users will have different views here, based on their
levels of access rights. For example, student will not have access to see Lecturers
salary details, one employee will not have access to see other employees details,
unless he is a manager. At this level, one can access the data from database and
perform some calculations based on the data. For example calculate the tax from the
salary of employee, calculate CGPA of a Student, Calculate age of a person from his
Date of Birth etc. These users can be real users or any programs.

Logical/ Conceptual level - This is the next level of abstraction. It describes the
actual data stored in the database in the form of tables and relates them by means of
mapping. This level will not have any information on what a user views at external
level. This level will have all the data in the database. Any changes done in this level
will not affect the external or physical levels of data. That is any changes to the table
structure or the relation will not modify the data that the user is viewing at the
external view or the storage at the physical level. For example, suppose we have
added a new column skills which will not modify the external view data on which
the user was viewing Ages of the students. Similarly, it will have space allocated for
Skills in the physical memory, but it will not modify the space or address of Date of
Birth (using which Age will be derived) in the memory. Hence external and physical
independence is achieved.

Physical level - This is the lowest level in data abstraction. This level describes how
the data is actually stored in the physical memory like magnetic tapes, hard disks etc.
In this level the file organization methods like hashing, sequential, B+ tree comes into
picture. At this level, developer would know the requirement, size and accessing
frequency of the records clearly. So designing this level will not be much complex for
him.

4. Data Models
Depending on the levels of data we are modeling, we have divided data models into 3
categories Object Based, Physical and Record based Data models. Basically physical data
model represents the data at data layer or internal layer. Object and Record based data models
are modeled based on the data at the application and user level. They are basically
responsible for designing various objects of the database, and their mappings. They are
further divided into different categories as shown in below diagram.

4.1 Object Based Data Model:


It is designed using the entities in the real world, attributes of each entity and their
relationship. It picks up each thing/object in the real world which is involved in the
requirement.
There are two types of object based data Models Entity Relationship Model and Object
oriented data model. ER data model is one of the important data model which forms the basis
for the all the designs in the database world. It defines the mapping between the entities in the
database. Object oriented data model, along with the mapping between the entities, describes
the state of each entity and the tasks performed by them.
4.1.1 Entity Relationship Data Models
Entities or real world objects are represented in a rectangular box. Their attributes are
represented in ovals. Primary keys of entities are underlined. All the entities are mapped
using diamonds. This is one of the methods of representing ER model. There are many
different forms of representation.
4.1.2 Object Oriented Data Models
This data model is another method of representing real world objects. It considers each object
in the world as objects and isolates it from each other. It groups its related functionalities
together and allows inheriting its functionality to other related sub-groups. Since each class
binds its attributes and its functionality, it is same as representing the real world object. We
can see each object as a real entity. Hence it is more understandable. It is an approach for
solving the requirement. It is not a technology. Hence it fails to put it in the database
management systems.

4.2 Record based Data Models


These data models are based on application and user levels of data. They are modeled
considering the logical structure of the objects in the database. This data models defines the
actual relationship between the data in the entities.
There are 3 types of record based data models defined so far- Hierarchical, Network and
Relational data models. Most widely used record based data model is relational data model.
Other two are not widely used.
4.2.1 Hierarchical Data Models
In this data model, the entities are represented in a hierarchical fashion. Here we identify a
parent entity, and its child entity. Again we drill down to identify next level of child entity
and so on. This model can be imagined as folders inside a folder!
In our example above, it is diagrammatically represented as below:
It can also be imagined as root like structure. This model will have only one main root. It then
branches into sub-roots, each of which will branch again. This type of relationship is best
defined for 1:N type of relationships. E.g.; One company has multiple departments (1:N), one
company has multiple suppliers (1:N),one department has multiple employees (1:N), each
department has multiple projects(1:N) . it fails to handle many to many relationships
efficiently. It results in redundancy and confusion. It can handle only parent-child kind of
relationship.
4.2.2 Network Data Models
This is the enhanced version of hierarchical data model. It is designed to address the
drawbacks of the hierarchical model. It helps to address M:N relationship. This data model is
also represented as hierarchical, but this model will not have single parent concept. Any child
in the tree can have multiple parents here.
Let us revisit our company example. A company has different projects and departments in the
company own those projects. Even suppliers of the company give input for the project. Here
Project has multiple parents and each department and supplier have multiple projects. This is
represented as shown below. Basically, it forms a network like structure between the entities,
hence the name.

It would be little difficult to design the relationship between the entities, since all the entities
are related in some way. It requires thorough practice and knowledge about the designing.
4.2.3 Relational Data Models
This model is based on the mathematical concepts of set theory. It considers the tables as a
two dimensional table with rows and columns. It is least bothered about the physical storage
of structure and data in the memory. It considers only the data and how it can be represented
in the form of rows and columns, and the way it can establish the relation between other
tables.
A relational data model revolves around 5 important rules.
1. Order of rows / records in the table is not important. For example, displaying the
records for Joseph is independent of displaying the records for Rose or Mathew in
Employee table. It does not change the meaning or level of them. Each record in
the table is independent of other. Similarly, order of columns in the table is not
important. That means, the value in each column for a record is independent of
other. For example, representing DEPT_ID at the end or at the beginning in the
employee table does not have any affect.
2. Each record in the table is unique. That is there is no duplicate record exists in the
table. This is achieved by the use of primary key or unique constraint.
3. Each column/attribute will have single value in a row. For example, in Department
table, DEPT_NAME column cannot have Accounting and Quality together in a
single cell. Both has to be in two different rows as shown above.
4. All attributes should be from same domain. That means each column should have
meaningful value. For example, Age column cannot have dates in it. It should
contain only valid numbers to represent individuals age. Similarly, name columns
should have valid names, Date columns should have proper dates.
5. Table names in the database should be unique. In the database, same schema cannot
contain two or more tables with same name. But two tables with different names
can have same column names. But same column name is not allowed in the same
table.
Examine below table structure for Employee, Department and Project and see if it satifies
relational data model rules.

4.3 Physical Data Model: Physical data model represent the model where it describes how
data are stored in computer memory, how they are scattered and ordered in the memory, and
how they would be retrieved from memory. Basically physical data model represents the data
at data layer or internal layer.

5. Instances and Schemas


Databases change over time as information is inserted and deleted. The collection of
information stored in the database at a particular moment is called an instance of the
database. The overall design of the database is called the database schema. Schemas
are changed infrequently, if at all.
The concept of database schemas and instances can be understood by analogy to a
program written in a programming language. A database schema corresponds to the
variable declarations (along with associated type definitions) in a program. Each
variable has a particular value at a given instant. The values of the variables in a
program at a point in time correspond to an instance of a database schema.
Database systems have several schemas, partitioned according to the levels of
abstraction. The physical schema describes the database design at the physical level,
while the logical schema describes the database design at the logical level.Adatabase
may also have several schemas at the view level, sometimes called subschemas, that
describe different views of the database.

6. Data Independence
The three-schema architecture can be used to explain the concept of data independence,
which can be defined as the capacity to change the schema at one level of a database system
without having to change the schema at the next higher level. We can define two types of data
independence:
Logical data independence is the capacity to change the conceptual schema without
having to change external schemas or application programs. We may change the
conceptual schema to expand the database (by adding a record type or data item), or
to reduce the database (by removing a record type or data item). In the latter case,
external schemas that refer only to the remaining data should not be affected. Only the
view definition and the mappings need be changed in a DBMS that supports logical
data independence.
Physical data independence is the capacity to change the internal schema without
having to change the conceptual (or external) schemas. Changes to the internal
schema may be needed because some physical files had to be reorganizedfor
example, by creating additional access structuresto improve the performance of
retrieval or update. If the same data as before remains in the database, we should not
have to change the conceptual schema.

7. Database Languages: A database system provides a data definition language to specify


the database schema and a data manipulation language to express database queries and
updates. In practice, the data definition and data manipulation languages are not two separate
languages; instead they simply form parts of a single database language, such as the widely
used SQL language.

7.1 Data Description Language (DDL): As the name suggests, this language is used to
define the various types of data in the database and their relationship with each other. The
basic functions performed by DDL are: -
Create tables, files, databases and data dictionaries.
Specify the storage structure of each table on disk.
Integrity constraints on various tables.
Security and authorization information of each table.
Specify the structure of each table.
Overall design of the Database.

7.2 Data Manipulation Language (DML): A language that enables users to access or
manipulate data (retrieve, insert, update, delete) as organized by a certain Data Model is
called the Data Manipulation Language (DML). It can be of two types: -
Procedural DML - It describes what data is needed and how to get it. For example: -
Relational Algebra.
Non Procedural DML - It describes what data is needed without specifying how to
get it. For example: - Relational calculus.

8. Database Users: Database users are the one who really use and take the benefits of
database. There will be different types of users depending on their need and way of accessing
the database.
1. Application Programmers - They are the developers who interact with the
database by means of DML queries. These DML queries are written in the
application programs like C, C++, JAVA, Pascal etc. These queries are converted
into object code to communicate with the database. For example, writing a C
program to generate the report of employees who are working in particular
department will involve a query to fetch the data from database. It will include a
embedded SQL query in the C Program.
2. Sophisticated Users - They are database developers, who write SQL queries to
select/insert/delete/update data. They do not use any application or programs to
request the database. They directly interact with the database by means of query
language like SQL. These users will be scientists, engineers, analysts who
thoroughly study SQL and DBMS to apply the concepts in their requirement. In
short, we can say this category includes designers and developers of DBMS and
SQL.
3. Specialized Users - These are also sophisticated users, but they write special
database application programs. They are the developers who develop the complex
programs to the requirement.
4. Naive Users - these are the users who use the existing application to interact with
the database. For example, online library system, ticket booking systems, ATMs etc
which has existing application and users use them to interact with the database to
fulfill their requests.

9. Database Administrator
One of the main reasons for using DBMSs is to have central control of both the data and the
programs that access those data. A person who has such central control over the system is
called a database administrator (DBA). The functions of a DBA include:
Schema definition. The DBA creates the original database schema by executing a set
of data definition statements in the DDL. Storage structure and access-method
definition.
Schema and physical-organization modification. The DBA carries out changes to
the schema and physical organization to reflect the changing needs of the
organization, or to alter the physical organization to improve performance.
Granting of authorization for data access. By granting different types of
authorization, the database administrator can regulate which parts of the database
various users can access. The authorization information is kept in a special system
structure that the database system consults whenever someone attempts to access the
data in the system.
Routine maintenance. Examples of the database administrators routine maintenance
activities are: Periodically backing up the database, either onto tapes or onto remote
servers, to prevent loss of data in case of disasters such as flooding. Ensuring that
enough free disk space is available for normal operations, and upgrading disk space as
required. Monitoring jobs running on the database and ensuring that performance is
not degraded by very expensive tasks submitted by some users.

10. Overall System Structure


DBMS are very large and typically divided into modules. Some of the services are also
provided by the operating system of the host computer. The following is an example of what
the structure might be:
Query processor
o DML compiler Translates the Data Manipulation Languages into query Engine
instructions. It might also do optimization for query.
o Embedded DML precompiler Converts the DML statements in the application
program to normal procedure calls in the host language.
o DDL interpreter Interprets DDL statements and records them in a set of tables
containing metadata
o Query Evaluation Engine
Storage Manager
o Authorization and integrity manager Tests for the satisfaction of integrity
constraints and Checks the authority of user to perform various action.
o Transaction Manager Ensures the database remains in a consistent (correct)
state despite system failures.
o File manager Responsible for the allocation of space on the disk storage
system.
o Buffer manager Manages the data coming into and out of the system,
Including the caching of data.
Data structures
o Data files the database itself.
o Data dictionary the metadata about the structure of the database. Actually, this
is a critical element in the DBMS!
o Indices
Used to provide fast access to the data.
o Statistical data
The query processor uses this to optimize queries.
11. Entity
An entity can be a real-world object, either animate or inanimate, that can be easily
identifiable. For example, in a school database, students, teachers, classes, and courses
offered can be considered as entities. All these entities have some attributes or properties that
give them their identity.

Entity Sets
An entity set is a collection of similar types of entities. An entity set may contain entities with
attribute sharing similar values. For example, a Students set may contain all the students of a
school; likewise a Teachers set may contain all the teachers of a school from all faculties.
Entity sets need not be disjoint.

Attributes
Entities are represented by means of their properties, called attributes. All attributes have
values. For example, a student entity may have name, class, and age as attributes.
There exists a domain or range of values that can be assigned to attributes. For example, a
student's name cannot be a numeric value. It has to be alphabetic. A student's age cannot be
negative, etc.
Types of Attributes
Simple attribute Simple attributes are atomic values, which cannot be divided
further. For example, a student's phone number is an atomic value of 10 digits.
Composite attribute Composite attributes are made of more than one simple
attribute. For example, a student's complete name may have first_name and
last_name.
Derived attribute Derived attributes are the attributes that do not exist in the
physical database, but their values are derived from other attributes present in the
database. For example, average_salary in a department should not be saved directly in
the database, instead it can be derived. For another example, age can be derived from
data_of_birth.
Single-value attribute Single-value attributes contain single value. For example
Social_Security_Number.
Multi-value attribute Multi-value attributes may contain more than one values.
For example, a person can have more than one phone number, email_address, etc.
These attribute types can come together in a way like
simple single-valued attributes
simple multi-valued attributes
composite single-valued attributes
composite multi-valued attributes

12. Relationship
The association among entities is called a relationship. For example, an employee works_at a
department, a student enrolls in a course. Here, Works_at and Enrolls are called relationships.
Relationship Set
A set of relationships of similar type is called a relationship set. Like entities, a relationship
too can have attributes. These attributes are called descriptive attributes.
Degree of Relationship
The number of participating entities in a relationship defines the degree of the relationship.
Binary = degree 2
Ternary = degree 3
n-ary = degree
13. Mapping Cardinalities
Cardinality defines the number of entities in one entity set, which can be associated with the
number of entities of other set via relationship set.
One-to-one One entity from entity set A can be associated with at most one entity
of entity set B and vice versa.

One-to-many One entity from entity set A can be associated with more than one
entities of entity set B however an entity from entity set B, can be associated with at
most one entity.

Many-to-one More than one entities from entity set A can be associated with at
most one entity of entity set B, however an entity from entity set B can be associated
with more than one entity from entity set A.
Many-to-many One entity from A can be associated with more than one entity
from B and vice versa.

14. Entity Relationship Diagram:


An entity relationship model, also called an entity-relationship (ER) diagram, is a graphical
representation of entities and their relationships to each other, typically used in computing in
regard to the organization of data within databases or information systems.
Entity
Entities are represented by means of rectangles. Rectangles are named with the entity set they
represent.

Attributes
Attributes are the properties of entities. Attributes are represented by means of ellipses. Every
ellipse represents one attribute and is directly connected to its entity (rectangle).

If the attributes are composite, they are further divided in a tree like structure. Every node is
then connected to its attribute. That is, composite attributes are represented by ellipses that
are connected with an ellipse.
Multivalued attributes are depicted by double ellipse.

Derived attributes are depicted by dashed ellipse.


Relationship
Relationships are represented by diamond-shaped box. Name of the relationship is written
inside the diamond-box. All the entities (rectangles) participating in a relationship, are
connected to it by a line.
Binary Relationship and Cardinality
A relationship where two entities are participating is called a binary relationship.
Cardinality is the number of instance of an entity from a relation that can be associated with
the relation.
One-to-one When only one instance of an entity is associated with the relationship,
it is marked as '1:1'. The following image reflects that only one instance of each entity
should be associated with the relationship. It depicts one-to-one relationship.

One-to-many When more than one instance of an entity is associated with a


relationship, it is marked as '1:N'. The following image reflects that only one instance
of entity on the left and more than one instance of an entity on the right can be
associated with the relationship. It depicts one-to-many relationship.
Many-to-one When more than one instance of entity is associated with the
relationship, it is marked as 'N:1'. The following image reflects that more than one
instance of an entity on the left and only one instance of an entity on the right can be
associated with the relationship. It depicts many-to-one relationship.

Many-to-many The following image reflects that more than one instance of an
entity on the left and more than one instance of an entity on the right can be
associated with the relationship. It depicts many-to-many relationship.

Participation Constraints
Total Participation Each entity is involved in the relationship. Total participation is
represented by double lines.
Partial participation Not all entities are involved in the relationship. Partial
participation is represented by single lines.

15. Super Key: Super Key is an attribute or a composite attribute which functionally
determines all of the entitys attributes. In other words, a superkey uniquely identifies each
entity in a table.
Considering the above STUDENT Table, any of the following can be identified as the
tables superkey:
1. STU_NUM As it determines all other attributes in the table.
2. STU_NUM, STU_LNAME As the combination also determines all remaining attributes.
3. STU_NUM, STU_LNAME, STU_FNAME As this combination also determines all
remaining attributes.
In fact, STU_NUM whenever consider either alone or in any possible combination
with other attributes, realizes a superkey. This is even true if the additional attributes are
having redundant values in the tables record.

Candidate Key: Candidate Key is a superkey whose values are not repeated in the table
records. In other words, when the values in a superkey are not repeated in the tables records,
then such a key is called a candidate key.
Considering the above STUDENT Table, following facts can be revealed The
superkey attribute STU_NUM can also be termed as a candidate key because the values for
STU_NUM are not redundant in the given example The composite superkey (STU_NUM,
STU_LNAME) cannot be considered as a candidate key because STU_NUM by itself is a
candidate key The combination (STU_LNAME, STU_FNAME, STU_INIT, STU_PHONE)
can also be considered as a candidate key provided that the values under the combination are
not be repeated in the tables records.

Primary Key: Primary Key is a candidate key which doesnt have repeated values nor does it
comes with a NULL value in the table. A primary key can uniquely identifies each row in any
table, thus a primary key is mainly utilized for record searching.
A primary key in any table is both a superkey as well as a candidate key.
It is possible to have more than one choice of candidate key in a particular table
example. In that case, the selection of the primary key would be driven by the
designers choice or by end user requirements.

16. Transform ER Diagram into Tables


ER diagram gives us the good knowledge about the requirement and the mapping of
the entities in it, we can easily convert them as tables and columns. i.e.; using ER diagrams
one can easily created relational data model, which nothing but the logical view of the
database.
There are various steps involved in converting it into tables and columns. Each type
of entity, attribute and relationship in the diagram takes their own depiction here. Consider
the ER diagram below and will see how it is converted into tables, columns and mappings.
The basic rule for converting the ER diagrams into tables is
Convert all the Entities in the diagram to tables.
All the entities represented in the rectangular box in the ER diagram become independent
tables in the database. In the below diagram, STUDENT, COURSE, LECTURER and
SUBJECTS forms individual tables.
All single valued attributes of an entity is converted to a column of the table
All the attributes, whose value at any instance of time is unique, are considered as columns of
that table. In the STUDENT Entity, STUDENT_ID, STUDENT_NAME form the columns of
STUDENT table. Similarly, LECTURER_ID, LECTURER_NAME form the columns of
LECTURER table. And so on.
Key attribute in the ER diagram becomes the Primary key of the table.
In diagram above, STUDENT_ID, LECTURER_ID, COURSE_ID and SUB_ID are the key
attributes of the entities. Hence we consider them as the primary keys of respective table.
Declare the foreign key column, if applicable.
In the diagram, attribute COURSE_ID in the STUDENT entity is from COURSE entity.
Hence add COURSE_ID in the STUDENT table and assign it foreign key constraint.
COURSE_ID and SUBJECT_ID in LECTURER table forms the foreign key column. Hence
by declaring the foreign key constraints, mapping between the tables are established.
Any multi-valued attributes are converted into new table.
A hobby in the Student table is a multivalued attribute. Any student can have any number of
hobbies. So we cannot represent multiple values in a single column of STUDENT table. We
need to store it separately, so that we can store any number of hobbies, adding/ removing /
deleting hobbies should not create any redundancy or anomalies in the system. Hence we
create a separate table STUD_HOBBY with STUDENT_ID and HOBBY as its columns. We
create a composite key using both the columns.
Any composite attributes are merged into same table as different columns.
In the diagram above, Student Address is a composite attribute. It has Door#, Street, City,
State and Pin. These attributes are merged into STUDENT table as individual columns.
One can ignore derived attribute, since it can be calculated at any time.
In the STUDENT table, Age can be derived at any point of time by calculating the difference
between DateOfBirth and current date. Hence we need not create a column for this attribute.
It reduces the duplicity in the database.
These are the very basic rules of converting ER diagram into tables and columns, and
assigning the mapping between the tables. Table structure at this would be as below:

17. Generalization
Going up in this structure is called generalization, where entities are clubbed together to
represent a more generalized view. For example, a particular student named Mira can be
generalized along with all the students. The entity shall be a student, and further, the student
is a person. The reverse is calledspecialization where a person is a student, and that student is
Mira.
As mentioned above, the process of generalizing entities, where the generalized entities
contain the properties of all the generalized entities, is called generalization. In
generalization, a number of entities are brought together into one generalized entity based on
their similar characteristics. For example, pigeon, house sparrow, crow and dove can all be
generalized as Birds.

Specialization
Specialization is the opposite of generalization. In specialization, a group of entities is
divided into sub-groups based on their characteristics. Take a group Person for example. A
person has name, date of birth, gender, etc. These properties are common in all persons,
human beings. But in a company, persons can be identified as employee, employer,
customer, or vendor, based on what role they play in the company.
Similarly, in a school database, persons can be specialized as teacher, student, or a staff,
based on what role they play in school as entities.

Aggregation
Aggregation is a process when relation between two entities is treated as a single entity.
Look at below ER diagram of STUDENT, COURSE and SUBJECTS. Student attends the
Course, and he has some subjects to study. At the same time, Course offers some subjects.
Here a relation is defined on a relation. But ER diagram does not entertain such a relation. It
supports mapping between entities, not between relations.

If we look at STUDENT and COURSE from SUBJECTs point of view, it does not
differentiate both of them. It offers its subject to both of them. So what can we do here is,
merge STUDENT and COURSE as one entity. This process of merging is called aggregation.
It is completely different from generalization. In generalization, we merge entities of same
domain into one entity. In this case we merge related entities into one entity.
Here we have merged STUDENT and COURSE into one entity STUDENT_COURSE. This
new entity forms the mapping with SUBJECTS. The new entity STUDENT_COURSE, in
turn has two entities STUDENT and COURSE with Attends relationship.

18. Functional Dependencies


Functional dependency is a relationship that exists when one attribute uniquely determines
another attribute.
If R is a relation with attributes X and Y, a functional dependency between the attributes is
represented as X->Y, which specifies Y is functionally dependent on X. Here X is a
determinant set and Y is a dependent attribute. Each value of X is associated precisely with
one Y value.
Functional dependency in a database serves as a constraint between two sets of attributes.
Defining functional dependency is an important part of relational database design and
contributes to aspect normalization.

19. Normalization of Database


Database Normalization is a technique of organizing the data in the database. Normalization
is a systematic approach of decomposing tables to eliminate data redundancy and undesirable
characteristics like Insertion, Update and Deletion Anamolies. It is a multi-step process that
puts data into tabular form by removing duplicated data from the relation tables.
Normalization is used for mainly two purpose,
Eliminating redundant (useless) data.
Ensuring data dependencies make sense i.e data is logically stored.

20. First normal form (1NF): 1NF is a property of a relation in a relational database. A
relation is in first normal form if and only if the domain of each attribute contains
only atomic (indivisible) values, and the value of each attribute contains only a single value
from that domain.
For example consider a table which is not in First normal form
Student Table :
Student Age Subject

Adam 15 Biology, Maths

Alex 14 Maths

Stuart 17 Maths
In First Normal Form, any row must not have a column in which more than one value is
saved, like separated with commas. Rather than that, we must separate such data into multiple
rows.
Student Table following 1NF will be :
Student Age Subject

Adam 15 Biology

Adam 15 Maths

Alex 14 Maths

Stuart 17 Maths
Using the First Normal Form, data redundancy increases, as there will be many columns with
same data in multiple rows but each row as a whole will be unique.

Second normal form (2NF)


A table is in 2NF if it is in 1NF and every non-prime attribute of the table is dependent on the
whole of every candidate key. A non-prime attribute of a table is an attribute that is not a
part of any candidate key of the table.
In example of First Normal Form there are two rows for Adam, to include multiple
subjects that he has opted for. While this is searchable, and follows First normal form, it is an
inefficient use of space. Also in the above Table in First Normal Form, while the candidate
key is {Student, Subject}, Age of Student only depends on Student column, which is
incorrect as per Second Normal Form. To achieve second normal form, it would be helpful to
split out the subjects into an independent table, and match them up using the student names as
foreign keys.
New Student Table following 2NF will be :
Student Age

Adam 15

Alex 14

Stuart 17
In Student Table the candidate key will be Student column, because all other column
i.e Age is dependent on it.
New Subject Table introduced for 2NF will be :
Student Subject

Adam Biology

Adam Maths

Alex Maths

Stuart Maths
In Subject Table the candidate key will be {Student, Subject} column. Now, both the above
tables qualifies for Second Normal Form and will never suffer from Update Anomalies.
Although there are a few complex cases in which table in Second Normal Form suffers
Update Anomalies, and to handle those scenarios Third Normal Form is there.

Third normal form: Third normal form is a normal form that is used
in normalizing a database design to reduce the duplication of data and ensure referential
integrity by ensuring that (1) the entity is in second normal form, and (2) all the attributes in a
table are determined only by the candidate keys of that table and not by any non-prime
attributes.

For example, consider a table with following fields.


Student_Detail Table :
Student_id Student_name DOB Street City State Zip
In this table Student_id is Primary key, but street, city and state depends upon Zip. The
dependency between zip and other fields is called transitive dependency. Hence to
apply 3NF, we need to move the street, city and state to new table, with Zip as primary key.
New Student_Detail Table :
Student_id Student_name DOB Zip
Address Table :
Zip Street city state

21. EFCODD rules for RDBMS:


Dr Edgar F. Codd, after his extensive research on the Relational Model of database
systems, came up with twelve rules of his own, which according to him, a database must
obey in order to be regarded as a true relational database.
These rules can be applied on any database system that manages stored data using
only its relational capabilities. This is a foundation rule, which acts as a base for all the other
rules.
Rule 1: Information Rule
The data stored in a database, may it be user data or metadata, must be a value of some table
cell. Everything in a database must be stored in a table format.
Rule 2: Guaranteed Access Rule
Every single data element (value) is guaranteed to be accessible logically with a
combination of table-name, primary-key (row value), and attribute-name (column value). No
other means, such as pointers, can be used to access data.
Rule 3: Systematic Treatment of NULL Values
The NULL values in a database must be given a systematic and uniform treatment. This is a
very important rule because a NULL can be interpreted as one the following data is
missing, data is not known, or data is not applicable.
Rule 4: Active Online Catalog
The structure description of the entire database must be stored in an online catalog, known
as data dictionary, which can be accessed by authorized users. Users can use the same
query language to access the catalog which they use to access the database itself.
Rule 5: Comprehensive Data Sub-Language Rule
A database can only be accessed using a language having linear syntax that supports data
definition, data manipulation, and transaction management operations. This language can be
used directly or by means of some application. If the database allows access to data without
any help of this language, then it is considered as a violation.
Rule 6: View Updating Rule
All the views of a database, which can theoretically be updated, must also be updatable by
the system.
Rule 7: High-Level Insert, Update, and Delete Rule
A database must support high-level insertion, updation, and deletion. This must not be
limited to a single row, that is, it must also support union, intersection and minus operations
to yield sets of data records.
Rule 8: Physical Data Independence
The data stored in a database must be independent of the applications that access the
database. Any change in the physical structure of a database must not have any impact on
how the data is being accessed by external applications.
Rule 9: Logical Data Independence
The logical data in a database must be independent of its users view (application). Any
change in logical data must not affect the applications using it. For example, if two tables are
merged or one is split into two different tables, there should be no impact or change on the
user application. This is one of the most difficult rule to apply.
Rule 10: Integrity Independence
A database must be independent of the application that uses it. All its integrity constraints
can be independently modified without the need of any change in the application. This rule
makes a database independent of the front-end application and its interface.
Rule 11: Distribution Independence
The end-user must not be able to see that the data is distributed over various locations. Users
should always get the impression that the data is located at one site only. This rule has been
regarded as the foundation of distributed database systems.
Rule 12: Non-Subversion Rule
If a system has an interface that provides access to low-level records, then the interface must
not be able to subvert the system and bypass security and integrity constraints.

Unit- II
Concept of SQL
SQL Structured Query Language is a special-purpose programming
language designed for managing data held in a relational database management
system (RDBMS), or for stream processing in a relational data stream management
system (RDSMS).
Originally based upon relational algebra and tuple relational calculus, SQL consists of
a data definition language, data manipulation language, and Data Control Language. The
scope of SQL includes data insert, query, update and delete, schema creation and
modification, and data access control. Although SQL is often described as, and to a great
extent is, a declarative language (4GL), it also includes procedural elements.
SQL was one of the first commercial languages for Edgar F. Codd's relational model,
as described in his influential 1970 paper, "A Relational Model of Data for Large Shared Data
Banks." Despite not entirely adhering to the relational model as described by Codd, it became
the most widely used database language.
SQL became a standard of the American National Standards Institute (ANSI) in 1986,
and of the International Organization for Standardization (ISO) in 1987 Since then, the
standard has been revised to include a larger set of features. Despite the existence of such
standards, most SQL code is not completely portable among different database systems
without adjustments.
SQL was initially developed at IBM by Donald D. Chamberlin and Raymond F.
Boyce in the early 1970s. This version, initially called SEQUEL (Structured English Query
Language), was designed to manipulate and retrieve data stored in IBM's original quasi-
relational database management system, System R, which a group at Laboratory had
developed during the 1970s. The acronym SEQUEL was later changed to SQL because
"SEQUEL" was a trademark of the UK-based Hawker Siddeley aircraft company.
In the late 1970s, Relational Software, Inc. (now Oracle Corporation) saw the potential of the
concepts described by Codd, Chamberlin, and Boyce, and developed their own SQL-
based RDBMS with aspirations of selling it to the U.S. Navy, Central Intelligence Agency,
and other U.S. government agencies. In June 1979, Relational Software, Inc. introduced the
first commercially available implementation of SQL, Oracle V2 (Version2)
for VAX computers.
After testing SQL at customer test sites to determine the usefulness and practicality of
the system, IBM began developing commercial products based on their System R prototype
including System/38, SQL/DS, and DB2, which were commercially available in 1979, 1981,
and 1983, respectively.

1. Benefits of SQL:
1. SQL is an English-like language. It uses words such as select, insert, delete as part of its
command set.
2. SQL is non-procedural language we specify what information we require without
specifying how to get it. All SQL statements use the query optimizer a part of RDBMS to
determine the fastest means of retrieving the specified data.
3. SQL processes sets of records rather than a single record at a time. The most common form
of a set of records is a table.
4. SQL can be used by a range of users including DBAs, application programmers,
management personnel and many other types of end users.
5. SQL provides commands for a variety of tasks including
- Querying Data
- Inserting, updating, deleting rows in a table
- Creating, modifying and deleting database objects
- Controlling access to the database and database objects
- Guaranteeing database consistency.
The important benefits of SQL are portability and the capability to support a wide range of
commands so as to perform a variety of tasks.

2. Embedded SQL
1. SQL provides a powerful declarative query language. However, access to a database
from a general-purpose programming language is required because,
o SQL is not as powerful as a general-purpose programming language. There are
queries that cannot be expressed in SQL, but can be programmed in C,
Fortran, Pascal, Cobol, etc.
o Nondeclarative actions -- such as printing a report, interacting with a user, or
sending the result to a GUI -- cannot be done from within SQL.
2. The SQL standard defines embedding of SQL as embedded SQL and the language in
which SQL queries are embedded is referred as host language.
3. The result of the query is made available to the program one tuple (record) at a time.
4. To identify embedded SQL requests to the preprocessor, we use EXEC SQL
statement:
EXEC SQL embedded SQL statement END-EXEC

Note: A semi-colon is used instead of END-EXEC when SQL is embedded in C or


Pascal.
5. Embedded SQL statements: declare cursor, open, and fetch statements.
EXEC SQL
declare c cursor for
select cname, ccity
from deposit, customer
where deposit.cname = customer.cname and deposit.balance > :amount
END-EXEC

where amount is a host-language variable.


EXEC SQL open c END-EXEC. This statement causes the DB system to
execute the query and to save the results within a temporary relation.
A series of fetch statement are executed to make tuples of the results available
to the program.
EXEC SQL fetch c into :cn, :cc END-EXEC. The program can then
manipulate the variable cn and cc using the features of the host programming
language.
A single fetch request returns only one tuple. We need to use a while loop (or
equivalent) to process each tuple of the result until no further tuples (when a
variable in the SQLCA is set).
We need to use close statement to tell the DB system to delete the temporary
relation that held the result of the query.

6. Embedded SQL can execute any valid update, insert, or delete statements.
7. Dynamic SQL component allows programs to construct and submit SQL queries at
run time.

3. Naming the Objects and Parts and referring them:


Object Naming Rules
This section lists rules for the names of objects and their parts.
1. Names must be from 1 to 30 bytes long.
2. Names cannot contain quotation marks.
3. Names are not case-sensitive.
4. A name must begin with an alphabetic character.
5. Names can only contain alphanumeric characters and the characters _,$,and #. Oracle
Corporation discourages the use of $ and #.
6. A name cannot be an ORACLE reserved word. The list of reserved words can be
found in Appendix A.
7. The word DUAL should not be used as a name for an object or part. DUAL is the
name of a dummy table frequently accessed by SQL*Plus and SQL*Forms.
8. The ORACLE SQL language contains other keywords that have special meanings and
should not be used. The list of these keywords can be found in Appendix B.
9. A name must be unique across its namespace. Objects in the same namespace must
have different names. Refer to page 2-8 of ``ORACLE7 Server SQL Language
Reference Manual'' for a description of the namespaces. For example, a table, view
and packages share the same namespace but tables/views and indexes share a different
namespace. Each schema in the database has its own namespaces for the objects it
contains. This means, that two tables in different schemas are in different namespaces
and can have the same name.
10. A name can be enclosed in double quotes. Such names can contain any combination
of characters, ignoring rules 3 through 7 in this list. Such names can also include
spaces. Once you have given an object a name enclosed in double quotes, you must
use double quotes whenever you refer to the object.

4. Referring to Objects and Parts


When you refer to an object in a SQL statement, ORACLE considers the context of the SQL
statement and locates the object in the appropriate namespace. ORACLE always attempts to
resolve an object reference within the namespaces in your own schema before considering
namespaces outside your schema. You refer to the object with the following syntax:
[schema.]object[.part]
Where object is the name of the object. schema is the schema containing the object. The
schema qualifier allows you to refer to an object in a schema other than your own. Note that
you must be granted privileges to refer to objects in other schemas. If you omit this qualifier,
ORACLE assumes that you are referring to an object in your own schema. part is a part of
the object. This identifier allows you to refer to a part of a schema object, such as a column of
a table. Note that not all types of objects have parts. For example, this statement drops the
EMP table in the schema SCOTT: DROP TABLE scott.emp

5. Literals
The terms literal and constant value are synonymous and refer to a fixed data value. For
example, 'JACK', 'BLUE ISLAND', and '101' are all character literals; 5001 is a numeric
literal. Note that character literals are enclosed in single quotation marks, which enable
Oracle to distinguish them from schema object names.
Text
Text specifies a text or character literal. We can specify character literals with the 'text'
notation, national character literals with the N'text' notation.
A text literal must be enclosed in single quotation marks.
A text literal can have a maximum length of 4000 bytes.
Here are some valid text literals:
'Hello', 'ORACLE.dbs', 'Jackie''s raincoat', '09-MAR-98', N'nchar literal'
Integer
We must use the integer notation to specify an integer whenever integer appears in
expressions, conditions, SQL functions, and SQL statements.
An integer can store a maximum of 38 digits of precision.
Here are some valid integers: 7, +255
Number
We must use the number notation to specify values whenever number appears in
expressions, conditions, SQL functions, and SQL statements.
A number can store a maximum of 38 digits of precision.
If we have established a decimal character other than a period (.) with the
initialization parameter NLS_NUMERIC_CHARACTERS, you must specify numeric
literals with 'text' notation. In such cases, Oracle automatically converts the text literal
to a numeric value.
For example, if the NLS_NUMERIC_CHARACTERS parameter specifies a decimal
character of comma, specify the number 5.123 as follows: '5,123'
Here are some valid representations of number: 25, +6.34, 0.5,
25e-03

6. Datatypes
Each literal or column value manipulated in ORACLE has a datatype.
A value's datatype associates a fixed set of properties with the value.
These datatypes define the domain of values that each column can contain or each
argument can have.

Character Datatypes
char(size)
The CHAR datatype specifies a fixed length character string.
When you create a table with a CHAR column, you can supply the column length in
bytes. ORACLE subsequently ensures that all values stored in that column have this
length.
If you insert a value that is shorter than the column length, ORACLE blank-pads the
value to column length.
If you try to insert a value that is too long for the column, ORACLE returns an error.
The default length for a CHAR column is 1 byte. The maximum length of CHAR data
is 255 bytes.
ORACLE compares CHAR values using blank-padded comparison semantics.
varchar2(size)
The VARCHAR2 datatype specifies a variable length character string.
When you create a VARCHAR2 column, you can supply the maximum number of
bytes of data that it can hold.
ORACLE subsequently stores each value in the column exactly as you specify it,
provided it does not exceed the column's maximum length.
If you try to insert a value that exceeds this length, ORACLE returns an error.
You must specify a maximum length for a VARCHAR2 column.
The maximum length of VARCHAR2 data is 2000 bytes.
ORACLE compares VARCHAR2 values using non-padded comparison semantics.
varchar(size)
The VARCHAR datatype is currently synonymous with the VARCHAR2 datatype.
Oracle Corporation recommends that you use VARCHAR2 rather than VARCHAR.
In a future version of ORACLE, VARCHAR might be a separate datatype used for
variable length character strings compared with different comparison semantics.

Number Datatypes
number(p,s)] where p is the precision, or the total number of digits and s is the scale, or the
number of digits to the right of the decimal point.
You can use number(p) which is a fixed point number with precision p and scale 0, or
number which is a floating point number with precision 38.
If the scale is negative, the actual data is rounded to the specified number of places to
the left of the decimal point.
For example, a specification of (10,-2) means to round to hundreds - 7456123.89
would be stored as 7456100.
float(b)] - where b specifies a floating point number with binary precision b.
The precision b can range from 1 to 126 with a default value of 126.
To convert from binary to decimal precision, multiply b by 0.30103.
To convert from decimal to binary precision, multiply the decimal precision by
3.32193.
The maximum of 126 digits of binary precision is roughly equivalent to 38 digits of
decimal precision.
Long Datatypes
LONG columns store variable length character strings containing up to 2 gigabytes, or
2**31-1 bytes.
LONG columns have many of the characteristics of VARCHAR2 columns.
You can use LONG columns to store long text strings.
ORACLE uses LONG columns in the data dictionary to store the text of view
definitions.
The length of LONG values may also be limited by the memory available on your
computer.
You can reference LONG columns in SQL statements in these places:
o SELECT lists
o SET clauses of UPDATE statements
o VALUES clauses of INSERT statements
The use of LONG values are subject to some restrictions:
o A table cannot contain more than one LONG column.
o LONG columns cannot appear in integrity constraints (except for NULL and NOT
NULL constraints).
o LONG columns cannot be indexed.
o A procedure or stored function cannot accept a LONG argument.
Also, LONG columns cannot appear in certain parts of SQL statements:
o WHERE, GROUP BY, ORDER BY, or CONNECT BY clauses or with the
DISTINCT operator in SELECT statements.
o SQL functions (such as SUBSTR or INSTR).
o expressions or conditions.
o select lists of queries containing GROUP BY clauses.
o select lists of subqueries or queries combined by set operators.
o select lists of CREATE TABLE AS SELECT statements.

Date Datatype
The DATE datatype is used to store date and time information. Although date and
time information can be represented in both CHAR and NUMBER datatypes, the
DATE datatype has special associated properties. For each DATE value the following
information is stored: century year month day hour minute second
You cannot specify a date literal. To specify a date value, you must convert a character
or numeric value to a date value with the TO_DATE function.
ORACLE automatically converts character values that are in the default date format
into date values when they are used in date expressions.
The default date format is specified by the initialization parameter
NLS_DATE_FORMAT and is a string such as 'DD-MON-YY'. This example date
format includes two-digit number for the day of the month, an abbreviation of the
month name, and the last two digits of this year.
If you specify a date value without a time component, the default time is 12:00:00a.m.
(midnight). If you specify a date value without a date, the default date is the first day
of the current month.
The date function SYSDATE returns the current date and time.

Raw and Long Raw Datatype


The RAW and LONG RAW datatypes are used for byte-oriented data (for example,
binary data or byte strings) to store character strings, floating point data, and binary
data such as graphics images and digitized sound.
ORACLE returns RAW values as hexadecimal character values. RAW data can only
be stored and retrieved.
RAW is equivalent to VARCHAR2 and LONG RAW to LONG except that there is no
conversion between database and session character set.

7. Nulls: If a column in a row has no value, then the column is said to be null, or to contain a
null. Nulls can appear in columns of any datatype that are not restricted by NOT NULL
or PRIMARY KEY integrity constraints. Use a null when the actual value is not known
or when a value would not be meaningful.
Do not use null to represent a value of zero, because they are not equivalent. (Oracle
currently treats a character value with a length of zero as null. However, this may not
continue to be true in future releases, and Oracle recommends that you do not treat empty
strings the same as NULLs.) Any arithmetic expression containing a null always evaluates to
null. For example, null added to 10 is null. In fact, all operators (except concatenation) return
null when given a null operand.

8. Pseudocolumns
A pseudocolumn behaves like a table column, but is not actually stored in the table. You can
select from pseudocolumns, but you cannot insert, update, or delete their values.
Rowid: For each row in the database, the ROWID pseudocolumn returns a row's address.
Rownum: For each row returned by a query, the ROWNUM pseudocolumn returns a number
indicating the order in which Oracle selects the row from a table or set of joined rows.
The first row selected has a ROWNUM of 1, the second has 2, and so on. You can use
ROWNUM to limit the number of rows returned by a query, as in this example:
SELECT * FROM emp WHERE ROWNUM < 10;
User: returns your current userid
Sysdate: returns current date and time.
Null: returns a null value
Sysdate: returns current date and time.

9. Comments With in SQL Statements:


Comments within SQL statements do not affect the statement execution, but they may make
your application easier for you to read and maintain. You may want to include a comment in a
statement that describes the statement's purpose within your application.
A comment can appear between any keywords, parameters or punctuation marks in a
statement. You can include a comment in a statement using either of these means:
Begin the comment with /*. Proceed with the text of the comment. This text can span
multiple lines. End the comment with */.
Begin the comment with -\space- (two hyphens). Proceed with the text of the
comment. This text cannot extend to a new line. End the comment with a line break.

Comments on Schema Objects:


We can also add comments to a table or columns using the comment command. This is useful
especially for large database where we want others to understand some specific information
about a table such as type of information stored in the table.
Eg: SQL> comment on table emp is
2 this is a table containing employee details;
Eg: SQL> comment on colimn emp.empno is
2 identifier for employee;

10. Operators
An operator is used to manipulate individual data items and return a result. These data items
are called operands or arguments. Some of the operators will be listed below
Arithmetic Operators
unary +-
arithmetic * /
binary +-
Logical Operators
not NOT
and AND
or OR

Character Operators
concatenate ||
Comparison Operators
equality =
inequality !=, <>
greater >, >=
less <, <=
equal to any IN, =ANY
not equal to NOT IN, !=ANY
any =ANY, !=ANY, >ANY, <="ANY,">=ANY
all =ALL, !=ALL, >ALL, <="ALL,">=ALL
between [NOT] BETWEEN x ANY y
exists EXISTS
like x [NOT] LIKE y [ESCAPE z]
null IS [NOT] NULL
Note: % matches any string of zero or more characters except null. The character ``_''
matches any single character.

Set Operators Set operators combines the results of two queries into a single result:
UNION - All distinct rows selected by either query.
UNION ALL - All rows selected by either query, including all duplicates.
INTERSECT - All distinct rows selected by both queries.
MINUS - All distinct rows selected by the first query but not the second.

11. Functions
A function is similar to an operator in that it manipulates data items and returns a result.
Functions differ from operators in the format in which they appear with their arguments. If
you call a function with an argument of a datatype other than the datatype expected by the
function, ORACLE implicitly converts the argument to the expected datatype before
performing the function. If you call a function with a null argument, the function
automatically returns null. The only functions that do not follow this rule are CONCAT,
REPLACE, DUMP and NVL.
There are two general types of functions:
single row (or scalar) functions
group functions (or aggregate) functions
These functions differ in the number of rows upon which they act. A single row function
returns a single result row for every row of a queried table or view, while a group function
returns a single result row for a group of queried rows.
Single row functions can appear in select lists (provided the SELECT statement does not
contain a GROUP BY clause), WHERE clauses, START WITH clauses, and CONNECT BY
clauses.
Group functions can appear in select lists and HAVING clauses. If you use the GROUP BY
clause in a SELECT statement, ORACLE divides the rows of a queried table or view into
groups. In a query containing a GROUP BY clause, all elements of the select list must be
either expressions from the GROUP BY clause, expressions containing group functions, or
constants. ORACLE applies the group functions in the select list to each group of rows and
returns a single result row for each group.
Single Row Functions
Numeric Functions
ABS(n) Returns the absolute value of n.
CEIL(n) Returns smallest integer greater than or equal to n.
COS(n) Returns the cosine of n (angle expressed in radians).
COSH(n) Returns the hyperbolic cosine of n.
EXP(n) Returns e raised to the nth power.
FLOOR(n) Returns largest integer equal to or less than n.
LN(n) Returns the natural logarithm of n (for n > 0).
LOG(m,n) Returns the logarithm, base m, of n. (m <> 0 or 1).
MOD(m,n) Returns remainder of m divided by n.
POWER(m,n) Returns m raised to the nth power (m**n).
ROUND(n[,m]) Returns n rounded to m places right of the decimal.
SIGN(n) Returns -1 if n<0; 0 if n=0; 1 if n>0.
SIN(n) Returns the sine of n.
SINH(n) Returns the hyperbolic sine of n.
SQRT(n) Returns square root of n.
TAN(n) Returns the tangent of n.
TANH(n) Returns the hyperbolic tangent of n.
TRUNC(n[,m]) Returns n truncated to m decimal places; else m=0.

Character Functions
CHR(n) Returns the character having binary equivalent to n.
CONCAT(c1, c2) Returns c1 concatenated with c2.
Returns char, with the first letter of each word in uppercase, all
INITCAP(char)
other letters in lowercase.
LOWER(char) Returns char, with all letters lowercase.
Returns c1, left-padded to length n with the sequence of
LPAD(c1,n[,c2])
characters in c2; c2 defaults to `\space', a single blank.
Removes characters from the left of c1, with initial characters
LTRIM(c1[,set]) removed up to the first character not in set; set defaults to
`\space', a single blank.
Returns the string, c1, with every occurrence of s1, search string,
replaced with replacement string, r1. If r1 replacement string is
REPLACE(c1, s1 [,r1])
omitted or null, all occurrences of s1 are removed. If s1 is null, c1
is returned.
Returns c1, right-padded to length n with c2, replicated as many
times as necessary; c2 defaults to `\space', a single blank. If c1 is
RPAD(c1,n[,c2])
longer than n, this function returns the portion of char1 that fits in
n.
Returns char, with final characters removed after the last
RTRIM(c1[,set])
character not in set; set defaults to `\space'.
Returns a character string containing the phonetic representation
SOUNDEX(char) of char. This function allows you to compare words that are
spelled differently, but sound alike in English.
Returns a portion of c1, beginning at character m, n characters
long. If m is positive, ORACLE counts from the beginning of
char to find the first character. If m is negative, ORACLE counts
SUBSTR(c1,m[,n])
backwards from the end of char. The value m cannot be 0. If n is
omitted, ORACLE returns all characters to the end of char. The
value n cannot be less than 1.
The same as SUBSTR, except that the arguments m and n are
SUBSTRB(c1,m[,n])
expressed in bytes, rather than in characters.
Returns c1 with all occurrences of each character
in from replaced by its corresponding character in to. Characters
in c1 that are not in from are not replaced. The argument from can
contain more characters than to. In this case, the extra characters
at the end of from have no corresponding characters in to. If these
TRANSLATE(c1,from,to)
extra characters appear in c1, they are removed from the return
value. You cannot use an empty string for to in order to remove
all characters in from from the return value. ORACLE interprets
the empty string as null, and if this function has a null argument,
it returns null.
UPPER(char) Returns char, with all letters uppercase.
Returns the decimal representation in the database set of the first
ASCII(char)
byte of char.
Searches c1 beginning with its nth character for the mth
occurrence of c2 and returns the position of the character in c1
that is the first character of this occurrence. If n is negative,
INSTR(c1,c2[,n[,m]]) ORACLE counts and searches backward from the end of c1. The
value of m must be positive. The default values of both n and m
are 1, meaning ORACLE begins searching at the first character of
c1 for the first occurrence of c2.
LENGTH(char) Returns the length of char in characters.

Date Functions
Returns the date d plus n months. The argument n can be any
integer. If d is the last day of the month or if the resulting
ADD_MONTHS(d,n) month has fewer days than the day component of d, then the
result is the last day of the resulting month. Otherwise, the
result has the same day component as d.
Returns the date of the last day of the month that contains d.
LAST_DAY(d) You might use this function to determine how many days are
left in the current month.
Returns number of months between dates d1 and d2. If d1 is
later than d2, result is positive; if earlier, negative. If d1 and
d2 are either the same days of the month or both last days of
MONTHS_BETWEEN(d1,d2) months, the result is always an integer; otherwise ORACLE
calculates the fractional portion of the result based on a 31-
day month and also considers the difference in time
components of d1 and d2.
NEW_TIME(d,z1,z2) Returns the date and time in time zone z2 when date and
time in time zone z1 are d. The arguments z1 and z2 can be
any of these text strings:
'AST' or 'ADT' Atlantic Standard or Daylight Time
'BST' or 'BDT' Bering Standard or Daylight Time
'CST' or 'CDT' Central Standard or Daylight Time
'EST' or 'EDT' Eastern Standard or Daylight Time
'GMT' Greenwich Mean Time
'HST' or 'HDT' Alaska-Hawaii Standard Time or
Daylight Time
'MST' or 'MDT' Mountain Standard Time or Daylight
Time
'NST' Newfoundland Standard Time
'PST' or 'PDT' Pacific Standard or Daylight Time
'YST' or 'YDT' Yukon Standard or Daylight Time
Returns the date of the first weekday named by char that is
later than the date d. The argument char must be a day of the
NEXT_DAY(d,char) week in your session's date language. The return value has
the same hours, minutes, and seconds component as the
argument d.
Returns d rounded to the unit specified by the format model
ROUND(d[,fmt])
fmt. If you omit fmt, d is rounded to the nearest day.
Returns the current date and time. Requires no arguments. In
distributed SQL statements, this function returns the date and
SYSDATE
time of your local database. You cannot use this function in
the condition of a CHECK constraint.
Returns d with the time portion of the day truncated to the
unit specified by the format model fmt. If you omit fmt, d is
truncated to the nearest day.
Format Model Rounding or Truncating Unit

CC, SCC Century


SYYY,YYYY, Year (rounds up on July 1)
YEAR,SYEAR,
YYY,YY,Y
IYYY,IYY,IY,I ISO Year
Q Quarter (rounds up on the sixteenth day of the
second month of the quarter)
TRUNC(d[,fmt]) MONTH,MON, Month (rounds up on the sixteenth
day)
MM,RM
WW Same day of the week as the first day of the
year
IW Same day of the week as the first day of the
ISO year
W Same day of the week as the first day of the
month
DDD,DD,J Day
DAY,DY,D Starting day of the week
HH,HH12,HH24 Hour
MI Minute
Conversion Functions
Converts a value from CHAR or VARCHAR2 datatype to
CHARTOROWID(char)
ROWID datatype.
HEXTORAW(char) Converts char containing hexadecimal digits to a raw value.
Converts raw to a character value containing its hexadecimal
RAWTOHEX(raw)
equivalent.
Converts a ROWID value to VARCHAR2 datatype. The
ROWIDTOCHAR(rowid)
result of this conversion is always 18 characters long.
Converts d of DATE datatype to a value of VARCHAR2
datatype in the format specified by the date format fmt. If
TO_CHAR(d [,fmt])
you omit fmt, d is converted to a VARCHAR2 value in the
default date format.
Converts n of NUMBER datatype to a value of VARCHAR2
datatype, using the optional number format fmt. If you omit
TO_CHAR(n, [,fmt])
fmt, n is converted to a VARCHAR2 value exactly long
enough to hold its significant digits.
Converts char of CHAR or VARCHAR2 datatype to a value
of DATE datatype. The fmt is a date format specifying the
TO_DATE(char [,fmt]) format of char. If you omit fmt, char must be in the default
date format. If fmt is 'J', for Julian, then i char must be a
number.
Converts char, a value of CHAR or VARCHAR2 datatype
TO_NUMBER(char [,fmt]) containing a number in the format specified by the optional
format model fmt, to a value of NUMBER datatype.

Other Funtions
Returns the greatest of the list of exprs. All exprs after the first are
GREATEST(expr implicitly converted to the datatype of the first prior to the
[,expr] ...) comparison. ORACLE compares the exprs using non-padded
comparison semantics.
Returns the least of the list of exprs. All exprs after the first are
LEAST(expr implicitly converted to the datatype of the first prior to the
[,expr] ...) comparison. ORACLE compares the exprs using non-padded
comparison semantics.
If expr1 is null, returns expr2; if expr2 is not null, returns expr1. The
arguments expr1 and expr2 can have any datatype. If their datatypes
NVL(expr1,expr2)
are different, ORACLE converts expr2 to the datatype of expr1 before
comparing them.
UID Returns an integer that uniquely identifies the current user.
USER Returns the current ORACLE user with the datatype VARCHAR2.
Returns information of VARCHAR2 datatype about the current
session. This information can be useful for writing an application-
USERENV(option)
specific audit trail table or for determining the language-specific
characters currently used by your session.
Group Functions: Many group functions accept these options:
This option causes a group function to consider only distinct values of the
DISTINCT
argument expression.
ALL This option causes a group function to consider all values including all duplicates
All group functions except COUNT(*) ignore nulls. You can use the NVL in the argument to
a group function to substitute a value for a null. If a query with a group function returns no
rows or only rows with nulls for the argument to the group function, the group function
returns null.

AVG([DISTINCT|ALL] n) Returns average value of n.


COUNT({* | [DISTINCT|ALL] Returns the number of rows in the query. If you specify
expr} ) expr, this function returns rows where expr is not null. You
can count either all rows, or only distinct values of expr. If
you specify the asterisk (*), this function returns all rows,
including duplicates and nulls.
MAX([DISTINCT|ALL] expr) Returns maximum value of expr.
MIN([DISTINCT|ALL] expr) Returns minimum value of expr.
Returns the standard deviation of x, a number. ORACLE
STDDEV([DISTINCT|ALL] x) calculates the standard deviation as the square root of the
variance defined for the VARIANCE group function.
SUM([DISTINCT|ALL] n) Returns sum of values of n.
Returns variance of x, a number. For the variance formula
VARIANCE([DISTINCT|ALL]
see page 3-48 of ``ORACLE7 Server SQL Language
x)
Reference Manual''.

12. Format Models


A format model is a character literal that describes the format of DATE or NUMBER data
stored in a character string. You can use a format model as an argument of the TO_CHAR or
TO_DATE function for these purposes
to specify the format for ORACLE to use to return a value from the database to you
to specify the format for a value you have specified for ORACLE to store in the
database
Note that a format model does not change the internal representation of the value in the
database.

Number Format Models


You can use number format models in these places:
in the TO_CHAR function to translate a value of NUMBER datatype to VARCHAR2
datatype
in the TO_NUMBER function to translate a value of CHAR or VARCHAR2 datatype
to NUMBER datatype
All number format models cause the number to be rounded to the specified number of
significant digits. If a value has more significant digits to the left of the decimal place than
are specified in the format, pound signs (#) replace the value.
A number format model is composed of one or number format elements. The table below lists
the elements of a number format model.
Element Example Description
Number of ``9''s specifies number of significant digits returned. Blanks
9 9999
are returned for leading zeroes and for a value of zero.
0999 Returns a leading zero or a value of zero in this position as a 0, rather
0
9990 than as a blank.
$ $9999 Prefixes values with dollar sign.
B B9999 Returns zero value as blank, regardless of ``0''s in the format model.
Returns ``-'' after negative values. For positive values, a trailing space
MI 9999MI
is returned.
Returns ``+'' for positive values and ``-'' for negative values in this
S S9999
position.
Returns negative values in . For positive values, a leading and trailing
PR 9999PR
space is returned.
D 99D99 Returns the decimal character in this position, separating the integral
and fractional parts of a number.
G 9G999 Returns the group separator in this position.
C C999 Returns the ISO currency symbol in this position.
L L999 Returns the local currency symbol in this position.
,(comma) 9,999 Returns a comma in this position.
Returns a period in this position, separating the integral and fractional
.(period) 99.99
parts of a number.
Multiplies values by 10**{n}, where n is the number of ``9''s after the
V 999V99
``V''.
EEEE 9.999EEEE Returns value in scientific notation.
RN RN Returns upper- or lower-case Roman numerals.
Rn Value can be an integer between 1 and 3999.
The MI and PR format elements can only appear in the last position of a number format
model. The S format element can only appear in the first or last position.
If a number format model does not contain the MI, S, or PR format elements, negative sign
and positive values automatically contain a leading space.
A number format model can contain only a single decimal character (D) or period (.), but it
can contain multiple group separators (G) or commas (,). A group separator or comma cannot
appear to the right of a decimal character or period in a number format model.
The characters returned by some of these format elements are specified by initialization
parameters. The table below lists these elements and parameters.

Element Description Initialization Parameter


D Decimal character NLS_NUMERIC_CHARACTERS
G Group separator NLS_NUMERIC_CHARACTERS
C ISO currency symbol NLS_ISO_CURRENCY
L Local currency symbol NLS_CURRENCY

Date Format Models


You can use date format models in these places:
in the TO_CHAR function to translate a DATE value that is in a format other than the
default date format
in the TO_DATE function to translate a character value that is in a format other than
the default date format
A date format model is composed of one or more date format elements. The table below lists
the elements of a date format model.

Element Meaning
SCC or CC Century; ``S'' prefixes BC dates with ``-''.
YYYY or
4-digit year; ``S'' prefixes BC dates with ``-''.
SYYYY
IYYY 4-digit year based on the ISO standard.
YYY or YY or
Last 3, 2, or 1 digit(s) of year.
Y
IYY or IY or I Last 3, 2, or 1 digit(s) of ISO year.
Y, YYY Year with comma in this position.
SYEAR or Year, spelled out;``S'' prefixes BC dates with ``-''.
YEAR
RR Last 2 digits of year; for years in other centuries.
BC or AD BC/AD indicator.
B.C. or A.D. BC/AD indicator with periods.
Q Quarter of year (1,2,3,4; JAN-MAR = 1).
MM Month (01-12; JAN = 01).
RM Roman numeral month (I-XII; JAN = I).
MONTH Name of month, padded with blanks to length of 9 characters.
MON Abbreviated name of month.
Week of year (1-53) where week 1 starts on the first day of the year and
WW
continues to the seventh day of the year.
IW Week of year (1-52 or 1-53) based on the ISO standard.
Week of month (1-5) where week 1 starts on the first day of the month and
W
ends on the seventh.
DDD Day of year (1-366).
DD Day of month (1-31).
D Day of week (1-7).
DAY Name of day, padded with blanks to length of 9 characters.
DY Abbreviated name of day.
Julian day; the number of days since January 1, 4712 BC. Numbers specified
J
with 'J' must be integers.
AM or PM Meridian indicator.
A.M. or P.M. Meridian indicator with periods.
HH or HH12 Hour of day (1-12).
HH24 Hour of day (0-23).
MI Minute (0-59).
SS Second (0-59).
SSSSS Seconds past midnight 90-86399).
-/,.;:"text" Punctuation and quoted text is reproduced in the result.
The RR date format element is similar to the YY date format element, but it provides
additional flexibility for storing date values in other centuries. The RR date format element
allows you to store twenty-first century dates in the twentieth century by specifying only the
last two digits of the year. It will also allow you to store twentieth century dates in the
twenty-first century in the same way if necessary.
If you use the TO_DATE function with the YY date format element, the date value returned is
always in the current century. If you use the RR date format element instead, the century of
the return value varies according to the specified two-digit year and the last two digits of the
current year.
The following suffixes can be added to date format elements:
Example Example
Suffix Meaning
Element Value
TH Ordinal number DDTH 4TH
SP Spelled number DDSP FOUR
SPTH or THSP Spelled, ordinal number DDSPTH FOURTH
Capitalization in a spelled-out word, abbreviation, or Roman numeral follows capitalization
in the corresponding format element. For example, the date format model 'DAY' produces
capitalized words like 'MONDAY'; 'Day' produces 'Monday'; and 'day' produces 'monday'.

13. Data Definition Language (DDL) SQL Commands


The commands of SQL that are used to create database objects, alter the structure of the
database objects and delete database objects from database are collectively called as DDL.
Examples include Create, Alter, Drop, Truncate, Rename etc.
The Create Table Command:
The create table command defines each column of the table uniquely. Each column has
minimum of three attributes.
Name
Data type
Size(column width).
Each table column definition is a single clause in the create table syntax. Each table column
definition is separated from the other by a comma. Finally, the SQL statement is terminated
with a semicolon.
The Structure of Create Table Command
Table name is Student
Column name Data type Size
Reg_no varchar2 10
Name char 30
DOB date
Address varchar2 50
Example:
CREATE TABLE Student
(Reg_no varchar2(10),
Name char(30),
DOB date,
Address varchar2(50));

The DROP Command


Syntax:
DROP TABLE <table_name>
Example:
DROP TABLE Student;
It will destroy the table and all data which will be recorded in it.

The TRUNCATE Command


Syntax:
TRUNCATE TABLE <Table_name>
Example:
TRUNCATE TABLE Student;

The RENAME Command


Syntax:
RENAME <OldTableName> TO <NewTableName>
Example:
RENAME <Student> TO <Stu>
The old name table was Student now new name is the Stu.

The ALTER Table Command


By The use of ALTER TABLE Command we can modify our exiting table.
Adding New Columns
Syntax:
ALTER TABLE <table_name>
ADD (<NewColumnName> <Data_Type>(<size>),......n)
Example:
ALTER TABLE Student ADD (Age number(2), Marks number(3));
The Student table is already exist and then we added two more
columns Age and Marks respectively, by the use of above command.
Dropping a Column from the Table
Syntax:
ALTER TABLE <table_name> DROP COLUMN <column_name>
Example:
ALTER TABLE Student DROP COLUMN Age;
This command will drop particular column
Modifying Existing Table
Syntax:
ALTER TABLE <table_name> MODIFY (<column_name>
<NewDataType>(<NewSize>))
Example:
ALTER TABLE Student MODIFY (Name Varchar2(40));
The Name column already exist in Student table, it was char and size 30, now it is modified
by Varchar2 and size 40.
Restriction on the ALTER TABLE
Using the ALTER TABLE clause the following tasks cannot be performed.
Change the name of the table
Change the name of the column
Decrease the size of a column if table data exists

14. Data Manipulation Language (DDL) SQL Commands


The commands of SQL that are used to insert data into the database, modify the data of the
database and to delete data from the database are collectively called as DML. Examples
include Insert, Update Delete and select.
Insert Operation: To insert data into a table.
Eg: INSERT INTO student (reg_no, first_name, last_name, dob,address, pincode)
VALUES('A101', 'Mohd', 'Imran', '01-MAR-89','Allahabad', 211001);
Eg : INSERT INTO student VALUES('A101', 'Mohd', 'Imran', '01-MAR-
89','Allahabad', 211001);
Note : Character expression placed within the insert into statement must be enclosed in
single quotes (').
Inserting data into a table from another table:
In addition to inserting data one row at a time into a table, it is quite possible to populate a
table with data that already exist in another table. You can store same record in a table that
already stored in another table.
Eg : suppose you want to insert data from course table to university table then use this
example:
INSERT INTO university SELECT course_id, course_name FROM course;
Data will be inserted into university table automatically, what will be in course table. you can
give condition also in WHERE clause.

DELETE Operation:
The DELETE command can remove all the rows from the table or a set of rows from the
table.
eg: DELETE FROM student; It will DELETE all the rows from student table.

eg: DELETE FROM student WHERE reg_no='A101'; If condition will be satisfied then
it will delete a row from the table Register number A101 will be deleted from the table

UPDATE Operation:
The UPDATE command is used to change or modify data values in a table and UPDATE
command can Update all the rows from the table or a set of rows from the table.
eg : UPDATE Student SET course='MCA';
Course is a column name, suppose ant time you want to update something like that in the
student table course should be MCA for all students then you can use this type of query. It
will update all the rows in the table all rows will have MCA course.
Now, if you want update particular row then see below.
UPDATE Student SET course='MCA' where reg_no='A101'; it will update only one row
that will have Register no. A101.
you can use different-different types of condition in WHERE clause, eg salary updation, if
salary has increased someone's then simply multiply, addition you can do in salary column.

Select: Select command is used to view Data in the Table (Select Command). Once data has
been inserted into a table, the next most logical operation would be to view what has been
inserted. The SELECT SQL verb is used to achieve this.
All Rows and All Columns
Syntax: SELECT * FROM Table_name;
eg: Select * from Student; It will show all the table records.
SELECT First_name, DOB FROM STUDENT WHERE Reg_no = 'S101'; Cover it by
single inverted comma if its datatype is varchar or char.
This Command will show one row. because you have given condition for only one row and
particular records. If condition which has given in WHERE Clause is true then records will
be fetched otherwise it will show no records selected.

Eliminating Duplicates:
A table could hold duplicate rows. In such a case, you can eliminate duplicates.
Syntax: SELECT DISTINCT col, col, .., FROM table_name;
eg : SELECT DISTINCT * FROM Student;
or : SELECT DISTINCT first_name, city, pincode FROM Student;
It scans through entire rows, and eliminates rows that have exactly the same contents in each
column.
Sorting DATA:
The Rows retrieved from the table will be sorted in either Ascending or Descending order
depending on the condition specified in select statement, the Keyword has used ORDER BY.
SELECT * FROM Student
ORDER BY First_Name;
it will in show records as alphabetical order from A to Z ascending order. If you want
Descending order means Z to A then used DESC Keyword at last.
eg : SELECT first_name, city,pincode FROM Student
ORDER BY First_name DESC;

15. Transaction Control Language (TCL)


The commands of SQL that are used to control the transactions made against the database are
collectively called as TCL and examples include Commit, Rollback and Save point.
Commit: Commit is used for the permanent changes. When we use Commit in any query
then the change made by that query will be permanent and visible. We can't Rollback after the
Commit.
Rollback: Rollback is used to undo the changes made by any command but only before a
commit is done. We can't Rollback data which has been committed in the database with the
help of the commit keyword.
Save point: creates points within groups of transactions in which to ROLLBACK.

UNIT III
Schema Objects

1. Introduction to Schema Objects


A schema is a collection of logical structures of data, or schema objects. A schema is
owned by a database user and has the same name as that user. Each user owns a single
schema. Schema objects can be created and manipulated with SQL and include the following
types of objects:
Clusters
Database links
Database triggers
Dimensions
External procedure libraries
Indexes and index types
Java classes, Java resources, and Java sources
Materialized views and materialized view logs
Object tables, object types, and object views
Operators
Sequences
Stored functions, procedures, and packages
Synonyms
Tables and index-organized tables
Views
Other types of objects are also stored in the database and can be created and manipulated with
SQL but are not contained in a schema:
Contexts
Directories
Profiles
Roles
Tablespaces
Users
Rollback segments

2. Features of Schema Objects:


i) Schema objects are logical data storage structures.
ii) Schema objects do not have a one-to-one correspondence to physical files on disk that
store their information.
iii) RDBMS stores a schema object logically within a tablespace of the database.
iv) The data of each object is physically contained in one or more of the tablespace's
datafiles.
V) For some objects, such as tables, indexes, and clusters, you can specify how much disk
space Oracle allocates for the object within the tablespace's datafiles.
vi) There is no relationship between schemas and tablespaces: a tablespace can contain
objects from different schemas, and the objects for a schema can be contained in different
tablespaces.

3. Guidelines for Managing Schema Objects:


Before creation of a new database, planning the physical structure of a database is crucial for
the database administrator. Apart from planning, the application developer must also carefully
plan and create an applications database objects. The guidelines for managing scheme
objects includes:
1. Planning the database tables in an application scheme.
2. Setting integrity rules and defining table relationship in an application scheme.
3. Using views effectively for an application
4. Creating and managing tables and views.
5. Using indexes and clusters to speed the retrieval of specific rows in a table.
6. Using sequence to automatically create unique primary keys for tables without contention
among concurrent users who are inserting new rows.

4. Sequences:
Sequences are database objects from which multiple users can generate unique integers. The
sequence generator generates sequential numbers, which can help to generate unique primary
keys automatically, and to coordinate keys across multiple rows or tables.
Without sequences, sequential values can only be produced programmatically. A new
primary key value can be obtained by selecting the most recently produced value and
incrementing it. This method requires a lock during the transaction and causes multiple users
to wait for the next value of the primary key
Syntax:
CREATE SEQUENCE sequence_name
[START WITH start_num]
[INCREMENT BY increment_num]
[ { MAXVALUE maximum_num | NOMAXVALUE } ]
[ { MINVALUE minimum_num | NOMINVALUE } ]
[ { CYCLE | NOCYCLE } ]
[ { CACHE cache_num | NOCACHE } ]
[ { ORDER | NOORDER } ];
The default start_num is 1.
1. The default increment number is 1.
2. minimum_num must be less than or equal to start_num, and minimum_num must be
less than maximum_num.
3. NOMINVALUE specifies the maximum is 1 for an ascending sequence or -10^26 for
a descending sequence.
4. NOMINVALUE is the default.
5. maximum_num must be greater than or equal to start_num, and maximum_num must
be greater than minimum_num.
6. NOCYCLE specifies the sequence cannot generate any more integers after reaching
its maximum or minimum value.
7. NOCYCLE is the default.
8. CACHE cache_num specifies the number of integers to keep in memory.
9. The default number of integers to cache is 20.
10. The minimum number of integers that may be cached is 2.
11. NOCACHE specifies no integers are to be stored.
12. ORDER guarantees the integers are generated in the order of the request.
13. NOORDER is the default.

e.g:
CREATE SEQUENCE emp_sequence
INCREMENT BY 1
START WITH 1
NOMAXVALUE
NOCYCLE
CACHE 10;