This module:
Summarizes main concepts of Chapters 1 & 2 of textbook:
Elmasri/Navathe (E&N), Edition#5 (or newer), publisher: Addison-Wesley
(Note: Chapter + Page references in these notes refer to Edition#5
If you have a different edition, the topic under discussion can then be found
using your book’s index and chapter outlines)
• Abbreviations:
Database (DB)
Database management System (DBMS)
Specifically, Chapters 1 & 2 cover:
Comparison of non-DBMS with DBMS environments (Ch 1)
Software Architecture of a DBMS (Ch 2)
Introduction to DB
In general, regardless of implementation, a database (DB) has two main
components: 1) data and 2) processors of the data
Processors (usually programs) specify how data is manipulated
• Schema - how information relates to other pieces of information, and how information is
grouped/classified and structured
Ex: Named record types: Department, Student, Class section, …
and record type fields are defined and “known to” (i.e.: stored in) the DBMS;
Each ClassSection record’s student# field value represents a student enrolled in this
section, and this student# value must match a Student file record student# field value
That is, a Class section student cannot exist unless such a student exists in the Student file
(Technically, this is an example of a value Existence constraint)
Schema information is stored separately from data, in the catalog/data dictionary*
data dictionary (DD) ß abbreviation used in this course
(data management relies on the DD – it stores data names & their meanings)
• Data + data relationships - the actual information users store in the database
Normally, store data in structures based on schema definitions
However, theoretically, unstructured/binary data can be stored in many database systems
Several different DB models have been implemented, all of them having the purpose of
improving on limitations of FMS
RDBMS vs.“Schemaless” processing systems vs. “Object Relational”
Even BEFORE DBMS systems came into use (way back in the 1960s) …
various DB models have been proposed and deployed
SQL “object” processing has been supported for almost 20 years in the major relational
DBMSs (IBM’s DB2, Oracle, Microsoft SQL Server). The SQL 99: standard specifies SQL “object”
language. Note: a system such as MySQL (that is NOT an RDBMS) has no object capability.
In such systems (classified as “object relational models” (aka OORDBMS)),
and unlike the original relational model,
a table column can be more than just a scalar value, by allowing a column to be
an object or an object reference.
= > covered later in course
Overview of Relational Database Management Systems (RDBMS)
Recall that a DB model has 2 main components:1) Data and 2) Processors
Start course focus on relational DBMS (RDBMS) model concepts
The fundamental schema “data container” visible to the user is called a table
Example: In RDBMS, a person occurrence is accessed (and user sees it) as a
stored row in a table. However, rows are physically stored in RDBMS-vendor
specific blocked binary format.
By contrast, in FMS, each person record is stored in a record in an ordinary OS file
using program-defined formats (that can be arbitrary)
- self-describing (all relevant schema info is stored in the DD and can be queried)
- multiple views (different users authorized to see different aspects of the same data)
- shared access to data and concurrent processing is an emphasis
Chapter 2 introduces
• The three-level architecture that forms the basis for many database architectures
- a result of the ANSI/SPARC study group on Database Management Systems (in the
mid 1970s)
- database vendors differ in how they implement the three levels
CREATE view viewName AS ß “AS” is a Required keyword (it separates the SELECT)
SELECT t.c1, v.c2h
FROM t, v; Can be any legal SQL query;
In this example, t and v are an existing table and
view, with respective columns named c1 and c2.
Note: the view inherits specified columnNames of the
tables/views in the FROM clause
Note: For any table or view (assume named myobj)
SQL> describe myobj {displays its schema (=column names, Oracle data types
and nullity constraint (NOT NULL), if any}
Three-level Architecture E/C/I applied to RDBMSs
1. External (E) level: DD info for user views & system views
=> DB apps should use views rather than tables, as much as possible
- View query results are accessed from tables that store query data
- By default, results of a query on a view are displayed, but NOT stored anywhere
External/Conceptual Mappings
Data Model
(ConceptualLayer)
Relational Database
Management
Conceptual/Internal System
Mapping (RDBMS)
Storage formats
(InternalLayer)
From now on, unless stated otherwise, “RDBMS” means the Oracle RDBMS
(Except for SQL Standard, RDBMSs differ in operational working of transactions, storage management, etc.)
A small part of the Oracle DD is associated with each user/account.
User-specific parts of the DD are a user’s schema (contains that user’s tables, views, etc.)
External (E) Level
Ideally, RDBMS apps should access data using views rather than tables
In oracle SQL, by default, the ONLY views user U can access are
1) views created by U
( Note: the creator of an Oracle object owns that object ), and
2) views for which U was granted access by another user
< The Oracle object access model is derived from the “Take/Grant”
protection model originally invented for OS protection in 1970s >
Important DD views used very frequently by an Oracle user:
user_views is a view queryable in each user U’s schema;
it has info about all views owned by U {for 1) above}
Example: user_tables is a view in each user U’s schema; it has info about all
tables owned by U
To repeat: RDBMS “table” format is an illusion for end-user convenience/simplicity
-------------------------------------------------------------------------------------------------------------------------------------------------------------
Example: a view in each user U’s schema displaying conceptual level table
info about tables owned OR accessible by U;
The following view query filters results by eliminating all non-answer rows:
SELECT tableName, status
FROM all_tables
WHERE table_name like ‘X%’;
DD view all_tables built-in to each user’s schema
= all tables in DB user U can access
Condition that “filters” query results:
Result is: All <tableName, owner> pairs for
tables whose name starts with “X” ß upper case
Internal (I) Level
• Defines mappings of C level objects-- > their physical DB representations
Oracle-specific notes (another RDBMS system might have different such rules):
Each table’s data is physically stored in OS files named “xxx.dbf, yyy.dbf, …”
owned by Oracle; data is normally accessed when using SQL as an
RDBMS table or view or other type of Oracle object
User U can access an item only if U has the required privileges for
attempted operations on the item
More about mappings between schema layers
• The conceptual/internal mapping:
– specifies mapping from conceptual level objects to their stored counterparts
Ex: Each oracle table “T” is implemented (means physically represented)
as its row data stored in part of an OS file “f” owned by OS user “oracle”;
the DD and storage block header info keeps track of the storage location of T’s
row data
{the unit of storage is a ‘database block’, size somewhat larger than an OS block}
DepartmentNumber char(2));
Note: A cast operator on SSN_last4 can define SSN_last4 desired format
Illustration of physical data independence
At some time after the employee table was created, a DBA (or table admin.) does:
create index e_index on employee(SSN);
1) employee table and 2) employee_view definitions do not need change;
Benefit: many queries on employee & employee_view involving SSN will run faster
Suitable Mappings between the 3 schema levels provide Data
Independence
Data Independence reduces Impact from Changes across schema layers
• A change to the storage structure definition might mean that the
conceptual/internal mapping must be changed. However,
IF the conceptual schema remains invariant/unchanged after a storage
structure change, we achieve
physical data independence (PDI)
Ex: When DBA adds another AM for table rows (such as an index),
no logical schema or external schema changes are needed,
thus, PDI is achieved
Each prominent large scale RDBMS (Oracle, DB2, SQL Server, Informix, …)
can be considered a giant software engine running on the OS.
(later in the course, we study data normalization, a way to create a collection of table
schemas satisfying important RDBMS design objectives)
-- DBA must know how to analyze and create appropriate physical structures (indexes,
other structures, recovery from failure, archiving (means saving a record of some
particular transaction records, forever), other special requirements, etc.)
DBA cont...
n Liaising with users, i.e. to ensure that the data required is available and to
design the necessary external schemas and conceptual/external mapping (using
DDL);
For example, create and configure additional tablespaces, user schemas, solve
“outOfStorage” or “Quota exceeded” storage issues
n Give technical help when users have advanced questions problems with their
applications
Example: a web app having FrontEnd + App + DBMS structure,
App tier: Java/Python/C++ /C# … help is app developer’s job, not a DBA duty)
n recovery of transactions, most RDBMSs use a log file where each log
record contains the values for database items before and after changes,
per transaction; log is used for recovery purposes
Advantages:
+ faster retrievals from multiple copies:
a local access (for example, on the server being used) is faster than a
non-local access to a remote server (that involves network
Disadvantages:
- slows down updates (the copy at each site must be changed)
- duplicates storage
- temporary data inconsistency is possible among sites during update
Minimizing/Reducing Data Redundancy
• In non-DBMS systems, each application has private files
-- Often results in redundancy in stored data, thus, wasted storage space
-- Early (of many years ago) systems did not share data easily
• In a DBMS, data is “integrated”
the database is a unification of many otherwise distinct data files;
an RDBMS table access by a user does not need to “see”, nor “program”
the files that actually contain the physical data
Look again at the section in the SQLPLUS204_s19.doc handout –
a tablespace storage map can involve any number of x.dbf files
(Ex: table X can be in tablespace ts1’s datafile f1.dbf and table Y can be in
tablespace ts1’s datafile f3.dbf)
– The view construct is a prominent technique for integrating data;
allows a table, “T”, to be visible in the schema that owns T
AND also visible to other authorized schemas
à Using views reduces redundancy because, instead of creating new
tables containing copies of columns from existing tables, define views
that use the data in existing (aka “base”) tables
Schema design principle: store each value in one PLACE, if possible
An example of inconsistency in Redundancy and How-to “hide” it
• Data redundancy can lead to inconsistency in the database unless redundant data is carefully processed
One way to implement hiding – provide a “consistent view” of each data set:
Suppose: - At time t1, there are 2 copies c1 and c2 of value v3
- At time t2 > t1, c1 is set to v4 ß {The hardware does NOT do the two updates
- At time t3 > t2, c2 is set to v4 simultaneously}
Assume user U1 can access only c1 and user U2 can access only c2
Then at any time t between t1 and t3, any other user U (different from U1 and U2) accesses v3 (but not v4)
And, after time t3, all users, U1, U2, and U, see the updated value v4
--------------------------------------------------------------------------
The point of the above example is: between times t2 and t3, the current values of the cj are
unequal, aka, ‘inconsistent’ DB values, and these inconsistent values are masked/hidden
from users U
Note – the value c3 is “stale” between times t2 and t3;
vs. FMS,
n DB utilities load incorrect data and/or data entry clerks manually key in incorrect data
Input data checks (and correction of bad input) must be used to prevent incorrect data entry
n Integrity constraints must be enforced – these are assertions to be satisfied by each table update
(in Oracle, a component of the SQL processor deals with enforcing integrities)
Ex: an employee table’s manager column value should never reference a non-existent manager:
employee( id, name, . . . , manager, . . . )
n Converting data across different data types is another common cause of incorrect data
rounding/approximation errors, illegal target data type;
unexpected result values from a conversion that are not noticed, and carried forward, etc.
( If different DB app languages access the same DB, those languages that do cast/type conversion might
convert the same data differently )
n Possibility of corrupted data from accidental causes (accidental delete, for example)
Thus, most integrity checks involve data Values and data Types
Note - some of the more serious corruptions (as side-effects of security breaches, for example,
result in sudden inaccessibility to data that should be accessible;
= => it is also possible that the DD itself has been compromised by a Security attack
Integrity cont... – more advanced forms
• A record type may have constraints on:
- the total number of allowable occurrences
(Ex: max number of authorized medical procedures per patient/year – depends on insurance policy)
Constraints in general
• Constrains have been studied as a topic in their own right for many years
• They occur implicitly/explicitly in every software system except the simplest ones
• Object Constraint Language (OCL), created in the mid 1990s looks at all aspects of constraints:
pre/post-conditions, invariants, various OO-related constraints such as inheritance, associations, etc.
• Various parts of OCL have been incorporated into various UML diagramming systems
• As with many modern language environments, SQL has incorporated some (but not all) of OCL
DBMS Integrity - summary
2. Value, aka Range checks (disallow a value not in specified (min,max) range, or
not in a specified List of Values (LOV)
The term “DD” has a much broader scope and meaning than
just physical DB storage of user data.
{E&N textbook does NOT make this clear in beginning of book}
On large DB projects, regardless of the DB classification, extensive Analysis must be done before
loading user data content. Incomplete or poor analysis will result in DB Design and Implementation
issues that result in much redo/re-work
Server functionality involves the core DBMS facilities such as shared data access,
SQL processing, transaction processing, DD services, etc.