Anda di halaman 1dari 36

P0_DBMS_intro204_s19.

ppt - Introduction to DBMS environments


spring 2019
This course
Assumes: Some exposure to database systems and SQL
Does not assume: Detailed knowledge of relational database systems and
how applications use DBMS capabilities

This module:
Summarizes main concepts of Chapters 1 & 2 of textbook:
Elmasri/Navathe (E&N), Edition#5 (or newer), publisher: Addison-Wesley
(Note: Chapter + Page references in these notes refer to Edition#5
If you have a different edition, the topic under discussion can then be found
using your book’s index and chapter outlines)
• Abbreviations:
Database (DB)
Database management System (DBMS)
Specifically, Chapters 1 & 2 cover:
Comparison of non-DBMS with DBMS environments (Ch 1)
Software Architecture of a DBMS (Ch 2)
Introduction to DB
In general, regardless of implementation, a database (DB) has two main
components: 1) data and 2) processors of the data
Processors (usually programs) specify how data is manipulated

DB, as historically implemented in late1960s thru end of 1970s:


• Each application had independent, non-shared files
- Frequent data duplication across applications à data inconsistencies
- No standardized security for directories, files, file protection levels, etc.

– subroutines provided general routines to manipulate data


– BUT, using subroutines required low-level data & I/O programming knowledge;
the subroutines used an early compiled language such as FORTRAN, C, or
COBOL or a machine-specific assembly language
- The only way to access data was by programming in the above kinds of
languages (data access required programming ability + technical skills)

The environment described above is referred to as the


“File Management System” (FMS) approach to DB
FMS proved too costly to 1) develop and 2) maintain: different
programmers repeatedly developing similar code for each new app
=>> The above problems motivated invention of DBMSs
Database Management System (DBMS) Approach
• DBMS and FMS-based DBs are very different environments

• Information in a DBMS system visibly subdivided into two subsets:


– Schema* or Metadata Schema contains structure & DB definitions
for everything about users & the DBMS itself

– Data + data relationships: User data and its structures

• Schema - how information relates to other pieces of information, and how information is
grouped/classified and structured
Ex: Named record types: Department, Student, Class section, …
and record type fields are defined and “known to” (i.e.: stored in) the DBMS;
Each ClassSection record’s student# field value represents a student enrolled in this
section, and this student# value must match a Student file record student# field value
That is, a Class section student cannot exist unless such a student exists in the Student file
(Technically, this is an example of a value Existence constraint)
Schema information is stored separately from data, in the catalog/data dictionary*
data dictionary (DD) ß abbreviation used in this course
(data management relies on the DD – it stores data names & their meanings)
• Data + data relationships - the actual information users store in the database
Normally, store data in structures based on schema definitions
However, theoretically, unstructured/binary data can be stored in many database systems
Several different DB models have been implemented, all of them having the purpose of
improving on limitations of FMS
RDBMS vs.“Schemaless” processing systems vs. “Object Relational”
Even BEFORE DBMS systems came into use (way back in the 1960s) …
various DB models have been proposed and deployed

Relational DBMSs have dominated as a core software technology for


data/information processing since the middle of the 1980s
1. They work well on many standard kinds of applications
2. Millions of development hours were invested in optimizing these systems in order to provide
acceptable speed/performance
3. IF a DBMS closely conforms to the relational model, it has extensive/elaborate DDs

A fundamental distinction among DB models is presence/absence of the explicit use of schemas


A “schemaless” DB is one having no explicit description (thus, no schema) for definitions of
stored items . In fact, such a system is not actually DB at all ! More on this at end of this course.
* has advantages for processing some forms of unstructured data
* each stored item, in simplest terms, is a “pair” <key value, object>

SQL “object” processing has been supported for almost 20 years in the major relational
DBMSs (IBM’s DB2, Oracle, Microsoft SQL Server). The SQL 99: standard specifies SQL “object”
language. Note: a system such as MySQL (that is NOT an RDBMS) has no object capability.
In such systems (classified as “object relational models” (aka OORDBMS)),
and unlike the original relational model,
a table column can be more than just a scalar value, by allowing a column to be
an object or an object reference.
= > covered later in course
Overview of Relational Database Management Systems (RDBMS)
Recall that a DB model has 2 main components:1) Data and 2) Processors
Start course focus on relational DBMS (RDBMS) model concepts

The fundamental schema “data container” visible to the user is called a table
Example: In RDBMS, a person occurrence is accessed (and user sees it) as a
stored row in a table. However, rows are physically stored in RDBMS-vendor
specific blocked binary format.
By contrast, in FMS, each person record is stored in a record in an ordinary OS file
using program-defined formats (that can be arbitrary)

Name Address Date of Birth Salary

Jim Smith 1 Apple Lane 1/3/1991 11000

Jon Greg 5 Pear St 7/9/1992 13000

Bob Roberts 2 Plumb Road 3/2/1990 12000

Schema info Stored representation of person occurrences data


= person schema {In this example, there are 3 person data rows}
Table schemas
• Rows: represent different object occurrences as objects with
the same schema (same type)
- FMS file records corresponds to table “rows”
• Columns: related facts about each object occurrence;
- Schema info about a column includes the column’s
name and its data type (as well as various other
properties, constraints, covered later)
The schema for an RDBMS table fully describes the structure of the columns
• person “table” schema in the previous slide includes pairs
<ColumnName, data type>: < == each RDBMS table column must have a type
– NAME – string holding 12 characters, cannot be empty string
– ADDRESS – string holding 12 characters
– Date of Birth – a date, with constraint: 18 < age <100
– SALARY – a number >= 0 (in some DBMS-defined numeric type)
• Other inter-data semantics (meanings) supported in RDBMS, including
- structure & processing info about tables, rows, columns, relationships
- constraints such as existence, time-related processing, security, etc.
DBMS responsibilities/conceptual roles
There are three broad classes of DBMS users:
1. application programmer, responsible for developing apps in some -
high-level language(s) such as COBOL, C, Java, C#, Perl, etc.;
- programs interface with the DBMS data using published interfaces
by high-level language api-based calls to the DBMS
2. end-user, usually access the database via a DB query language
(language well-suited for accessing data more easily than with 3GL code);
Such users are in many cases not technically sophisticated =>
the simpler the DB/user interface, the better
Might also develop their own moderately-sophisticated apps.
3. database administrator (DBA), administers and controls all system-wide
aspects of the database; a large DB might deploy several people, each with
DBA authority (when DBA task is too much for one person)
- installs, upgrades, configures DB, might help configure user
schemas, etc.;
- consults with, and helps, users having technical problems
(lost-password, poor performing application, out-of-storage space, etc.)
Database Architecture
Different commercial DBMSs do not use the same internal architecture
Moreover, two different RDBMSs, even from the same corporation (such as Oracle
and MySQL) are very different, Oracle having many more capabilities
We use a multi-programmed virtual server running Oracle in CSC204
Chapter 1, section 1.3 identifies 3 (of many) Characteristics of DBMSs vs. FMS

- self-describing (all relevant schema info is stored in the DD and can be queried)
- multiple views (different users authorized to see different aspects of the same data)
- shared access to data and concurrent processing is an emphasis

Chapter 2 introduces
• The three-level architecture that forms the basis for many database architectures

- a result of the ANSI/SPARC study group on Database Management Systems (in the
mid 1970s)
- database vendors differ in how they implement the three levels

• According to the three-level architecture, the DD is divided into


3 general levels/layers
(analogous to the seven ISO network layers in computer networking):
1. External (E)
2. Conceptual (C)
3. Internal (I)
Built-in Language components for RDBMS
A DBMS provides two languages:
Data definition language (DDL) interacts with DD
DDL statements by a user can create/delete/modify schema structures
(This means creating & changing schemas themselves)
DDL by an administrator can manipulate ANY object in the DB
Examples:
create table TableName … ; (user/administrator creates a table)
-- creating other types of Oracle objects covered soon
There is NO need for CSC204 students to issue commands like:
create database databaseName … ; This is because one database that houses all
student schema/account info already exists, and moreover,
this DB does not need to be referenced any time during CSC204
Data manipulation language (DML) to query and/or modify data
(Note: A non-SQL system might not even have a built-in query language)
Example: (Using the Oracle SQL command-line processor named SQL*Plus)
-- Display the salary of each employee ß SQL*Plus comment
SELECT salary FROM employee; ß query table employee in current/default schema
SQL is the name of the international RDBMS language standard
SQL includes the sub-languages DDL & DML (& DCL covered later)
SQL also has constraint, security, and other commands included in the DDL
CREATE table and CREATE view commands
Simplest form of Oracle CREATE TABLE and CREATE view syntax –
- IF a table with specified name does not exist in a schema,
Creates new table (no rows are initially stored, because create is an independent transaction)
OTHERWISE, an error msg is displayed indicating that a table (or other object in the same DB
namespace) with the same name currently exists in this schema
CREATE TABLE tableName
(column1Name type, ß ‘type” is pre-defined by Oracle OR user-defined
: (number, date are pre-defined type examples)
columnjName type); (Comma required between ‘type’ and next ColumnName)
Oracle view creation – a views is a “virtual” table: it can be queried like an ordinary table
- As with table creation, the named view is assumed to not exist prior to creation,
OTHERWISE, and error msg is displayed indicating that a view with the specified name
currently exists in this schema

CREATE view viewName AS ß “AS” is a Required keyword (it separates the SELECT)
SELECT t.c1, v.c2h
FROM t, v; Can be any legal SQL query;
In this example, t and v are an existing table and
view, with respective columns named c1 and c2.
Note: the view inherits specified columnNames of the
tables/views in the FROM clause
Note: For any table or view (assume named myobj)
SQL> describe myobj {displays its schema (=column names, Oracle data types
and nullity constraint (NOT NULL), if any}
Three-level Architecture E/C/I applied to RDBMSs
1. External (E) level: DD info for user views & system views
=> DB apps should use views rather than tables, as much as possible
- View query results are accessed from tables that store query data
- By default, results of a query on a view are displayed, but NOT stored anywhere

2. Conceptual (C) level : DD definitions of all data of interest to the organization


and users, independent of physical storage details;
In RDBMS, tables, constraints, etc., are defined at this level
3. Internal (I) level : DD descriptions of how DB data is stored physically on disk.
Storage definitions for conceptual items are defined in the DD at this level.
Examples: physical file formats, and space management for
- storing rows < -- how much storage is allocated, and where allocated
(An Oracle tablespace defines/describes much of this physical detail)
Also, search structures, such as indexes < -- called “access methods” (AMs)
Default object Naming: When an Oracle database object “o” is created, by default,
the name for o entered into the DD uses Upper Case for alphabetical characters
Example: create table xyZW27 . . . stores the table’s name as: XYZW27 into the DD
Later in course, mention quoted identifiers of form “x Y3”; such names stored in DD with
(lower vs. upper) case retained. Oracle default is “regular” identifiers not surrounded
by quotes. Note: regular identifier xY3 is stored in DD as XY3 (and blanks are illegal)
three-level Architecture cont…
DD maintains translation mappings shown in diagram
User 1 User 2 User 3 User 4

External External External External


Schemas View A View B
DD
View C

External/Conceptual Mappings

Data Model
(ConceptualLayer)
Relational Database
Management
Conceptual/Internal System
Mapping (RDBMS)

Storage formats
(InternalLayer)

From now on, unless stated otherwise, “RDBMS” means the Oracle RDBMS
(Except for SQL Standard, RDBMSs differ in operational working of transactions, storage management, etc.)
A small part of the Oracle DD is associated with each user/account.
User-specific parts of the DD are a user’s schema (contains that user’s tables, views, etc.)
External (E) Level
Ideally, RDBMS apps should access data using views rather than tables
In oracle SQL, by default, the ONLY views user U can access are
1) views created by U
( Note: the creator of an Oracle object owns that object ), and
2) views for which U was granted access by another user

{ Access rules 1) and 2) above also are enforced for tables }

< The Oracle object access model is derived from the “Take/Grant”
protection model originally invented for OS protection in 1970s >
Important DD views used very frequently by an Oracle user:
user_views is a view queryable in each user U’s schema;
it has info about all views owned by U {for 1) above}

all_views also queryable in each U’s schema: includes


user_views content, and also includes info about all views for
which U has been granted SELECT access -- 2) above
Conceptual (C) Level
• An abstract representation of information content of DB
• The definitions of conceptual records involve information and structure
definitions about both system and user objects in the DB
-----------------------------------------------------------------------------------------------------------------------------------

Example: user_tables is a view in each user U’s schema; it has info about all
tables owned by U
To repeat: RDBMS “table” format is an illusion for end-user convenience/simplicity
-------------------------------------------------------------------------------------------------------------------------------------------------------------

Example: a view in each user U’s schema displaying conceptual level table
info about tables owned OR accessible by U;
The following view query filters results by eliminating all non-answer rows:
SELECT tableName, status
FROM all_tables
WHERE table_name like ‘X%’;
DD view all_tables built-in to each user’s schema
= all tables in DB user U can access
Condition that “filters” query results:
Result is: All <tableName, owner> pairs for
tables whose name starts with “X” ß upper case
Internal (I) Level
• Defines mappings of C level objects-- > their physical DB representations

• It is one level above physical disk (the ‘on-disk storage’ format)

• The internal schema level:


– defines the various types of the stored row columns
– defines search indexes and, more generally, the available AMs
– specifies placement, sequence, cluster details for stored rows

• Examples: user_indexes and user_objects are


DD views in each user schema for its indexes, objects, etc.
----------------------------------------------------------------------------

Oracle-specific notes (another RDBMS system might have different such rules):
Each table’s data is physically stored in OS files named “xxx.dbf, yyy.dbf, …”
owned by Oracle; data is normally accessed when using SQL as an
RDBMS table or view or other type of Oracle object
User U can access an item only if U has the required privileges for
attempted operations on the item
More about mappings between schema layers
• The conceptual/internal mapping:
– specifies mapping from conceptual level objects to their stored counterparts
Ex: Each oracle table “T” is implemented (means physically represented)
as its row data stored in part of an OS file “f” owned by OS user “oracle”;
the DD and storage block header info keeps track of the storage location of T’s
row data
{the unit of storage is a ‘database block’, size somewhat larger than an OS block}

• The external/conceptual mapping:


– For a given view “V”, the map specifies the conceptual level items, such as
tables and columns, that implement each column of V
Ex: A view named V with column pname is defined on the p_name column of table
schema person(p_name, address, bdate, . . .)
At execution time, the view query “Qv”
SQL> select pname from v;
returns the p_name column value of each row in table person

An RDBMS query optimizer (qo) will automatically use above mappings:


Example: a QEP for the above view query Qv can determine
a) the physical table to be accessed to solve Qv
b) data AMs used to find the exact data solution
Data Independence – the concept
• In FMS, application programs must program file structure details
> Suppose file “f” contains record r definition “r_declare:
r_declare: r{fld1,ty1, … fldn,tyn} /* Field names (fldi) and types (tyj) */
a) Assume many application programs appl1, …, applm access f
b) A field, named fld_new with type ty_new is added to r_declare

App Changes involve maintenance activity (aka re-programming)


Each app must be changed by re-coding (details vary by language):
1. r_declare must be changed
2. source code logic that computes field locations might change,
depending on where fld_new is located in r_declare
3. other changes, depending on individual app might be needed

These maintenance activities are EXPENSIVE to do, and


take developers away from working on new applications, etc.
• In RDBMS, a DBA has flexibility to change storage structure in response to
changing requirements => reduces need to modify existing tables/views

= > DD content and schema-layer mappings reduce maintenance costs


Illustration of PDI for a view
Given: 1. view employee_view based on existing employee table, and
2. column SSN_last4 based on a substring of column employee.SSN

select SSN_last4, DateofBirth


from employee_view SSN_last4 values
where SSN_last4 > 3287; based on SSN column of employee

create employee (SSN char(9),


DateofBirth date,
: employee table schema

DepartmentNumber char(2));
Note: A cast operator on SSN_last4 can define SSN_last4 desired format
Illustration of physical data independence
At some time after the employee table was created, a DBA (or table admin.) does:
create index e_index on employee(SSN);
1) employee table and 2) employee_view definitions do not need change;
Benefit: many queries on employee & employee_view involving SSN will run faster
Suitable Mappings between the 3 schema levels provide Data
Independence
Data Independence reduces Impact from Changes across schema layers
• A change to the storage structure definition might mean that the
conceptual/internal mapping must be changed. However,
IF the conceptual schema remains invariant/unchanged after a storage
structure change, we achieve
physical data independence (PDI)
Ex: When DBA adds another AM for table rows (such as an index),
no logical schema or external schema changes are needed,
thus, PDI is achieved

• A change to the conceptual definition might mean that the


conceptual/external mapping must be changed accordingly. However,
IF the external schema remains invariant/unchanged after a change in the
conceptual schema, we achieve
logical data independence (LDI)
Ex: A DBA increases length max for ‘HomeAddress’ column of a table “T”.
Any view v, that refers to T.HomeAddress would not need to be re-coded
LDI is achieved
(However, there are some situations for which a view needs re-definition
and re-compilation -- > later in course)
Examples of conceptual schema changes preserving LDI

An Oracle table, assume it is named tb, can be altered/changed in 2 ways,


neither of which effect existing view definitions on tb

Oracle syntax rules:


• Add a column “cl” to tb (by default, existing row cl values are set to NULL)
• Modify an existing column “ecl” by:
§ Increasing a character column width/length (string formatting)
§ Increasing the number of digits in a NUMBER column (numeric sizing)
§ Increasing (but maybe NOT decreasing) the number of decimal places in
a NUMBER column (numeric precision)

/* Add a new column */ ß Comment


alter table tb add ßalter table tb by adding new column
( new_col varchar2(8) ); ß with specified data type
{ More later on character data. char(8) is always length 8 (blank padded, if needed), but
each varchar2(8) item is stored as 1-8 bytes, with no padding, using only bytes needed;
PS: there is no advantage to char(someInt), none at all }
Note: Some column changes do not preserve LDI: for example, you cannot rename a
dependent table column without re-coding & re-compiling the views whose definitions use
the original column names
Summary of independence among schema levels

Physical Independence (PDI)

External Conceptual Internal


ß No changes needed here à IF Change here

Logical Independence (LDI)


External Conceptual Internal

No changes needed IF Change Might need to


ß here à here coordinate change
here
The translation & execution of a DBMS query
Each Oracle SQL statement does following processing steps:
{ For simplicity, query (rather than an arbitrary SQL statement) is used for illustration }
1. A user issues an query Qt (in source text representation)
(Ex: Qt: SELECT lastname from employee where salary > 5000;)
2. Qt is parsed and tested for legal syntax,
if errors, abort Q FootNote1
3. The DBMS inspects the external schema, the external/conceptual mapping,
the conceptual schema, the conceptual/ internal mapping, and the storage
structure definition (if schema map info is currently cached, this will be fast)

IF the query is well-formed, does not reference non-existent items, and


does not violate any constraints, security checks, etc, …,
generate query plan using qo,FootNote2 and goto 4.
ELSE Issue error msgs and abort Q
4. An executable form of Qt (call it Qe) is built and DB server executes it
(Oracle caches Qe, hoping that Qe is re-executed in near future)
5. The DBMS performs the physical operations specified by Qe on the
physical database, and returns results = (a list of lastname values)
FootNote1 - During ‘abort Q’, the SQL processor issues one or more error messages; an
application/script/tool/command-language that issued Q might or might not continue
(depends on the nature of the error)
FootNote2 – “qo” abbreviates query optimizer, the Oracle SQL processing component that
determines the fastest way to access the target data for Qt. qo is enabled, by default
A wider viewpoint - DBMS processing functionality
The database management system (DBMS) is the software that:

* handles all access to the database


process queries in the fastest and/or most efficient way);
(fast processing is critical for multi-GB or multi-TB DBs)

• handles transaction processing (tp) for each schema/user


By definition, a DB transaction is
{one or more DB operations processed as one logical unit by the RDBMS}

• implements concurrent tp among schemas/users; much more on this later

• is responsible for enforcing user authorization & security checks

Each prominent large scale RDBMS (Oracle, DB2, SQL Server, Informix, …)
can be considered a giant software engine running on the OS.

In general, no special hardware is needed to run RDBMS software – that is,


RDBMSs usually run on commodity OSs and hardware; one exception would
be special mass storage hardware for extremely large data sets
+ an Oracle database can be completely virtualized
(As is the case for the CSC204 course RDBMS server named cscoracle)
Database Administrator (DBA)
DBA is responsible for overall control of database system; DBA responsibilities include

n Deciding the information content


of the database, i.e. identifying the entities of interest to the enterprise and the information
to be recorded about those entities. This should take into account individual application
requirements as well as opportunities for sharing schemas;
-- Must know how to properly build a schema, so that it is relatively easy to
modify/extend it, as application requirements change

(later in the course, we study data normalization, a way to create a collection of table
schemas satisfying important RDBMS design objectives)

n Policies for storage structure, access strategy, and operations


i.e., how the data is to be represented by coding the storage structure definition.
The internal/conceptual schema is specified using the DDL

-- DBA must know how to analyze and create appropriate physical structures (indexes,
other structures, recovery from failure, archiving (means saving a record of some
particular transaction records, forever), other special requirements, etc.)
DBA cont...
n Liaising with users, i.e. to ensure that the data required is available and to
design the necessary external schemas and conceptual/external mapping (using
DDL);
For example, create and configure additional tablespaces, user schemas, solve
“outOfStorage” or “Quota exceeded” storage issues

n Give technical help when users have advanced questions problems with their
applications
Example: a web app having FrontEnd + App + DBMS structure,
App tier: Java/Python/C++ /C# … help is app developer’s job, not a DBA duty)

n Defining authorization checks and validation procedures. Authorization checks


and validation procedures are extensions to the conceptual schema and can be
specified using the DDL sub-language of SQL
For example, GRANT SELECT on tableName to userName;
gives user userName ability to query (but not update or delete) rows of tableName
DBA cont...
n Defining a strategy for backup. For example periodic dumping of the
database to a backup storage, and procedures for reloading the
database from backup

n recovery of transactions, most RDBMSs use a log file where each log
record contains the values for database items before and after changes,
per transaction; log is used for recovery purposes

There is an increasing number of applications that must save


( for years, or forever! ) all records that ever existed in the DB
à a huge data archiving requirement …
{Examples: insurance, medical, corporate, government, etc. records}

n monitoring performance and responding to changes in


requirements, i.e. changing details of storage and access;
re-organising the system to get the performance that is
‘best for the enterprise’

A DB server in a IT center often supports many simultaneous


1) DB connections and 2) active user and internal DBMS transactions

à Diagnose and solve user Access, Function & Performance issues


DBMS Capabilities and Facilities
Facilities as offered by different DBMSs varies. A DBMS should provide ALL
(not just some of) of the following capabilities. If a system does not implement
ALL of these capabilities, it might be a DB, but, it is not DBMS at all:

ü PDI and LDI


ü sophisticated view capabilities decoupled from conceptual schema
ü efficient data sharing and non-redundancy of data
ü transaction processing (tp) and efficient concurrent data access
ü integrity control (correct data and changes to data)
ü centralised or distributed architecture, depending on requirements
ü security policies specified by DBA and enforced automatically at runtime
ü performance/efficiency tools
ü automatic backup and recovery from failure

= = > There should be Tools and Utilities available to DBAs/Users for


dealing with all above facilities
DB Analysis - Stored data properties
Def - Redundancy in a DB of a user datum (single data item) is

• Direct if a value is a copy of another existing value

• Indirect if the value can be derived from other existing values


Ex: explicit storage of employee age (in years) not needed,
because it can be computed by: int (CurrentTime – DateOfBirth)

Þ Storing age values in a large DB of person records is a bad idea à


Þ every midnight, birthdays necessitate: +1 to some subset of user ages

Redundancy using copies – why do it?


Answer: Commonly done in large distributed databases

Advantages:
+ faster retrievals from multiple copies:
a local access (for example, on the server being used) is faster than a
non-local access to a remote server (that involves network
Disadvantages:
- slows down updates (the copy at each site must be changed)
- duplicates storage
- temporary data inconsistency is possible among sites during update
Minimizing/Reducing Data Redundancy
• In non-DBMS systems, each application has private files
-- Often results in redundancy in stored data, thus, wasted storage space
-- Early (of many years ago) systems did not share data easily
• In a DBMS, data is “integrated”
the database is a unification of many otherwise distinct data files;
an RDBMS table access by a user does not need to “see”, nor “program”
the files that actually contain the physical data
Look again at the section in the SQLPLUS204_s19.doc handout –
a tablespace storage map can involve any number of x.dbf files
(Ex: table X can be in tablespace ts1’s datafile f1.dbf and table Y can be in
tablespace ts1’s datafile f3.dbf)
– The view construct is a prominent technique for integrating data;
allows a table, “T”, to be visible in the schema that owns T
AND also visible to other authorized schemas
à Using views reduces redundancy because, instead of creating new
tables containing copies of columns from existing tables, define views
that use the data in existing (aka “base”) tables
Schema design principle: store each value in one PLACE, if possible
An example of inconsistency in Redundancy and How-to “hide” it
• Data redundancy can lead to inconsistency in the database unless redundant data is carefully processed

– The system must be aware of data duplication:


Many modern DB systems “hide” transient data inconsistencies during updates
(hiding means: make the inconsistencies ‘transparent’)

One way to implement hiding – provide a “consistent view” of each data set:
Suppose: - At time t1, there are 2 copies c1 and c2 of value v3
- At time t2 > t1, c1 is set to v4 ß {The hardware does NOT do the two updates
- At time t3 > t2, c2 is set to v4 simultaneously}

Assume user U1 can access only c1 and user U2 can access only c2
Then at any time t between t1 and t3, any other user U (different from U1 and U2) accesses v3 (but not v4)
And, after time t3, all users, U1, U2, and U, see the updated value v4
--------------------------------------------------------------------------
The point of the above example is: between times t2 and t3, the current values of the cj are
unequal, aka, ‘inconsistent’ DB values, and these inconsistent values are masked/hidden
from users U
Note – the value c3 is “stale” between times t2 and t3;

vs. FMS,

where >= 2 copies of file f (stored on the same or different servers)


might be inconsistent during a significant interval of time (possibly minutes or hours)
Data Integrity
Many different ways that data values can be incorrect:
n Data copies are inconsistent

n DB utilities load incorrect data and/or data entry clerks manually key in incorrect data
Input data checks (and correction of bad input) must be used to prevent incorrect data entry

n Integrity constraints must be enforced – these are assertions to be satisfied by each table update
(in Oracle, a component of the SQL processor deals with enforcing integrities)
Ex: an employee table’s manager column value should never reference a non-existent manager:
employee( id, name, . . . , manager, . . . )

n Converting data across different data types is another common cause of incorrect data
rounding/approximation errors, illegal target data type;
unexpected result values from a conversion that are not noticed, and carried forward, etc.
( If different DB app languages access the same DB, those languages that do cast/type conversion might
convert the same data differently )

+ Programming errors causing incorrect updates

n Possibility of corrupted data from accidental causes (accidental delete, for example)

n Intentionally destroyed or fabricated (i.e., fake data substituted) by security attacks

Thus, most integrity checks involve data Values and data Types
Note - some of the more serious corruptions (as side-effects of security breaches, for example,
result in sudden inaccessibility to data that should be accessible;
= => it is also possible that the DD itself has been compromised by a Security attack
Integrity cont... – more advanced forms
• A record type may have constraints on:
- the total number of allowable occurrences
(Ex: max number of authorized medical procedures per patient/year – depends on insurance policy)

• Who can insert /delete/modify a given record ?


Enforcement requires: authorization policies to be Defined and strictly, and continually Enforced

• Who “owns” specific data


Ownership also has the responsibility of managing data

• Hierarchy relationships are another form of constraint:


- EMPLOYEE type has field engineer as a subtype
(thus, a field engineer is, automatically, also an employee)
è Consequences for implementing operations such as: what happens if a parent object is deleted?

• A different kind of constraint originates from business rules:


Ex: Many medical, financial, government transactions must be kept, essentially “forever”
(for various legal, insurance, medical, etc. reasons);
these are retention integrities = = > original motivation for data warehousing

Constraints in general

• Constrains have been studied as a topic in their own right for many years
• They occur implicitly/explicitly in every software system except the simplest ones
• Object Constraint Language (OCL), created in the mid 1990s looks at all aspects of constraints:
pre/post-conditions, invariants, various OO-related constraints such as inheritance, associations, etc.
• Various parts of OCL have been incorporated into various UML diagramming systems
• As with many modern language environments, SQL has incorporated some (but not all) of OCL
DBMS Integrity - summary

Summary of: Data Integrity enforcement issues in RDBMS =


things that must be checked/tested when DB values are processed:

1. Redundancy checks (prevent or minimize: adding duplicate info)

2. Value, aka Range checks (disallow a value not in specified (min,max) range, or
not in a specified List of Values (LOV)

3. Loss of precision when data is stored in machine representation

4. Model constraints (RDBMS has 3 fundamental constraints, that, simply put,


deal with identifying/referencing table rows – much more on this next week)

5. Type checks (store decimal number to string location – what happens?)

6. Access checks and data corruption detection


DBMS analysis

The term “DD” has a much broader scope and meaning than
just physical DB storage of user data.
{E&N textbook does NOT make this clear in beginning of book}

On large DB projects, regardless of the DB classification, extensive Analysis must be done before
loading user data content. Incomplete or poor analysis will result in DB Design and Implementation
issues that result in much redo/re-work

Part of Analysis is: Naming the items to be stored

* Scope of the DB analysis problem


-- The information that must be gathered about data
For EVERY data field, must collect several of its properties --
the field’s meaning, how and when it is created,
and details about how each data item is accessed, processed

* The meaning of a datum might depend on a user’s context/viewpoint


-- In a large distributed DB
(data is stored at several separate servers that can be geographically separated),
Users at different servers might refer to a datum D by
different names (and even worse, meaning different things)
Example: The meaning of the identifier “cost” for item w –
Manufacturer on server s1 might mean cost C1 to produce w
Wholesaler on server s2 might mean cost C1 + (distribution markup C2)
Retailer on server s3 might mean cost C1 + C2 + (retail margin markup) = C1 + C2 + C3
Thus, w.cost has several meanings = > Potential for confusion/error
Integrity as a DB design issue
• Centralized control of the database helps maintain integrity
– permits the DBA to define validation procedures to be carried out
whenever any update operation is attempted by any user and any
application
(update includes modification, creation and deletion)

• Integrity is crucial in DB use for any mission-critical appl.


– an application run without validation procedures can produce
erroneous data that can affect other applications using that data
… medical, airline passenger, financial, … ß human-critical apps

Untrained/Unaware/Careless DB developers produce


DB project disasters: Poor Design and/or Implementation
Ignoring integrity is perhaps “understandable” - it is difficult to think
about and design for it, and there is runtime overhead for its
enforcement (means: execution time penalty)

DB Analysis tools: ER, UML, table-formatted DD, etc.


Client/Server Architecture
Client/Server refers to “logical” division of functionalities –

Client functionality involves user interaction with the DBMS:


The user interface, and software and processing available to the user;
Runs on client’s machine

Server functionality involves the core DBMS facilities such as shared data access,
SQL processing, transaction processing, DD services, etc.

Client & Server functionalities referred to above can be implemented physically


in many ways (some approaches are outlined in Chapter 2).
For example, data storage and data processing:
- is most-simply implemented with one server and a collection of disks
- an opposite extreme would be processing with n servers and a data storage system
involving RAID disk or some other storage system such as SAN

FROM Oracle Corp documentation (Oracle 7, Server Concepts Manual, A7673-01):


“In the Oracle client/server architecture, the database application and the database are
separated into two parts: a front-end or client portion, and a back-end or server portion. The
client executes the database application that accesses database information and interacts with
a user through the keyboard, screen, and pointing device such as a mouse. The server
executes the Oracle software and handles the functions required for concurrent, shared data
access to an Oracle database. Although the client application and Oracle can be executed on
the same computer, it may be more efficient when the client portion and server portion are
executed by different computers connected via a network.”

Anda mungkin juga menyukai