Anda di halaman 1dari 65

1

Character Field Record File Database

The most basic logical data element is the character, which consists of a single alphabetic, numeric, or other symbol. From a users point of view, a character is the most basic element of data that can be observed and manipulated.

The next higher level of data is the field, or data item. A field consists of a grouping of characters. For example, the grouping of alphabetic characters in a persons name forms a name field, and the grouping of numbers in a sales amount forms a sales amount field Specifically, a data field represents an attribute (a characteristic or quality) of some entity (object, person, place or event). For example, an employee salary is an attribute that is a typical data field used to describe an entity who is an employee of a business

Related fields of data are grouped to form a record. Thus, a record represents a collection of attributes that describe an entity. An example is the payroll record for a person, which consists of data fields describing attributes such as the persons name, Social Security Number, and rate of pay. Fixed-length records contain a fixed number of fixed-length data fields. Variable-length records contain a variable number of fields and field lengths.
5

A group of related records is a data file, or table. Thus, an employee file would contain the records of the employees of a firm. Files are frequently classified by the application for which they are primarily used, such as a payroll file or an inventory file, or the type of data they contain, such as a document file or a graphical image file. Files are also classified by their permanence, for example, a payroll master file versus a payroll weekly transaction file. ..
6

A transaction file, therefore, would contain records of all transactions occurring during a period and might be used periodically to update the permanent records contained in a master file. A history file is an obsolete transaction or master file retained for backup purposes or for long-term historical storage called archival storage.

A database is an integrated collection of logically related records or objects. An object consists of data value describing the attributes of an entity, plus the operations that can be performed upon the data. A database consolidates records previously stored in separate files into a common pool of data records that provides data for many applications. The data stored in a database are independent of the application programs using them and of the type of secondary storage devices on which they are stored.

Entity: A person, place, object, event or


concept in the user environment about which the organization wishes to maintain data. common properties or characteristics.

Entity type: A collection of entities that share


Relationship: It is an association representing an interaction among the instances of one or more entity types that is of interest to the organization.
9

Attribute: A property or characteristics of an


entity type that is of interest to the organization. Primary Key: An attribute that uniquely (or combination of attributes) identifies each row in a relation.

Secondary Key: One field or combination of


fields for which more than one record may have the same combination of values, also called a non-unique key.

10

Normalization: The process of decomposing

relations with anomalies to produce smaller, wellstructured relations.

Foreign Key: An attribute in a relation of a database


that serves as the primary key of another relation in the same database.

Schema: A collection of table that are grouped


together for a particular purpose.

11

12

Data previously stored in separate files have been integrated in to a single database structure. Also the metadata that describe this data are shown residing in the same structure. The DBMS provides the interface between the various database applications for organizational users and the database. The DBMS allows users to share the data and to query access and update the stored data.
13

Sales Department

Accounting Department

DataBase Applications

DBMS

Personnel Department Meta Data


Customers

Employees
Products Orders
14

Shared :Data in a database are shared among different users & applications Persistence: Data in a database exist permanently in the sense the data can live beyond the scope of the process that created it. Validity/ Integrity /correctness: Data should be correct with respect to the real world entity that they represent. Security: Data should be protected from authorized access.
15

Consistency: Whenever more than one data element in a database represents related realworld values, the values should be consistent with respect to the relationship Non-redundancy: No two data items in a database should represent the same real world entity. Independence: The three levels in the schema (Internal, conceptual and external) should be independent of each other so that the changes in the schema at one level should not affect the other levels

16

Data & Data Base Administrators

System Developers

End Users

CASE Tools

User Interface

Application Programs

Repository

DBMS

Database

17

CASE (Computer Aided Software Engineering) Tools: Repository:


Automated tools used application programs. to design databases

and

DBMS:

Centralized knowledge base for all data definitions, data relationships, screen and report formats and other system components. A repository contains an extended set of metadata important for managing databases as well as other components of an information system.

Software that is used to create, maintain and provide controlled access to user database.

18

Database:

Application program: User interface:

An organized collection of logically related data, usually designed to meet the information needs of multiple users in an organization. It is important to distinguish between the database and repository. The repository contains definitions of data, whereas the database contains occurrences of data. Computer programs that are used to create and maintain the database and provide information to users. Languages, menus, and other facilities by which users interact with various system components, such as CASE tools, application programs, the DBMS, and the repository.

19

End-users:
Persons throughout the organization, who add, delete and modify data in the database and who request or receive information from it. All users interaction with the database must be routed through the DBMS. In summary, DBMS operational environment is an Integrated system of hardware, software and people that is designed to facilitate the storage, retrieval, and control of the information resource and to improve the productivity of the organization

20

Data and database administrators:


Data administrators are persons who are responsible for the overall management of data resources in an organization. Database administrators are responsible for physical database design and for managing technical issues in the database environment.

System Developers:
Persons such as system analysts and programmers who design new application programs. System developers often use CASE tools for system requirements analysis and program design.
21

Program data Independence: Minimal Data redundancy Improved data consistency Increased productivity of application development Improved Data quality Improved data accessibility and responsiveness Reduced program maintenance Improved decision support

22

New specialized personnel are needed Installation complexity and management cost occur Conversion costs are high Need for explicit backup and recovery Organizational conflict

23

To provide greater independence between programs and data, thereby reducing maintenance costs. To manage increasingly complex data types and structures. To provide easier and faster access to data for users who have neither a background in programming languages for a detailed understanding of how data are stored in databases. To provide more powerful platforms for decision support applications.
24

25

The most common database model for new systems defines simple tables for each relation and many-to-many relationships. Cross reference key link the tables together, representing the relationships between entities. Primary and Secondary key indexes provide rapid access to data based upon qualifications. Most new applications are built using relational DBMSs and many relational DBMS products exist. RELATION 1 (PRIMARY KEY, ATTRIBUTES )

RELATION 2 (PRIMARY KEY, FOREIGN KEY, ATTRIBUTES )


26

Relation: A two dimensional table containing rows and columns of data. Relational Data Model: A data model representing data in the form of tables. It was introduced in 1970 by E.F. Codd. It is based on the mathematical motion of a relation. Consisting of rows and columns of data.
Columns are called attributes Rows are called tuples

27

Advantages:
Structural independence Improved conceptual simplicity Easier database design, implementation, management and use. A powerful DBMS

Disadvantages:
Substantial Hardware and System Software overhead. Poor design and implementation.

28

29

Continuing developments in information technology and its business application have resulted in the evolution of several major types of databases. There are six major conceptual categories of database that may be found in computer-using organization.

Operational Databases Analytical Databases: Data warehouses Distributed Databases End User Databases External Databases

30

Diagram

31

These databases store detailed data needed to support the operations of the entire organization. They are also called subject area data bases (SADA), transaction databases, and production databases. Examples are a customer database, personnel database, inventory database and other databases containing data generated by business operations.
32

These databases store data and information extracted from selected operational and external databases. They consist of summarized data and information most needed by an organizations managers and other end users. Analytical databases are also called managerial databases or information databases. They may also be called multidimensional databases, since they frequently use a multidimensional database structure to organize data. These are the databases accessed by the online analytical processing (OLAP) systems, decision support systems and executive information systems.

33

A data warehouse stores data from current and previous years that has been extracted from the various operational databases of an organization. It is a central source of data that has been screened, edited, standardized, and integrated so it can be used by manager and other end user professionals for a variety of forms of business analysis, market research and decision support. Data warehouses may be subdivided into data marts, which hold specific subsets of data from the warehouse. A major use of data warehouse databases is data mining. In data mining, the data in a data warehouse are processed to identify key factors and trend in historical patterns of business activity. This can be used to help managers make decision about strategic change in business operations to gain competitive advantage in the market place.

34

Many organizations replicate and distribute copies of parts of databases to network servers at a variety of sites. These distributed databases can reside on network servers on the World Wide Web, on corporate intranets or extranets or on other company networks. Distributed database may be copies of operational or analytical databases, hypermedia or discussion databases, or any other type of database. Replication and distribution of database is done to improve database performance and security. Ensuring that all of the data in an organization distributed databases are consistently and concurrently updated is a major challenge of distributed database management.
35

These databases consist of a variety of data files developed by end users at their workstations. For example, users may have their own electronic copies of document they download from the World Wide Web, generate with word processing packages, or receive by electronic mail. Or they may have their own data file generated from using spreadsheet and DBMS packages.

36

Access to a wealth of information from external databases is available for a fee from commercial online services, and with or without charge from many sources on the internet, especially the World Wide Web. Web sites provide an endless variety of hyperlinked pages of multimedia documents in Hypermedia databases for you to access. Data are available in the form of statistics and economic and demographic activity from statistical data banks. Or you can view or download abstracts or complete copies of hundreds of newspapers, magazines, newsletters, research papers, and other published material and other periodicals from bibliographic and full text databases.
37

Data Definition Language is used to create, alter and delete database objects. The commands used are CREATE, ALTER and DROP.

38

Data Manipulation Language commands let users insert, modify and delete the data in the database. Three data manipulation statements INSERT, UPDATE & DELETE.

39

The Data Control Language consists of commands that control the user access to the database objects. Thus DCL is mainly related to the security issues. That is, determining who has access to the database objects and what operations they can perform on them. The task of the DCL is to prevent unauthorized access to data. The Database Administrator (DBA) has the power to give and task the privileges to a specific user, thus giving or denying access to the data. The DCL commands are GRANT and REVOKE.

40

Features of RDBMS Relational Algebra Data Dictionary Normalization Integrity Relational database languages Database administration Indexing

41

Relational Algebra operations manipulate relations. That is, these operations use one or two existing relations to create a new relation. This new relation may then be used as input to a new operation. This powerful concept the creation of new relations from old ones makes possible an infinite variety of data manipulations. It also makes the solution of queries considerably easier, since we can experiment with partial solutions until we find an approach that will work.
42

Consists nine operations: union, intersection, difference, product, select, project, join, divide and assignment. The first four of these operations are taken from mathematical set theory and are largely the same as the operations found there. The next four are new operations that apply specifically to the relational data model. The last operation assignment is the standard computer language operation of giving a name to a value.
43

Union The union operation (U) allows us to combine the data from two relations. Intersection: The Intersection operation () allows us to identify the rows that are common to two relations. Difference: The difference operation (indicated by a minus sign) () allows us to identify rows that are in one relation and not in another
44

Product: The product operation, indicated by the * symbol, creates the Cartesian product of two relations. Select: The select operation is used to create a relation from another relation by selecting only those rows from the original relation that satisfy a specified condition.

45

Project: If the select operation may be thought of as eliminating unwanted rows, the project operation can be thought of as eliminating unwanted columns. Join: The join operation is used to connect data across relations perhaps the most important function in any database language

46

Divide: Divide is a relational algebra operation that creates a new relation by selecting the rows in one relation that match every row in another relation. Assignment: Assignment is a relational algebra operation that gives a name to a relation.

47

An effective database system will allow growth and modification in the database without compromising the integrity of its data. The data dictionary / directory (DD/D) aids the accomplishment of this objective by allowing the definitions of data to be maintained separately from the data itself. This allows changes to be made to the data definitions with no effect on the stored data. For example, the subschema used by a particular program could be modified without in any way affecting the stored data.
48

The process of decomposing relations with anomalies to produce smaller, well structured relations. The main goals of normalization are,
Minimize data redundancy, thereby avoiding anomalies and conserving storage space. Simplify the enforcement of integrity constraints. Make it easier to maintain data (insert, update and delete) Provide a better design that is an improved representation of the real world and a stronger basis for future growth.

49

First normal Form:

Normalization can be accomplished and understood in stages, each of which corresponds to a normal form. A normal form is a state of a relation that results from applying simple rules regarding functional dependencies to that relation. Any multi values attributes have been removed, so there is a single value at the intersection of each row and column of the table. Any partial functional dependencies have been removed. Any transitive dependencies have been removed.

Second normal Form:

Third normal form:

50

Boyce_Codd normal form:

Fourth normal form:

Any remaining anomalies that result from functional dependencies have been removed.
Any multi-valued dependencies have been removed. Any remaining anomalies have been removed

Fifth normal form:

51

Integrity is concerned with making certain that operations performed by users are correct and maintain database consistency. A condition or restriction that is applied to a particular set of data is commonly termed an Integrity control or constraint. We will consider the following constraints. The integrity constraints provide a logical basis for maintaining the validity of data values in the database, thus preventing errors in database, updating and information processing.
52

Structured Query Language (SQL) is the standard command set used to communicate with the relational database management systems. All tasks related to relational data management creating tables, querying the database for information, modifying the data in the database, deleting them, granting access to users and so on can be done using SQL.

53

Database administration is basically concerned with ensuring that accurate and consistent information is available to users and applications when needed and in the form required. The DBA interacts with both the system and users. Some organizations have split the responsibility for managing information system resources between a data administrator (DA) and a database administrator (DBA).

54

There are four types of indexes: Unique primary index (UPI), Which is an index on a unique field, possibly the primary key of the table, and which not only is used to find table rows based on this field value, but also used by the DBMS to determine where to store a row based on the primary index field value. Non-unique Primary Index (NUPI), which is an index on a non unique field and which not only is used to find table rows based on this field value, but also is used by the DBMS to determine where to store a row based on the primary index field value.
55

Unique Secondary Index (USI), which is an index on a unique field and which is used only to find table rows based on this field value. Non-unique Secondary Index (NUSI), which is an index on a non unique field and which is used only to find table rows based on this field value.

56

Data warehousing is the process whereby organization create and maintain data warehouses and extract meaning and inform decision making from their informational assets through these data warehouses.

57

Subject-Oriented: Integrated:

A data warehouse is organized around the key subjects of the enterprise. Major subjects may include customers, patients, students, product and time. The data housed in the data warehouse are defined using consistent naming conventions, formats, encoding structures and related characteristics gathered from several internal systems of record and also often from sources external to the organization. Data in the data warehouse contain a time dimension, so that they may be used to study trends and changes. Data in the data warehouse are loaded and refreshed from operational systems, but can not be updated by end users.

Time-Variant:

Non updatable:

58

Need for a company wide view Need to separate operational and Informational Systems

59

No single system of record Multiple systems are not synchronized: Organizations want to analyze the activities in a balanced way: Customer Relationship Management: Supplier Relationship Management:

60

A data warehouse centralizes data that are scattered throughout disparate operational systems and makes them readily available for decision support applications. A properly designed data warehouse adds value to data by improving their quality and consistency. A separate data warehouse eliminates much of the contention for resources that results when informational applications are confounded with operational processing.

61

Building the architecture requires four basic steps. Data are extracted from the various internal and external source system files and databases. In a large organization, there may be dozens or even hundreds of such files and databases. The data from the various source systems are transformed and integrated before being loaded into the data warehouse. Transactions may be sent to the source systems to correct errors discovered in data staging. The data warehouse is a database organized for decision support. It contains both detailed and summary data. Users access the data warehouse by means of a variety of query languages and analytical tools. Results may be fed back to data warehouse and operational databases. Extraction and loading happens on a periodic basis, sometimes daily, weakly or monthly.

62

Source Data System


Extract

Data Staging Area


Processing Clean match combine remove dups standardize transform export to DW
Load

Data & Meta data Storage Area


Load

End_ user Presentation Tools


Adhoc Query Tools Report Writers

Internal

Data Ware house End-user Applications Modeling & Mining tools Visualization Tools

Extract

External

Model / Query results

63

More cost effective decision making Better enterprise Intelligence Enhanced Customer service Business Reengineering Information System Reengineering

64

Data mining uses a variety of techniques to find hidden patterns and relationships in large pools of data and infer rules from them that can be used to predict future behavior and guide decision making. Data mining is often used to provide information for targeted marketing in which personalized or individual messages can be created based on individual preferences. These systems can perform high level analyses of patterns or trends, but they can also drill down to provide more detail when needed.

65

Anda mungkin juga menyukai