Anda di halaman 1dari 32

Unit - 2

Online Analytical Processing (OLAP)


G. K Gupta, Second Edition

OLAP
Dimension is an attribute or an ordinate within multidimensional structure with a list of values. On-line Analytical Processing (OLAP) is a technique used for providing management decision support using historical and summarized data that is consolidated in the data warehouse.

A fact determined by combining dimension values.


Fact table is multidimensional and is a way to flatten a cube values of measures SQL command GROUP BY used as aggregation operator OLAP systems are data warehouse front-end to make aggregate data DW and OLAP based on a multidimensional conceptual view of enterprise data.
2

University student data with the dimensions degree, country, scholarship and year given below(Multidimensional view):
Degree country Australia India Malaysia BSc 5 10 5 LLB 20 0 1 MBBS 15 15 10 B.com BIT 50 25 12 11 17 23 ALL 101 67 51

The above table is for the year 2000. We have collect data for other years as well.

Use of spreadsheet has scalability problem as it is difficult to represent millions of


rows or with thousands of formulas. Data cubes generalize spreadsheets to any number of dimensions.

1. Definition - E. F Codd
Is a dynamic enterprise analysis required to create, manipulate, animate and synthesis information from exegetical, contemplative and formulaic data analysis models. Information is manipulated from point of view of a manager (exegetical), from the point of view of someone

who has thought about it (contemplative) and according


to some formula (formulaic)

OLAP Features

Four enterprise data model


Categorical - comparison of historical values

Exegetical - discovering reasons for what categorical


model found

Contemplative - what if analysis of the data


Formulaic - how to reach a desired goal
5

2. Characteristics of OLAP systems


1.Users : OLTP systems designed for office workers while the OLAP systems are deigned for decision makers.

2.Functions : OLTP systems are mission-critical which


support enterprise day-to-day operations; OLAP systems are called management-critical to support enterprise decisionsupport functions. 3.Nature: OLTP designed to process one record at a time;

OLAP is to deal with many customer records at a time to


provide summary or aggregate data to a manager.
6

4.Design : OLTP is application-oriented which view enterprise data as a collection of tables; OLAP system is

subject-oriented which view enterprise as multidimensional


5. Data : OLTP systems deal with current status of information eg: employee who left three years ago; OLAP requires historical data over several years 6. Kind of use: OLTP systems are used for read and write

operations; OLAP systems do not update the data.

OLTP users function DB design data clerk, IT professional day to day operations application-oriented current, up-to-date detailed, flat relational isolated repetitive read/write index/hash on prim. key short, simple transaction tens thousands 100MB-GB transaction throughput

OLAP knowledge worker decision support subject-oriented historical, summarized, multidimensional integrated, consolidated ad-hoc lots of scans complex query millions hundreds 100GB-TB query throughput, response
8

usage access unit of work # records accessed #users DB size metric

FASMI Characteristics
Derived from first letters of OLAP systems:
Fast OLAP queries to be answered quickly like search

engine. To achieve such performance is difficult. So a good


data structure and hardware to precompute most commonly queried aggregates. Analytic OLAP queries to be answered without any programming. Vendor tool used to cope with any relevant queries for application and user.
9

shared OLAP system is a shared resource but not shred by many


people; while accessed only by a group of managers and used by selected users. Concurrency control needed if users write or update data in the database Multidimensional: OLAP to provide multidimensional conceptual view of data that refers data as a cube with dimensions shown as parent/ child relationships. Information : OLAP obtain information from a data warehouse so as to handle large amount of input data.

10

Codds OLAP Characteristics


Codd in his 1993 paper lists the following 12 rules for evaluating OLAP products:

1. Multidimensional conceptual view - to make a variety of manipulations


(e.g. slice and dice) relatively easy.

2. Accessibility Should be in between data sources and an OLAP front-end. 3. Batch extraction vs interpretive OLAP system to provide
multidimensional data staging plus partial pre-calculations of aggregates

4. Multi-user support To provide normal database operations like retrieval,


update, concurrency control, integrity and security

5. Storing OLAP results Results not to be kept separate from source data.
Read-write OLAP applications should not be implemented directly on transaction data.
11

6. Extraction of missing values: Should distinguish missing values


from zero values as aggregates will be computed incorrectly. Large cubes may have large number of zeroes.

7. Treatment of missing values: Should ignore all missing values of


regardless of their source

8. Uniform reporting performance - consistent reporting


performance as the number of dimensions grows

9. Generic dimensionality - different dimensions should not be


treated differently.

10. Unlimited dimensions and aggregation levels - some


applications need as many as 15-20 dimensions. Allow unlimited dimensions.
12

3. Multidimensional view and Data Cube


A data warehouse is based on a multidimensional data model which views data in the form of a data cube. Consider the following database: Student(sid, name1, stu_name, country, DOB, address) Enrolment(sid, Degree_id, SSemester) Degree( Degree_id, Degree_name, Degree_length, Fee, Dept) Detailed Example in page 413 We consider a two-dimensional view is considered. ie country X degree

13

Degree country Australia India Malaysia Degree country Australia India Malaysia Degree country Australia India Malaysia

BSc 5 10 5 BSc 7 9 5 BSc 12 19 10

LLB 20 0 1 LLB 10 0 1 LLB 30 0 2

MBBS 15 15 10 MBBS 16 17 19 MBBS 31 32 29

B.com BIT 50 25 12 11 17 23

ALL For year 2000 101 67 51 ALL 96 61 64 ALL 197 128 115
14

B.com BIT 53 22 19 10 13 20

For year 2001

B.com BIT 103 47 31 21 30 43

Aggregates for both semesters

4. Data Cube
Number of students as a function of country, degree and semester
Dimensions: country, degree, sem Hierarchical summarization paths continent school Year

country

region country

ug/pg degree semester

semester
15

Each edge of the cube is called a dimension. A user therefore has a multidimensional conceptual view of the data which is represented by the cube. The points inside a cube provide aggregations. For example, a point may provide the number of students from Malaysia admitted to BCom in year 1998. The cube is not always three-dimensional Each dimension may be associated with a table that describes the dimension. For example, a dimension table for country would contain the country

names and could contain other information e.g. category.


Other dimensions like time do not naturally have such table of information.
16

A cube is represented in three dimensions: country X degree X semester for any country (x), any degree (y) and any start semester (z). 2002 2001 2000 BSc LLB

Degree

MBBS B.com BIT

All
Australia

All

12 19 10

30 0 2

31 32 29

103 47 31

21

197

India India

30 43

128

Malaysia

115

sum

71
degree_id.

68
B.sc 71

192 316 217 863


LLB 68 MBBS 192 B.Com 315 BIT 217
17

For the query : SELECT degree_id, count(*) FROM enrolment GROUP BY

Country

Each of the edges in cube represents a dimension with members in degree B.Sc, LLB, MBBS, B.com, BIT.

All space gives total number of students joined in each course in respective
country. Measures called as semi-additive or non-additive as they cannot be combined.

A data cube allows data to be modeled and viewed in multiple dimensions


Dimension tables, such as item (item_name, brand, type), or time(day, week, month, quarter, year)

Fact table contains measures (such as dollars_sold) and keys to each of the
related dimension tables Eight types of aggregations or queries possible are: Null, degrees, semester, country, degrees& semester , Semester & country , degrees & country, all 2n aggregation possible in n dimensions.
18

5. Data cube implementations


Solutions to aggregate and store the data are: 1. Pre_compute and store all: Millions of aggregates need to be computed and

stored. So indexing large amounts of data is also expensive.


2. Pre_compute (and store) none : done when a query is executed; does not need extra space for storing the cube but query response time is very poor. 3. Pre_compute and store some : Pre_compute and store most frequently queried aggregates. Let a be degree dimension, b be country, c be the starting semester the queries will be based on (ALL, ALL, ALL) , (a, ALL, ALL) , (ALL, ALL, c) , (ALL, b, ALL), (a, ALL, c) , (ALL, b, c), (a, b, ALL), (a, b, c) Data cube uses many techniques for pre_computing aggregates and store them.

19

20

Example of fact & dimension table


time
time_key day day_of_the_week month quarter year

item
Sales Fact Table time_key item_key branch_key location_key units_sold dollars_sold avg_sales
item_key item_name brand type supplier_type

branch
branch_key branch_name branch_type

location
location_key street city province_or_street country

Measures
November 13
GKGupta 21

OLAP implementation models


Relational OLAP (ROLAP) Use relational or extended-relational DBMS to store and manage warehouse data and OLAP middle ware to support missing pieces Include optimization of DBMS backend, implementation of aggregation navigation logic, and additional tools and services. Data warehouse provides multidimensional capabilities by representing data in fact table and dimension table. Advantage is it more easily used with existing RDBMS and data is stored without any fact table storage Disadvantage is it poor query performance

22

OLAP implementation models


Multidimensional OLAP (MOLAP) Based on Multidimensional DBMS (top-down approach) No standard approach to store and maintain data.

Array-based multidimensional storage engine (sparse matrix


techniques) fast indexing to pre-computed summarized data Hybrid OLAP (HOLAP) User flexibility, e.g., low level: relational, high-level: array Specialized SQL servers
23

Data Cube
But in the two-dimensional situation, we dont just want to find out the number of students for any country (x) and any degree (y). We may have many other queries e.g. 1. How many students are doing MIT? 2. How many students from Thailand? 3. How many Asian students doing Law degrees? Thus there is kind of hierarchy that we wish to use, for example, the world, the continents, the regions, the countries etc. In degrees, we may want a hierarchy of university, Schools, UG and PG, individual degrees.
24

Data Cube operations


A number of operations may be applied to data cubes. The common ones are: - roll-up (increasing the level of abstraction) - drill-down (increasing detail)

- slice and dice (selection and projection)


- pivot (re-orienting the view)

25

Roll-up (less detail)


Zooming out the data cube ie it performs further aggregation on the data Used in further abstraction (i.e. less detail). Eg: single degree programs to all programs offered by a school in single

countries to Continents or from three dimensions to two dimensions.


Drill-down (increasing detail) reverse of roll up, when we wish to partition more finely or want to focus on some particular values of certain dimensions. Drill-down adds more detail to the data, it may involve adding another dimension.

26

Slice and dice (selection and projection)


Slice operation performs a selection on one dimension of the cube (e.g. degree = MIT). The dice operation performs a selection on two or more dimensions

(e.g. degree = BIT and country = Australia or India)


Pivot (re-orienting the view) An alternate presentation of the data e.g. rotating the axes in a 3-D cube.

28

November 13

GKGupta

29

Data Cube Operations

30

Guidelines for OLAP


1. Vision : To be consulted with users with a clear vision including clearly defined, understood business objectives which is shared by stakeholders.

2. Senior management support : Supported by senior managers


3. Selecting an OLAP tool : Familiar with ROLAP & MOLAP tools required for enterprise. Some times combination of ROLAP & MOLAP that is cost effective 4. Corporate strategy: OLAP strategy to fit with the enterprise strategy and business objectives. 5. Focus on the users: Should be based on the technical or non-technical users based on personal skill & information needs 6. Join management : Jointly managed by IT and business professionals. Committee of people to be involved to provide ideas. 7. Review and adapt: Regular reviews of project required to ensure that the project meets the current need of enterprise.
31

Consider a university which spread across 5


countries whose number of students admitted for the courses like BSc, LLB, MBBS, B.com in 3 years from 2010. Construct the 2-Dimensional View for each years course entry and the Aggregates after three years. Finally develop a data cube with dimension country X degree X

year.

32

Anda mungkin juga menyukai