OLAP
Dimension is an attribute or an ordinate within multidimensional structure with a list of values. On-line Analytical Processing (OLAP) is a technique used for providing management decision support using historical and summarized data that is consolidated in the data warehouse.
University student data with the dimensions degree, country, scholarship and year given below(Multidimensional view):
Degree country Australia India Malaysia BSc 5 10 5 LLB 20 0 1 MBBS 15 15 10 B.com BIT 50 25 12 11 17 23 ALL 101 67 51
The above table is for the year 2000. We have collect data for other years as well.
1. Definition - E. F Codd
Is a dynamic enterprise analysis required to create, manipulate, animate and synthesis information from exegetical, contemplative and formulaic data analysis models. Information is manipulated from point of view of a manager (exegetical), from the point of view of someone
OLAP Features
4.Design : OLTP is application-oriented which view enterprise data as a collection of tables; OLAP system is
OLTP users function DB design data clerk, IT professional day to day operations application-oriented current, up-to-date detailed, flat relational isolated repetitive read/write index/hash on prim. key short, simple transaction tens thousands 100MB-GB transaction throughput
OLAP knowledge worker decision support subject-oriented historical, summarized, multidimensional integrated, consolidated ad-hoc lots of scans complex query millions hundreds 100GB-TB query throughput, response
8
FASMI Characteristics
Derived from first letters of OLAP systems:
Fast OLAP queries to be answered quickly like search
10
2. Accessibility Should be in between data sources and an OLAP front-end. 3. Batch extraction vs interpretive OLAP system to provide
multidimensional data staging plus partial pre-calculations of aggregates
5. Storing OLAP results Results not to be kept separate from source data.
Read-write OLAP applications should not be implemented directly on transaction data.
11
13
Degree country Australia India Malaysia Degree country Australia India Malaysia Degree country Australia India Malaysia
B.com BIT 50 25 12 11 17 23
ALL For year 2000 101 67 51 ALL 96 61 64 ALL 197 128 115
14
B.com BIT 53 22 19 10 13 20
4. Data Cube
Number of students as a function of country, degree and semester
Dimensions: country, degree, sem Hierarchical summarization paths continent school Year
country
region country
semester
15
Each edge of the cube is called a dimension. A user therefore has a multidimensional conceptual view of the data which is represented by the cube. The points inside a cube provide aggregations. For example, a point may provide the number of students from Malaysia admitted to BCom in year 1998. The cube is not always three-dimensional Each dimension may be associated with a table that describes the dimension. For example, a dimension table for country would contain the country
A cube is represented in three dimensions: country X degree X semester for any country (x), any degree (y) and any start semester (z). 2002 2001 2000 BSc LLB
Degree
All
Australia
All
12 19 10
30 0 2
31 32 29
103 47 31
21
197
India India
30 43
128
Malaysia
115
sum
71
degree_id.
68
B.sc 71
Country
Each of the edges in cube represents a dimension with members in degree B.Sc, LLB, MBBS, B.com, BIT.
All space gives total number of students joined in each course in respective
country. Measures called as semi-additive or non-additive as they cannot be combined.
Fact table contains measures (such as dollars_sold) and keys to each of the
related dimension tables Eight types of aggregations or queries possible are: Null, degrees, semester, country, degrees& semester , Semester & country , degrees & country, all 2n aggregation possible in n dimensions.
18
19
20
item
Sales Fact Table time_key item_key branch_key location_key units_sold dollars_sold avg_sales
item_key item_name brand type supplier_type
branch
branch_key branch_name branch_type
location
location_key street city province_or_street country
Measures
November 13
GKGupta 21
22
Data Cube
But in the two-dimensional situation, we dont just want to find out the number of students for any country (x) and any degree (y). We may have many other queries e.g. 1. How many students are doing MIT? 2. How many students from Thailand? 3. How many Asian students doing Law degrees? Thus there is kind of hierarchy that we wish to use, for example, the world, the continents, the regions, the countries etc. In degrees, we may want a hierarchy of university, Schools, UG and PG, individual degrees.
24
25
26
28
November 13
GKGupta
29
30
year.
32