Anda di halaman 1dari 11

Indexing OLAP Data

Sunita Sarawagi

Monowar Hossain
York University

Agenda
Requirements on Indexing methods
Existing indexing methods
Optimization of R-Tree for OLAP data
R-Tree VS Bit-mapped Indices
Conclusion

Requirements on Indexing methods


Symmetric partial match queries
Continuous e.g. time between Jan to July 94
Discontinuous e.g. first month of each year

Indexing

at multiple levels of aggregation

Pre-computation group-bys
Indexing summary data

Handing

multiple traversal orders


Efficient batch update
Handling sparse data efficiently

Existing methods
Multidimensional

array-based methods

Works efficiently when data is dense


Essbases schema

E.G. four dimensional cube : product and store (sparse), time


and scenarios ( dense)
B-tree on Product and Store
Two-dimensional array on time and scenarios

Evaluation of Essbases schema

May cause multiple searches.


E.g. searching store = something on product-store index

Performance depends on ability to find enough dense


dimensions.
Efficient batch update

Existing methods Cont...


Bit

mapped indices

Pros:

Low cardinality data, bit maps are both spaced and retrieval
efficient.
Supports bitwise operations
Access data is clustered
All dimensions handles symmetrically

Cons

Range queries
Increased space overhead of storing the bit-maps specially for
high cardinality data
Expensive batch update as all bit mapped indices have to be
modified even for a single row insertion

Existing methods... Cont


Bit-mapped indices variants
Compression
Hybrid
Dynamic Bit-maps

Existing methods... Cont


Hierarchical

Indices

Example: Product - Store

Index product first also store summaries on product level.


For each product value, create index for Store and store
summaries for product-store level

Pros:

Allows faster access to higher levels data


Dimensions are symmetrically handled

Cons:

Widely used index storage overhead


The average retrieval efficiency can suffer because of large
indexing structure

Existing methods Cont


Multidimensional indices
Use of of the indexed methods designed for

spatial data

E.g RTree, GridFiles etc.

Optimized R-Tree of OLAP data


Rectangular

dense region (only the boundaries that


contain more than threshold number of points
Contains a pointer to variable length array of (TIDs or

the tuples itself)


Points in sparse regions
Finding

dense regions

Ask Expert?
Use of clustering algorithm (similar algorithm: image

analysis)
Need

evaluation!!

R-Tree VS Bit-mapped indices


R-Tree Pros:
Allows range queries
Smaller space overhead
Update is more efficient

Bit-mapped

Pros:

Faster Bit-wise operation


Efficient for low cardinality, few restricted

dimensions, and sparse data.

Conclusion
High level overview
Recommended readings
MOLAP VS OLAP
R-Tree and variants
R-Tree alternatives
Computational of multidimensional aggregates
And More..

Anda mungkin juga menyukai