Tom McKinley (mac2@us.ibm.com) IBM Systems and Technology Group Lab Services
Whats in a name?
Industry refers to Grouping Sets and Super Groups (Rollup and Cube) collectively as Grouping Sets
Grouping Sets and Super Groups are sometimes referred to as Analytical SQL Grouping Functions
For clarity, this presentation will use the term Grouping Functions when collectively describing Grouping Sets, Rollups, and Cubes
SQL Group By: Specifies that an SQL Select statement returns rows grouped by one or more columns
EXAMPLE: For each quarter in 2006, show me the
7 19 83 1 14 3 30 680 110
4 2 4 3 3 4 4 1 4
Qtr, Units
Selection can be performed using any method available, including parallel enabled methods
Qtr1
Qtr2
Qtr3
Qtr4
Collect all common input values into a common bucket (by hash value)
4 2010 IBM Corporation
2 4 3 4 1 4 4 4 3
Use RRNs to access the table rows to gather the units information
Part
Units
Year
Qtr
7 19 83 1 14 3 30 680 110
4 2 4 3 3 4 4 1 4
RRNs
Given an index on Part, Year, Qtr, Apply local selection, and gather RRNs for each group of Quarters
Grouping Sets and Super Groups Defined Grouping Functions Syntax and Examples Grouping Functions Performance Considerations and Examples
Syntax dictates whether multiple sets in a query act in an additive or multiplicative fashion
10
SELECT COUNTRY, REGION, STORE, SUM(SALES) FROM TRANS GROUP BY GROUPING SETS ((COUNTRY, REGION), (COUNTRY, STORE))
Region NW NE NW SE SW -
Store Bobs Barbs Caining Menes Mills Pensk Shell Targe Toms Wally
1,310,000 100,000 350,000 770,000 400,000 500,000 300,000 200,000 140,000 440,000 150,000
2010 IBM Corporation
SELECT COUNTRY, NULL, STORE, SUM(SALES) FROM TRANS GROUP BY COUNTRY, NULL, STORE
12
Rollup Example
The ROLLUP operation generates a result set showing aggregates for a hierarchy of values in the selected columns
SELECT COUNTRY, REGION, SUM(SALES) FROM TRANS GROUP BY ROLLUP (COUNTRY, REGION) Logical equivalent SQL UNION syntax:
SELECT * FROM (SELECT COUNTRY, REGION, SUM(SALES) FROM TRANS GROUP BY COUNTRY, REGION UNION ALL SELECT COUNTRY, NULL, SUM(SALES) FROM TRANS GROUP BY COUNTRY, NULL UNION ALL SELECT NULL, NULL, SUM(SALES) FROM TRANS)
13 2010 IBM Corporation
Rollup Example
SELECT NULL, NULL, SUM(SALES) FROM TRANS SELECT COUNTRY, NULL, SUM(SALES) FROM TRANS GROUP BY COUNTRY, NULL Country Canada U.S.A. Canada U.S.A. SELECT COUNTRY, REGION, SUM(SALES) FROM TRANS GROUP BY COUNTRY, REGION U.S.A. U.S.A. U.S.A.
SELECT COUNTRY, REGION, SUM(SALES) FROM TRANS GROUP BY ROLLUP (COUNTRY, REGION)
Region NW NE NW SE SW
SELECT COUNTRY, REGION, SUM(SALES) FROM TRANS GROUP BY ROLLUP (COUNTRY, REGION) ORDER BY 1 ASC, 2 ASC
15
SELECT COUNTRY, REGION, SUM(SALES) FROM TRANS GROUP BY ROLLUP (COUNTRY, REGION) ORDER BY 1 ASC, 2 ASC
Region NW NE NW SE SW
U.S.A. -
3,250,000 3,350,000
16
SELECT COUNTRY, REGION, SUM(SALES) FROM TRANS GROUP BY ROLLUP (COUNTRY, REGION) ORDER BY 1 ASC, 2 ASC
Region NW NE NW SE SW -
17
Cube Example
The CUBE operation is a shorthand way to specify grouping over every possible dimension out of a given set of grouping columns
SELECT COUNTRY, REGION, SUM(SALES) from TRANS GROUP BY CUBE (COUNTRY, REGION)
Is equivalent to the following Grouping Sets syntax:
SELECT COUNTRY, REGION, SUM(SALES) FROM TRANS GROUP BY GROUPING SETS ((COUNTRY, REGION), (COUNTRY), (REGION), ())
18 2010 IBM Corporation
Cube Example
SELECT COUNTRY, REGION, SUM(SALES) from TRANS GROUP BY CUBE (COUNTRY, REGION) Logical equivalent SQL UNION syntax:
SELECT * FROM (SELECT COUNTRY, REGION, SUM(SALES) FROM TRANS GROUP BY COUNTRY, REGION UNION ALL SELECT COUNTRY, NULL, SUM(SALES) FROM TRANS GROUP BY COUNTRY, NULL UNION ALL SELECT NULL, REGION, SUM(SALES) FROM TRANS GROUP BY NULL, REGION UNION ALL SELECT NULL, NULL, SUM(SALES) FROM TRANS)
19 2010 IBM Corporation
Cube Example
Country
SELECT NULL, REGION, SUM(SALES) FROM TRANS GROUP BY NULL, REGION SELECT COUNTRY, NULL, SUM(SALES) FROM TRANS GROUP BY COUNTRY, NULL SELECT COUNTRY, REGION, SUM(SALES) FROM TRANS GROUP BY COUNTRY, REGION
Canada U.S.A. Canada U.S.A. U.S.A. U.S.A. U.S.A.
SELECT COUNTRY, REGION, SUM(SALES) from TRANS GROUP BY CUBE (COUNTRY, REGION)
Region
NE NW SE SW NW NE NW SE SW
Sum(Sales)
450000 1040000 550000 1310000 3350000 100000 3250000 100000 450000 940000 550000 1310000
20
GROUPING(B) returns a value of 0 for group 1: (A,B) GROUPING(B) returns a value of 1 for groups 2 and 3: (A,-),(-.-) Helps to differentiate between group 1 result set where B is NULL and result sets of groups 2 & 3
21
22
GROUPING SETS clause used in both. One set of parenthesis around all sets specifies an additive association. Parenthesizing each set individually also specifies an additive association GROUPING SETS clause with single parenthesis is Additive CUBE(A,B) expands to GROUPING SETS ((A,B), (A), (B), ()) ROLLUP(C,D) expands to GROUPING SETS ((C,D), (C), ()) GROUP BY occurs in an additive fashion to form: GROUPING SETS ((A, B), (A), (B), (), (C, D), (C), (), (E))
23
24
25
26
Create Radix Indexes for common Grouping Set and ROLLUP sets Run Visual Explain to view query rewrites Use Index Advised to find holes in indexing strategy and temporary index creates Could be replaced by a permanent index if index encapsulates predicate selection, grouping and aggregation columns. Index Advised can help with this Parallelism can greatly improve populate time of the hash table Could be replaced by a permanent index if index encapsulates predicate selection, grouping and aggregation columns. Index Advised can help with this Can imply that the environment is memory constrained Grouping function queries are generally more memory intensive as they often union multiple trees that create and interrogate temporary objects Use feedback tools to determine the Optimizers fair share of memory
Memory Constaints
27
No direct index advice Index advice via Snapshot data or Visual Explain
Enhanced SQE index advised 3020 rows to show multiple indexes for same table Temporary index created
Enhanced SQE index advised 3020 rows to show multiple indexes for same table Temporary index created
Query Optimization
Print SQL Information Messages
28
29
30
31
32
33
34
UNCHANGED SQE Plan Cache SQE Plan Cache Snapshots Summarized Database Monitor Data Debug Job Log Messages PRTSQLINF Messages
35
36
The MQTs can be used as follows: SELECT A, NULL, NULL, SUM_MQT1 FROM MQT1 UNION ALL SELECT NULL, B, C, SUM_MQT2 FROM MQT2 MQTs which provide a subset of the grouping sets asked for in the query can be unioned with a query that implements the remaining grouping sets
37
The MQT can be used as follows: SELECT A, B, C, SUM(SUM_MQT) FROM MQT GROUP BY ROLLUP(A, B, C) SUM(D) becomes a SUM(SUM) when the MQT is substituted. Other aggregate expressions can be done with similar compensations.
COUNT aggregate expression becomes SUM of the MQT COUNT MAX/MIN becomes MAX/MIN of the MQT MAX/MIN.
38
MQT: SELECT A, B, C, E, GROUPING(E) as GE, SUM(D) as SUM_MQT FROM T1 GROUP BY ROLLUP(A, B, C, E) The MQT can be used as follows: SELECT A, B, C, SUM_MQT FROM MQT WHERE GE = 1 Rows where GROUPING(E) = 1 are results for ROLLUP(A, B, C)
Optimizer must be able to differentiate between result set for GS(A,B,C,E) where E contains NULLs and GS(A,B,C,-), GS(A,B,-,-), GS(A,-,-,-) and GS(-,-,-,-) MQT would not be usable without GROUPING(E) in the MQT definition
39
Can be satisfied with MQT: SELECT A, B, C, SUM(D), GROUPING(A), GROUPING(B), GROUPING(C) as SUM_MQT FROM T1 GROUP BY GROUPING SETS((A), (B), (C)) MQTs specifying CUBE and ROLLUP tend to carry large volumes of data. Used when disk space is plentiful and:
Contain commonly used grouping permutations Very commonly run CUBE and ROLLUP queries
40
Summary
New Grouping Functions Allow for multiple Grouping operations in 1 query, usually using 1 pass through the data
GROUP BY GROUPING SETS((C1,C2,C3), (C1,C2,C3,C4),(C1,C2) GROUP BY ROLLUP (C1,C2,C3) GROUP BY CUBE(C1,C2)
Be careful not to Ask for more than you need. Use Visual explain to understand and tune query before you run it.
Index advise is Key
Consider MQTs
Solve these kinds of queries Provide great performance for expanded use of this grouping technology
41
42
CREATE ENCODED VECTOR INDEX STAR100G.YEARQTR ON STAR100G.ITEM_FACT (YEAR(ORDERDATE) AS ORDYEAR ASC, QUARTER(ORDERDATE) AS ORDQTR ASC); CREATE INDEX STAR100G.TOTALEXTENDEDPRICE ON STAR100G.ITEM_FACT (QUANTITY * EXTENDEDPRICE AS ASC); TOTEXTPRICE
* BEWARE, THERE ARE SOME RESTRICTIONS ON WHAT QUERY OPTIMIZER CAN USE
43
CREATE INDEX STAR100G.SHIPMODE ON STAR100G.ITEM_FACT (SHIPMODE ASC, ZIPCODE ASC ) RCDFMT SHPMODEZIP ADD CUSTNAME, SHIPADDR;
CREATE INDEX STAR100G.SHIPMDMON ON STAR100G.ITEM_FACT (SHIPMODE ASC, MONTH(SHIPDATE AS SHIPMON for COLUMN SHIPMON ASC) RCDFMT SHPMODEMON ADD CUSTNAME, SHIPADDR, COMMITDATE;
44
Limitations
45
Could replace some logical files with SQL indexes for use by Native programs.
Modernize those objects Big logical pagesize
46
In general its not a good idea to build indexes with the WHERE clause
Not yet useable by the Optimizer May be too specific, Only useable by small number of statements DO NOT Replace general purpose index with multiple Sparse indexes CREATE INDEX ORDERYEAR on ORDERS (YEAR ASC)
NOT the following CREATE INDEX ODERYEAR1 on ORDERS(YEAR ASC) WHERE YEAR=2006 CREATE INDEX ODERYEAR2 on ORDERS(YEAR ASC) WHERE YEAR=2007 CREATE INDEX ODERYEAR3 on ORDERS(YEAR ASC) WHERE YEAR=2008
47
Are you experiencing performance problems? Are you using SQL? Are you getting the most out of DB2 for i? IBM DB2 for i Center of Excellence
Database modernization DB2 Web Query Database architecture and design DB2 SQL performance analysis and tuning Data warehousing and Business Intelligence DB2 for i education and training
Contact: Mike Cain Rochester, MN USA
48 2010 IBM Corporation
Need help?
mcain@us.ibm.com
Thank You