with DB2
Matthias Nicola
IBM Silicon Valley Lab
mnicola@us.ibm.com
Agenda
Introduction
DB2 Scalability for OLTP and Data Warehousing
DB2's Database Partitioning Feature (DPF)
Overview
Data partitioning, clustering, placement
Join Methods
TPoX Scalability in a DPF database
Scalability vs. Performance
Benchmark configuration & results
pureScale Overview
Summary
DB2 Everyplace
Agenda
Introduction
DB2 Scalability for OLTP and Data Warehousing
DB2's Database Partitioning Feature (DPF)
Overview
Data partitioning, clustering, placement
Join Methods
TPoX Scalability in a DPF database
Scalability vs. Performance
Benchmark configuration & results
pureScale Overview
Summary
Tables
FCM network
data+log data+log data+log data+log
Partition 1 Partition 2 Partition 3 Partition n
Database
http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/topic/com.ibm.db2.luw.admin.partition.doc/doc/c0004569.html
8 2009 IBM Corporation
Information Management DB2
Information Management Software
Distribution map
i 0 1 2 3 4 5 6 7 32k
p(i) 1 2 3 4 1 2 3 4
Agenda
Introduction
DB2 Scalability for OLTP and Data Warehousing
DB2's Database Partitioning Feature (DPF)
Overview
Data partitioning, clustering, placement
Join Methods
TPoX Scalability in a DPF database
Scalability vs. Performance
Benchmark configuration & results
pureScale Overview
Summary
Single Server
January
March
March
January
February
March
Database Partitions
Table 1: Sales
Table 2: Customer
Agenda
Introduction
DB2 Scalability for OLTP and Data Warehousing
DB2's Database Partitioning Feature (DPF)
Overview
Data partitioning, clustering, placement
Join Methods
TPoX Scalability in a DPF database
Scalability vs. Performance
Benchmark configuration & results
pureScale Overview
Summary
*For simplicity, this example hashes odd key values to partition 1 and even key values to partition 2
21 2009 IBM Corporation
Information Management DB2
Information Management Software
Collocated Join
create table tab1(pk1 int, c1 int,...) distribute by hash (pk1);
create table tab2(pk2 int, c2 int,...) distribute by hash (pk2);
Directed Join
select * from tab1, tab2 where tab1.c1 = tab2.pk2;
Send rows from tab1 to those partitions where they can find join matches in tab2,
i.e. redistribution of tab1, based on hashing of the join key c1.
23 2009 IBM Corporation
Information Management DB2
Information Management Software
Repartitioned Join
select * from tab1, tab2 where tab1.c1 = tab2.c2;
Broadcast Join
select * from tab1, tab2
permanent storage
partition 1 partition 2
Agenda
Introduction
DB2 Scalability for OLTP and Data Warehousing
DB2's Database Partitioning Feature (DPF)
Overview
Data partitioning, clustering, placement
Join Methods
TPoX Scalability in a DPF database
Scalability vs. Performance
Benchmark configuration & results
pureScale Overview
Summary
30
Our Test Design
31
TPoX Benchmark
TPoX = Transaction Processing over XML Data
Open Source Benchmark: http://tpox.sourceforge.net/
Financial transaction processing scenario: online brokerage
Realistic test for XML databases
Custacc
Customers 1 n Account 11 n
Customer Holding
1 n
n 1
n 1
Order Security
Brokerage
House DB
FIXML
CustAcc.xsd Security.xsd
(41 XSD files)
4 20 kb 1 2 kb 2 9 kb
FIXML: Standardized Financial XML Schema for Securities Trading !
32
Document structures and join relationships
ID ID
Name CustAcc Order Symbol Security
DateOfBirth Name
Address
Phone ID SecurityType
OrignDt SecurityInformation
Account ID TrdDt StockInformation
Currency Acct Sector
OpeningDate Side Industry
Balance Qty Category
Holding Symbol Sym OutstShares
FundInformation
Name
FundFamily
Type
Quantity Sector
Industry
Holding Symbol AssetGroup
Name FixedIncome
Type ExpenseRatio
Quantity TotalAssets
MinInitialInvestment
Holding
MinSubsequentInvest.
Account ID Price/LastTrade
Currency Ask/Bid
OpeningDate 50DayAvg
Balance 200DayAvg
Holding Symbol
Name
Type
Quantity
33
TPoX Data & Schema
1 n 11 n
Customer Account Holding
1 n
FIXML: financial
n 1
industry XML Schema n 1
Order Security
Account ID
Currency
OpeningDate
Balance
Holding Symbol
! Name
Type
Quantity
Order Holding Symbol
Name
Type
ID Quantity
OrignDt Holding
TrdDt Account ID
Currency
Acct OpeningDate
Side Balance
Qty Holding Symbol
Name
Sym Type
Quantity
36
* we might come up with a better name in the near future
Business Questions Complex SQL/XML Queries
Q1: Popular Securities
Find securities that have more shares bought than sold across all orders.
List their order quantities grouped by year.
38
TPoX DSS: Query Characteristics
Query Tables Characteristics
Q1 Popular Securities O, S 2 x XMLTABLE,
Group By, Order By
Q2 Top 10 Most Popular Trading Weeks O Full scan of all orders,
OLAP Function rank()
Q3 Average Account Balance of Premiun C Indexed access to premium
Customers customers, Group By, Order By
Q4 Average Balance per Number Of Accounts C Full scan of all customers
Q5 Percentage of buy orders per sector and C, O, S Aggregation, SQL OLAP Functions,
gender 3 x XMLTABLE, 2 x XMLEXISTS
Q6 Max Stock Orders for an Industry C, O, S 2 x XMLTABLE, 2 x XMLEXISTS
Q7 Order Amounts for Two Major Currencies O Several predicates, CASE expression
Q8 Order Amounts for All Currencies O 4 aggregation functions,
Group By two XML attributes
Q9 Balance per Currency C Full scan of all accounts, aggregation
and grouping
Q10 Sleeping Customers C, O Common table expression
Agenda
Introduction
DB2 Scalability for OLTP and Data Warehousing
DB2's Database Partitioning Feature (DPF)
Overview
Data partitioning, clustering, placement
Join Methods
TPoX Scalability in a DPF database
Scalability vs. Performance
Benchmark configuration & results
pureScale Overview
Summary
8 processing nodes
8 database
partitions, 250GB
45
Scalability Results: Cluster
TPoX/DSS Query Response Times (Cluster)
250GB / 8 partitions
500GB / 16 partitions
Elapsed time (seconds)
1TB / 32 partitions
Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10
Query response times for 500GB and 1TB are close to the 250GB results!
47
Information Management DB2
Information Management Software
Agenda
Introduction
DB2 Scalability for OLTP and Data Warehousing
DB2's Database Partitioning Feature (DPF)
Overview
Data partitioning, clustering, placement
Join Methods
TPoX Scalability in a DPF database
Scalability vs. Performance
Benchmark configuration & results
pureScale Overview
Summary
DB2 pureScale
Goals
Unlimited Capacity
Any transaction processing or ERP workload
Start small
Grow easily, with your business
Application Transparency
Avoid the risk and cost of tuning your applications to the database topology
Continuous Availability
Maintain service across planned and unplanned events
Webcast: http://www.channeldb2.com/video/db2-purescale-a-technology
Web site: http://www.ibm.com/software/data/db2/linux-unix-windows/editions-features-purescale.html
Without changing
applications
Efficient coherency protocols
designed to scale without Single Database View
application change
Applications automatically and
transparently workload balanced
across members
bufferpool(s) bufferpool(s)
Services provided include
Group Bufferpool (GBP)
Global Lock Management (GLM)
Shared Communication Area (SCA)
Primary
Log Log
Members duplex GBP, GLM, GBP GLM SCA
SCA state to both a primary and
secondary Secondary
Done synchronously
Duplexing is optional (but recommended)
Set up automatically, by default
Shared database
(Single database partition)
bufferpool(s) bufferpool(s)
GBP includes a Page Registry
Keeps track of what pages are buffered in
each member and at what memory
address
Used for fast invalidation of such pages
when they are written to the GBP
Silent Invalidation
Informs members of page updates
requires no CPU cycles on those
members
No interrupt or other message
processing required
Increasingly important as cluster grows
GBP GLM SCA
of Transaction Throughput
Questions / Discussion
mnicola@us.ibm.com
Backup
Slides
64 2009 IBM Corporation
Information Management DB2
Information Management Software
InfoSphere Warehouse
68 2009 IBM Corporation
Information Management DB2
Information Management Software
SQL Interface
Benefits
Target specific information hidden within text
Competitive edge by driving further business insight
Drives a greater ROI for your applications
Key Features:
Database design, or reverse engineer an
existing database or DDL (RDA)
View/Modify the schema
Compare/Sync DB objects
Analyze design (best practices and
dependencies), Validation
DB2 Storage Modeling: Table Space,
Buffer Pool, Partition
XML document size range 1-20KB 1-23KB Changes have improved performance of
ACCOUNT IDs of customer Not dense Dense generating and consuming TPoX XML data
Total XML document size of Slightly less Slight larger in large scale TPoX benchmarks !
100GB scale than 100GB than 100GB
avg # of accounts per customer 1.5 2.0
NOTE: please refer to TPoX V2.0 Release Note at http://sourceforge.net/projects/tpox for more detail
72
More information on XML
data management in
DB2 for Linux, UNIX, Windows
and
DB2 for z/OS
http://tinyurl.com/pureXML
73