with DB2
Matthias Nicola
IBM Silicon Valley Lab
mnicola@us.ibm.com
Agenda
Introduction
DB2 Scalability for OLTP and Data Warehousing
DB2's Database Partitioning Feature (DPF)
Overview
Data partitioning, clustering, placement
Join Methods
TPoX Scalability in a DPF database
Scalability vs. Performance
Benchmark configuration & results
pureScale Overview
Summary
DB2 Everyplace
Agenda
Introduction
DB2 Scalability for OLTP and Data Warehousing
DB2's Database Partitioning Feature (DPF)
Overview
Data partitioning, clustering, placement
Join Methods
TPoX Scalability in a DPF database
Scalability vs. Performance
Benchmark configuration & results
pureScale Overview
Summary
Tables
FCM network
data+log data+log data+log data+log
Partition 1 Partition 2 Partition 3 Partition n
Database
http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/topic/com.ibm.db2.luw.admin.partition.doc/doc/c0004569.html
8 2009 IBM Corporation
Information Management DB2
Information Management Software
Distribution map
i 0 1 2 3 4 5 6 7 32k
p(i) 1 2 3 4 1 2 3 4
Agenda
Introduction
DB2 Scalability for OLTP and Data Warehousing
DB2's Database Partitioning Feature (DPF)
Overview
Data partitioning, clustering, placement
Join Methods
TPoX Scalability in a DPF database
Scalability vs. Performance
Benchmark configuration & results
pureScale Overview
Summary
Single Server
January
March
March
January
February
March
Database Partitions
Table 1: Sales
Table 2: Customer
Agenda
Introduction
DB2 Scalability for OLTP and Data Warehousing
DB2's Database Partitioning Feature (DPF)
Overview
Data partitioning, clustering, placement
Join Methods
TPoX Scalability in a DPF database
Scalability vs. Performance
Benchmark configuration & results
pureScale Overview
Summary
*For simplicity, this example hashes odd key values to partition 1 and even key values to partition 2
21 2009 IBM Corporation
Information Management DB2
Information Management Software
Collocated Join
create table tab1(pk1 int, c1 int,...) distribute by hash (pk1);
create table tab2(pk2 int, c2 int,...) distribute by hash (pk2);
Directed Join
select * from tab1, tab2 where tab1.c1 = tab2.pk2;
Send rows from tab1 to those partitions where they can find join matches in tab2,
i.e. redistribution of tab1, based on hashing of the join key c1.
23 2009 IBM Corporation
Information Management DB2
Information Management Software
Repartitioned Join
select * from tab1, tab2 where tab1.c1 = tab2.c2;
Broadcast Join
select * from tab1, tab2
Agenda
Introduction
DB2 Scalability for OLTP and Data Warehousing
DB2's Database Partitioning Feature (DPF)
Overview
Data partitioning, clustering, placement
Join Methods
TPoX Scalability in a DPF database
Scalability vs. Performance
Benchmark configuration & results
pureScale Overview
Summary
31
TPoX Benchmark
TPoX = Transaction Processing over XML Data
Open Source Benchmark: http://tpox.sourceforge.net/
Financial transaction processing scenario: online brokerage
Realistic test for XML databases
Custacc
1 n Account 11 n
Customer Holding
1 n
n 1
n 1
Order Security
FIXML
CustAcc.xsd Security.xsd
(41 XSD files)
4 20 kb 1 2 kb 2 9 kb
FIXML: Standardized Financial XML Schema for Securities Trading !
32
Document structures and join relationships
ID ID
Name CustAcc Order Symbol Security
DateOfBirth Name
Address ID SecurityType
Phone OrignDt SecurityInformation
Account ID TrdDt StockInformation
Currency Acct Sector
OpeningDate Side Industry
Balance Qty Category
Sym OutstShares
Holding Symbol FundInformation
Name
FundFamily
Type
Quantity Sector
Industry
AssetGroup
Holding Symbol FixedIncome
Name
Type ExpenseRatio
Quantity TotalAssets
MinInitialInvestment
MinSubsequentInvest.
Holding Price/LastTrade
ID
Currency Ask/Bid
OpeningDate 50DayAvg
Account Balance 200DayAvg
Holding Symbol
Name
Type
Quantity
33
TPoX Data & Schema
FIXML: financial
industry XML Schema
35
order table (500M rows) custacc table (50M rows)
What is TPoX-DSS*?
Decision Support workload on top of the
regular XML data of the TPoX benchmark
36
* we might come up with a better name in the near future
Business Questions Complex SQL/XML Queries
Q1: Popular Securities
Find securities that have more shares bought than sold across all orders.
List their order quantities grouped by year.
38
TPoX DSS: Query Characteristics
Query Tables Characteristics
Q1 Popular Securities O, S 2 x XMLTABLE,
Group By, Order By
Q2 Top 10 Most Popular Trading Weeks O Full scan of all orders,
OLAP Function rank()
Q3 Average Account Balance of Premiun C Indexed access to premium
Customers customers, Group By, Order By
Q4 Average Balance per Number Of Accounts C Full scan of all customers
Q5 Percentage of buy orders per sector and C, O, S Aggregation, SQL OLAP Functions,
gender 3 x XMLTABLE, 2 x XMLEXISTS
Q6 Max Stock Orders for an Industry C, O, S 2 x XMLTABLE, 2 x XMLEXISTS
Q7 Order Amounts for Two Major Currencies O Several predicates, CASE expression
Q8 Order Amounts for All Currencies O 4 aggregation functions,
Group By two XML attributes
Q9 Balance per Currency C Full scan of all accounts, aggregation
and grouping
Q10 Sleeping Customers C, O Common table expression
Agenda
Introduction
DB2 Scalability for OLTP and Data Warehousing
DB2's Database Partitioning Feature (DPF)
Overview
Data partitioning, clustering, placement
Join Methods
TPoX Scalability in a DPF database
Scalability vs. Performance
Benchmark configuration & results
pureScale Overview
Summary
8 processing nodes
8 database
partitions, 250GB
45
Scalability Results: Cluster
Query response times for 500GB and 1TB are close to the 250GB results!
47
Information Management DB2
Information Management Software
Agenda
Introduction
DB2 Scalability for OLTP and Data Warehousing
DB2's Database Partitioning Feature (DPF)
Overview
Data partitioning, clustering, placement
Join Methods
TPoX Scalability in a DPF database
Scalability vs. Performance
Benchmark configuration & results
pureScale Overview
Summary
DB2 pureScale
Goals
Unlimited Capacity
Any transaction processing or ERP workload
Start small
Grow easily, with your business
Application Transparency
Avoid the risk and cost of tuning your applications to the database topology
Continuous Availability
Maintain service across planned and unplanned events
Webcast: http://www.channeldb2.com/video/db2-purescale-a-technology
Web site: http://www.ibm.com/software/data/db2/linux-unix-windows/editions-features-purescale.html
Without changing
applications
Efficient coherency protocols
designed to scale without Single Database View
application change
Applications automatically and
transparently workload balanced
across members
bufferpool(s) bufferpool(s)
GBP includes a Page Registry
Keeps track of what pages are buffered in
each member and at what memory address
Used for fast invalidation of such pages when
they are written to the GBP
Wr
te
da
ite
ge
Pa
al i
Force-at-Commit (FAC) protocol
Pa
Inv
ad
g
ensures coherent access to data Re
nt
ile
across members
S
DB2 forces (writes) updated pages to GBP at
COMMIT (or before)
GBP synchronously invalidates any copies of GBP GLM SCA
such pages on other members
New references to the page on other members
will retrieve new copy from GBP
In-progress references to page can continue
Page
Registry
M1 M2
Silent Invalidation
Informs members of page updates requires no
CPU cycles on those members
Ca
n
No interrupt or other message processing
Yu
e
Ih
ag
p,
av
required
P
h
ad
er
Ne
th
e
Re
Increasingly important as cluster grows
is
w
yo
loc
pa
u
ge
k?
ar
e.
im
ag
Hot pages available without disk I/O
e
from GBP memory GBP GLM SCA
RDMA and dedicated threads enable read
page operations in
~10s of microseconds
of Transaction Throughput
Questions / Discussion
mnicola@us.ibm.com
Backup
Slides
64 2009 IBM Corporation
Information Management DB2
Information Management Software
InfoSphere Warehouse
68 2009 IBM Corporation
Information Management DB2
Information Management Software
SQL Interface
Benefits
Target specific information hidden within text
Competitive edge by driving further business insight
Drives a greater ROI for your applications
Key Features:
Database design, or reverse engineer an
existing database or DDL (RDA)
View/Modify the schema
Compare/Sync DB objects
Analyze design (best practices and
dependencies), Validation
DB2 Storage Modeling: Table Space,
Buffer Pool, Partition
XML document size range 1-20KB 1-23KB Changes have improved performance of
ACCOUNT IDs of customer Not dense Dense generating and consuming TPoX XML data
Total XML document size of Slightly less Slight larger in large scale TPoX benchmarks !
100GB scale than 100GB than 100GB
NOTE: please refer to TPoX V2.0 Release Note at http://sourceforge.net/projects/tpox for more detail
72
More information on XML
data management in
DB2 for Linux, UNIX, Windows
and
DB2 for z/OS
http://tinyurl.com/pureXML
73