Anda di halaman 1dari 61

Scalable Data Management

with DB2

Matthias Nicola
IBM Silicon Valley Lab
mnicola@us.ibm.com

2009 IBM Corporation


Information Management DB2
Information Management Software

Agenda
Introduction
DB2 Scalability for OLTP and Data Warehousing
DB2's Database Partitioning Feature (DPF)
Overview
Data partitioning, clustering, placement
Join Methods
TPoX Scalability in a DPF database
Scalability vs. Performance
Benchmark configuration & results
pureScale Overview
Summary

2 2009 IBM Corporation


Information Management DB2
Information Management Software

DB2 Data Server Editions

DB2 for z/OS

DB2 Enterprise Edition /


IBM InfoSphere Warehouse
DB2

DB2 Workgroup Edition

DB2 Express-C (free!)

DB2 Everyplace

3 2009 IBM Corporation


Information Management DB2
Information Management Software

Business Value of Scalability

More historical data = more precise forecasts


Data mining needs a lot of data for pattern accuracy
OLAP needs a lot of data for forecast accuracy

Predictable costs when growth occurs


Often the budget is the controlling factor, not technology
Low maintenance cost is important

No forced migrations from technology limitations


Enabling very large databases

4 2009 IBM Corporation


Information Management DB2
Information Management Software

DB2 Scalability for OLTP and Data Warehousing


Database Partitioning Feature (DPF)
DB2 pureScale
Range partitioning
Multi-Dimensional Clustering (MDC)
Compression
Self-Tuning Memory Management (STMM)
Automatic Storage
Workload Management
High Availability, Disaster Recovery
Recovery
Security and Compliance
Utilities: Load, Backup & Restore, Redistribute
Archiving
etc.
5 2009 IBM Corporation
Information Management DB2
Information Management Software

Agenda
Introduction
DB2 Scalability for OLTP and Data Warehousing
DB2's Database Partitioning Feature (DPF)
Overview
Data partitioning, clustering, placement
Join Methods
TPoX Scalability in a DPF database
Scalability vs. Performance
Benchmark configuration & results
pureScale Overview
Summary

6 2009 IBM Corporation


Information Management DB2
Information Management Software

DB2's Database Partitioning Feature (DPF)


select from table

Tables

FCM network

Engine Engine Engine Engine


data+log data+log data+log data+log
Partition 1 Partition 2 Partition 3 Partition n

Database

Database is divided into multiple database partitions


Database partitions run on same or separate servers (shared-nothing)
Each partition has its own table spaces, log, configuration, etc.
Data is spread over N database partitions
Queries are executed in parallel on all database partitions

7 2009 IBM Corporation


Information Management DB2
Information Management Software

Flexible configuration options


Possible hardware configurations
All database partitions on a single machine (logical partitions)
easy exploitation of multi-core systems
All database partitions on separate machines (physical partitions)
Hybrid: multiple machines with several logical partitions on each
FCM (Fast Communication Manager)

DB2 DB2 DB2 DB2 DB2 DB2 DB2 DB2


Partition Partition Partition Partition Partition Partition Partition Partition
SMP server SMP server SMP server SMP server

I/O Channels I/O Channels

Storage server Storage server

Example: 4 physical machines, 2 database partitions per machine

http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/topic/com.ibm.db2.luw.admin.partition.doc/doc/c0004569.html
8 2009 IBM Corporation
Information Management DB2
Information Management Software

DB2's Database Partitioning Feature (DPF)

9 2009 IBM Corporation


Information Management DB2
Information Management Software

The Distribution Map Distribution key can consist of one


or multiple columns.
Avoid low cardinality columns, such as
"gender", "state", etc.
Unique indexes must contain all columns
Distribution key of the distribution key
column name C1
DB2 hash algorithm
column value 000120 5

Distribution map
i 0 1 2 3 4 5 6 7 32k
p(i) 1 2 3 4 1 2 3 4

Partition1 Partition2 Partition3 Partition4

10 2009 IBM Corporation


Information Management DB2
Information Management Software

Agenda
Introduction
DB2 Scalability for OLTP and Data Warehousing
DB2's Database Partitioning Feature (DPF)
Overview
Data partitioning, clustering, placement
Join Methods
TPoX Scalability in a DPF database
Scalability vs. Performance
Benchmark configuration & results
pureScale Overview
Summary

11 2009 IBM Corporation


Information Management DB2
Information Management Software

Single Server

12 2009 IBM Corporation


Information Management DB2
Information Management Software

DB2 Database Partitioning Feature = Divide Work


Database Partition 1 Database Partition 2 Database Partition 3

13 2009 IBM Corporation


Information Management DB2
Information Management Software

Range Partitioning Further Reduces I/O


Database Partition 1 Database Partition 2 Database Partition 3

January

CREATE TABLE sales (recordID INT,


salesdate DATE,
...
details XML)
DISTRIBUTE BY HASH (recordID)
February
PARTITION BY RANGE (salesdate) EVERY 1 MONTHS ;

March

14 2009 IBM Corporation


Information Management DB2
Information Management Software

Multi-Dimensional Clustering to Further Reduce I/O


Database Partition 1 Database Partition 2 Database Partition 3

January CREATE TABLE sales (recordID INT,


salesdate DATE,
productID INTEGER,
storeID INTEGER,
...
details XML)
February
DISTRIBUTE BY HASH (recordID)
PARTITION BY RANGE (salesdate) EVERY 1 MONTHS
ORGANIZE BY (productID, storeID) ;

March

15 2009 IBM Corporation


Information Management DB2
Information Management Software

Compression Reduces I/O by a Factor of 3x to 4x


Database Partition 1 Database Partition 2 Database Partition 3

January

February

March

16 2009 IBM Corporation


Information Management DB2
Information Management Software

Data Partitioning and Placement Options


Can distribute a table across some or all database partitions.
Can replicate a table to have an identical copy on each partition.

Database Partitions

Part. Part. Part. Part. Part. Part. Part. Part.


1 2 3 4 5 6 7 8

Table 1: Sales

Table 2: Customer

Table 3: Table 3: Table 3: Table 3: Table 3: Table 3: Table 3: Table 3:


Product Product Product Product Product Product Product Product
(copy) (copy) (copy) (copy) (copy) (copy) (copy)

19 2009 IBM Corporation


Information Management DB2
Information Management Software

Agenda
Introduction
DB2 Scalability for OLTP and Data Warehousing
DB2's Database Partitioning Feature (DPF)
Overview
Data partitioning, clustering, placement
Join Methods
TPoX Scalability in a DPF database
Scalability vs. Performance
Benchmark configuration & results
pureScale Overview
Summary

20 2009 IBM Corporation


Information Management DB2
Information Management Software

Join Processing - Example


create table tab1(pk1 int, c1 int,...)
distribute by hash (pk1);

create table tab2(pk2 int, c2 int,...)


distribute by hash (pk2);

Logical data in the tables: Physical data distribution:


database database
tab1 tab2 partition 1 partition 2
pk1 c1 pk2 c2
1 3 3 2 tab1 tab2 tab1 tab2
2 3 4 8 pk1 c1 pk2 c2 pk1 c1 pk2 c2
3 4 5 3 1 3 3 2 2 3 4 8
distribute by hash*
7 7 7 4 3 4 5 3 8 12 8 15
8 12 8 15 7 7 7 4 12 15 10 10
11 10 10 10 11 10 15 7 12 12
12 15 12 12
15 7

*For simplicity, this example hashes odd key values to partition 1 and even key values to partition 2
21 2009 IBM Corporation
Information Management DB2
Information Management Software

Collocated Join
create table tab1(pk1 int, c1 int,...) distribute by hash (pk1);
create table tab2(pk2 int, c2 int,...) distribute by hash (pk2);

select * from tab1, tab2 where tab1.pk1 = tab2.pk2;

Both tables are partitioned partition 1 partition 2


by the join key
tab1 tab2 tab1 tab2
Any join matches are guaranteed to pk1 pk2 pk1 pk2
be within any given partition 1 3 2 4
("co-located") 3 5 8 8
7 7 12 10
No join matches across partitions 11 15 12
Allows local joins within each
partition, no data movement
Best case, best performance

22 2009 IBM Corporation


Information Management DB2
Information Management Software

Directed Join
select * from tab1, tab2 where tab1.c1 = tab2.pk2;

permanent storage on the fly / in memory

partition 1 partition 2 partition 1 partition 2

tab1 tab2 tab1 tab2 tab1' tab2 tab1' tab2


pk1 c1 pk2 pk1 c1 pk2 pk1 c1 pk2 pk1 c1 pk2
1 3 3 2 3 4 1 3 3 3 4 4
3 4 5 8 12 8 2 3 5 11 10 8
DTQ
7 7 7 12 15 10 7 7 7 8 12 10
11 10 15 12 12 15 15 12

Send rows from tab1 to those partitions where they can find join matches in tab2,
i.e. redistribution of tab1, based on hashing of the join key c1.
23 2009 IBM Corporation
Information Management DB2
Information Management Software

Single Partition Directed Join


select * from tab1, tab2
where tab1.c1 = 3 and tab1.c1 = tab2.pk2;

partition 1 partition 2 partition 1 partition 2

tab1 tab2 tab1 tab2 tab1' tab2'


pk1 c1 pk2 pk1 c1 pk2 pk1 c1 pk2
1 3 3 2 3 4 1 3 3
3 4 5 8 12 8 2 3
DTQ
7 7 7 12 15 10
11 10 15 12

Value predicates are used to optimize (reduce) the data flow


and eliminate irrelevant partitions from the join processing.
24 2009 IBM Corporation
Information Management DB2
Information Management Software

Repartitioned Join
select * from tab1, tab2 where tab1.c1 = tab2.c2;

partition 1 partition 2 partition 1 partition 2

tab1 tab2 tab1 tab2 tab1' tab2' tab1' tab2'


pk1 c1 pk2 c2 pk1 c1 pk2 c2 pk1 c1 pk2 c2 pk1 c1 pk2 c2
1 3 3 2 2 3 4 8 1 3 5 3 3 4 3 2
3 4 5 3 8 12 8 15 2 3 15 7 11 10 7 4
DTQ
7 7 7 4 12 15 10 10 7 7 4 11 8 12 10 10
11 10 15 7 12 12 12 15 8 15 12 12
DTQ

Redistribute both tables by hashing on their join keys so that matching


rows end up on the same partition.
25 2009 IBM Corporation
Information Management DB2
Information Management Software

Broadcast Join
select * from tab1, tab2

partition 1 partition 2 partition 1 partition 2

tab1 tab2 tab1 tab2 tab1' tab2 tab1' tab2


pk1 pk2 pk1 pk2 pk1 pk2 pk1 pk2
1 3 2 4 1 3 1 4
3 5 8 8 3 5 3 8
BTQ
7 7 12 10 7 7 7 10
11 15 12 11 15 11 12
2 2
8 8
12 12

Broadcast a copy of one table to all database partitions.

26 2009 IBM Corporation


Information Management DB2
Information Management Software

Data Placement Option: Replicated Table

permanent storage
partition 1 partition 2

tab1 tab2 tab1(copy) tab2


pk1 pk2 pk1 pk2
1 3 1 4
3 5 3 8
7 7 7 10
11 15 11 12
2 2
8 8
12 12

Good choice for small tables with infrequent insert/update/delete activity,


such as dimension tables in a star schema.

27 2009 IBM Corporation


Information Management DB2
Information Management Software

Agenda
Introduction
DB2 Scalability for OLTP and Data Warehousing
DB2's Database Partitioning Feature (DPF)
Overview
Data partitioning, clustering, placement
Join Methods
TPoX Scalability in a DPF database
Scalability vs. Performance
Benchmark configuration & results
pureScale Overview
Summary

28 2009 IBM Corporation


Information Management DB2
Information Management Software

Scalability vs. Performance


Performance: Time to complete a given task with given resources
Scalability: Ability to add resources to
complete the same task more quickly
handle a bigger task in about the same time

Example: Mowing the lawn


Peter does it alone in 8 hours
Peter and Bob work together and take 4 hours
Scalability is perfect, performance is poor!

Jim does it alone in 1 hour


Jim and John together do it in 1hrs20min
Performance is great, scalability is awful !

Mary mows the lawn in 30 minutes


Mary and Susan together need 15 minutes
Performance is great, scalability is also great !

29 2009 IBM Corporation


Scalability Metrics
Fixed Database Size Increasing Database Size
Query elapsed time

Query elapsed time


Mathematically, these
two approaches are
equivalent.

# of partitions Database size &


# of partitions

Make queries against a DB of a fixed Hold response time constant for a


size faster by adding partitions growing database by adding partitions in
(speedup). Amount of data per proportion (scaleup/"scale-out").
partition shrinks. Amount of data per partition remains
constant.

Basic assumption: Queries executed against a bigger


database examine more data

30
Our Test Design

Increasing database size:


Query elapsed time

250GB / 500GB / 1TB


Increasing number of database
partitions
Fixed ratio of data volume to
number of partitions
n partitions n*2 partitions n*4 partitions Show constant query elapsed
250GB 500GB 1 TB times to prove scalability

31
TPoX Benchmark
TPoX = Transaction Processing over XML Data
Open Source Benchmark: http://tpox.sourceforge.net/
Financial transaction processing scenario: online brokerage
Realistic test for XML databases

Custacc
Customers 1 n Account 11 n
Customer Holding
1 n

n 1
n 1
Order Security

Brokerage
House DB
FIXML
CustAcc.xsd Security.xsd
(41 XSD files)

4 20 kb 1 2 kb 2 9 kb
FIXML: Standardized Financial XML Schema for Securities Trading !

32
Document structures and join relationships

ID ID
Name CustAcc Order Symbol Security
DateOfBirth Name
Address
Phone ID SecurityType
OrignDt SecurityInformation
Account ID TrdDt StockInformation
Currency Acct Sector
OpeningDate Side Industry
Balance Qty Category

Holding Symbol Sym OutstShares
FundInformation
Name
FundFamily
Type
Quantity Sector
Industry
Holding Symbol AssetGroup
Name FixedIncome
Type ExpenseRatio
Quantity TotalAssets
MinInitialInvestment
Holding
MinSubsequentInvest.
Account ID Price/LastTrade
Currency Ask/Bid
OpeningDate 50DayAvg
Balance 200DayAvg

Holding Symbol
Name
Type
Quantity
33
TPoX Data & Schema
1 n 11 n
Customer Account Holding
1 n

FIXML: financial
n 1
industry XML Schema n 1
Order Security

CustAcc: modeled after


a real banking system CustAcc.xsd
FIXML
Security.xsd
(41 XSD files)
that uses XML

Security: information Database schema for a non-DPF DB2 database:


similar to investment create table custacc ( cadoc XML )
web sites
create table security ( sdoc XML )
create table order ( odoc XML )

Scale Factor M, 1 TB raw data


500M Order documents, 50M CustAcc documents
20,833 Securities, independent of scale factor
3 Simple Tables + XML Indexes
34
TPoX Database Schema for DPF
- Extract certain XML element values into relational cols as distribution keys
ID

- Goal: enable partitioning of both tables by a common key





Name
DateOfBirth
Address
Phone
CustAcc


Account ID
Currency
OpeningDate
Balance

Holding Symbol

! Name
Type
Quantity
Order Holding Symbol
Name
Type
ID Quantity

OrignDt Holding

TrdDt Account ID
Currency
Acct OpeningDate
Side Balance

Qty Holding Symbol
Name
Sym Type
Quantity

custid secsym odoc custid cdoc


integer varchar XML integer XML

order table (500M rows) custacc table (50M rows)


35
What is TPoX-DSS*?
Decision Support workload on top of the
regular XML data of the TPoX benchmark

A set of complex SQL/XML queries


Includes massive table scans, aggregation,
grouping, OLAP functions, etc.

Focus on single-user query response time

36
* we might come up with a better name in the near future
Business Questions Complex SQL/XML Queries
Q1: Popular Securities
Find securities that have more shares bought than sold across all orders.
List their order quantities grouped by year.

Q2: Top 10 Most Popular Trading Weeks, Ranked by Order Volume


For each year, find the ten most active weeks and return the buy, sell, and
total order volumes for each week.

Q3: Average Account Balance of Premiun Customers


Calculate the average account balance of all premium customers, grouped
by their number of accounts.

Q4: Average Balance per Number Of Accounts


Calculate the average account balance of all customers, grouped by their
number of accounts.

Q5: Percentage of buy orders per sector and gender


For each stock in a given sector of securities, find the percentage of buy
orders placed by male vs. female clients.
37
Business Questions Complex SQL/XML Queries
Q6: Max Stock Orders for an Industry
List the 20% (or: x%) most expensive orders for customer in a given state and
for a given industry (subset of securities).

Q7: Order Amounts for Two Major Currencies


Calculate the min, max and avg order amount for all orders in a given
timeframe grouped by buy/sell for two major currencies.

Q8: Order Amounts for All Currencies


Calculate the min, max and avg order amount for all orders in a given
timeframe grouped by buy/sell and the orders currency.

Q9: Balance per Currency


Each account is in a specific currency. Calculate the average account balance
for each currency.

Q10: Sleeping Customers


Find all customers having less than x orders in a given timeframe.

38
TPoX DSS: Query Characteristics
Query Tables Characteristics
Q1 Popular Securities O, S 2 x XMLTABLE,
Group By, Order By
Q2 Top 10 Most Popular Trading Weeks O Full scan of all orders,
OLAP Function rank()
Q3 Average Account Balance of Premiun C Indexed access to premium
Customers customers, Group By, Order By
Q4 Average Balance per Number Of Accounts C Full scan of all customers
Q5 Percentage of buy orders per sector and C, O, S Aggregation, SQL OLAP Functions,
gender 3 x XMLTABLE, 2 x XMLEXISTS
Q6 Max Stock Orders for an Industry C, O, S 2 x XMLTABLE, 2 x XMLEXISTS
Q7 Order Amounts for Two Major Currencies O Several predicates, CASE expression
Q8 Order Amounts for All Currencies O 4 aggregation functions,
Group By two XML attributes
Q9 Balance per Currency C Full scan of all accounts, aggregation
and grouping
Q10 Sleeping Customers C, O Common table expression

All queries available upon request, in SQL/XML notation.


39
Q5: Percentage of buy orders per sector and gender
SELECT DISTINCT secsector, gender,
SUM(ordqty) OVER (PARTITION BY secsector, gender) AS orderqty,
SUM(ordqty) OVER (PARTITION BY secsector, gender) * 100
/ SUM(ordqty) OVER (PARTITION BY secsector) AS percentage
FROM security, order, custacc,
XMLTABLE(' declare namespace s="http://tpox-benchmark.com/security";
$SDOC/s:Security'
COLUMNS secsector VARCHAR(30) PATH '*:SecurityInformation//*:Sector',
secname VARCHAR(50) PATH '*:Name') AS T1,
XMLTABLE(' declare default element namespace "http://www.fixprotocol.org/FIXML-4-4";
$ODOC/FIXML/Order'
COLUMNS ordqty BIGINT PATH '*:OrdQty/@Qty') AS T2,
XMLTABLE(' declare namespace c="http://tpox-benchmark.com/custacc";
$CADOC/c:Customer'
COLUMNS gender VARCHAR(10) PATH '*:Gender') AS T3
WHERE order.secsym = security.secsym AND
order.custid = custacc.custid AND
XMLEXISTS(' declare namespace s="http://tpox-benchmark.com/security";
$SDOC/s:Security/s:SecurityInformation/*[s:Industry="OfficeSupplies" and
s:MinInitialInvestment=5000]')
AND
XMLEXISTS(' declare default element namespace "http://www.fixprotocol.org/FIXML-4-4";
$ODOC/FIXML/Order[@Side = "2"]')
ORDER BY secsector, gender;
40
Information Management DB2
Information Management Software

Agenda
Introduction
DB2 Scalability for OLTP and Data Warehousing
DB2's Database Partitioning Feature (DPF)
Overview
Data partitioning, clustering, placement
Join Methods
TPoX Scalability in a DPF database
Scalability vs. Performance
Benchmark configuration & results
pureScale Overview
Summary

41 2009 IBM Corporation


Data Partitioning in a Cluster
Each node has 2 Intel Xeon 5169 dual-core CPUs, and 32GB RAM.
4 cores per node we use 4 database partitions per node.

8 processing nodes

Node Node Node Node Node Node Node Node


1 2 3 4 5 6 7 8

8 database
partitions, 250GB

16 database partitions, 500 GB

32 database partitions, 1TB

45
Scalability Results: Cluster
TPoX/DSS Query Response Times (Cluster)

250GB / 8 partitions
500GB / 16 partitions
Elapsed time (seconds)

1TB / 32 partitions

Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10

Source: IBM internally measured results, September 2009


Query

Query response times for 500GB and 1TB are close to the 250GB results!
47
Information Management DB2
Information Management Software

Agenda
Introduction
DB2 Scalability for OLTP and Data Warehousing
DB2's Database Partitioning Feature (DPF)
Overview
Data partitioning, clustering, placement
Join Methods
TPoX Scalability in a DPF database
Scalability vs. Performance
Benchmark configuration & results
pureScale Overview
Summary

48 2009 IBM Corporation


Information Management DB2
Information Management Software

DB2 pureScale
Goals

Unlimited Capacity
Any transaction processing or ERP workload
Start small
Grow easily, with your business

Application Transparency
Avoid the risk and cost of tuning your applications to the database topology

Continuous Availability
Maintain service across planned and unplanned events

Webcast: http://www.channeldb2.com/video/db2-purescale-a-technology
Web site: http://www.ibm.com/software/data/db2/linux-unix-windows/editions-features-purescale.html

49 2009 IBM Corporation


Information Management DB2
Information Management Software

DB2 pureScale : Technology Overview


Clients connect anywhere,
Clients see single database
Clients connect into any member
Automatic load balancing and client reroute may change
underlying physical member to which client is connected

Single Database View


DB2 engine runs on several host computers
Co-operate with each other to provide coherent access to the
database from any member

Member Member Member Member Integrated cluster services


Failure detection, recovery automation, cluster file system
In partnership with STG (GPFS,RSCT) and Tivoli (SA MP)
CS CS CS CS

Low latency, high speed interconnect


Special optimizations provide significant advantages on RDMA-
capable interconnects (eg. Infiniband)
Cluster Interconnect

PowerHA pureScale technology


CS CS
Efficient global locking and buffer management
2nd-ary Log Log Log Log Primary
Synchronous duplexing to secondary ensures availability

Shared Storage Access


Data sharing architecture
Shared access to database
Database Members write to their own logs
Logs accessible from another host (used during recovery)

50 2009 IBM Corporation


Information Management DB2
Information Management Software

Scale with Ease

Without changing
applications
Efficient coherency protocols
designed to scale without Single Database View
application change
Applications automatically and
transparently workload balanced
across members

Without administrative DB2 DB2 DB2 DB2


DB2
complexity
No data redistribution required
Log Log Log Log Log

To 128 members in initial


release
Limited by testing resources

51 2009 IBM Corporation


Information Management DB2
Information Management Software

What is a PowerHA pureScale ?


Software technology that assists db2 agents & other
threads
db2 agents & other
threads
in global buffer coherency
management and global locking
Derived from System z Parallel Sysplex &
Coupling Facility technology log buffer, log buffer,
dbheap, & dbheap, &
Software based other heaps other heaps

bufferpool(s) bufferpool(s)
Services provided include
Group Bufferpool (GBP)
Global Lock Management (GLM)
Shared Communication Area (SCA)
Primary
Log Log
Members duplex GBP, GLM, GBP GLM SCA
SCA state to both a primary and
secondary Secondary
Done synchronously
Duplexing is optional (but recommended)
Set up automatically, by default
Shared database
(Single database partition)

54 2009 IBM Corporation


Information Management DB2
Information Management Software
Client A :
Select from T1
Client B : Client C :
where C2=Y
The Role of the GBP Update T1 set C1=X
where C2=Y
Select from T1
where C2=Y
Commit

GBP acts as fast disk cache Member 1 Member 2


Dirty pages stored in GBP, then later,
written to disk
Provides fast retrieval of such pages when
needed by other members

bufferpool(s) bufferpool(s)
GBP includes a Page Registry
Keeps track of what pages are buffered in
each member and at what memory
address
Used for fast invalidation of such pages
when they are written to the GBP

Force-at-Commit (FAC) protocol


ensures coherent access to
data across members
GBP GLM SCA
DB2 forces (writes) updated pages to
GBP at COMMIT (or before)
GBP synchronously invalidates any copies
of such pages on other members
Page
New references to the page on other Registry
members will retrieve new copy from GBP
M1 M2
In-progress references to page can continue

55 2009 IBM Corporation


Information Management DB2
Information Management Software

Stealth System Maintenance

Goal: allow DBAs to apply Single Database View


system maintenance without
negotiating an outage window

Procedure: DB2 DB2 DB2 DB2


1. Drain (aka Quiesce)
2. Remove & Maintain
3. Re-integrate Log Log Log Log

4. Repeat until done

Enables continuous availability

57 2009 IBM Corporation


Information Management DB2
Information Management Software

Achieving Efficient Scaling : Key Design Points


Deep RDMA exploitation over
Lock Mgr Lock Mgr Lock Mgr Lock Mgr
low latency fabric
Enables round-trip response time
Buffer Mgr
~10-15 microseconds

Silent Invalidation
Informs members of page updates
requires no CPU cycles on those
members
No interrupt or other message
processing required
Increasingly important as cluster grows
GBP GLM SCA

Hot pages available without


disk I/O from GBP memory
RDMA and dedicated threads enable
read page operations in
~10s of microseconds

58 2009 IBM Corporation


Information Management DB2
Information Management Software

of Transaction Throughput

61 2009 IBM Corporation


Information Management DB2
Information Management Software

Questions / Discussion

mnicola@us.ibm.com

63 2009 IBM Corporation


Information Management DB2
Information Management Software

Backup
Slides
64 2009 IBM Corporation
Information Management DB2
Information Management Software

Features to Minimize Planned Outages


Backup: Fast, scalable, granular Other utilities
Online or offline Online statistics collection
Fully parallel and scalable Online index create and
Can be throttled reorganization
Partition-level backup Online reorganization
Table space-level backup Online inspect
Full, Incremental, or Delta
Volume snapshot support Dynamic operations
Configuration parameters
Load: Fast, scalable and granular Buffer pool operations
Fully parallel and scalable Container operations
Partition-level
Online load Space management
Online index rebuild Online container management
Automatic storage
Automatic log management Online index reorganization

65 2009 IBM Corporation


Information Management DB2
Information Management Software

Features to Minimize Unplanned Outages


Hardware failures High availability
Integration with TSA cluster Clustering / failover support
manager Integrated with TSM
Built-in redundancy can't be turned Automatic client reroute
off
Consistency bits Human and Application Errors
Log mirroring Point-in-Time (POT) recovery
Automatic mirroring of critical data Drop table recovery
files
Support for RAID Miscellaneous
Infinite active logging
Fast recovery Online container operations
Continuous check pointing
Parallel recovery
Automatic recovery tuning
Filtered recovery
Dynamic debugging capability

66 2009 IBM Corporation


Information Management DB2
Information Management Software

OLAP Optimization Advisor


InfoSphere Warehouse will
design the aggregates to
support dimensional
analysis for you using:
Hybrid line
Statistics
Meta-data that describes the
cubes
Hierarchies, dimensions,
measures, etc.
Optimizes to understand
impact to load times and
performance trade-off

67 2009 IBM Corporation


Information Management DB2
Information Management Software

Universal Cubing Services Access


Portals, Web Applications, Dashboards, Interactive Reports,
Ad Hoc Analysis, Common Desktop Tools
IBM Cognos 8 BI IBM DataQuant Microsoft Excel Cubeware Cockpit
& DB2 QMF

Universal Cube Access


(ODBO, XMLA)

InfoSphere Warehouse
68 2009 IBM Corporation
Information Management DB2
Information Management Software

InfoSphere Warehouse Data Mining


Data Mining Embedded into Applications and Processes

SOA Processes BI Analytical Tools Web Analytical Apps Mining Visualizer

SQL Interface

DB2 InfoSphere Warehouse


Modeling Model
Enterprise-Level Data Results
Mining
High-Speed, In-Database
Scoring
In-Database
SQL Data Mining

Structured & Scoring


Unstructured Functions
Data

69 2009 IBM Corporation


Information Management DB2
Information Management Software

InfoSphere Warehouse Text Analytics


Analyze and extract structured data from text
Makes data available to normal reporting and analysis tools
From customer call center records, claim forms, etc.

Benefits
Target specific information hidden within text
Competitive edge by driving further business insight
Drives a greater ROI for your applications

Business value examples Simple text analysis capabilities for text


Better product categorization columns stored in warehouse tables
Early warning on customer attrition Pattern matching rules and simple linguistics
Fraud detection Enhance existing reports and data mining
Product defect analysis with insights gleaned from text
Better customer profiling Simple rules and dictionary editor

70 2009 IBM Corporation


Information Management DB2
Information Management Software

InfoSphere Warehouse Design Studio


Leverage and extend InfoSphere Data Architect:
Design and modify database physical models (schema & storage design, etc)
Design and model OLAP objects
Design and model warehouse transformation and mining flows

Key Features:
Database design, or reverse engineer an
existing database or DDL (RDA)
View/Modify the schema
Compare/Sync DB objects
Analyze design (best practices and
dependencies), Validation
DB2 Storage Modeling: Table Space,
Buffer Pool, Partition

Generate script & Deploy: on data


models, and flow models
Impact Analysis: on data models and flow
models

71 2009 IBM Corporation71


Whats new in TPoX 2.0
TPoX 2.0 includes pervasive change to the benchmark
TPoX 2.0 test results not comparable to previous versions of TPoX
Data Generator Workload and WorkloadDriver
TPoX V1.3 and Earlier TPoX 2.0 TPoX V1.3 and Earlier TPoX 2.0
Based on Toxgene A single java based program Workload description file in Workload description file in
proprietary format, hard to read XML format, easy to read and
3rd party tool, lack of support Complete rewrite
create
Slow (> 5 days for 1TB data) Fast (6 hours for 1TB data) WorkloadDriver reads input WorkloadDriver reads input
Cant generate dense account IDs Account IDs are now dense documents from large amount documents from smaller
for CUSTACC of small files amount of larger files,
improved performance for
Large amount of small XML files Small amount of larger files, reading XML input
each contains 50K XML documents
documents
Update transaction U1, U5 and Update transaction U1, U5
Data Distribution U6 select account for update and U6 select account for
based on customer ID update based on account ID
TPoX V1.3 TPoX 2.0
and Earlier
# of CUSTACC vs # of ORDER 1:5 1:10

XML document size range 1-20KB 1-23KB Changes have improved performance of
ACCOUNT IDs of customer Not dense Dense generating and consuming TPoX XML data
Total XML document size of Slightly less Slight larger in large scale TPoX benchmarks !
100GB scale than 100GB than 100GB
avg # of accounts per customer 1.5 2.0

NOTE: please refer to TPoX V2.0 Release Note at http://sourceforge.net/projects/tpox for more detail
72
More information on XML
data management in
DB2 for Linux, UNIX, Windows
and
DB2 for z/OS

http://tinyurl.com/pureXML

73

Anda mungkin juga menyukai