Anda di halaman 1dari 30

Growing Green

Databases with Oracle


and Sun UltraSPARC
T-series servers
Oracle Open World 2008

Glenn.Fawcett@Sun.com
Sr. Staff Engineer
Performance Technologies Group
http://blogs.sun.com/~glennf

Andrew.Holdsworth@Oracle.com
Sr. Director Real World Performance
Introduction(s)
• Andrew Holdsworth, Director Real World Performance
> Focus on real customer performance issues
> Test performance of Customer databases
> Resolve customer performance issues
• Glenn Fawcett, Sr. Staff Engineer
> Performance Technologies Group
> Benchmark performance of Oracle
> Tune customer systems/databases
> Knowledge management: conferences, blog, white-papers

Glenn.Fawcett@sun.com 9/22/08 2
Goals
• Introduce T-series and CMT processors
> How did it come about?
• T-series performance characteristics
> Show examples of performance
> Throughput vs Response Time
• Show Real World edge conditions
> Common pain points
> Tuning

Glenn.Fawcett@sun.com 9/22/08 3
What is a CMT processor?
• T2000 is the grandfather of the T-series servers and
the 1st machine to use a CMT chip.
• Stands for “Chip Multi Threaded” processor.
• Nicknames
> CoolThreads, Niagara, CMT
• Official Names
> UltraSPARC T1: Single Socket (8 cores, 32 threads)
> UltraSPARC T2: Single Socket (8 cores, 64 threads)
> UltraSPARC T2-Plus : Multi-Socket (8 cores, 64 threads)

Glenn.Fawcett@sun.com 9/22/08 4
Sun's CMT bloodline
Processors Servers Timeline
T5140 & T5240 Apr 2008
Chip Multi-Threading (CMT) Up to 128 Threads
2x Sockets
per server
UltraSPARC T2-Plus
N*(8 cores, 64 threads)

UltraSPARC T2 T5120 & T5220


(8 cores, 64 threads)
Up to 64 Threads

Oct 2007
UltraSPARC T1
(8 cores, 32 threads) Sun Blade
T1000 and T2000 T6320
Up to 32 Threads

Nov 2005
Glenn.Fawcett@sun.com 9/22/08 5
Focus on Data Center cost reduction
• Data Center operational
cost are higher than ever.
> Power
> Cooling
> Space
• CMT architecture
> Less power
> More throughput
> Less space

Glenn.Fawcett@sun.com 9/22/08 6
Memory Speed Trails CPU increases
Relative
Performance
10000
CPU Frequency
DRAM Speeds
1000
ars
Ye
2
ry
100 E v e Gap
x
U -- 2 6
P Eve ry
C
AM - - 2x
10 DR Years

1
1980 1985 1990 1995 2000 2005
Source: Sun World Wide Analyst Conference Feb. 25, 2003

Glenn.Fawcett@sun.com 9/22/08 7
Recycling CPU cycles
• Classic CPU design
> Stall on fetch of memory... wasted cycles
doing nothing but using power, cooling, etc..
> Faster core speeds waste more cycles
doing nothing.
> Large caches to lessen the cost of memory
references
• CMT / Niagara design
> Don't Stall on memory references
> Simple core and pipeline design
> Massive threads to provide more throughput

Glenn.Fawcett@sun.com 9/22/08 8
CMT Implementation
Four threads share a single pipeline
Every cpu cycle an instruction from
a different thread is executed. Niagara Processor
Shared Pipeline
Thread 4 C M C M C M Utilization: Up to 85%
Thread 3 C M C M C M
Thread 2 C M C M C M
Thread 1 C M C M C M
Time
Memory Latency Compute

Glenn.Fawcett@sun.com 9/22/08 9
Oracle view of CMT processors
• UltraSPARC T2,T2+ processors
> 8 cores
> 2 integer pipelines/core
> 4 threads per pipeline
• Solaris shows each a “thread” as CPU.
> “psrinfo and mpstat” will show 64 cpus.
> Oracle sees
> “CPU_COUNT=64” by default
> Database connections running concurrently across
all CPU threads.
> High connection count per CPU.

Glenn.Fawcett@sun.com 9/22/08 10
CMT design points
• Strengths
> High throughput with lots of database connections.
> Low power utilization
> Small footprint
• Trade-offs
> Single-user performance can be less than
traditional CPU architectures.
> Single-Threaded jobs usually run slower.
> Install, DB load, Backup

Glenn.Fawcett@sun.com 9/22/08 11
Oracle Scaling under load

Oracle scaling with BM factory


• Which is better ?
250 > x6220 (AMD based)
225 > Fast single stream shown by
high tps @ 10 connections.
200
(3x better than T2000)
175 > Throughput under increasing
x6220 load deteriorates.
TPS

150
T2000
> T2000 CMT based server
125
> Slower single-stream
100
> 1.6x better throughput under
75 high load!
10 20 30 40 50 60 70 80 90 100 > Better response time at peak
#db connections load.

Glenn.Fawcett@sun.com 9/22/08 12
Throughput and Response Time
• Measuring application throughput with Oracle
> Metrics: xact/sec, orders/hr, queries/sec, ...
> Tools: AWR, statspack, OEM, Spotlight, etc..
• Response time
> Metrics: transactional, batch/report window
> Measure: DB level, Application level, presentation (web).
> 90th percentile... Users don't complain using averages.
• System statistics are not application metrics
> CPU%, syscalls, io/sec, packets/sec, ...

Glenn.Fawcett@sun.com 9/22/08 13
Oracle performance expectations
• Most OLTP environments run great with CMT.
• Understand response time components
> DB CPU usually around 20% of response time.
> IO, Network, and application tier makes up the rest.
• Bad code (High LIO) is a bottle-neck on any server.
> CPU bound bad code hurts sooner on CMT servers.
• Serial Batch jobs run slower with CMT.
> Check the usual suspects:
> exec plans, indexes, business logic
> Divide and Conquer
> Concurrency and Parallelism to increase
Glenn.Fawcett@sun.com 9/22/08 14
Do your Homework!
• Define Business metrics
> orders/hr, invoices/min, ...
• Define SLAs
> txn response time < 2 seconds
> Batch window requirements
• Increase Parallelism / concurrency
> Add more connections, Batches,..
> Use existing parallel features
in Oracle.
• Steer clear of common pitfalls
> Performance Bugs (eg. Checksum bug #6814520)
> Bad practices: (Buffered IO, no Jumbo Frames)
Glenn.Fawcett@sun.com 9/22/08 15
Real world edge conditions
• Single-Threaded processes are the central issue
> ~2x higher initial response time vs Xeon or SPARC64.
• Common complaints
> Install is slower
> backup seems to take longer
> CPU is not loaded
> Analyze runs longer

Glenn.Fawcett@sun.com 9/22/08 16
Oracle Tuning for CMT servers
• Separating real issues from the noise
> Is this a one-time event?
> What are the real performance requirements?
> Other factors: QEP, application changes, patch level...
• Tuning single-threaded processes
> RMAN Backup
> schema analysis
> index maintenance
> DSS operations: create as select / insert as select

Glenn.Fawcett@sun.com 9/22/08 17
Tuning: Backup using RMAN
• Recently had a customer escalation that RMAN backup times
were twice as long on a T2000
• Process had been configured to be single-threaded w/
compression on the backup set.
• Resolution: Use multiple channels and parallelism to maximize
throughput of your IO configuration
RMAN> configure channel 1 device type disk format
'/o6s_data/GLENNF/d2/backup_db_c1%d_S_%s_P_%p_T_%t' MAXPIECESIZE 1024 M;
RMAN> configure channel 2 device type disk format
'/o6s_data/GLENNF/d2/backup_db_c2%d_S_%s_P_%p_T_%t' MAXPIECESIZE 1024 M;
...
...
RMAN> configure channel 20 device type disk format
'/o6s_data/GLENNF/d2/backup_db_c20%d_S_%s_P_%p_T_%t' MAXPIECESIZE 1024 M;

After creating these channels, you must tell RMAN how to connect to these channels:
RMAN> configure channel 1 DEVICE TYPE DISK CONNECT '/as sysdba';
RMAN> configure channel 2 DEVICE TYPE DISK CONNECT '/as sysdba';
...
...
RMAN> configure channel 20 DEVICE TYPE DISK CONNECT '/as sysdba';

Glenn.Fawcett@sun.com 9/22/08 18
Scaling RMAN Backup

RMAN> CONFIGURE DEVICE TYPE DISK BACKUP TYPE


TO COMPRESSED BACKUPSET PARALLELISM 20;

8x speedup!

Glenn.Fawcett@sun.com 9/22/08 19
Tuning: Schema Analysis
• Many legacy scripts for Oracle maintenance functions.
> Often single-threaded
> Often not tested as part of a proof of concept/Migration
> Gathering schema statistics is a common issue
• Improve Gather Stats time by:
• Avoiding it all together!
• Upgrade to 11g and use defaults.
• Use Parallel Degree option for gather stats
DBMS_STATS.GATHER_SCHEMA_STATS
(OWNNAME=>'scott',ESTIMATE_PERCENT=>$2,
DEGREE=>32);

Glenn.Fawcett@sun.com 9/22/08 20
Schema Analysis – scaling
6000

97min 1
3
5
2
6
4
8
1
.
0
4
6
9
1
0
3
4
5
9
70
1

5000

4000
runtime (secs)

3000

2000
20min

1000
4 8 12 16 20 24 28 32
parallelism
Glenn.Fawcett@sun.com 9/22/08 21
Tuning: Index Creation
• Traditionally index creation was single-threaded
• Parallel index create was introduced to reduce
create/recreate times.
• “unrecoverable” option further reduces create times
by eliminating the redo associated with creation of
indexes.
• “compute statistics” can be done while the index is
being created to save time... default in 11g!
• Combining the parallel option with the
unrecoverable, and “compute statistics”:
create index gtest_c1 on gtest(idname)
parallel 16
unrecoverable
compute statistics;

Glenn.Fawcett@sun.com 9/22/08 22
Tuning: Index Creation
Index create times
2400

2000

required runtime
1600 performance
runtime

1200

800

400

0
0 4 8 12 16

Parallelism
Glenn.Fawcett@sun.com 9/22/08 23
Tuning: Batch Scaling
• Common component of Batch processing
> “Create as Select” / “Insert as Select”
• Use Parallel Query
> Recent customer tuning exercise had
multiple CAS operations without parallelism.
> Using just parallel=8 provided huge gains
> Append hint can provide further gains.
## Create as Select ##
##
SQL> alter session enable parallel dml;
SQL> create table abc
parallel (degree 8)
as
select /*+ parallel(gtest, 8) */ * from gtest;

## Insert as Select ##
##
SQL> alter session enable parallel dml;
SQL> insert /*+ parallel(abc,8) */
into abc
select /*+ parallel(gtest,8) */ * from gtest;

Glenn.Fawcett@sun.com 9/22/08 24
Tuning: create as select
Create as Select
1600

1400

1200

1000 required runtime


performance
runtime

800

600

400

200

0
1 2 3 4 5 6 7 8
Parallelism
Glenn.Fawcett@sun.com 9/22/08 25
Further areas to explore parallelism
• Oracle Applications
> adadmin
> can specify number of workers
> specify parallelism for maintenance functions
> Increase # of concurrent managers
> Payroll batch can be split into ranges
• Application tier
> Use persistent connections
> manage connection pool sizes
> CMT can handle many more connections/socket
• Batch jobs are excellent places to exploit parallelism
and concurrency.
Glenn.Fawcett@sun.com 9/22/08 26
Real SLA misses
• Look for the Usual suspects
> Bad SQL, QEPs, Stats => High LIO
> Performance Bugs: (eg. Checksum bug #6814520)
> Bad practices: (Buffered IO, no Jumbo Frames)
• OLTP Response time issues
> Measure response time at application level
> Gather Oracle event trace data to profile
> Tune with Response time profile... “Method R”.
• Application mis-configuration
> Increase parallelism and concurrency

Glenn.Fawcett@sun.com 9/22/08 27
Oracle customers running on CMT
• eBay
• http://www.sun.com/customers/servers/ebay.xml
• Scripps Networks
• http://www.sun.com/customers/servers/scripps.xml
• Customer Reference site
> 75 customer references that agreed to public
references on http://www.sun.com/customers/
> Many more using the Sun's UltraSPARC CMT
technology.

Glenn.Fawcett@sun.com 9/22/08 28
Summary

• Separate myth from fact


> Throughput vs response time requirements.
> Are SLAs being met?
> Solve the right problem.
• Parallelism is your friend
> Batch
> Backup
> Admin
• Bad single-threaded code is the enemy.

Glenn.Fawcett@sun.com 9/22/08 29
QUESTIONS???
Glenn.Fawcett@Sun.com
http://blogs.sun.com/~glennf
Sr. Staff Engineer
Performance Technologies Group

Andrew.Holdsworth@Oracle.com
Sr. Director Real World Performance