Anda di halaman 1dari 38

Resource Mapping

A Wait Time Based Methodology for


Database Performance Analysis

Prepared for Hotsos Symposium, 2005


Presented by
Matt Larson
Chief Technology Officer
Confio Software
Presentation Agenda

 Introduction
 Conventional Tuning vs. Wait-based
Tuning
 Foundation: Resource Mapping
Methodology
 5 Key Steps of Applying RMM
 RMM Advantages
 Conclusion

2
Who am I?

 Former DBA consultant specializing in


Oracle performance tuning
 Co-author of three Oracle books (Oracle
Development Unleashed, Oracle
Unleashed
2 Edition, Oracle8 Server Unleashed)
nd

 Co-author of two other database related


books
 CTO and founder of Oracle performance
software company
3
Problems with Conventional Tuning
Tools: Like the Drunk Under the
Streetlight

4
Conventional Tuning

 Art, not a science


 Ratio-based (cache hit ratios, etc.)
 Sometimes fruitless
 It’s “tuned” (I guess?)
 Different tuning/investigation process
for each DBA/DBA Team/Company

5
Problems with Conventional Tuning
Tools

 Optimize systems, not business results


 Conventional tools:
• V$ Views: limited visibility & granularity
• Statspack: averages across entire database
• Explain Plan: deemphasizes how non-object
resources affect performance
 Incorrect Data hides real results
• System-wide averages
• Event counters
• Incomplete visibility

6
What Problems are you Trying to
Solve?

• I spend the whole week monitoring and


optimizing Oracle configurations, but I have
no demonstrable results to show for it - why?
• Will more hardware make my application run
faster? By how much?
• Will the new application run efficiently on
the production server?
• Why does one application keep impacting
my SLA compliance?
• If I could make one (or 2, 3, or 4) changes to
my database to have the biggest impact,
what would they be?

7
You know you are working on the
wrong thing when…
 After spending an agonizing week tuning
Oracle buffers to minimize I/O
operations, management typically
rewards you with:

• A. An all expense paid vacation


• B. A free lunch
• C. A stale donut
• D. Reward? Nobody even noticed!

8
You know you have a visibility
problem when…
 You measure database performance based
on:

• A. Increasing trends in user response time


• B. Increasing system down time
• C. Increasing help desk calls
• D. Increasing decibel levels from irate users

9
Your role is sub-essential to the
business of your organization
when…
 Your role in the rollout of a new
customer facing application results in:

• A. Keys to drive the CEO’s Porsche


• B. Keys to use the executive restroom
• C. A mop to use in the executive restroom
• D. Your office has been moved to the
restroom

10
You know you are accustomed to
measuring the wrong thing when…
 You measure the commute time to work
based on:

• A. The time it takes to get there


• B. Counting the times your wheels rotate
• C. Monitoring your tachometer
• D. The number of speeding tickets

11
Wait-based Performance Tuning

 Emerging best-practice for database


tuning
 Proponents include leading consultants,
trainers and authors
 Oracle is starting to build wait-based
tuning tools into the database
particularly in 10g

 Tune by determining where processing


time is spent
12
Oracle 10g - Moving towards wait-
based

 Adding wait-based columns to existing views


 New wait-based views

Example:
v$session_wait_history

• Provides the last 10 wait events for a session


• Session ID, Username, Event, Wait_Time, etc.
• Used to provide wait_time for only a few events

13
DBA Success Stories using RMM

 DBA solves a “Cold Case”. Problem unresolved


for 1 year with traditional tools; Solution
identified in 10 minutes during hands-on
training
 DBA ends “Crit Sit” 2 week situation ends
quickly after identification of Library Cache pin
wait and load locks. Metalink identifies Oracle
bug, patch successfully applied
 DBA saves $700K. 90% CPU capacity initiates
expansion from 12 to 24 CPU server. DBA
identifies parallel queries across 16 parallel
threads as source of bottleneck. CPU
eliminated as constraint, no new server
required.
14
RMM: Confio’s Underlying
Methodology
 Resource Mapping Methodology:

Resource
Wait-Event
Mapping DBFlash
Analysis
Methodology

General Rigorous, Packaged


approach- complete product
best practice requirement implementatio
s n
 Three Key Principles of RMM
1. SQL View: All statistics at SQL statement level
2. Time View: Measure Time, not number of times a resource
is utilized
3. Full View: Separately measure every resource to isolate
source of problems

15
Confio’s Resource Mapping
Methodology
• The principles of RMM can be illustrated by using the analogy that
data processing is like an assembly line. Data goes in one end, is
subject to a series of changes, and comes out the other end as a
finished product
• The assembly line (or SQL Statement) must be observed at the
lowest level where a unit of work is being performed (SQL View
Principle)
• Measurements are made with regard to time instead of counting
how often an event occurs (Time View Principle)
• All resources system-wide must be monitored to get a full view of
Counters
potential bottlenecks i.e. no blind spots (Full ViewCounters
Principle)
CPU 74% Blind Spot CPU 38% Blind Spot
Reads 1789327 Reads 4955

145 seconds 8726 seconds


Follow a unit of work
Time through every operation Time
16
Track SQL Time,
Not System Counters
• Watching Counters leads to wrong conclusions: Time is more relevant
• Total System Counters hide information: Need breakdown to
individual SQLs

Total System 80K Reads 5K Packets 125 Attempts 216K Writes


Counter

SQL 1 30 Minutes 4M 6M 4M
5R 50 A

SQL 2 15M 200 Minutes 10 M 200 Minutes


25 R 35 A

SQL 3 5M 5M 100 Minutes 5M


50 Reads 50 A

Resources I/O Network Locks Redo

17
RMM-compliant Performance
Tools

 Oracle Tracing
• RMM compliant when wait events are traced
• Shows SQL level statistics (SQLView), all events
(FullView) and events by time (TimeView)
• Text-based, short-term technical reporting
• Primarily used for reactive tuning

 Confio DBFlash for Oracle


• RMM compliant
• 24/7 proactive monitoring
• Graphical, long-term trend reporting
• RMM-based Alerting

18
Applying RMM for Business
Results

Five Step Process focusing on what


matters

1. Identify 2. Allocate 3. Quantify 4. Prioritize 5. Assign

19
Step 1: Identify

 Identify SQL Statements


having largest impact
• (SQL View and Time View
principles)
 Longest wait times = most
significant “pain points” for SQL statements
customers prioritized by
 Conversely, low cache hit Total Wait Time
ratios or high latch usage
may not impose high wait
20
times for users (so why fix
Step 2: Allocate

 Allocate impact to real


customers (internal or
external)
 Allocate wait time to
Program, Session,
Machine
• SQL View principle makes
this connection Programs Prioritized by
 Understanding database Total Wait Time
customer and application

21
Step 3: Quantify
 How much is save in time/money if fixed?
 Enabled by Full View and Time View principles
 Soft dollar savings
• Data entry clerks
• DBA time spent in problem resolution
 Hard dollar savings
• Reduce hardware upgrades
• Meet SLA’s avoiding penality
• Ensure business isn’t lost due to poor performing
or unavailable system

Quantifiable benefit of
Tuning a
specific statement

22
Step 4: Prioritize

 If last step properly executed, this step


is fairly straight forward
 Allow’s DBA to cut through the clutter of
potential new projects, investigations,
and trials.
 Better justification for priorities. (e.g. We
aren’t working on your problem since
this other has a higher demonstrable
business impact)

23
Step 5: Assign

 Assign the right people to the problem


• Log_buffer waits
• Network issues
• Same query 10,000/hour
 Enabled by Full View principle
 Avoid finger-pointing by accurately
assigning quickly

24
Resource Mapping Methodology

RMM

Wait Based Tuning Network, Storage, Application, Web, etc.

25
Silo Monitoring

Business Management LIMITED VIEW

IT Management LIMITED VIEW


Software Layers

Web Team Web Server


Sitescope

Custom App Team Often No Commercial Tools Custom Biz Logic

Network Team Network


HP Openview

Database/OS Teams Database Server


Wait-based tuning
Storage/OS Teams Storage Box
EMC Control Center

Each team uses their own tool to partially monitor their


non-Oracle layers. No view across layers. Management
has no clear view.
26
The Solution - Integrated Vision

Business Management RMM across the stack

IT Management

Web Team Web Server

Custom App Team Custom Biz Logic

Network Team Network

Database/OS Teams Database Server

Storage/OS Teams Storage Box

All teams see a complete picture of all layers and dependencies.


Enables more efficient “Umbrella” solution.

27
RMM Achieved Business Benefits

RMM Does: Business Benefit:

35% reduction in database Reduce capital investment


capacity requirement Avoid unnecessary additions
Recovers un-used capacity
Standardizes “expert” Reduce training & consulting
analysis ability across entire costs
DBA team performance
Quantifies Focus tuning efforts on
impact biggest business impacts
Identifies problem Root Assign human resources and
Cause and resolution responsibility
Anticipates + resolves Maintain SLA and end user
performance bottlenecks performance

28
Example 1: Problem Observed

 Critical situation: Secure Service Center


application performance unsatisfactory
• Response time between 2400 and 9000
seconds
• Very high network traffic (3x—4x normal),
indicating time-outs and user refreshes
• “CritSit” declared: major effort to resolve
problem

29
Observations using Resource
Mapping Methods
 1: Identify accumulated Waits
 2: Identify specific resources used

Lib cache pin


wait
Notice scale:
> 8000 secs

Lib cache load


lock

30
Results

Library cache pin nearly


unobservable

Notice scale:
Library cache load lock no < 1400 secs max
longer observable

31
Results

 Response time improvement from 8000


seconds (worst case) to 900 seconds
 Variance improvement:
• Before: response time 2400 - 8000
sec
• After: response time 800 - 900 sec

32
Example 2: Performance Drain –
Identify the Source

 Slow response reported


 DBA and database focus
of delays
 Database problem?

 No – SQL*Net Message
identified as source of
delay
 2nd highest wait event

33
RMM Drill Down identifies source of
problem
 Single application
generates all SQL*Net
Messages
 App on same server as
Oracle!

 Answer:
 Misconfiguration – TCP/IP
used within server
 Change to IPC, eliminate
NIC traffic and 30% of
wait time

Solution requires knowing: Which SQL, What Wait Time, Which


Resource

34
Example 3: Scattered Reads
 Situation: LINS06 database - Hourly profile identifies high
wait anomaly
 3-10x higher than other periods – requires investigation

wait time
42,000 seconds
10:00-11:00

35
Drill Down to Key RMM Parameters

Notice scale:
> 6000 secs
Db file
scattered
reads

Db file
scattered reads
36
Conclusion

 Look for what has an impact


 Resource Mapping is more that Wait
Time – Analysis must include:
• SQL level granularity
• Full Resource granularity
 Isolating the SQL and Resource allows
you to find and fix the Root Cause
 DBAs can have an impact and be
heroes!

37
Thank you for coming

Matt Larson

Contact Information
• mattlarson@confio.com
• 303-938-8282 ext. 110
• Company website
www.confio.com

38

Anda mungkin juga menyukai