Anda di halaman 1dari 7

An investigation of

dataserver CPU & IO


utilization to resolve
performance bottlenecks

By Prasanna Kirtani

July 2009
Introduction
Any database management system be it Sybase, Oracle,
SQLserver or any other DBMS, relies to on finite
availability of hardware and software components. These
components make up the resource pool of the system. Of
interest to me when I troubleshoot database performance is
Disk IO, CPU and memory.

Performance is a measure of how the DBMS uses the I/O, CPU


and memory to process DDL and DML requests.

• Performance is typically calculated in terms of


throughput and response time.
• The aim of performance tuning is to optimize the
utilization of CPU, Disk IO and Memory.
• The goal of performance troubleshooting is to
identify performance bottlenecks by examining one
variable or component at a time. We have to be aware
that resolving one bottleneck can or in a lot of cases
uncover previously hidden problems.

Through this document, I would like to share my approach to


troubleshooting performance issues on Sybase ASE.

Where Do We Start?
Unfortunately, this is a question that is faced every time
a performance problem is encountered.

There are lots of areas that could be responsible for


degradation of performance on a DBMS, Some of which are
outside of the database server.

For Sybase ASE, whenever a poor server performance is


reported, even before I register my mind that there seems
to be a performance issue, I do the following

1. Check connectivity to the dataserver


2. Check the dataserver errorlog for any unfamiliar
messages
3. Login to the dataserver and check for blocking, log
suspend and a filled up tempdb
4. Run a simple select query in the master database

Once I assure myself that the dataserver is functioning


normally, I proceed with my performance investigation

CPU, IO and Memory utilization


Each DBMS has its own flavor of commands that help isolate
performance issues. There is no silver bullet which can
magically pinpoint performance bottlenecks. What I have
seen to be most effective is to start looking at a problem
by capturing CPU, IO and memory utilization within the
dataserver. A few readings of these three basics variables
can provide us with clues as to where to look next to
pinpoint the performance bottleneck.

For the Sybase ASE, I find store procedure “sp_monitor” to


be most useful. “sp_monitor” is a well known stored
procedure available for the longest time on ASE that
captures CPU busy and IO busy statistics since the last run
of the same procedure. Sybase like other DBMS now provides
a lot more features that can help with performance
troubleshooting but I have prefer sp_monitor for its
simplicity and ease of use. “sp_monitor” provides a first
insight at what is going on in the dataserver as far as IO
and CPU utilization is concerned.

Note: To collect information relating to “memory”, I


use another procedure “sp_sysmon”. Specifically, I
look for two values “buffer hit ratio” and “spinlock
contention”.

“Buffer hit ratio” reports the % times a requested


datapage is found in the data cache. A buffer hit
ratio of less than 97% could indicate memory
starvation. If sufficient amount of memory is not
allocated to the data caches the first thing to do
would be to increase memory.

“Spinlock Contention” reports the number of times an


engine encountered spinlock contention on the cache,
and had to wait to acquire memory. Splitting the data
caches into cachelets will reduce spinlock contention
in case of SMP environments

Further investigation using


sp_monitor
So, moving on, readings of sp_monitor over a few minutes
provide real time CPU busy and I/O busy statistics.
Reviewing CPU and IO utilization on the dataserver gives us
four combinations of CPU and IO values.

The dataserver would be experiencing either Low CPU


utilization / Low IO utilization, High CPU utilization/Low
IO utilization, Low CPU utilization/High IO utilization or
High CPU utilization/High IO utilization. Let’s review
these four possibilities in a bit more detail

Low CPU utilization (below 80%)/Low IO utilization (below


80%)

If CPU and IO utilization reported by dataserver is low, it


is quite likely that Sybase is not the source of poor
performance. In such as case, I work with the application
team and OS administrators to investigate components
outside of the ASE realm. Some of the areas that merit
investigation are

• Client Application and API Issues


• Network issues
• Operating system performance ( output of top, vmstat,
iostat etc )
• Problems with the Sybase interfaces file on the etc.
• Problems with database connectivity tools

High CPU utilization (over 95%)/Low IO utilization (below


80%)

High CPU utilization is not a bad thing. However consistent


high CPU utilization of over 95-98% indicates that the
dataserver may be experiencing one of more of the following
situations:
• A lot of queries performing aggregation operations
i.e. sum, avg, max, min etc. These operations are CPU
intensive. A long term solution is to implement
incremental aggregation.
• The data is over normalized. As a consequence there is
need to join 5 or more tables to extract even simple
result sets. Such extent of normalization causes data
to be memory bound resulting in a lot of logical I/O
that shows up as CPU cycles.
• The dataserver does not have sufficient capacity to
process DML/DDL requests. The only alternative could
be to increase the number of engines/CPUs or migrate
to better hardware.
• Too many parallel queries running simultaneously..
causing large number of worker threads overwhelming
dataserver engines.
• Improperly sized data cache buffer pools forces the
optimizer to choose the incorrect caching strategy.
This could result in data being swapped in and out of
memory. The swapping shows up as CPU and IO cycles.
• Improperly set configuration parameters can
drastically increase CPU utilization by unnecessarily
wasting CPU cycles. An example is a high ‘runnable
process search count’

Immediate steps can be taken to help decrease CPU load.


These include setting parallelism to 1, reducing the number
of queries running simultaneously on the dataserver,
increasing the number of online engines and ensuring
that the dataserver is the only principal process on the
machine where it is hosted and adjusting configuration
parameters to reduce CPU overhead.

Low CPU utilization (below 80%)/High IO utilization (over


95%)

High I/O utilization may result from the following

• The data profile has changed rendering the index


statistics useless. This results in the query
optimizer selecting improper indexes or running table
scans causing IO to spike.
• Lack of proper indexes cause tables scans especially
when queries involve multiple table joins that
translates to I/O requirements to blow out of
proportion.
• DSS and OLTP applications are being hosted on the same
dataserver. DSS applications typically need massive
amounts of data pages to be read into memory causing
OLTP data to be swapped out due to LRU algorithm. This
results in an IO heavy dataserver.
• The data is over normalized and hence requires more
data pages to be read and written increasing I/O
utilization
• A bad database design dictates complex queries to be
written, the optimizer chooses incorrect indexes or
proper indexes do not exist causing a lot more I/O to
be performed.
• Again, improperly sized data buffer pools cause
incorrect caching strategy resulting in data being
swapped in and out of memory. For example a small 16K
buffer pool would cause the optimizer to choose the
16K buffer pool to reduce I/O because of its existence
but in reality leads to high I/O due to in and out
page swaps.

In the short term, As with CPU utilization, I/O utilization


can be tamed by scaling down the number of processing
running on the dataserver, performing update statistics if
possible on the most critical tables, resizing data caches,
increasing network packet size, increasing the size of
buffer pools etc.

High CPU utilization(over 95%)/High IO utilization(over 95%)

Good luck! Look at the both of the above paragraphs to


provide relief in the short term.

Conclusion
To conclude our discussion, when we encounter performance
degradation on ASE there are a few things your DBA can do
to provide immediate relief. However there is no simple
bunch of tricks that can permanently and quickly fix
performance issues. Application and Physical Database
Design decisions account for over 80% of your system's
"tunable" performance. I had come across a comment “It is
Not Possible to Tune a Bad Design” in one of the articles I
had read. I find this to be very true. Project resources
should be planned to include thorough review of your
database design for optimal performance. Good luck!

Anda mungkin juga menyukai