Anda di halaman 1dari 36

c  m 

Chris Lawson
Database Specialists, Inc.
www.dbspecialists.com

clawson@dbspecialists.com
ocus of Presentation
‡ Explore some ³strange´ database problems that have
baffled some DBAs
‡ Most of the mysteries occurred on critical production
systems, although some were on development systems
‡ ALL of the mysteries were eventually explained
‡ Depending on your personal experience, some of these
³mysteries´ will seem trivial or commonplace; others
will indeed seem mysterious
‡ Most mysteries have a simple explanation
‡ Most mysteries have a simple fix
^hy Spend Time on These
Database Mysteries?
‡ Each DBA has a unique set of experiences and biases.
^hat one DBA thinks is obvious, another will not.
‡ An Oracle ³detective´ is part scientist, part artist.
Many solutions require creativity, not just logic.
‡ A superior DBA will look for ways to ³stretch´ and
learn ways to handle difficult problems.
‡ ^ithout working out difficult problems, you will not
advance as a DBA.
‡ You will be the ³hero´ if you encounter a mystery and
solve it; remember the solution--you may see it again!
A ^ord About Oracle Versions
‡ This presentation was originally written in
1998
‡ Most of these mysteries involve Oracle 7
databases
‡ Although some of the mysteries might not
apply directly to Oracle 8i, they still offer
insight into the problem-solving process

The Case of the
Berserk Application
P  
  P 
‡ Using HPUX, Oracle 7.3.2.3
‡ Help desk application (Vantive) that connects to Oracle
database suddenly goes berserk, creating thousands of
connections
‡ Program had worked normally for many months
‡ DBAs watch helplessly as CPU load driven from 1 to 50
‡ As DBAs kill extra processes, more take their place
‡ Alert log and recent trace files show nothing unusual
‡ DBAs are united in accusing the | | as the culprit
Berserk Application
( )
P  
    
‡ Running Sun Solaris, Oracle 7.3.2.3
‡ Users complain that performance has degraded in recent
months
‡ Manager states that ³something must be wrong with the
network´
‡ Application is P , a document management/printing
application
‡ DBA investigates. Discovers that time to connect in
SQL*Plus is 30-45 seconds, even though server load is low
‡ Connect time is bad whether remote (PC) or directly on server
‡ Server load (file I/O and CPU) is generally low
Berserk Application: Ã  
‡ P is the culprit. It is active by default on
many 7.3 Oracle versions
‡ Excerpt from Oracle Corporation Alert:

 
   
   |   |
 |     |    |
 
Berserk Application: Ã  
( )
c  

‡ Check directory ORACLE_HOME/rdbms/otrace: As


size of files process.dat and regid.dat approach 10mb,
problems arise
ÿ ÿ ÿÿ ÿÿ  
       
ÿ ÿ ÿÿ ÿÿ  
     

‡ To correct: simply remove these two files, then issue


command  
Berserk Application: Ã  
( )
c 

‡ Add line to listener.ora for each database (after


ORACLE_HOME):
! "#$Ã "P%Ã " c&'"
‡ Set and export environment variable
"P%Ã " c&'" for all users.
Put standard profile in /etc directory
‡ Restart all databases and restart listener

The Case of the Reluctant Patch
(

‡ To correct several bugs, decision is made to


upgrade from 7.3.2.2 to 7.3.2.3 (HPUX)
‡ Patch is obtained from Oracle and applied to test
server. DBA notes that patch ran very quickly and
runs again ³just to be sure´
‡ Bug is now gone on test server
The Case of the Reluctant Patch
( )
 

‡ Patch is similarly applied to production server--


same operating system and version.
‡ Production application is tested, but bug is still
there!
‡ Another DBA reviews patch file, location, etc.
All seem correct.
The Case of the Reluctant Patch:
à 
‡ DBA happens to notice that upon SQL*Plus
startup, database is )!
‡ The patch was really only applied on the  
run. This is apparently a quirk in the patch | 
file.
‡ The command |  | (then grep for patch#)
can be used to determine which patches are
applied
)
The Case of the Sleazy SQL
‡ ³Big Publisher Ltd.´ runs an MRP system called
³AVALON,´ similar to Oracle Manufacturing.
Database stores inventory, part information,
vendors, etc.
‡ Server is ATT3555, running NCR UNIX.
Database is Oracle 7.1.6
‡ Issue: Users report that certain common operations
are very slow
The Case of the Sleazy SQL
( )
‡ DBA investigates and queries *+   using:
à  
 

 


  !

‡ Query yields troublesome SQL statement, with


these stats:
%Ã&"Ã"&","P'c%-# 
'.."&"cÃ"&","P'c%-#  
The Case of the Sleazy SQL
( )
‡ Statement has been accidentally designed to ensure  
 
   | by making index usage   
:
Ã""Pc/.&-0 P "&"
#$!P- #$!
 #
#$!P-, #$!
 #
#$!P- #$!
) #
#$!P- #$!
1
‡ TABLE ABC SIZE = 3mb, about 25,000 rows
‡ DBA requests developers to alter statement to eliminate
NVL (COL_N) functions
‡ DBA advised that no resources available to make change
‡  
If code can¶t be changed, what can be done to
improve performance?
The Case of the Sleazy SQL:
à 
‡ CACHE the table! or example:
   23 
‡ Normally, blocks from full-table scans are designated for
rapid age-out; otherwise, they would ³wipe-out´ the db
cache. Cache of table causes blocks to be treated ³normally.´
‡ Caching table disables rapid age-out of this table
‡ Logical reads will not be reduced, but disk reads approach
zero!
‡ Note:   -P '.."&à was slightly increased to
compensate for the cached table that now consumes a few
megabytes of database cache
1
The Case of the
Non-Optimal Optimizer
‡ A large software company based in ³Cedar Shores´ has
designed a large financials application. Program has been
tuned for  |    .
‡ The application runs very well, is a mature product, which
is used in thousands of companies around the world.
‡ Some users clamor for new features: more horns and
whistles
The Case of the
Non-Optimal Optimizer
( )
‡ The new development team, afraid to become obsolete,
wants to convert to Cost-Based Optimizer (CBO). They
also wisely consider that Oracle recommends using CBO
on new projects.
‡ The older developers, now nearing peaceful retirement,
predict disaster if the database is switched to CBO,
because the execution plans will change.
‡ Issue: How can Optimizer be   to CBO
without changing the code?
The Case of the Non-Optimal
Optimizer: Ã  
‡ Simply substitute a view having a ³hint´ for the table
needing CBO
or example:
rename DEPT to DEPT_ORIG
create view DEPT as select / * + ALL_RO^S */
* from DEPT_ORIG;
‡ Now, application will use the VIE^ when it looks for
DEPT
‡ All queries using DEPT will use CBO
‡ Note: Upon renaming a table, the indexes and constraints
will ³move´ with the table; however, synonyms and grants
may need to be reset.

The Case of the
orgetful Memory
‡ A new internet-transaction application, P
 , and its
database have been installed on a Sun Ultra Enterprise Server
‡ Sun Solaris 2.5.1, Oracle 7.2.3
‡ Application appears to run smoothly for several months,
although it occasionally creates large dump files
‡ Trace files appear occasionally with ORA-4030 ³Out of Process
Memory´ and recommends ³increase process memory quota´
‡ Server seems to hang occasionally. Server reboot fixes
‡ SysAdmin checks kernel parameters related to memory. All
correct and match other servers. Not using any large stored
procedures
‡ Problem: ^hat is causing memory/hang problems?
The Case of the
orgetful Memory: Ã  
‡ DBA checks / (swap area on server) and notes 
  
‡ Investigation reveals that application occasionally goes
berserk and consumes ENTIRE S^AP area with log
files
‡ Deletion of log files does not return disk space, since
application is still ³holding´ the files
‡ Reboot of server cleaned up / area, thereby
correcting problem
‡ Suggestion: If memory-related error messages exist,
check swap area first
4
The Reluctant Index Affair
(

‡ DBA asked to analyze and tune Australia manufacturing


database. Database is running CBO. One particularly
bothersome SQL statement is identified
‡ The ^HERE condition is perfect for a new index,
because of its excellent selectivity
‡ Index is quickly created. Table is also analyzed
 

‡ Even though index is a ³perfect´ solution to the query, a


full table scan is used instead
The Reluctant Index Affair:
à 
‡ The values in the table are very lopsided. Optimizer,
however, will assume uniform distribution, which is
incorrect in many cases
‡ Re-analyze and specify histogram:
0 0"# 0 $"#  0%  $ & ÃÃ%# '(

‡ This creates histogram of 75 ³buckets´ for each indexed


column
‡ ^ith these statistics, optimizer will ³know´ how values are
distributed, and will more often make right decision to use
an index or not
 Mystery of the
Hanging Database
‡ At random intervals, a 7.3.2.3 database hangs. No trace
files, and nothing unusual in the alert log.
‡ ^hen problem occurs, no response to new connections
requests; over 1200 existing connections ³hang.´
‡ Oracle Support is alerted to priority 1 problem; experts
across the world investigate for weeks
‡ DBA is using OEM Lock Manager tool and notices user
who is blocking about 25 other users. The hang occurs
soon after.
‡ Oracle Australia recommends checking indexes. This
suggestion led to the solution.
 

‡ How did index problems hang database?


Mystery of the
Hanging Database: Ã  
‡ Application design flaw
‡ There are hundreds of foreign keys in the database; 99%
had indexes. A few did not, violating good design practice.
^hen batch program began updates, locking increased
rapidly.
‡ ^ithout  index, updates on |  table completely
block updates on  (vice versa for 7.1.6)
‡ Reference: Server Application Developers Guide
‡ Although not admitted as database ³bug,´ database was
overwhelmed by the locks
‡ Once indexes on all  ¶s created, problems disappeared

The Case of the
Mysterious Package
‡ Manufacturing application was installed on a Sun Ultra
3000 server. A small database was created for testing
purposes. Oracle version 7.2.3. Shared pool size about 60mb.
‡ At first, all went well. Then, seemingly randomly, when
the users began to try new features, they would receive a
³funny´ error message and the application failed.
‡ A trace file recommended increasing shared pool
 

‡ How can application fail with such a sizable shared pool?


‡ Aside from massive increase in shared pool, what can be
done?
The Case of the
Mysterious Package: Ã  
‡ The application uses about 20 massive PL/SQL packages.
Some are 5x the SYS.STANDARD package. ^hen a
package load is attempted, it will not fit in the shared pool.
‡ Memory-intensive packages should be ³pinned´ or ³kept´
in shared pool after database startup
$ & Ã"Ã)*+,
-)

-./  0 /0!

‡ But first, must find the ³big´ packages (will also list
SYS.STANDARD):
à  1
2+
2,*
+
+ **3
 ,

 ,*
+
+ !
The Case of the
Mysterious Package: Ã  
( )
‡ Example script to find ³big´ packages and generate
SQL script to ³pin´ them in memory
à     ÃÃ
   !"##
 ######$%
&'()*+

, 
-. '   
/
01231(41"54 -6Ã1(1)607 $
0121(48 9Ã.0' 2:((9
012 4Ã1(1%

The Case of the
Uncooperative Rollback
‡ In mid-afternoon, DBA (running ³OEM Top Sessions´)
notices many users ³ACTIVE´ but showing 0 file I/O. Lock
Manager reveals one user performing big update blocking all.
‡ Culprit tracked down--agrees to be terminated. DBA
disconnects session.
‡ Locks are not released, but user is ³marked for kill.´
‡ Very little file I/O activity. Alert log shows very slow
switching of redo logs.
‡ DBA performs shutdown abort then startup. Database starts
up after 2 minutes. All is well.
 

‡ ^hy did user not rollback and release locks?


The Case of the Uncooperative
Rollback: Ã  
‡ If session is terminated, speed of rollback is proportional
to init.ora parameter
P"#'&- P"#c&%"Ã
‡ If default value (20) is used, rollback of killed session
can take 50x time of original update. Alternatively,
shutdown abort/startup cleans up database much faster.
‡ Rationale: Parameter prevents rollback of one user from
hogging all the resources on a busy system
‡ Solution: Increase parameter to reduce rollback time
(since shutdown abort is usually not an option)

The Singular Case of the
Phantom Users
‡ A manufacturing database in Sydney, Australia, needed
performance tuning. SQL tuning on US databases had
yielded good results.
‡ The table v$sqlarea was queried to find resource-intensive
SQL statements. Several commonly-run statements were
isolated. Performance was improved through index additions.
‡ Statistics were re-examined over the next 4 hours, in order to
confirm improvements.
‡ However, repeated looks at execution statistics showed no
change.
‡ DBA puzzles over enigma for several hours, then realizes
that NOTHING is ^RONG! ^hat did he finally realize?
The Singular Case of the
Phantom Users: Ã  
‡ Nothing is wrong because the users were
still asleep. It was only 5:00am in Sydney!

The Case of the Slow Physician
(Bonus Mystery)
‡ Health application is experiencing slow run times. Analysis shows
following SQL statement causing 3000 disk reads
à  45+Ã%6 7%   8/ )9 #% /
‡ P   is a join of 2 tables (DOCS + COSIGN),
joined on patient_id (indexed)
‡ Search criteria µDR. MC ENZIE¶ is   ; thus,
   IS expected choice for optimizer, with -PÃ
as Driving table.
The Case of the Slow Physician
( )
‡ Even with index on -PÃ! , optimizer (CBO)
insists on using hash-join, and   to ever use   on
  !
‡ Repeated || |
 commands do not correct
‡ Substituting query    a view yields expected NL
result
 

‡ ^hy does using the view cause optimizer to make the


³wrong´ choice?
The Case of the Slow Physician:
à 
‡ Everything seemed to point to a problem with the view,
because all worked normally as long as the view was
excluded
‡ inally, DBA compared view definition (in OEM Schema
Manager) to definition seen using ³describe table´ syntax.
The columns did NOT match!
‡ Examining the object-create script revealed that the view
   |  , so that column DOC_ID in the
view did NOT match DOC_ID in the table!
‡ Once the correct column was indexed, a Nested-Loop Join
was selected by the optimizer
Contact Information
Chris Lawson
clawson@dbspecialists.com

55666   

Database Specialists, Inc.


388 Market Street, Suite 400
San rancisco, CA 94111

Anda mungkin juga menyukai