Anda di halaman 1dari 11

Deadlock Caused By Library Cache Lock

Introduction
The in-house Oracle performance analyzer tool can be very handy and efficient to analyze performance issue triggered by library cache lock. Here I will present a real life case to analyze and resolve such type of performance issues in minutes.

The Issue

On 01/03/2014, DB sp2-stgadmdb suffered from library cache lock contention. Some queries have waited for more than 15 hours. From Oracle performance analyzer, we can see the active session list, wait events, running time and wait time. Here I sorted the list by column SEC_WAIT. Oracle performance analyzer provides a lock finder tool via context menu: Find Lock Holder. It can be used to retrieve the lock queue and lock holder.

The Lock Queue


With one click, the lock queue information is displayed in another tab. The session I started the context menu has requested exclusive lock on table AMD_LB_FACT.LB_ACTIVE_DAILY_AGGR. There are 19 entries in the table queue (the first row is the session I started the lock search, and it is duplicated later in the list.), most of them are requesting the lock in Share mode.

The Lock Holder


Sort by the column MODE_HELD, we can find the session which holds the library cache lock, sid 625 on node 2. Why does the session hold the lock for so long? The wait event for the holder session is also displayed. It is waiting for PX Deq: Parse Reply. Basically, it is a PX operation and the QC session is waiting for parallel slave processes to complete child cursor parsing. Still, why wait for so long (check WAIT_SEC)? Oracle Performance Analyzer provides context menu to check individual session, or all sessions of a PX operation by QC, or all sessions of similar SQL_ID. Since the holder has PX operation, Track PX Operation is used for further research.

The PX Operation Sessions


Now we can see all the sessions related to the PX operation. Use the bottom pane we can dig into SQL specific information like SQL text. From the list, we can see all but one slave sessions are waiting for cursor: pin S wait on X, a typical event when PX operation runs into trouble for parsing. Usually one session is working on the parsing, which is either got blocked, or slowly loading library objects into shared pool. Here we can see one slave session is also waiting on library cache lock, so this session is the bottleneck for parsing. We can further use context menu to find the lock chain.

Recursive SQL
Note the SQL_ID of the slave processes is different from the QC session. The bottom pane can be used to identify the actual SQL. The slave processes are working on a recursive SQL from parsing process. Basically, the SQL is used to figure out partition pruning. The SQL_ID without child number indicates the related sessions have not done the parsing.

The Lock Chain

Find Lock Holder function, triggered from the slave session waiting for library cache lock, displays another list of the lock queue. The lock holder is the query QC session. The lock is held in Share mode. Why the blocked slave sessions cannot be granted the lock with Share mode? The SYS session (sid 13, node 2) is requesting the lock with Exclusive mode. It also requested before the PX slave sessions (check WAIT_SEC, it has waited for 54160 seconds).

How Could That Happen?


Here is my understanding:
When the PX query started, the QC session spent too much time for parsing. Possibly the shared pool is not large enough, or the required objects not loaded into shared pool. Before QC completed the parsing, the SYS query (a stat job, see next slide) needs update the table statistics, so it requested exclusive lock on the same table, and is blocked by library cache lock. When the QC session of PX operation completed its parsing, the parallel slave process parsings could not get library cache lock, because the SYS session had requested exclusive lock before them. Since the QC session is not going to release the lock, we basically get deadlocked. Still, it is very interesting why cannot slave processes inherit the lock from PQ process?

The Troublemaker
The context menu Track This Session can be used to track the SYS session which is requesting exclusive lock. To break the deadlock, either the SYS stats job, or the query, has to be killed.

Back To Normal
Here is the DB status minutes after I cleared the SYS stats job session. All but one queries have running time longer than 1 minute. At the time I am writing this slide (within one hour), all user queries are completed and the only visible active sessions are from Oracle Perf analyzer. As a side node, if a stat job got killed, better to restart it later. Missing or incomplete stats could cause other performance issues.