Another LatchProfX use case

Tanel Poder

2009/03/20

Riyaj Shamsudeen wrote an excellent article about systematic latch contention troubleshooting.

Especially if the latch contention problem is ongoing, looking into system wide stats (like v$latch.sleep columns) is not the best idea in busy systems. This may sometimes lead you to fixing the wrong problem.

This is because sometimes the latch contention is not caused by some system wide inefficiency but rather by one or few sessions.

The right approach would be to measure the following things:

  1. Which latch (and exact child latches) the problem session is waiting for (query v$session_wait, ASH, sql_trace)
  2. Why is the problem session trying to get that latch so often (query v$sesstat counters for problem session and check execution plan if only single/few statements executed)
  3. Why this latch is busy: who’s holding it (query v$latchholder, LatchProf)
  4. Why is that someone holding the latch so much (query v$sesstat counters for that session or run LatchProfX)

Riyaj used this approach and successfully found out the troublemaker causing heavy library cache latch contention. He used my LatchProfX tool for part of the diagnosis (and I’m very happy to see that my advanced oracle troubleshooting tools find real world use in hands of other troubleshooters/tuners too!).

Here’s an excerpt from latchprofx output (from a little test case I put together), showing which session is holding which library cache latch how much, hold mode (shared/exclusive) and also the actual reason (function name) why the latch is held:

SQL> @latchprofx sid,name,laddr,ksllwnam,ksllwlbl,hmode 159 % 100000

-- LatchProfX 1.08 by Tanel Poder (  )

 SID NAME           LADDR            KSLLWNAM  KSLLWLBL  HMODE        Held  Gets  Held %  Held ms Avg hold ms
---- -------------- ---------------- --------- --------- ---------- ------ ----- ------- -------- -----------
 159 library cache  00000003A93513E0 kglic     child     exclusive   10044  9314   10.04  301.320        .032
 159 library cache  00000003A9351340 kglic     child     exclusive    9895  8458    9.90  296.850        .035
 159 library cache  00000003A9351660 kglic     child     exclusive    9761  8440    9.76  292.830        .035
 159 library cache  00000003A9351700 kglic     child     exclusive    9737  8924    9.74  292.110        .033
 159 library cache  00000003A9351480 kglic     child     exclusive    8999  7765    9.00  269.970        .035
 159 library cache  00000003A93515C0 kglic     child     exclusive    8553  7832    8.55  256.590        .033
 159 library cache  00000003A9351520 kglic     child     exclusive    6852  6828    6.85  205.560        .030

7 rows selected.

kglic means Kernel Generic Library cache Iterate Chain (AFAIK), it’s the function which is executed when you access most X$KGL tables. And this is where Riyaj got the evidence that the latch contention comes from inefficient scanning of library cache, caused by a session running queries against V$SQL_PLAN views with a bad plan.

So, check out Riyaj’s articele, my LatchProfX script, then my Advanced Oracle Troubleshooting Seminar :)

Comments powered by Talkyard.