Oracle Latch Contention Troubleshooting

I wrote a latch contention troubleshooting article for IOUG Select journal last year (it was published earlier this year). I have uploaded this to tech.E2SN too, I recommend you to read it if you want to become systematic about latch contention troubleshooting:

http://tech.e2sn.com/oracle/troubleshooting

I’m working on getting the commenting & feedback work at tech.E2SN site too, but for now you can comment here at this blog entry…

Note that this year’s only Advanced Oracle Troubleshooting class takes place in the end of April/May 2014, so sign up now if you plan to attend this year!

This entry was posted in Cool stuff, Oracle and tagged , , , , , , , , , , . Bookmark the permalink.

12 Responses to Oracle Latch Contention Troubleshooting

  1. Edgar says:

    Tanel
    I was testing latchprfx.sql on 11R2 but seems not working ? any idea

    Regards

  2. Tanel Poder says:

    @Edgar
    download the latest version from tech.e2sn.com – i fixed an issue with 11gR2 new transformation which caused the join order to be wrong

  3. CJ says:

    Hi Tanel,

    I am confused about a trivial issue. I was reading the following excerpt from an article:

    A Concepual Knowledge on Oracle Blocks Read Algorithm is as under :

    1. User Issues a Query.

    2. Query is Parsed and Optimal Execution path is generated and stored in the Library Cache.

    3. Based on the Execution Path, required Index or Table block is searched in the the Cache.

    4. If the block is not found in the cache, a disk read request is made, and the block is read into the Cache.

    5. The block, from the Cache is then read into a private memory area (UGA) of the User.

    6. Once the Block is read into the private memory of the User, the required row is fetched.

    http://viveklsharma.wordpress.com/2010/03/04/consistent-gets-myth/

    I was a bit shocked by steps number 5 & 6. My own perception was that the Oracle server process will read only selected information from the buffer cache into the UGA.

    What is your take on this?

    Many thanks.

  4. Ashish says:

    Hi Tanel,

    We faced the issue of library cache latch contentions due to which CPU was 0% idle continuously. There was one index rebuild session which was holding the library cache lock. Others were waiting with “library cache: Mutex X”. Some times I was able to see other session coming and going with “Library cache lock”. But index rebuild was there.

    I was not able to check which session was holding lock. Whenever I used to run the query to check the holder of library cache lock, it was displaying “library cache: Mutex X” and never used come back.

    So I decided to kill the index rebuild. But it’s Journal table did not clean up after killing the index rebuild and still I was able to see that index rebuild session which was holding the Library cache lock. Manual clean up of the journal table was also stuck.

    Here my question is what can we do in this case? This database was critical. I tried for around 25 mins and then decided to go for failover of the database to standby.

    Thanks,
    Ashish

  5. CJ says:

    @CJ
    Hi Tanel,

    It’s been a while but this question has started bothering me again, so I thought I might ask again.

    Suppose a block is in the buffer cache and server process needs to fetch a row from that block, after taking the latch etc etc. Will it:

    A. Copy the block from the share memory into the process private memory (UGA/CGA/PGA) and then read the row needed?

    or

    B. Read the requisite row from the block while it resides in the shared memory itself?

    or

    C. Something totally different?

    Now the difficult part – how to prove that the server process is indeed doing A or B or C? My hunch is option B, but I can’t think of how to prove it one way or another.

    Many thanks.

  6. Tanel Poder says:

    @CJ
    CJ,

    The buffer is pinned (in share mode for reading) and then only the needed rows are extracted from the buffer. So no extra copy is happening. There’s an optimization (the “consistent gets – examination” thing), where the buffer isn’t even pinned but read under the protection of the CBC latch only. This is done for index root/branch blocs in some cases (when Oracle knows in advance that the block access doesn’t take long time), like single row lookup from an unique index…

  7. CJ says:

    Brilliant, thanks a lot Tanel.

    If you were asked to *prove* that a copy was not happening, how would you do that?

    Thanks.

  8. Tanel Poder says:

    @CJ
    Use a dev instance, suspend all other processes so they couldn’t do LIOs and then run various queries on various index types (with equality predicate and range predicate) and then measure the “consistent gets”, “consistent gets – examination” and correlate this to the number of CBC latch gets from v$latch. Snapper allows to show all (you can run it with gather=a option to get latch gets too)

    That’s the easiest way and assumes that Oracle’s v$sesstat + v$latch instrumentation works properly. There are more hard-core ways too ;-)

  9. CJ says:

    @Tanel Poder
    Ok, I can try it, I have a local instance where the only user session is my own.

    I expect that I will see the increase in gets/immediate gets in v$latch to match the number of consistent gets in v$sesstat.

    But how will it prove that the block wasn’t copied to the server memory CGA/UGA (transparently without increasing any counters)??

    I think hard-core ways would be required :-)

  10. Tanel Poder says:

    @CJ
    Or regarding the copy, there’s no counter for that.

    You can use a debugger for that, create a block with two rows with values ADBCDEF and GHIJKLM in them.

    Then run a query which selects only one of them and suspend the process when row is about to be returned (network call).

    And then scan through the process private memory with debugger to see whether the row you selected is there in process private mem and whether the other row in that block is not.

  11. Tanel Poder says:

    @Tanel Poder

    Make sure that your test is clean – start up a completely new process for your final test etc

  12. CJ says:

    Or maybe I will just take your word for it ;)

    Seriously, I will give it a try. Thanks a lot.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>