<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Tanel Poder's blog: Core IT for Geeks and Pros &#187; Search Results  &#187;  latchprof++latchprofx</title>
	<atom:link href="http://blog.tanelpoder.com/search/latchprof++latchprofx/feed/rss2/" rel="self" type="application/rss+xml" />
	<link>http://blog.tanelpoder.com</link>
	<description>Oracle troubleshooting, internals and performance tuning</description>
	<lastBuildDate>Fri, 03 Sep 2010 08:46:20 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>New versions of LatchProf and LatchProfX for latch contention troubleshooting and tuning</title>
		<link>http://blog.tanelpoder.com/2010/02/15/new-versions-of-latchprof-and-latchprofx-for-latch-contention-troubleshooting-and-tuning/</link>
		<comments>http://blog.tanelpoder.com/2010/02/15/new-versions-of-latchprof-and-latchprofx-for-latch-contention-troubleshooting-and-tuning/#comments</comments>
		<pubDate>Tue, 16 Feb 2010 03:01:51 +0000</pubDate>
		<dc:creator>Tanel Poder</dc:creator>
				<category><![CDATA[Administration]]></category>
		<category><![CDATA[Cool stuff]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[Performance]]></category>
		<category><![CDATA[Troubleshooting]]></category>

		<guid isPermaLink="false">http://blog.tanelpoder.com/?p=620</guid>
		<description><![CDATA[The LatchProf and LatchProfX scripts allow you to be more systematic with latch contention troubleshooting and tuning. No more guesswork is needed as these scripts give you exact session IDs and in this version also SQLIDs of the troublemaking applications. You can download the new versions here: LatchProf (reads V$ views) LatchProfX (reads X$ tables, [...]]]></description>
			<content:encoded><![CDATA[<p>The LatchProf and LatchProfX scripts allow you to be more systematic with latch contention troubleshooting and tuning. No more guesswork is needed as these scripts give you exact session IDs and in this version also SQLIDs of the troublemaking applications.</p>
<p>You can download the new versions here:</p>
<ul>
<li><a href="http://blog.tanelpoder.com/wp-content/plugins/wordpress-feed-statistics/feed-statistics.php?url=aHR0cDovL3d3dy50YW5lbHBvZGVyLmNvbS9maWxlcy9zY3JpcHRzL2xhdGNocHJvZi5zcWw=" target=\"_blank\">LatchProf</a> (reads V$ views)</li>
<li><a href="http://blog.tanelpoder.com/wp-content/plugins/wordpress-feed-statistics/feed-statistics.php?url=aHR0cDovL3d3dy50YW5lbHBvZGVyLmNvbS9maWxlcy9zY3JpcHRzL2xhdGNocHJvZnguc3Fs" target=\"_blank\">LatchProfX</a> (reads X$ tables, but gives better info, run as SYS)</li>
</ul>
<p>Example output (with SQLID info) is below:</p>
<pre class="brush: plain;">SQL&gt; @latchprof name,sid,sqlid % % 100000

-- LatchProf 1.21 by Tanel Poder ( http://www.tanelpoder.com )

NAME                                       SID SQLID               Held       Gets  Held %     Held ms Avg hold ms
----------------------------------- ---------- ------------- ---------- ---------- ------- ----------- -----------
cache buffers chains                       133 3jbwa65aqmkvm        349        348     .35      14.169        .041
simulator lru latch                        133 3jbwa65aqmkvm         51         51     .05       2.071        .041
row cache objects                          133 3jbwa65aqmkvm          5          5     .01        .203        .041
cache buffers chains                        24                        5          5     .01        .203        .041
cache buffers chains                       149 3jbwa65aqmkvm          3          3     .00        .122        .041
resmgr group change latch                   33 147a57cxq3w5y          2          2     .00        .081        .041
cache buffers chains                         9 5raw2bzx227wp          2          1     .00        .081        .081
In memory undo latch                       149 f3y38zthh270n          2          1     .00        .081        .081
active checkpoint queue latch                5                        2          1     .00        .081        .081
cache buffers chains                       149 75621g9y3xmvd          2          2     .00        .081        .041
cache buffers chains                         9 gvgdv2v90wfa7          2          2     .00        .081        .041
cache buffers chains                        33 75621g9y3xmvd          2          2     .00        .081        .041
checkpoint queue latch                       5                        1          1     .00        .041        .041
ksuosstats global area                       8                        1          1     .00        .041        .041
cache buffers lru chain                    133 3jbwa65aqmkvm          1          1     .00        .041        .041
multiblock read objects                    155 75ju2gn3s8009          1          1     .00        .041        .041
resmgr group change latch                    9 0w2qpuc6u2zsp          1          1     .00        .041        .041
resmgr group change latch                   33 apgb2g9q2zjh1          1          1     .00        .041        .041
resmgr group change latch                  133 apgb2g9q2zjh1          1          1     .00        .041        .041
space background task latch                 17                        1          1     .00        .041        .041
cache buffers chains                       149 5raw2bzx227wp          1          1     .00        .041        .041
cache buffers chains                        33 5raw2bzx227wp          1          1     .00        .041        .041
cache buffers chains                        33 05s4vdwsf5802          1          1     .00        .041        .041
cache buffers chains                        31 0yas01u2p9ch4          1          1     .00        .041        .041
cache buffers chains                         9 41zu158rqf4kf          1          1     .00        .041        .041
In memory undo latch                        33 0bzhqhhj9mpaa          1          1     .00        .041        .041
In memory undo latch                        31 gvgdv2v90wfa7          1          1     .00        .041        .041
In memory undo latch                         9 gvgdv2v90wfa7          1          1     .00        .041        .041
simulator lru latch                        149 3jbwa65aqmkvm          1          1     .00        .041        .041
row cache objects                          133 5yq51dtyc6qf2          1          1     .00        .041        .041
SQL memory manager workarea list la        133 3jbwa65aqmkvm          1          1     .00        .041        .041
enqueues                                   141                        1          1     .00        .041        .041
row cache objects                          132                        1          1     .00        .041        .041

33 rows selected.
</pre>
<p>LatchProf scripts allow you to easily identify which session and SQLID (or sqlhash in 9i) cause the latch(es) to be held the most.</p>
<p>Let&#8217;s check what&#8217;s the most &#8220;latch-holding&#8221; SQL reported by LatchProf:</p>
<pre class="brush: sql;">SQL&gt; @sqlid 3jbwa65aqmkvm

HASH_VALUE  CH# SQL_TEXT
---------- ---- ------------------------------------------------------------------------------------------------------------------------------------------------------
1432996723    0 SELECT O.ORDER_ID, LINE_ITEM_ID, PRODUCT_ID, UNIT_PRICE, QUANTITY, ORDER_MODE, ORDER_STATUS, ORDER_TOTAL, SALES_REP_ID, PROMOTION_ID, C.CUSTOMER_ID,
 CUST_FIRST_NAME, CUST_LAST_NAME, CREDIT_LIMIT, CUST_EMAIL, ORDER_DATE FROM ORDERS O , ORDER_ITEMS OI, CUSTOMERS C WHERE O.ORDER_ID = OI.ORDER_ID AND
 O.CUSTOMER_ID = C.CUSTOMER_ID AND O.ORDER_STATUS &lt;= 4
</pre>
<p>If you want to read more about the capabilities of LatchProf and LatchProfX, go here:</p>
<ul>
<li><a href="http://blog.tanelpoder.com/wp-content/plugins/wordpress-feed-statistics/feed-statistics.php?url=aHR0cDovL2Jsb2cudGFuZWxwb2Rlci5jb20vMjAwOC8wNy8wOS9hZHZhbmNlZC1vcmFjbGUtdHJvdWJsZXNob290aW5nLWd1aWRlLXBhcnQtNy1zYW1wbGluZy1sYXRjaC1ob2xkZXItc3RhdGlzdGljcy11c2luZy1sYXRjaHByb2Yv">http://blog.tanelpoder.com/2008/07/09/advanced-oracle-troubleshooting-guide-part-7-sampling-latch-holder-statistics-using-latchprof/</a></li>
<li><a href="http://blog.tanelpoder.com/wp-content/plugins/wordpress-feed-statistics/feed-statistics.php?url=aHR0cDovL2Jsb2cudGFuZWxwb2Rlci5jb20vMjAwOC8wNy8yMy9hZHZhbmNlZC1vcmFjbGUtdHJvdWJsZXNob290aW5nLWd1aWRlLXBhcnQtOC1ldmVuLW1vcmUtZGV0YWlsZWQtbGF0Y2gtdHJvdWJsZXNob290aW5nLXVzaW5nLWxhdGNocHJvZngv">http://blog.tanelpoder.com/2008/07/23/advanced-oracle-troubleshooting-guide-part-8-even-more-detailed-latch-troubleshooting-using-latchprofx/</a></li>
</ul>
<p>Note that the latest version (v1.21) also fixes a problem with Oracle 11.2 where the script execution plan order was wrong, causing the sampling to not happen in correct order. I added a NO_TRANSFORM_DISTINCT_AGG hint to disable a new transformation happening in 11.2 to make the scripts behave correctly&#8230;</p>
<div class="facebook_like_button"><iframe src="http://www.facebook.com/plugins/like.php?href=http%3A%2F%2Fblog.tanelpoder.com%2F2010%2F02%2F15%2Fnew-versions-of-latchprof-and-latchprofx-for-latch-contention-troubleshooting-and-tuning%2F&amp;layout=standard&amp;show-faces=true&amp;width=450&amp;action=like&amp;font=arial&amp;colorscheme=light" scrolling="no" frameborder="0" allowTransparency="true" style="padding: 0px 0px; border:none; overflow:hidden; width:450px; height:70px;"></iframe></div> <img src="http://blog.tanelpoder.com/wp-content/plugins/wordpress-feed-statistics/feed-statistics.php?view=1&post_id=620" width="1" height="1" style="display: none;" /><p><a class="a2a_dd addtoany_share_save" href="http://www.addtoany.com/share_save"><img src="http://blog.tanelpoder.com/wp-content/plugins/add-to-any/share_save_171_16.png" width="171" height="16" alt="Share/Bookmark"/></a> </p>]]></content:encoded>
			<wfw:commentRss>http://blog.tanelpoder.com/2010/02/15/new-versions-of-latchprof-and-latchprofx-for-latch-contention-troubleshooting-and-tuning/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>KGL simulator, shared pool simulator and buffer cache simulator &#8211; what are these?</title>
		<link>http://blog.tanelpoder.com/2009/09/13/kgl-simulator-shared-pool-simulator-and-buffer-cache-simulator-what-are-these/</link>
		<comments>http://blog.tanelpoder.com/2009/09/13/kgl-simulator-shared-pool-simulator-and-buffer-cache-simulator-what-are-these/#comments</comments>
		<pubDate>Mon, 14 Sep 2009 00:13:39 +0000</pubDate>
		<dc:creator>Tanel Poder</dc:creator>
				<category><![CDATA[Administration]]></category>
		<category><![CDATA[Internals]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[Performance]]></category>

		<guid isPermaLink="false">http://blog.tanelpoder.com/?p=444</guid>
		<description><![CDATA[Note: I accidentially published this note while I was writing it, so you may have seen a partial version of it. It&#8217;s complete now. If you have queried v$sgastat on recent Oracle versions (by which I mean 9i and above) you probably have seen allocations for some sort of simulators in Oracle instance. Here&#8217;s an [...]]]></description>
			<content:encoded><![CDATA[<p><em>Note: I accidentially published this note while I was writing it, so you may have seen a partial version of it. It&#8217;s complete now.</em></p>
<p>If you have queried v$sgastat on recent Oracle versions (by which I mean 9i and above) you probably have seen allocations for some sort of simulators in Oracle instance. Here&#8217;s an example:</p>
<pre>SQL&gt; select * from v$sgastat where lower(name) like '%<strong>sim</strong>%' order by name;

POOL         NAME                            BYTES
------------ -------------------------- ----------
shared pool  kglsim alloc latch area          1700
shared pool  kglsim alloc latches               68
shared pool  kglsim count of pinned he        9248
shared pool  kglsim free heap list             204
shared pool  kglsim free obj list              204
shared pool  kglsim hash table                4104
shared pool  kglsim hash table bkts        2097152
shared pool  kglsim heap                    635536
shared pool  kglsim latch area                1700
shared pool  kglsim latches                     68
shared pool  kglsim main lru count           87040
shared pool  kglsim main lru size           174080
shared pool  kglsim object batch            909440
shared pool  kglsim pin list arr               816
shared pool  kglsim recovery area             2112
shared pool  kglsim sga                      22092
shared pool  kglsim size of pinned mem       18496
shared pool  ksim client list                   84
shared pool  log_simultaneous_copies           480
shared pool  sim cache nbufs                   640
shared pool  sim cache sizes                   640
shared pool  sim kghx free lists                 4
shared pool  sim lru segments                  640
shared pool  sim segment hits                 1280
shared pool  sim segment num bufs              640
shared pool  sim state object                   48
shared pool  sim trace buf                    5140
shared pool  sim trace buf context             120
shared pool  sim_knlasg                       1200
shared pool  simulator hash buckets          16512
shared pool  simulator latch/bucket st        4608

31 rows selected.</pre>
<p>See, a bunch of &#8220;kgl sim&#8221; and then just &#8220;sim&#8221; allocations.</p>
<p>&#8230; or sometimes you can see latch contention on following latches:</p>
<pre>SQL&gt; select name from v$latch where name like '%sim%';

NAME
-------------------------------------------------------
ksim membership request latch
simulator lru latch
simulator hash latch
sim partition latch
shared pool simulator
shared pool sim alloc

6 rows selected.</pre>
<p>Again, there seems to be some &#8220;simulation&#8221; work going on in Oracle instance.</p>
<p>So what are these simulators about?</p>
<p><span id="more-444"></span></p>
<p>These simulators help Oracle to determine how much time and effort would have been saved if buffer cache or shared pool size was bigger than current. And this info is summarized and externalized in various advisor views such V$SHARED_POOL_ADVICE, V$MTTR_TARGET_ADVICE, V$DB_CACHE_ADVICE and so on. These simulation engines were introduced in Oracle 9i along the mentioned V$ views.</p>
<p>Also this stuff is used by MMAN for making SGA resizing decisions if SGA_TARGET (10g+) or MEMORY_TARGET (11g+) is enabled.</p>
<p>I decided to write an article about simulators as quite a few people have been asking me about it lately. So, I first googled around to see if someone has already written about it already and found one <a href="http://blog.tanelpoder.com/wp-content/plugins/wordpress-feed-statistics/feed-statistics.php?url=aHR0cDovL3N0YWNrb3ZlcmZsb3cuY29tL3F1ZXN0aW9ucy8xODM5NjMvd2hhdC1pcy10aGUtb3JhY2xlLWtnbC1zaW11bGF0b3I=">here</a>. It turned out it was written by me, I just didn&#8217;t know about it anymore :)</p>
<p>So, I will write an edited version of my original explanation here. Note that I don&#8217;t know the exact algorithm of the simulator, just explaining the fundamentals here.</p>
<p>KGL simulator is the shared pool memory allocation tracking and simulation engine.</p>
<p>KGL=Kernel General Library cache manager, as the name says it deals with library objects such cursors, cached stored object definitions (PL/SQL stored procs, table definitions etc).</p>
<p>KGL simulator is used for estimating the benefit of caching if the cache was larger than currently. The general idea is that when flushing out a library cache object, it&#8217;s hash value (and few other bits of info) are still kept in the KGL simulator hash table. This stores a history of objects <em><strong>which were in memory in past but got flushed out</strong></em>. That&#8217;s the key part: we keep a history of objects flushed out (this is done by selective sampling though, not at every object flush as it would have too much overhead).</p>
<p>When loading a library cache object into library cache (which means that no existing such object is in there already), Oracle goes and checks the KGL simulator hash table to see whether an object with matching hash value is in there. If a matching object is found, that means that the required object had been in cache in past, but flushed out due space pressure. That&#8217;s the second important part: we know when we would have been able to (probably) just reuse an object if the shared pool had been bigger.</p>
<p>Using that information of how many library cache object (re)loads could have been been avoided if cache had been bigger (thanks to KGL simulator history) and also thanks to knowing how much time the (current) object reloads took, Oracle can predict how much response time would have been saved instancewide if shared pool was bigger. So, we don&#8217;t need to increase shared pool itself to measure the benefit, we <em><strong>simulate</strong></em> a bigger shared pool by just keeping a history of aged out library cache objects hash values in a special array &#8211; and that&#8217;s why there are various kglsim and simulator allocations seen in V$SGASTAT.</p>
<p>The simulation results are seen from v$shared_pool_advice and are used by other advisors such ADDM.</p>
<p>Apparently Oracle keeps track also about how much memory is pinned in the shared pool (thus can&#8217;t be flushed out) over time. This info is probably used also for estimating shared pool resize requirements as pinned memory chunks are not usable for anything else (as long as they are pinned) in the shared pool. These measurements are also done on sampling basis.</p>
<p>I did a little test with my <a href="http://blog.tanelpoder.com/wp-content/plugins/wordpress-feed-statistics/feed-statistics.php?url=aHR0cDovL3d3dy50YW5lbHBvZGVyLmNvbS9maWxlcy9zY3JpcHRzL2xvdHNocGFyc2VzLnNxbA==" target=\"_blank\">lotshparses.sql</a> script (which just runs a lot of hard parses inside a PL/SQL loop) and ran latchprofx to report which latches and why was my session using. Note the &#8220;sim&#8221; instead of usual % in the latchprofx syntax. I am interested in only these latches which have &#8220;sim&#8221; in their name in this experiment:</p>
<pre>SQL&gt; <a href="http://blog.tanelpoder.com/wp-content/plugins/wordpress-feed-statistics/feed-statistics.php?url=aHR0cDovL3d3dy50YW5lbHBvZGVyLmNvbS9maWxlcy9zY3JpcHRzL2xhdGNocHJvZnguc3Fs" target=\"_blank\">@latchprofx</a> sid,name,func,hmode 146 <em><strong>sim</strong></em> 100000

-- LatchProfX 1.12 by Tanel Poder ( http://www.tanelpoder.com )

       SID NAME                                FUNC                                     HMODE              Held       Gets  Held %
---------- ----------------------------------- ---------------------------------------- ------------ ---------- ---------- -------
 146 shared pool simulator               kglsim_unpin_simhp                       exclusive           514         49     .51
 146 shared pool simulator               kglsim_upd_newhp                         exclusive           206        198     .21
 146 shared pool simulator               kglsim_scan_lru: scan                    exclusive            95         93     .10
 146 shared pool simulator               kglsim_chg_simhp_free                    exclusive            32         32     .03
 146 shared pool simulator               kglsim_chg_simhp_noob                    exclusive            32         32     .03
 146 shared pool sim alloc               kglsim_chk_heaplist: alloc               exclusive             4          3     .00
 146 shared pool sim alloc               kglsim_chk_objlist: alloc                exclusive             3          2     .00
 146 shared pool simulator               kglsim_chk_objlist: extend               exclusive             1          1     .00
 146 shared pool simulator               kglsim_scan_lru: rebalance               exclusive             1          1     .00</pre>
<p>This simulation engine is pretty cool because its the first time someone has tried to measure the <strong><em>time</em></strong> what could be saved by increasing shared pool size. (Response) Time is the ultimate measure of performance, so this simulator measures the right thing. Do you remember what people used before that? Library cache hit ratio! A lot of people back then (and some still today) have been misleaded by measuring the wrong things like hit ratios.</p>
<p>Anyway, while this is a cool feature &#8211; it can cause trouble in big systems. This is because the extra CPU usage for simulation and also the potential concurrency issues like simulator latch contention (like the shared pool simulator latch above, these latches are used even on 11g where most library cache latches have been replaced with KGX mutexes for efficiency and avoiding false contention issues). Also, some people are sensitive about the potential large memory allocations the shared pool simulator can take from shared pool (this is controlled by <em><strong>_kglsim_max_percent</strong></em> parameter which defaults to max 5% of total shared pool size). As these allocations are dynamic, then when your application suddenly starts making lots of hard parses due newly released code, causing memory shortage, then the simulator can make things even worse as it can make relatively big allocations for itself too! And as far as I know the simulator arrays don&#8217;t shrink back.</p>
<p>Note that I&#8217;m talking about big systems here, in most other cases the simulation shouldn&#8217;t cause trouble (on recent db versions anyway).</p>
<p>In some databases I have disabled all of the advisors and simulators with setting <em><strong>statistics_level = basic</strong></em>. All features what you see from V$STATISTICS_LEVEL would be disabled when setting statistics_level to basic:</p>
<pre>SQL&gt; select statistics_name,activation_level from v$statistics_level;

STATISTICS_NAME                          ACTIVAT
---------------------------------------- -------
Buffer Cache Advice                      TYPICAL
MTTR Advice                              TYPICAL
Timed Statistics                         TYPICAL
Timed OS Statistics                      ALL
Segment Level Statistics                 TYPICAL
PGA Advice                               TYPICAL
Plan Execution Statistics                ALL
Shared Pool Advice                       TYPICAL
Modification Monitoring                  TYPICAL
Longops Statistics                       TYPICAL
Bind Data Capture                        TYPICAL
Ultrafast Latch Statistics               TYPICAL
Threshold-based Alerts                   TYPICAL
Global Cache Statistics                  TYPICAL
Active Session History                   TYPICAL
Undo Advisor, Alerts and Fast Ramp up    TYPICAL
Streams Pool Advice                      TYPICAL
Time Model Events                        TYPICAL
Plan Execution Sampling                  TYPICAL
Automated Maintenance Tasks              TYPICAL
SQL Monitoring                           TYPICAL
Adaptive Thresholds Enabled              TYPICAL
V$IOSTAT_* statistics                    TYPICAL
.
23 rows selected.</pre>
<p>Sometimes you don&#8217;t want to disable absolutely everything, but only the troublemakers. The good news is that there&#8217;s an (undocumented) parameter for disabling every feature separately. As always, make sure that you  know what you&#8217;re doing before changing these and get blessing from Oracle support in order to not get yourself in trouble.</p>
<p>So, for example the library cache memory simulation (shared pool advice) can be disabled by setting <strong><em>_library_cache_advice = false</em></strong> .</p>
<p>I covered shared pool and library cache above. Oracle has more cache advisors, the underlying principle in them is the same. They save some small metadata/info about flushed out objects in a simulator array and just check that array when someone is allocating memory for new object. If matching object&#8217;s metadata still happens to be in the simulator array, then Oracle knows that we could have avoided the time and effort loading (generating) that object back into memory if the corresponding cache had been bigger. Of course the algorithm must be more sophisticated than just checking for object&#8217;s existence, there are probably object&#8217;s flush timestamp and size stored in the simulator arrays as well.</p>
<p>While the library cache simulator array stores library cache objects hash values for looking up historical memory allocations metadata, then the buffer cache advisor stores just data block addresses (DBAs) of blocks previously in cache. When Oracle spots that a block was previously in cache while intiating physical IO for re-reading it back again, it will measure the time it took to complete the current physical IO and add this info into the simulator array. Again, that&#8217;s how Oracle can measure how much <strong><em>response time</em></strong> would have been saved if the block would still have been in the buffer cache.</p>
<p>The buffer cache advisor needs some special attention, as I have had most issues with that (on large systems again), mostly with buffer cache&#8217;s <em><strong>simulator lru latch</strong></em> contention in systems with lots of CPUs and physical IOs going on. The problem here is that even though you can have hundreds of thousands of cache buffers chains latches in a system, the number of simulator lru latches remains relatively low by default, causing contention.</p>
<p>Oracle provides a documented parameter for disabling the buffer cache simulation &#8211; you need to set <em><strong>db_cache_advice</strong></em> = off (or ready).</p>
<p>So, hopefully this sheds some light into how the advisors work and what the heck the simulators simulate in Oracle. Even though I provided a list of parameters which allow you to control this stuff &#8211; don&#8217;t change any of them unless you actually have a problem which they should fix (and you are sure that these parameters would fix <em>your</em> problem).</p>
<p><strong><br />
</strong></p>
<p><strong>Update:</strong></p>
<p>Jonathan Lewis has written a good article about buffer cache advisor details on his website &#8211; I recommend to read that one too if you&#8217;re interested in this stuff:</p>
<p><a href="http://blog.tanelpoder.com/wp-content/plugins/wordpress-feed-statistics/feed-statistics.php?url=aHR0cDovL3d3dy5qbGNvbXAuZGVtb24uY28udWsvQ2FjaGVfYWR2aWNlLmh0bWw=" target=\"_blank\">http://www.jlcomp.demon.co.uk/Cache_advice.html</a></p>
<div class="facebook_like_button"><iframe src="http://www.facebook.com/plugins/like.php?href=http%3A%2F%2Fblog.tanelpoder.com%2F2009%2F09%2F13%2Fkgl-simulator-shared-pool-simulator-and-buffer-cache-simulator-what-are-these%2F&amp;layout=standard&amp;show-faces=true&amp;width=450&amp;action=like&amp;font=arial&amp;colorscheme=light" scrolling="no" frameborder="0" allowTransparency="true" style="padding: 0px 0px; border:none; overflow:hidden; width:450px; height:70px;"></iframe></div> <img src="http://blog.tanelpoder.com/wp-content/plugins/wordpress-feed-statistics/feed-statistics.php?view=1&post_id=444" width="1" height="1" style="display: none;" /><p><a class="a2a_dd addtoany_share_save" href="http://www.addtoany.com/share_save"><img src="http://blog.tanelpoder.com/wp-content/plugins/add-to-any/share_save_171_16.png" width="171" height="16" alt="Share/Bookmark"/></a> </p>]]></content:encoded>
			<wfw:commentRss>http://blog.tanelpoder.com/2009/09/13/kgl-simulator-shared-pool-simulator-and-buffer-cache-simulator-what-are-these/feed/</wfw:commentRss>
		<slash:comments>14</slash:comments>
		</item>
		<item>
		<title>latch: cache buffers chains latch contention &#8211; a better way for finding the hot block</title>
		<link>http://blog.tanelpoder.com/2009/08/27/latch-cache-buffers-chains-latch-contention-a-better-way-for-finding-the-hot-block/</link>
		<comments>http://blog.tanelpoder.com/2009/08/27/latch-cache-buffers-chains-latch-contention-a-better-way-for-finding-the-hot-block/#comments</comments>
		<pubDate>Thu, 27 Aug 2009 09:37:08 +0000</pubDate>
		<dc:creator>Tanel Poder</dc:creator>
				<category><![CDATA[Cool stuff]]></category>
		<category><![CDATA[Internals]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[Performance]]></category>
		<category><![CDATA[Tools]]></category>
		<category><![CDATA[Troubleshooting]]></category>

		<guid isPermaLink="false">http://blog.tanelpoder.com/?p=417</guid>
		<description><![CDATA[Here&#8217;s a treat for Oracle performance professionals and geeks who are looking for more systematic ways for cache buffers chains (CBC) latch contention troubleshooting. Cache buffers chains latches are taken when a process wants to walk through a cache buffer hash chain, looking if the block with required DBA (data block address) is in buffer [...]]]></description>
			<content:encoded><![CDATA[<p><!--[if !mso]> <mce:style><!  v\:* {behavior:url(#default#VML);} o\:* {behavior:url(#default#VML);} p\:* {behavior:url(#default#VML);} .shape {behavior:url(#default#VML);} v\:textbox {display:none;} --> <!--[endif]--></p>
<div>
<p>Here&#8217;s a treat for Oracle performance professionals and geeks who are looking for more systematic ways for cache buffers chains (CBC) latch contention troubleshooting. Cache buffers chains latches are taken when a process wants to walk through a cache buffer hash chain, looking if the block with required DBA (data block address) is in buffer cache. If the block happens to be in cache, then in most cases it has to be pinned first before use and unpinned after use, to make sure no-one else can perform an incompatible operation on that block at the same time. The modification of pin structures (pinning/unpinning) is also protected by CBC latches.</p></div>
<div>
<p>CBC latch contention can happen for multiple reasons, but one reason is that there is some <em>really</em> hot block in a SMP system with high number of CPUs (or CMT system with high number of threads like Sun T-series servers). Sometimes there happen to be multiple moderately hot blocks &#8220;under&#8221; the same CBC latch, which can result in latch contention again.</div>
<div>
<p>Traditionally DBAs used to look up the <strong>child latch address</strong> from V$SESSION_WAIT, sql_trace output or ASH and then look up all buffers protected by that latch from X$BH, using HLADDR column (HLADDR stands for Hash Latch Address). I also have a script for that, <a href="http://blog.tanelpoder.com/wp-content/plugins/wordpress-feed-statistics/feed-statistics.php?url=aHR0cDovL3d3dy50YW5lbHBvZGVyLmNvbS9maWxlcy9zY3JpcHRzL2JobGEuc3Fs" target=\"_blank\">bhla.sql</a> (Buffer Headers by Latch Address), which reports me all blocks currently in buffer cache, &#8220;under&#8221; that particular latch and the corresponding data block addresses and object names:</div>
<div><span id="more-417"></span></div>
<pre>SQL&gt; <strong>@bhla </strong>27E5A780  <em>&lt;-- latch address reported by wait interface (Parameter1 in latch: cache buffers chains wait event)</em>

FLG_LRUFLG                 OBJ OBJECT_TYPE         object                                          TCH DBA     
------------------- ---------- ------------------- ---------------------------------------- ---------- ---------
80000:8                      2 TABLE               SYS.SUBCOLTYPE$                                   0 1 7873  
80000:8                      2 TABLE               SYS.ATTRCOL$                                      0 1 7873  
0:0                      50472 INDEX               SYSMAN.MGMT_JOB_EXEC_IDX04                        1 3 27699 
0:0                      50472 INDEX               SYSMAN.MGMT_JOB_EXEC_IDX04                        1 3 27699 
0:4                      50472 INDEX               SYSMAN.MGMT_JOB_EXEC_IDX04                        1 3 27699 
2202000:8                 3710 TABLE               SYS.WRI$_ADV_REC_ACTIONS                          1 3 972   
0:0                      50472 INDEX               SYSMAN.MGMT_JOB_EXEC_IDX04                        1 3 27699 
0:0                      50472 INDEX               SYSMAN.MGMT_JOB_EXEC_IDX04                        1 3 27699 
2202000:8                94227 TABLE PARTITION     SYS.WRH$_ROWCACHE_SUMMARY                         2 3 36142 
80000:8                  54880 INDEX               PERFSTAT.STATS$UNDOSTAT_PK                       <strong>14 </strong>4 118331
0:8                         37 INDEX               SYS.I_OBJ2                                       <strong>24 </strong>1 55591 
2000:8                    3680 TABLE               SYS.WRI$_ADV_TASKS                               <strong>26 </strong>3 739   
0:8                         75 TABLE               SYS.IDL_UB2$                                     <strong>35 </strong>1 11745 
0:8                         73 TABLE               SYS.IDL_UB1$                                     <strong>38 </strong>1 12211</pre>
<p>From above we see that this child latch is protecting many blocks in buffer cache. So which one is the troublemaker? This is where the touchcount (TCH column) has been traditionally used, the higher the TCH the hotter the buffer, right?</p>
<p>Actually this may not always be true. For example if there is a somewhat hot block which has been steadily in buffer cache for weeks, it may have very high touch count. Now if there is an extremely hot block used by every morning&#8217;s batch job (and causing the trouble) but not used later on in the day, this block will get aged out later in the day. Once a block is aged out, its metadata, including TCH is gone! So when the block is loaded back in next morning (causing trouble again), its touch count starts from zero again.</p>
<p>So, by just looking into touchcount numbers you can&#8217;t always reliably detect who&#8217;s the <em>current</em> troublemaker. One option here would be to query the touchcounts, then wait for a minute, query the touchcounts again and see who&#8217;s touchcount increased the most during the minute. This would give a bit better picture about who is causing trouble <em>now.</em></p>
<p>But still, it would not be always reliable for another reason &#8211; touchcounts are incremented only after 3 seconds have passed since last increment! This factor has been coded in to avoid situation such a short but crazy nested loop join hammering a single buffer hundreds of thousands of times in few seconds and then finishing. The buffer wouldn&#8217;t be hot anymore but the touchcount would be hundreds of thousands due a single SQL execution. So, unless 3 seconds (of SGA internal time) has passed since last TCH update, the touchcounts would not be increased during buffer access.</p>
<blockquote><p>This time is controlled by SGA variable <strong>kcbatt_</strong> by the way:</p>
<pre>SQL&gt; oradebug dumpvar sga kcbatt_
ub4 kcbatt_ [3C440F4, 3C440F8) = 0000000<strong>3

</strong></pre>
</blockquote>
<p>This 3-second delay leaves us in the following situation, let say there are 2 blocks protected by a CBC child latch:</p>
<p>One block has been accessed <strong>once every 3 seconds</strong> for 24 hours in a row. A block accessed once per 3 seconds is definitely not a hot block, but its touchcount would be around 28800 (86400 seconds per 24 hours / 3 = 28800).</p>
<p>And there is another block which is accessed crazily for 2 seconds in a row and this happens every 10 seconds. 2 seconds of consecutive access would increase the touchcount by 1. If such access pattern has been going on every 10 seconds over last 24 hours, then the touch count for that buffer would be 86400 / 10 = 8640.</p>
<p>In the first case we can have a very cold block with TCH = 28800 and in second case a very hot block with TCH = 8640 only and this can mislead DBAs to fixing the wrong problem.</p>
<p>Luckily this problem can be solved with LatchProfX :-)</p>
<p>If you don&#8217;t know what LatchProfX is, you may want to read these articles first &#8211; LatchProf gives you a better way for latch contention troubleshooting compared to old approaches:</p>
<ul>
<li><a href="http://blog.tanelpoder.com/wp-content/plugins/wordpress-feed-statistics/feed-statistics.php?url=aHR0cDovL2Jsb2cudGFuZWxwb2Rlci5jb20vMjAwOC8wNy8wOS9hZHZhbmNlZC1vcmFjbGUtdHJvdWJsZXNob290aW5nLWd1aWRlLXBhcnQtNy1zYW1wbGluZy1sYXRjaC1ob2xkZXItc3RhdGlzdGljcy11c2luZy1sYXRjaHByb2Yv" target=\"_blank\">http://blog.tanelpoder.com/2008/07/09/advanced-oracle-troubleshooting-guide-part-7-sampling-latch-holder-statistics-using-latchprof/</a></li>
<li><a href="http://blog.tanelpoder.com/wp-content/plugins/wordpress-feed-statistics/feed-statistics.php?url=aHR0cDovL2Jsb2cudGFuZWxwb2Rlci5jb20vMjAwOC8wNy8yMy9hZHZhbmNlZC1vcmFjbGUtdHJvdWJsZXNob290aW5nLWd1aWRlLXBhcnQtOC1ldmVuLW1vcmUtZGV0YWlsZWQtbGF0Y2gtdHJvdWJsZXNob290aW5nLXVzaW5nLWxhdGNocHJvZngv">http://blog.tanelpoder.com/2008/07/23/advanced-oracle-troubleshooting-guide-part-8-even-more-detailed-latch-troubleshooting-using-latchprofx/</a></li>
<li><a href="http://blog.tanelpoder.com/wp-content/plugins/wordpress-feed-statistics/feed-statistics.php?url=aHR0cDovL2Jsb2cudGFuZWxwb2Rlci5jb20vMjAwOS8wMy8yMC9hbm90aGVyLWxhdGNocHJvZngtdXNlLWNhc2Uv">http://blog.tanelpoder.com/2009/03/20/another-latchprofx-use-case/</a></li>
</ul>
<p>And now I introduce a new feature in LatchProfX.</p>
<p>In addition to being able to report the SID of the latch holder and the kernel function why this latch was taken by the session, the newest LatchProfX can also report the <strong>object</strong> what was protected by the latch under investigation. For cache buffers chains the object is <em>data block address</em> of the block being accessed.</p>
<p>Here&#8217;s an example, let say I have CBC latch contention problems and from monitoring tools or sql_trace I have identified the hot latch, I use my <a href="http://blog.tanelpoder.com/wp-content/plugins/wordpress-feed-statistics/feed-statistics.php?url=aHR0cDovL3d3dy50YW5lbHBvZGVyLmNvbS9maWxlcy9zY3JpcHRzL3N3LnNxbA==" target=\"_blank\">sw.sql</a> script here, to report what a session is waiting for:</p>
<pre>SQL&gt; <a href="http://blog.tanelpoder.com/wp-content/plugins/wordpress-feed-statistics/feed-statistics.php?url=aHR0cDovL3d3dy50YW5lbHBvZGVyLmNvbS9maWxlcy9zY3JpcHRzL3N3LnNxbA==" target=\"_blank\"><strong>@sw</strong></a> 138
 SID STATE   EVENT                        SEC_IN_WAIT                 P1
------- ------- ---------------------------- ----------- ------------------
    138 WAITING latch: cache buffers chains            0 address=0x<strong>27E5A780</strong></pre>
<p>Sw reports that this session is waiting on cache buffer chains child latch with address 0x27E5a780.</p>
<p>Now we can run latchprofx and report sid, name, hold mode and protected object for that child latch only (instead of latch name we specified its address):</p>
<pre>SQL&gt; <a href="http://blog.tanelpoder.com/wp-content/plugins/wordpress-feed-statistics/feed-statistics.php?url=aHR0cDovL3d3dy50YW5lbHBvZGVyLmNvbS9maWxlcy9zY3JpcHRzL2xhdGNocHJvZnguc3Fs">@latchprofx</a> sid,name,hmode,<strong>object</strong> % <strong>27E5A780</strong> 100000

 SID NAME                   HMODE         OBJECT   Held  Gets  Held %  Held ms
---- ---------------------- ------------ ------- ------ ----- ------- --------
 138 cache buffers chains   exclusive     <strong>40EB02</strong>   3928  3797    3.93   41.637
 151 cache buffers chains   exclusive     <strong>40EB02</strong>   3711  3660    3.71   39.337
 138 cache buffers chains   shared        <strong>40EB02</strong>    623   623     .62    6.604
 151 cache buffers chains   shared        <strong>40EB02</strong>    544   544     .54    5.766</pre>
<p>As you see, LatchProfX has reported a data block address in the object column. This is the datablock we are trying to access in buffer cache under protection of the given child latch.</p>
<p>Lets see what this DBA translates to:</p>
<pre>
<pre>SQL&gt; <strong><a href="http://blog.tanelpoder.com/wp-content/plugins/wordpress-feed-statistics/feed-statistics.php?url=aHR0cDovL3d3dy50YW5lbHBvZGVyLmNvbS9maWxlcy9zY3JpcHRzL2RiYS5zcWw=" target=\"_blank\">@dba</a> 40EB02</strong>

 RFILE#     BLOCK#
---------- ----------
         1      60162

Press enter to find the segment using V$BH (this may take CPU time), CTRL+C to cancel:

STATE      BLOCK_CLASS        OBJECT_TYPE         object                                          TCH  MODE_HELD
---------- ------------------ ------------------- ---------------------------------------- ---------- ----------
xcur       data block         INDEX               SYS.SYS_IOT_TOP_<strong>94276</strong>                           331          1

Press enter to query what segment resides there using DBA_EXTENTS (this can be IO intensive), CTRL+C to cancel:

OWNER                          SEGMENT_NAME         PARTITION_NAME  TABLESPACE_NAME
------------------------------ -------------------- --------------- -----------------
SYS                            SYS_IOT_TOP_94276                    SYSTEM</pre>
<p>So, the troublemaker is an IOT index segment block belonging to table with object_id <strong>94276</strong>. I can find the table name from dba_objects by object_id, or by using my script:</pre>
<pre>SQL&gt;<strong> <a href="http://blog.tanelpoder.com/wp-content/plugins/wordpress-feed-statistics/feed-statistics.php?url=aHR0cDovL3d3dy50YW5lbHBvZGVyLmNvbS9maWxlcy9zY3JpcHRzL29pZC5zcWw=" target=\"_blank\">@oid</a></strong> 94276

owner                     object_name                    object_type        CREATED
------------------------- ------------------------------ ------------------ -----------------
SYS                       <strong>KILL_CPU</strong>                       TABLE              20090825 23:19:49</pre>
<p>Also, as we know the segment name and relative file number (RFILE#) and block number (BLOCK#)  from above output we can dump the datablock with an ALTER SYSTEM DUMP DATAFILE command.</p>
<p>As that command requires absolute file id (not relative) we have to do a little conversion first (in cases where there are multiple files with same relative file number in database):</p>
<pre>SQL&gt; select file_id from dba_extents where <strong>relative_fno</strong> = 1 and <strong>segment_name</strong> = 'SYS_IOT_TOP_94276';

 FILE_ID
----------
 1</pre>
<p>Of course in my little test case all relative file numbers match with absolute numbers (but if you have lots of datafiles, have directly migrated from Oracle 7 or use transportable tablespaces, you may hit this issue).</p>
<p>Finally I can run ALTER SYSTEM DUMP DATAFILE 1 BLOCK 60162 to find out more about the contents of that block. Another option would be to construct a rowid from the data_object_id, rfile and block and query the table using that but as that&#8217;s complicated for IOTs, I&#8217;ll leave this to a future post.</p>
<p>You can download latest version of LatchProfX here:</p>
<ul>
<li><a href="http://blog.tanelpoder.com/wp-content/plugins/wordpress-feed-statistics/feed-statistics.php?url=aHR0cDovL3d3dy50YW5lbHBvZGVyLmNvbS9maWxlcy9zY3JpcHRzL2xhdGNocHJvZnguc3Fs" target=\"_blank\">http://www.tanelpoder.com/files/scripts/latchprofx.sql</a></li>
</ul>
<div class="facebook_like_button"><iframe src="http://www.facebook.com/plugins/like.php?href=http%3A%2F%2Fblog.tanelpoder.com%2F2009%2F08%2F27%2Flatch-cache-buffers-chains-latch-contention-a-better-way-for-finding-the-hot-block%2F&amp;layout=standard&amp;show-faces=true&amp;width=450&amp;action=like&amp;font=arial&amp;colorscheme=light" scrolling="no" frameborder="0" allowTransparency="true" style="padding: 0px 0px; border:none; overflow:hidden; width:450px; height:70px;"></iframe></div> <img src="http://blog.tanelpoder.com/wp-content/plugins/wordpress-feed-statistics/feed-statistics.php?view=1&post_id=417" width="1" height="1" style="display: none;" /><p><a class="a2a_dd addtoany_share_save" href="http://www.addtoany.com/share_save"><img src="http://blog.tanelpoder.com/wp-content/plugins/add-to-any/share_save_171_16.png" width="171" height="16" alt="Share/Bookmark"/></a> </p>]]></content:encoded>
			<wfw:commentRss>http://blog.tanelpoder.com/2009/08/27/latch-cache-buffers-chains-latch-contention-a-better-way-for-finding-the-hot-block/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Done with my seminar in Netherlands &#8211; here&#8217;s some feedback and couple beta new versions of my scripts</title>
		<link>http://blog.tanelpoder.com/2009/04/05/done-with-my-seminar-in-netherlands-heres-some-feedback-and-couple-beta-new-versions-of-my-scripts/</link>
		<comments>http://blog.tanelpoder.com/2009/04/05/done-with-my-seminar-in-netherlands-heres-some-feedback-and-couple-beta-new-versions-of-my-scripts/#comments</comments>
		<pubDate>Sun, 05 Apr 2009 22:09:59 +0000</pubDate>
		<dc:creator>Tanel Poder</dc:creator>
				<category><![CDATA[Cool stuff]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[Performance]]></category>
		<category><![CDATA[Training]]></category>
		<category><![CDATA[Troubleshooting]]></category>

		<guid isPermaLink="false">http://blog.tanelpoder.com/?p=275</guid>
		<description><![CDATA[The Miracle Masterclass I delivered last week in Netherlands was awesome, I had lots of attendees and lots of good questions and feedback. As usual, I ran out of time in the end though. I will make this up with some articles about the stuff I didn&#8217;t have time to explain :) I used Snapper [...]]]></description>
			<content:encoded><![CDATA[<p>The Miracle Masterclass I delivered last week in Netherlands was awesome, I had lots of attendees and lots of good questions and feedback. As usual, I ran out of time in the end though. I will make this up with some articles about the stuff I didn&#8217;t have time to explain :)</p>
<p>I used Snapper v2.0 beta during my demos there and also an improved version of LatchProfX which can give some additional context information about reasons why latches were taken. As they are beta, not final versions, I have uploaded them into a temporary location:</p>
<p><em>Update: These scripts are out of beta now so here are their permanent locations:</em></p>
<ul>
<li><a href="http://blog.tanelpoder.com/wp-content/plugins/wordpress-feed-statistics/feed-statistics.php?url=aHR0cDovL3d3dy50YW5lbHBvZGVyLmNvbS9maWxlcy9zY3JpcHRzL3NuYXBwZXIuc3Fs">http://www.tanelpoder.com/files/scripts/snapper.sql</a></li>
<li><a href="http://blog.tanelpoder.com/wp-content/plugins/wordpress-feed-statistics/feed-statistics.php?url=aHR0cDovL3d3dy50YW5lbHBvZGVyLmNvbS9maWxlcy9zY3JpcHRzL2xhdGNocHJvZnguc3Fs">http://www.tanelpoder.com/files/scripts/latchprofx.sql</a></li>
</ul>
<p>Also, the Sesspack (session level statspack) v0.05 files are in my TPT script set, under tools/sesspack_v0.05 directory in the zip file:</p>
<ul>
<li><a href="http://blog.tanelpoder.com/wp-content/plugins/wordpress-feed-statistics/feed-statistics.php?url=aHR0cDovL3d3dy50YW5lbHBvZGVyLmNvbS9maWxlcy9UUFRfcHVibGljLnppcA==">http://www.tanelpoder.com/files/TPT_public.zip</a></li>
</ul>
<p>Toine van Beckhoven attended <a href="http://blog.tanelpoder.com/wp-content/plugins/wordpress-feed-statistics/feed-statistics.php?url=aHR0cDovL2Jsb2cudGFuZWxwb2Rlci5jb20vc2VtaW5hci8=" target=\"_blank\">the seminar</a> as well and wrote a review of it, if you&#8217;re still hesitating if you should attend as well, read <a href="http://blog.tanelpoder.com/wp-content/plugins/wordpress-feed-statistics/feed-statistics.php?url=aHR0cDovL3RvaW5ldmFuYmVja2hvdmVuLndvcmRwcmVzcy5jb20vMjAwOS8wNC8wMi90aGUtdHJhaW5zLWFyZS1hLW5pZ2h0bWFyZS1pdHMtYS1taXJhY2xlLWFuZC1ob3ctbGl0dGxlLWNhbi15b3Uta25vdy8=" target=\"_blank\">his review</a> first :)</p>
<div class="facebook_like_button"><iframe src="http://www.facebook.com/plugins/like.php?href=http%3A%2F%2Fblog.tanelpoder.com%2F2009%2F04%2F05%2Fdone-with-my-seminar-in-netherlands-heres-some-feedback-and-couple-beta-new-versions-of-my-scripts%2F&amp;layout=standard&amp;show-faces=true&amp;width=450&amp;action=like&amp;font=arial&amp;colorscheme=light" scrolling="no" frameborder="0" allowTransparency="true" style="padding: 0px 0px; border:none; overflow:hidden; width:450px; height:70px;"></iframe></div> <img src="http://blog.tanelpoder.com/wp-content/plugins/wordpress-feed-statistics/feed-statistics.php?view=1&post_id=275" width="1" height="1" style="display: none;" /><p><a class="a2a_dd addtoany_share_save" href="http://www.addtoany.com/share_save"><img src="http://blog.tanelpoder.com/wp-content/plugins/add-to-any/share_save_171_16.png" width="171" height="16" alt="Share/Bookmark"/></a> </p>]]></content:encoded>
			<wfw:commentRss>http://blog.tanelpoder.com/2009/04/05/done-with-my-seminar-in-netherlands-heres-some-feedback-and-couple-beta-new-versions-of-my-scripts/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Another LatchProfX use case</title>
		<link>http://blog.tanelpoder.com/2009/03/20/another-latchprofx-use-case/</link>
		<comments>http://blog.tanelpoder.com/2009/03/20/another-latchprofx-use-case/#comments</comments>
		<pubDate>Fri, 20 Mar 2009 19:35:41 +0000</pubDate>
		<dc:creator>Tanel Poder</dc:creator>
				<category><![CDATA[Internals]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[Performance]]></category>
		<category><![CDATA[Tools]]></category>
		<category><![CDATA[Troubleshooting]]></category>

		<guid isPermaLink="false">http://blog.tanelpoder.com/2009/03/20/another-latchprofx-use-case/</guid>
		<description><![CDATA[Riyaj Shamsudeen wrote an excellent article about systematic latch contention troubleshooting. Especially if the latch contention problem is ongoing, looking into system wide stats (like v$latch.sleep columns) is not the best idea in busy systems. This may sometimes lead you to fixing the wrong problem. This is because sometimes the latch contention is not caused [...]]]></description>
			<content:encoded><![CDATA[<p>Riyaj Shamsudeen wrote an excellent <a href="http://blog.tanelpoder.com/wp-content/plugins/wordpress-feed-statistics/feed-statistics.php?url=aHR0cDovL29yYWludGVybmFscy53b3JkcHJlc3MuY29tLzIwMDkvMDMvMTMvZHluYW1pY19wbGFuX3RhYmxlLXhrcWxmeHBsLWFuZC1leHRyZW1lLWxpYnJhcnktY2FjaGUtbGF0Y2gtY29udGVudGlvbi8=">article</a> about <em>systematic</em> latch contention troubleshooting.</p>
<p>Especially if the latch contention problem is ongoing, looking into system wide stats (like v$latch.sleep columns) is not the best idea in busy systems. This may sometimes lead you to fixing the wrong problem.</p>
<p>This is because sometimes the latch contention is not caused by some system wide inefficiency but rather by one or few sessions.</p>
<p>The right approach would be to <em>measure</em> the following things:</p>
<ol>
<li>Which latch (and exact child latches) the problem session is waiting for (query v$session_wait, ASH, sql_trace)</li>
<li>Why is the problem session trying to get that latch so often (query v$sesstat counters for problem session and check execution plan if only single/few statements executed)</li>
<li>Why this latch is busy: who&#8217;s holding it (query v$latchholder, LatchProf)</li>
<li>Why is that someone holding the latch so much (query v$sesstat counters for that session or run LatchProfX)</li>
</ol>
<p>Riyaj used this approach and successfully found out the troublemaker causing heavy library cache latch contention. He used my <a href="http://blog.tanelpoder.com/wp-content/plugins/wordpress-feed-statistics/feed-statistics.php?url=aHR0cDovL2Jsb2cudGFuZWxwb2Rlci5jb20vMjAwOC8wNy8yMy9hZHZhbmNlZC1vcmFjbGUtdHJvdWJsZXNob290aW5nLWd1aWRlLXBhcnQtOC1ldmVuLW1vcmUtZGV0YWlsZWQtbGF0Y2gtdHJvdWJsZXNob290aW5nLXVzaW5nLWxhdGNocHJvZngv">LatchProfX</a> tool for part of the diagnosis (and I&#8217;m very happy to see that my advanced oracle troubleshooting tools find real world use in hands of other troubleshooters/tuners too!).</p>
<p>Here&#8217;s an excerpt from latchprofx output (from a little test case I put together), showing which session is holding which library cache latch how much, hold mode (shared/exclusive) and also the actual reason (function name) why the latch is held:</p>
<pre><code>SQL&gt; @latchprofx sid,name,laddr,<strong>ksllwnam,ksllwlbl,hmode</strong> 159 % 100000

-- LatchProfX 1.08 by Tanel Poder ( http://www.tanelpoder.com )

 SID NAME           LADDR            KSLLWNAM  KSLLWLBL  HMODE        Held  Gets  Held %  Held ms Avg hold ms
---- -------------- ---------------- --------- --------- ---------- ------ ----- ------- -------- -----------
 159 library cache  00000003A93513E0 <strong>kglic</strong>     child     exclusive   10044  9314   10.04  301.320        .032
 159 library cache  00000003A9351340 kglic     child     exclusive    9895  8458    9.90  296.850        .035
 159 library cache  00000003A9351660 kglic     child     exclusive    9761  8440    9.76  292.830        .035
 159 library cache  00000003A9351700 kglic     child     exclusive    9737  8924    9.74  292.110        .033
 159 library cache  00000003A9351480 kglic     child     exclusive    8999  7765    9.00  269.970        .035
 159 library cache  00000003A93515C0 kglic     child     exclusive    8553  7832    8.55  256.590        .033
 159 library cache  00000003A9351520 kglic     child     exclusive    6852  6828    6.85  205.560        .030

7 rows selected.

</code></pre>
<p>kglic means Kernel Generic Library cache Iterate Chain (AFAIK), it&#8217;s the function which is executed when you access most X$KGL tables. And this is where Riyaj got the evidence that the latch contention comes from inefficient scanning of library cache, caused by a session running queries against V$SQL_PLAN views with a bad plan.</p>
<p>So, check out <a href="http://blog.tanelpoder.com/wp-content/plugins/wordpress-feed-statistics/feed-statistics.php?url=aHR0cDovL29yYWludGVybmFscy53b3JkcHJlc3MuY29tLzIwMDkvMDMvMTMvZHluYW1pY19wbGFuX3RhYmxlLXhrcWxmeHBsLWFuZC1leHRyZW1lLWxpYnJhcnktY2FjaGUtbGF0Y2gtY29udGVudGlvbi8=">Riyaj&#8217;s articele</a>, my <a href="http://blog.tanelpoder.com/wp-content/plugins/wordpress-feed-statistics/feed-statistics.php?url=aHR0cDovL2Jsb2cudGFuZWxwb2Rlci5jb20vMjAwOC8wNy8yMy9hZHZhbmNlZC1vcmFjbGUtdHJvdWJsZXNob290aW5nLWd1aWRlLXBhcnQtOC1ldmVuLW1vcmUtZGV0YWlsZWQtbGF0Y2gtdHJvdWJsZXNob290aW5nLXVzaW5nLWxhdGNocHJvZngv">LatchProfX script</a>, my Latch &amp; Mutex contention troubleshooting <a href="http://blog.tanelpoder.com/wp-content/plugins/wordpress-feed-statistics/feed-statistics.php?url=aHR0cDovL3d3dy50YW5lbHBvZGVyLmNvbS9maWxlcy9MYXRjaF9Mb2NrX0FuZF9NdXRleF9Db250ZW50aW9uX1Ryb3VibGVzaG9vdGluZy5wZGY=">conference slides</a> and if you want more, then my <a href="http://blog.tanelpoder.com/wp-content/plugins/wordpress-feed-statistics/feed-statistics.php?url=aHR0cDovL2Jsb2cudGFuZWxwb2Rlci5jb20vc2VtaW5hci8=">Advanced Oracle Troubleshooting Seminar</a>. I recently added Dallas, Denver, Salt Lake City and Hong-Kong to the list of cities where you can attend it this year!</p>
<div class="facebook_like_button"><iframe src="http://www.facebook.com/plugins/like.php?href=http%3A%2F%2Fblog.tanelpoder.com%2F2009%2F03%2F20%2Fanother-latchprofx-use-case%2F&amp;layout=standard&amp;show-faces=true&amp;width=450&amp;action=like&amp;font=arial&amp;colorscheme=light" scrolling="no" frameborder="0" allowTransparency="true" style="padding: 0px 0px; border:none; overflow:hidden; width:450px; height:70px;"></iframe></div> <img src="http://blog.tanelpoder.com/wp-content/plugins/wordpress-feed-statistics/feed-statistics.php?view=1&post_id=263" width="1" height="1" style="display: none;" /><p><a class="a2a_dd addtoany_share_save" href="http://www.addtoany.com/share_save"><img src="http://blog.tanelpoder.com/wp-content/plugins/add-to-any/share_save_171_16.png" width="171" height="16" alt="Share/Bookmark"/></a> </p>]]></content:encoded>
			<wfw:commentRss>http://blog.tanelpoder.com/2009/03/20/another-latchprofx-use-case/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>Advanced Oracle Troubleshooting @OOW 2008 presentation slides and scripts</title>
		<link>http://blog.tanelpoder.com/2008/09/26/advanced-oracle-troubleshooting-oow-2008-presentation-slides-and-scripts/</link>
		<comments>http://blog.tanelpoder.com/2008/09/26/advanced-oracle-troubleshooting-oow-2008-presentation-slides-and-scripts/#comments</comments>
		<pubDate>Thu, 25 Sep 2008 22:41:23 +0000</pubDate>
		<dc:creator>Tanel Poder</dc:creator>
				<category><![CDATA[Cool stuff]]></category>
		<category><![CDATA[Internals]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[Oracle 11g]]></category>
		<category><![CDATA[Performance]]></category>
		<category><![CDATA[Troubleshooting]]></category>

		<guid isPermaLink="false">http://blog.tanelpoder.com/2008/09/26/advanced-oracle-troubleshooting-oow-2008-presentation-slides-and-scripts/</guid>
		<description><![CDATA[My today morning&#8217;s Advanced Oracle Troubleshooting presentation was quite a success. The 400 people room was fully pre-booked and over 300 people showed up. This is a pretty good result considering that a) my presentation was an 9 am one on the last day of conference and b) there was a big party last night [...]]]></description>
			<content:encoded><![CDATA[<p>My today morning&#8217;s Advanced Oracle Troubleshooting presentation was quite a success. The 400 people room was fully pre-booked and over 300 people showed up. This is a pretty good result considering that a) my presentation was an 9 am one on the last day of conference and b) there was a big party last night :)</p>
<p>Thanks to everyone who attended!</p>
<p>I have uploaded the slides here: <a href="http://blog.tanelpoder.com/wp-content/plugins/wordpress-feed-statistics/feed-statistics.php?url=aHR0cDovL3d3dy50YW5lbHBvZGVyLmNvbS9maWxlcy9BZHZhbmNlZF9PcmFjbGVfVHJvdWJsZXNob290aW5nLnBkZg==">http://www.tanelpoder.com/files/Advanced_Oracle_Troubleshooting.pdf</a></p>
<p>PerfSheet (the Excel based automatic data visualization tool) is uploaded here: <a href="http://blog.tanelpoder.com/wp-content/plugins/wordpress-feed-statistics/feed-statistics.php?url=aHR0cDovL3d3dy50YW5lbHBvZGVyLmNvbS9maWxlcy9QZXJmU2hlZXQuemlw">http://www.tanelpoder.com/files/PerfSheet.zip</a></p>
<p>The TPT scripts (scripts like snapper, latchprofx, ostackprof etc I demoed and a lot more) are uploaded here: <a href="http://blog.tanelpoder.com/wp-content/plugins/wordpress-feed-statistics/feed-statistics.php?url=aHR0cDovL3d3dy50YW5lbHBvZGVyLmNvbS9maWxlcy9UUFRfcHVibGljLnppcA==">http://www.tanelpoder.com/files/TPT_public.zip</a></p>
<p>Few links to my past blog entries about how to use Snapper, LatchProfX and os_explain:</p>
<ul>
<li><a href="http://blog.tanelpoder.com/wp-content/plugins/wordpress-feed-statistics/feed-statistics.php?url=aHR0cDovL2Jsb2cudGFuZWxwb2Rlci5jb20vMjAwOC8wNy8yMy9hZHZhbmNlZC1vcmFjbGUtdHJvdWJsZXNob290aW5nLWd1aWRlLXBhcnQtOC1ldmVuLW1vcmUtZGV0YWlsZWQtbGF0Y2gtdHJvdWJsZXNob290aW5nLXVzaW5nLWxhdGNocHJvZngv">Advanced Oracle Troubleshooting Guide, part 8: LatchProfX</a></li>
<li><a href="http://blog.tanelpoder.com/wp-content/plugins/wordpress-feed-statistics/feed-statistics.php?url=aHR0cDovL2Jsb2cudGFuZWxwb2Rlci5jb20vMjAwNy8xMi8wNi9vcmFjbGUtc2Vzc2lvbi1zbmFwcGVyLXYxMDYtcmVsZWFzZWQv">Oracle Session Snapper v1.06 released</a></li>
<li><a href="http://blog.tanelpoder.com/wp-content/plugins/wordpress-feed-statistics/feed-statistics.php?url=aHR0cDovL2Jsb2cudGFuZWxwb2Rlci5jb20vMjAwOC8wNi8xNS9hZHZhbmNlZC1vcmFjbGUtdHJvdWJsZXNob290aW5nLWd1aWRlLXBhcnQtNi11bmRlcnN0YW5kaW5nLW9yYWNsZS1leGVjdXRpb24tcGxhbnMtd2l0aC1vc19leHBsYWluLw==">Advanced Oracle Troubleshooting Guide, part 6: os_explain</a></li>
</ul>
<p>I haven&#8217;t blogged about usage of OStackProf yet (as the tool&#8217;s very fresh), I will do so in coming week.</p>
<p>Feel free to send any further questions to tanel@tanelpoder.com</p>
<p>Also, if you liked my presentation, you will sure like my 2-day seminar on the same topic. I will go even deeper in there ;-)</p>
<p>Check out this page: <a href="http://blog.tanelpoder.com/wp-content/plugins/wordpress-feed-statistics/feed-statistics.php?url=aHR0cDovL2Jsb2cudGFuZWxwb2Rlci5jb20vc2VtaW5hci8=">http://blog.tanelpoder.com/seminar/</a>. I&#8217;ve arranged my seminar in 18 cities already for 2008 and early 2009, if your city is not in the list, let me know and I&#8217;ll see what I can do :)</p>
<p>Thanks once more for everyone attending my OOW presentation and also my blog readers for continued support!</p>
<div class="facebook_like_button"><iframe src="http://www.facebook.com/plugins/like.php?href=http%3A%2F%2Fblog.tanelpoder.com%2F2008%2F09%2F26%2Fadvanced-oracle-troubleshooting-oow-2008-presentation-slides-and-scripts%2F&amp;layout=standard&amp;show-faces=true&amp;width=450&amp;action=like&amp;font=arial&amp;colorscheme=light" scrolling="no" frameborder="0" allowTransparency="true" style="padding: 0px 0px; border:none; overflow:hidden; width:450px; height:70px;"></iframe></div> <img src="http://blog.tanelpoder.com/wp-content/plugins/wordpress-feed-statistics/feed-statistics.php?view=1&post_id=141" width="1" height="1" style="display: none;" /><p><a class="a2a_dd addtoany_share_save" href="http://www.addtoany.com/share_save"><img src="http://blog.tanelpoder.com/wp-content/plugins/add-to-any/share_save_171_16.png" width="171" height="16" alt="Share/Bookmark"/></a> </p>]]></content:encoded>
			<wfw:commentRss>http://blog.tanelpoder.com/2008/09/26/advanced-oracle-troubleshooting-oow-2008-presentation-slides-and-scripts/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>Oracle hidden costs revealed, Part2 &#8211; Using DTrace to find why writes in SYSTEM tablespace are slower than in others</title>
		<link>http://blog.tanelpoder.com/2008/09/02/oracle-hidden-costs-revealed-part2-using-dtrace-to-find-why-writes-in-system-tablespace-are-slower-than-in-others/</link>
		<comments>http://blog.tanelpoder.com/2008/09/02/oracle-hidden-costs-revealed-part2-using-dtrace-to-find-why-writes-in-system-tablespace-are-slower-than-in-others/#comments</comments>
		<pubDate>Mon, 01 Sep 2008 17:16:50 +0000</pubDate>
		<dc:creator>Tanel Poder</dc:creator>
				<category><![CDATA[Administration]]></category>
		<category><![CDATA[DTrace]]></category>
		<category><![CDATA[Internals]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[Performance]]></category>
		<category><![CDATA[Tools]]></category>

		<guid isPermaLink="false">http://tanelpoder.wordpress.com/2008/09/02/oracle-hidden-costs-revealed-part2-using-dtrace-to-find-why-writes-in-system-tablespace-are-slower-than-in-others/</guid>
		<description><![CDATA[I have written two posts in one, about a performance issue with writes in system tablespace and introduction of a little DTrace stack sampling script. Have you noticed that DML on tables residing in SYSTEM tablespace is slower than tables in other tablespaces? Here&#8217;s an example, I&#8217;ll create two similar tables, one in USERS tablespace, [...]]]></description>
			<content:encoded><![CDATA[<p><em>I have written two posts in one, about a performance issue with writes in system tablespace and introduction of a little DTrace stack sampling script.<br />
</em></p>
<p>Have you noticed that DML on tables residing in SYSTEM tablespace is slower than tables in other tablespaces?</p>
<p>Here&#8217;s an example, I&#8217;ll create two similar tables, one in USERS tablespace, other in SYSTEM, and inset into the first one (Oracle 10.2.0.3 on Solaris x64):</p>
<pre><code>SQL&gt; create table <strong>t1</strong>(a int) tablespace <strong>USERS</strong>;

Table created.

SQL&gt; create table <strong>t2</strong>(a int) tablespace <strong>SYSTEM</strong>;

Table created.

SQL&gt; exec for i in 1..100000 loop insert into <strong>t1</strong> values(i); end loop;

PL/SQL procedure successfully completed.

Elapsed: 00:00:<strong>03.09</strong>

</code></pre>
<p>Insert into table in USERS tablespace took 3 seconds.</p>
<p>Ok, let&#8217;s commit and flush dirty buffers that they wouldn&#8217;t get on the way of next insert.</p>
<pre><code>SQL&gt; commit;

Commit complete.

Elapsed: 00:00:00.02

SQL&gt; alter system checkpoint; <em>-- checkpointing to flush dirty buffers from previous inserts</em>

System altered.

Elapsed: 00:00:01.34
SQL&gt;

</code></pre>
<p>And now to the insert into the SYSTEM tablespace table:</p>
<pre><code>SQL&gt; exec for i in 1..100000 loop insert into <strong>t2</strong> values(i); end loop;

PL/SQL procedure successfully completed.

Elapsed: 00:00:<strong>08.98</strong>
SQL&gt;

</code></pre>
<p>What?! The same insert took 3 times longer, almost 9 seconds?</p>
<p>Fine! Let&#8217;s troubleshoot it!</p>
<p><span id="more-132"></span></p>
<p><!-- more --></p>
<p>First (as usual) I start from my Snapper for getting a performance snapshot of the session. I ran both inserts again and used Snapper from another session to monitor the insert activity (for brevity I didn&#8217;t monitor all stat counters, but used the sinclude filter to leave only various &#8220;gets&#8221; in from V$SESSTAT).</p>
<p>This is the insert to USERS tablespace table:</p>
<pre><code>SQL&gt; @snapper out,gather=tsw,<strong>sinclude=gets</strong> 10 1 148

-- Session Snapper v1.07 by Tanel Poder ( http://www.tanelpoder.com )

---------------------------------------------------------------------------------------------------------------------------------------------
HEAD,     SID, SNAPSHOT START   ,  SECONDS, TYPE, STATISTIC                               ,         DELTA,  DELTA/SEC,     HDELTA, HDELTA/SEC
---------------------------------------------------------------------------------------------------------------------------------------------
DATA,     148, 20080901 17:20:53,       10, STAT, db block gets                           ,        103448,      10345,    103.45k,     10.34k
DATA,     148, 20080901 17:20:53,       10, STAT, db block gets from cache                ,        103448,      10345,    103.45k,     10.34k
DATA,     148, 20080901 17:20:53,       10, STAT, consistent gets                         ,           495,         49,        495,       49.5
DATA,     148, 20080901 17:20:53,       10, STAT, consistent gets from cache              ,           495,         49,        495,       49.5
DATA,     148, 20080901 17:20:53,       10, STAT, consistent gets - examination           ,           219,         22,        219,       21.9
DATA,     148, 20080901 17:20:53,       10, STAT, no work - consistent read gets          ,            89,          9,         89,        8.9
DATA,     148, 20080901 17:20:53,       10, STAT, rollbacks only - consistent read gets   ,            18,          2,         18,        1.8
DATA,     148, 20080901 17:20:53,       10, STAT, cluster key scan block gets             ,            65,          6,         65,        6.5
DATA,     148, 20080901 17:20:53,       10, TIME, hard parse elapsed time                 ,          4090,        409,     4.09ms,      409us
DATA,     148, 20080901 17:20:53,       10, TIME, PL/SQL compilation elapsed time         ,          3139,        314,     3.14ms,    313.9us
DATA,     148, 20080901 17:20:53,       10, TIME, parse time elapsed                      ,          5131,        513,     5.13ms,    513.1us
DATA,     148, 20080901 17:20:53,       10, TIME, PL/SQL execution elapsed time           ,        174288,      17429,   174.29ms,    17.43ms
<strong>DATA,     148, 20080901 17:20:53,       10, TIME, DB CPU                                  ,       3108026,     310803,      3.11s,    310.8ms
</strong>DATA,     148, 20080901 17:20:53,       10, TIME, sql execute elapsed time                ,       3104607,     310461,       3.1s,   310.46ms
DATA,     148, 20080901 17:20:53,       10, TIME, hard parse (sharing criteria) elapsed ti,          4110,        411,     4.11ms,      411us
<strong>DATA,     148, 20080901 17:20:53,       10, TIME, DB time                                 ,       3108026,     310803,      3.11s,    310.8ms
</strong>DATA,     148, 20080901 17:20:53,       10, WAIT, log file switch completion              ,         71104,       7110,     71.1ms,     7.11ms
DATA,     148, 20080901 17:20:53,       10, WAIT, SQL*Net message to client               ,             3,          0,        3us,       .3us
DATA,     148, 20080901 17:20:53,       10, WAIT, SQL*Net message from client             ,       6684030,     668403,      6.68s,    668.4ms
--  End of snap 1

PL/SQL procedure successfully completed.

</code></pre>
<p>You see, the DB CPU equals to DB time (in other words all the time spent servicing my request in database was spent on CPU, thus there were no waits, blocks involved). And the insert still took roughly 3 seconds.</p>
<p>This is the insert to SYSTEM tablespace table:</p>
<pre><code>SQL&gt; @snapper out,gather=tsw,sinclude=gets 10 1 148

-- Session Snapper v1.07 by Tanel Poder ( http://www.tanelpoder.com )

---------------------------------------------------------------------------------------------------------------------------------------------
HEAD,     SID, SNAPSHOT START   ,  SECONDS, TYPE, STATISTIC                               ,         DELTA,  DELTA/SEC,     HDELTA, HDELTA/SEC
---------------------------------------------------------------------------------------------------------------------------------------------
DATA,     148, 20080901 17:21:23,       11, STAT, db block gets                           ,        <strong>102616</strong>,       9329,    102.62k,      9.33k
DATA,     148, 20080901 17:21:23,       11, STAT, db block gets from cache                ,        102616,       9329,    102.62k,      9.33k
DATA,     148, 20080901 17:21:23,       11, STAT, consistent gets                         ,           <strong>384</strong>,         35,        384,      34.91
DATA,     148, 20080901 17:21:23,       11, STAT, consistent gets from cache              ,           384,         35,        384,      34.91
DATA,     148, 20080901 17:21:23,       11, STAT, consistent gets - examination           ,           218,         20,        218,      19.82
DATA,     148, 20080901 17:21:23,       11, STAT, no work - consistent read gets          ,            94,          9,         94,       8.55
DATA,     148, 20080901 17:21:23,       11, STAT, rollbacks only - consistent read gets   ,            17,          2,         17,       1.55
DATA,     148, 20080901 17:21:23,       11, STAT, cluster key scan block gets             ,            65,          6,         65,       5.91
DATA,     148, 20080901 17:21:23,       11, TIME, hard parse elapsed time                 ,          2350,        214,     2.35ms,   213.64us
DATA,     148, 20080901 17:21:23,       11, TIME, PL/SQL compilation elapsed time         ,          1751,        159,     1.75ms,   159.18us
DATA,     148, 20080901 17:21:23,       11, TIME, parse time elapsed                      ,          3359,        305,     3.36ms,   305.36us
DATA,     148, 20080901 17:21:23,       11, TIME, PL/SQL execution elapsed time           ,        172153,      15650,   172.15ms,    15.65ms
<strong>DATA,     148, 20080901 17:21:23,       11, TIME, DB CPU                                  ,       8751163,     795560,      8.75s,   795.56ms
</strong><strong>DATA,     148, 20080901 17:21:23,       11, TIME, sql execute elapsed time                ,       8749139,     795376,      8.75s,   795.38ms
</strong>DATA,     148, 20080901 17:21:23,       11, TIME, hard parse (sharing criteria) elapsed ti,          2367,        215,     2.37ms,   215.18us
<strong>DATA,     148, 20080901 17:21:23,       11, TIME, DB time                                 ,       8751163,     795560,      8.75s,   795.56ms
</strong>DATA,     148, 20080901 17:21:23,       11, WAIT, SQL*Net message to client               ,             3,          0,        3us,      .27us
DATA,     148, 20080901 17:21:23,       11, WAIT, SQL*Net message from client             ,      -1442849,    -131168,     -1.44s,  -131.17ms
--  End of snap 1

PL/SQL procedure successfully completed.

</code></pre>
<p>You see the DB Time is still equal to DB CPU, thus all response was spent on CPU. Moreover, the time model stats tell me that all the elapsed time was spent executing SQL statements (which makes sense as I&#8217;m running INSERT SQL in a tight loop).</p>
<p>So, for some reason the INSERT SQL statements take almost 3 times longer when inserting to SYSTEM tablespace.</p>
<p>In cases where there is no waiting going on (things like Oracle Wait Interface / WaitProf would not help here), but only difference in CPU time, I normally look into V$SESSTAT counter differences first. But as seen from above, there is no major difference in counters such logical IOs (db block gets, consistent gets) to justify 3x increase in CPU usage (also there was no major difference in other counters which I have excluded for brevity).</p>
<p>So, the obvious next step is to sample the process stack during execution of both statements and see where the difference is. In my previous <a href="http://blog.tanelpoder.com/wp-content/plugins/wordpress-feed-statistics/feed-statistics.php?url=aHR0cDovL2Jsb2cudGFuZWxwb2Rlci5jb20vMjAwNy8wOS8wNi9hZHZhbmNlZC1vcmFjbGUtdHJvdWJsZXNob290aW5nLWd1aWRlLXBhcnQtMy1tb3JlLWFkdmVudHVyZXMtaW4tcHJvY2Vzcy1zdGFjay8=" target=\"_blank\">posts</a> I&#8217;ve used pstack for getting a few stack samples out of a process and a pstack loop could be used here as well.</p>
<p>But for demo purposes (and convenience) I will use my <a href="http://blog.tanelpoder.com/wp-content/plugins/wordpress-feed-statistics/feed-statistics.php?url=aHR0cDovL3d3dy50YW5lbHBvZGVyLmNvbS9maWxlcy9zY3JpcHRzL3VuaXgvZHN0YWNrcHJvZi5zaA==">DTrace stack sampling script</a> today.</p>
<p>It&#8217;s dead easy to sample user process and kernel stacks using DTrace in Solaris 10 (and MacOS X and FreeBSD too). However the DTrace stack aggregator also takes account of the current program counter offset inside functions when distinguishing between stacks (this means that CPU program counter location <em>main() + 0&#215;10</em> will be different from <em>main() + 0&#215;12</em>). But I don&#8217;t want such granularity, I want to see where the execution was at function level, not instruction level.</p>
<p>So I wrote my script to strip off the PC function offsets from DTrace output and order the stacks the way that most frequently sampled stack &#8220;branch&#8221; is displayed last, in bottom of the output. So, whatever stack and function you see outputted last, is probably the &#8220;troublemaker&#8221;. I still recommend scrolling up and reviewing few other stacks too.</p>
<p>So, I ran the insert to USERS tablespace table again and ran my <a href="http://blog.tanelpoder.com/wp-content/plugins/wordpress-feed-statistics/feed-statistics.php?url=aHR0cDovL3d3dy50YW5lbHBvZGVyLmNvbS9maWxlcy9zY3JpcHRzL3VuaXgvZHN0YWNrcHJvZi5zaA==">dstackprof</a> script on the server process with SPID 859.</p>
<p>This is from the &#8220;well behaving&#8221; process, so normally you should just compare this stack profile to the &#8220;bad behaving&#8221; process stack profile and see if you can spot any major differences in these. I will explain few of the functions on the go.</p>
<pre><code><strong># ./dstackprof.sh 859
</strong>
DStackProf v1.02 by Tanel Poder ( http://www.tanelpoder.com )
Sampling pid 859 for 5 seconds with stack depth of 100 frames...

147 samples with stack below
__________________
kcbchg1_main
kcbchg1                 &lt;- buffer change function in buffer cache layer
ktuchg                    &lt;- undo layer generate change
ktbchg2                  &lt;- block (header) change
kdtchg                    &lt;- data layer generate change
kdtwrp
kdtInsRow              &lt;- data layer row insert function
insrowFastPath
insdrvFastPath
inscovexe
insExecStmtExecIniEngine
insexe                   &lt;- OPI insert execution entry point
opiexe                   &lt;- OPI execute (executing my recursive INSERT INTO statement)
opipls
opiodr
rpidrus                  &lt;- all the RPI functions belong to recursive program interface (which allows making recursive calls in db)
skgmstack
rpidru                    &lt;- these RPI functions (and few OPI ones above) are what PL/SQL context switches physically are
rpiswu2
rpidrv
psddr0                  &lt;- most of the functions starting with "p" are PL/SQL execution ones
psdnal
pevm_EXECC
pfrinstr_EXECC
pfrrun_no_tool
pfrrun
plsql_run
peicnt                    &lt;- PL/SQL execution engine entry point
kkxexe                  &lt;- cursor to PL/SQL execution interface
opiexe                   &lt;- OPI call execution dispatcher
kpoal8
opiodr
ttcpip
opitsk
opiino
opiodr
opidrv
sou2o
opimai_real
main
0xe54ffc

193 samples with stack below
__________________
libc.so.1`times         &lt;-- during these 193 samples the execution was in opiexe (note that
opiexe                   -- it's a recursive OPI call execution (as there's the RPI layer below in the
opipls                    -- stack and there's another opiexe below. The lower opiexe is for executing
opiodr                   -- my PL/SQL anonymous block and upper is for the recursive INSERT statement
rpidrus
skgmstack
rpidru
rpiswu2
rpidrv
psddr0
psdnal
pevm_EXECC
pfrinstr_EXECC
pfrrun_no_tool
pfrrun
plsql_run
peicnt
kkxexe
opiexe
kpoal8
opiodr
ttcpip
opitsk
opiino
opiodr
opidrv
sou2o
opimai_real
main
0xe54ffc

3147 Total samples captured

</code></pre>
<p>Anyway, the two stacks which I have copied here don&#8217;t show anything obvious. And also, see that there were total 3147 samples taken when the process was on CPU. But the execution point was in the &#8220;top&#8221; stack during only 193 samples, which is around 6% of all samples.</p>
<p>As the above stacks come from a well-behaving process, let&#8217;s leave their deeper analysis and just compare them to the bad-behaving process. So, I ran the SYSTEM tablespace insert again (you still remember, we were troubleshooting why inserts into system tablespace were slower, right ;)</p>
<p>The top stack in profile was following:</p>
<pre><code># ./dstackprof.sh <strong>859</strong>

DStackProf v1.02 by Tanel Poder ( http://www.tanelpoder.com )
Sampling pid 859 for 5 seconds with stack depth of 100 frames...

<em>[...some output snipped...]
</em>

917 samples with stack below
__________________
libc.so.1`qsort
kd4_entries_overlap
kdb4chk
kd4chk
kcbchk
<strong>kco_blkchk</strong>    <em>&lt;-- what the heck is this? the name suggests that it's block checking of some kind</em>
kcoapl
kcbapl
kcrfw_redo_gen             &lt;-- also note this function, looks like the block change function (kcbchg) has called
kcbchg1_main                   &lt;-- a function to generate redo on its behalf
kcbchg1
ktuchg
ktbchg2
kdtchg
kdtwrp
kdtInsRow
insrowFastPath
insdrvFastPath
inscovexe
insExecStmtExecIniEngine
insexe
opiexe
opipls
opiodr
rpidrus
skgmstack
rpidru
rpiswu2
rpidrv
psddr0
psdnal
pevm_EXECC
pfrinstr_EXECC
pfrrun_no_tool
pfrrun
plsql_run
peicnt
kkxexe
opiexe
kpoal8
opiodr
ttcpip
opitsk
opiino
opiodr
opidrv
sou2o
opimai_real
main
0xe54ffc

6202 Total samples captured

</code></pre>
<p>As most of the top stacks from the bad behaving process were spending their time somewhere in kco_blkchk() function, it rang a bell &#8211; database block checking!</p>
<p>However it was weird as the <em>db_block_checking</em> parameter was set false in my instance:</p>
<pre><code>SQL&gt; show parameter db_block_checking

NAME                                 TYPE        VALUE
------------------------------------ ----------- -----
db_block_checking                    string      FALSE
SQL&gt;

</code></pre>
<p>After a little digging, I found this parameter:</p>
<pre><code>NAME                          VALUE     DESCRIPTION
----------------------------- --------- -------------------------------------------------------------
_db_always_check_system_ts    TRUE      Always perform block check and checksum for System tablespace

</code></pre>
<p>This parameter makes Oracle (since version 8.1.6) do block structure checking in SYSTEM tablespace even if db_block_checking itself is set to FALSE. Considering the criticality of data dictionary objects, it&#8217;s a good idea.</p>
<p>So, now you might say: <em>so what that there&#8217;s a performance issue, you shouldn&#8217;t keep your application tables in SYSTEM tablespace anyway!</em></p>
<p>Well, I have three words for you: <strong>AUD$, FGA_LOG$, DEF$_AQCALL</strong> :)</p>
<p>If you rely on auditing functionality heavily and especially if you use advanced replication, having the crucial tables (and associated LOB segments) in SYSTEM tablespace can mean quite a hit to your CPUs and system throughput. In which cases you probably would want to explore the option of moving these tables out of system tablespace.</p>
<p><strong>Warning</strong>: before moving anything, consult Metalink or Oracle Support as only few strictly defined ways for moving these object are supported. See notes 731908.1 for Audit and 1037317.6 for Advanced Replication.</p>
<p>So, the key point of this article is that I could have started guessing or setting various events or undocumented parameters for probing Oracle here and there, for figuring out what&#8217;s going on. But I used a simple systematic approach instead. When Oracle Wait Interface and V$SESSTAT counters didn&#8217;t help, looked into process stack next. And this gave me the vital clue (kdo_blkchk) that there&#8217;s block checking going on. And if I hadn&#8217;t figured out the cause myself, I would have had some solid evidence for support.</p>
<p>Note that you don&#8217;t have to use DTrace for this kind of stack profiling (even though it is very convenient). On non-Solaris 10 OS&#8217;es I would use whatever safe stack dump mechanism is available (pstack for example) and post-process the results. I probably wouldn&#8217;t take thousands of samples though, but probably 100 or even less.</p>
<p>Also, if you want to get a complete overview of the process execution profile, you would need to do much more complex aggregations, as you&#8217;d need to aggregate &#8220;child&#8221; branches of stack samples with their &#8220;parents&#8221; (currently the samples of a backtrace &#8220;a-&gt;b-&gt;c&#8221; are not accounted in &#8220;a-&gt;b&#8221; for example). But for my case (just checking if there&#8217;s any major difference between execution profiles) this approach was good enough.</p>
<p>By the way (if you haven&#8217;t given up yet :), check the differences in LatchProfX output. The top is from well-behaving process and bottom one is the bad one:</p>
<pre><code>SQL&gt; @latchprofx sid,name,ksllwnam,ksllwlbl 128 % 1000000

-- LatchProfX 1.06 by Tanel Poder ( http://www.tanelpoder.com )

       SID NAME                                KSLLWNAM                                 KSLLWLBL                   Held       Gets  Held %     Held ms Avg hold ms
---------- ----------------------------------- ---------------------------------------- -------------------- ---------- ---------- ------- ----------- -----------
<strong>       128 redo copy                           kcrfw_redo_gen: nowait                   latch                     49399      48937    4.94     674.790        .014
</strong>       128 cache buffers chains                kcbchg1: kslbegin: bufs not pinned                                  4574       2320     .46      62.481        .027
       128 cache buffers chains                kcbchg1: kslbegin: call CR func                                     2101       1445     .21      28.700        .020
       128 cache buffers chains                kcbgcur_2                                                           1323       1060     .13      18.072        .017
       128 cache buffers chains                kcbnew: new latch again                                              269        243     .03       3.675        .015
       128 simulator lru latch                 kcbs_simulate: simulate set              setid                       170        170     .02       2.322        .014
       128 redo allocation                     kcrfw_redo_gen: redo allocation 1        strand #                     68         13     .01        .929        .071
       128 cache buffers chains                kcbrls_2                                                              29         29     .00        .396        .014
       128 cache buffers chains                kcbgtcr: fast path                                                    25         25     .00        .342        .014
       128 cache buffers chains                kcbgcur: fast path (shr)                                               8          8     .00        .109        .014
       128 cache buffers lru chain             kcbzgws_1                                                              7          7     .00        .096        .014

SQL&gt; @latchprofx sid,name,ksllwnam,ksllwlbl 128 % 1000000

-- LatchProfX 1.06 by Tanel Poder ( http://www.tanelpoder.com )

       SID NAME                                KSLLWNAM                                 KSLLWLBL                   Held       Gets  Held %     Held ms Avg hold ms
---------- ----------------------------------- ---------------------------------------- -------------------- ---------- ---------- ------- ----------- -----------
<strong>       128 redo copy                           kcrfw_redo_gen: nowait                   latch                    356465      79839   35.65    6088.422        .076
</strong>       128 cache buffers chains                kcbchg1: kslbegin: bufs not pinned                                  4704       2233     .47      80.344        .036
       128 cache buffers chains                kcbchg1: kslbegin: call CR func                                     3542       1997     .35      60.497        .030
       128 cache buffers chains                kcbgcur_2                                                           1929       1448     .19      32.947        .023
       128 redo allocation                     kcrfw_redo_gen: redo allocation 1        strand #                   1018         34     .10      17.387        .511
       128 shared pool simulator               kglsim_unpin_simhp: fast path                                        713        713     .07      12.178        .017
       128 simulator lru latch                 kcbs_simulate: simulate set              setid                       266        266     .03       4.543        .017
       128 cache buffers chains                kcbnew: new latch again                                              182        171     .02       3.109        .018
       128 cache buffers chains                kcbgtcr: fast path (cr pin)                                           29         29     .00        .495        .017
       128 cache buffers chains                kcbgtcr: fast path                                                    15         15     .00        .256        .017

</code></pre>
<p>See, how the redo copy latches were held for much longer (both total time and average time of individual hold) when the block checking was enabled. This indicates that in addition to keeping the buffer pinned, also a redo copy latch is held during block checking (if not using an IMU transaction). If you use high-throughput advanced replication, think that CPU usage is too high for work done and can&#8217;t get rid of the redo copy latch contention, this may be a reason.</p>
<div class="facebook_like_button"><iframe src="http://www.facebook.com/plugins/like.php?href=http%3A%2F%2Fblog.tanelpoder.com%2F2008%2F09%2F02%2Foracle-hidden-costs-revealed-part2-using-dtrace-to-find-why-writes-in-system-tablespace-are-slower-than-in-others%2F&amp;layout=standard&amp;show-faces=true&amp;width=450&amp;action=like&amp;font=arial&amp;colorscheme=light" scrolling="no" frameborder="0" allowTransparency="true" style="padding: 0px 0px; border:none; overflow:hidden; width:450px; height:70px;"></iframe></div> <img src="http://blog.tanelpoder.com/wp-content/plugins/wordpress-feed-statistics/feed-statistics.php?view=1&post_id=132" width="1" height="1" style="display: none;" /><p><a class="a2a_dd addtoany_share_save" href="http://www.addtoany.com/share_save"><img src="http://blog.tanelpoder.com/wp-content/plugins/add-to-any/share_save_171_16.png" width="171" height="16" alt="Share/Bookmark"/></a> </p>]]></content:encoded>
			<wfw:commentRss>http://blog.tanelpoder.com/2008/09/02/oracle-hidden-costs-revealed-part2-using-dtrace-to-find-why-writes-in-system-tablespace-are-slower-than-in-others/feed/</wfw:commentRss>
		<slash:comments>24</slash:comments>
		</item>
		<item>
		<title>Flexible sampling of any V$ or X$ view with sample.sql</title>
		<link>http://blog.tanelpoder.com/2008/08/26/flexible-sampling-of-any-v-or-x-view-with-samplesql/</link>
		<comments>http://blog.tanelpoder.com/2008/08/26/flexible-sampling-of-any-v-or-x-view-with-samplesql/#comments</comments>
		<pubDate>Tue, 26 Aug 2008 10:35:00 +0000</pubDate>
		<dc:creator>Tanel Poder</dc:creator>
				<category><![CDATA[Internals]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[Performance]]></category>
		<category><![CDATA[Troubleshooting]]></category>

		<guid isPermaLink="false">http://tanelpoder.wordpress.com/?p=126</guid>
		<description><![CDATA[In recent past I&#8217;ve blogged few scripts which use specially crafted ordered nested loop for sampling contents of V$ and X$ views fast, with plain SQL. If you haven&#8217;t read them yet, here are the links: WaitProf LatchProf LatchProfX I wrote the above scripts having special purposes in mind (e.g. profile session waits or latching [...]]]></description>
			<content:encoded><![CDATA[<p>In recent past I&#8217;ve blogged few scripts which use specially crafted ordered nested loop for sampling contents of V$ and X$ views fast, with plain SQL.</p>
<p>If you haven&#8217;t read them yet, here are the links:</p>
<ul>
<li><a href="http://blog.tanelpoder.com/wp-content/plugins/wordpress-feed-statistics/feed-statistics.php?url=aHR0cDovL2Jsb2cudGFuZWxwb2Rlci5jb20vMjAwOC8wNi8wNi9hZHZhbmNlZC1vcmFjbGUtdHJvdWJsZXNob290aW5nLWd1aWRlLXBhcnQtNS1zYW1wbGluZy12LXN0dWZmLXdpdGgtd2FpdHByb2YtcmVhbGx5LWZhc3QtdXNpbmctc3FsLw==" target=\"_blank\">WaitProf</a></li>
<li><a href="http://blog.tanelpoder.com/wp-content/plugins/wordpress-feed-statistics/feed-statistics.php?url=aHR0cDovL2Jsb2cudGFuZWxwb2Rlci5jb20vMjAwOC8wNy8wOS9hZHZhbmNlZC1vcmFjbGUtdHJvdWJsZXNob290aW5nLWd1aWRlLXBhcnQtNy1zYW1wbGluZy1sYXRjaC1ob2xkZXItc3RhdGlzdGljcy11c2luZy1sYXRjaHByb2Yv" target=\"_blank\">LatchProf</a></li>
<li><a href="http://blog.tanelpoder.com/wp-content/plugins/wordpress-feed-statistics/feed-statistics.php?url=aHR0cDovL2Jsb2cudGFuZWxwb2Rlci5jb20vMjAwOC8wNy8yMy9hZHZhbmNlZC1vcmFjbGUtdHJvdWJsZXNob290aW5nLWd1aWRlLXBhcnQtOC1ldmVuLW1vcmUtZGV0YWlsZWQtbGF0Y2gtdHJvdWJsZXNob290aW5nLXVzaW5nLWxhdGNocHJvZngv" target=\"_blank\">LatchProfX</a></li>
</ul>
<p>I wrote the above scripts having special purposes in mind (e.g. profile session waits or latching activity).</p>
<p>Now I introduce a simple but powerful sqlplus script for ad-hoc sampling of any V$ view. It&#8217;s called&#8230;. (drumroll) &#8230;. <a href="http://blog.tanelpoder.com/wp-content/plugins/wordpress-feed-statistics/feed-statistics.php?url=aHR0cDovL3d3dy50YW5lbHBvZGVyLmNvbS9maWxlcy9zY3JpcHRzL3NhbXBsZS5zcWw=" target=\"_blank\">sample.sql</a> :)</p>
<p>When you look into it, the script is actually very simple. It&#8217;s just using power of sqlplus substitution variables, I can pass the sampled column and table names and sampling conditions in to the script dynamically.</p>
<p>The basic syntax is:</p>
<pre><code>@sample column_name table_name filter_condition num_samples

</code></pre>
<p>For example, let say I have a session which executes lots of SQL statements in a loop and I want to have a quick overview of what&#8217;s the TOP SQL statement for a session <em>right now</em>.</p>
<p><span id="more-126"></span></p>
<pre><code>SQL&gt; <strong>@sample  sql_id  v$session  sid=85  100000</strong>

SQL_ID             COUNT
------------- ----------
<strong>axwuqacjaxvfv</strong>      54658
4y1y43113gv8f      13850
8a1pvy4cy8hgv      13590
82hxvr8kxuzjq      13067
0000000000000       1924
0ybwd63u2any5       1276
cvhk2j2gymzhd        410
                     292
g1xapjmt4vm5c        207
330q95smuwnv9        194
cydgw456rs7z0        192
45bv71pwwj176         76
b5cr4hhndmbuf         76
dy0yxpyu2wcbv         36
gdzdu6ukq8vhs         34
8usapcs6k1b7f         20
20f6a85kwud5c         19
27xdav1rg5g14         18
c7g6u4yj5vsq8         16
2c5p3qrtz23rp         15
3kywng531fcxu         10
1yrpz7tw6d9vg          6
dmgp9whyw67x9          6
gsfnqdfcvy33q          4
5rxbazwmcdfaz          3
dwf9sc3v53hv5          1

26 rows selected.

SQL&gt; select sql_text from v$sql where sql_id = '<strong>axwuqacjaxvfv</strong>';

SQL_TEXT
---------------------------------------------------------------------
select /*+ no_parallel_index(t,"STATS$SYSTEM_EVENT_PK") dbms_stats
cursor_sharing_exact use_weak_name_resl dynamic_sampling(0)
no_monitoring no_expand index(t,"STATS$SYSTEM_EVENT_PK") */ count(*)
as nrw,count(distinct sys_op_lbid(53892,'L',t.rowid)) as nlb,null as
ndk,sys_op_countchg(substrb(t.rowid,1,15),1) as clf from
"PERFSTAT"."STATS$SYSTEM_EVENT" t where "SNAP_ID" is not null or
"DBID" is not null or "INSTANCE_NUMBER" is not null or "EVENT" is not
null

</code></pre>
<p>During the sampling period most of the time the session 85 was executing SQL with id <strong>axwuqacjaxvfv</strong>. Using sql_id I can easily look up the statement text from V$SQL as I did above (I&#8217;m running dbms_stats.gather_database_stats in that demo session). Easy!</p>
<p>But <a href="http://blog.tanelpoder.com/wp-content/plugins/wordpress-feed-statistics/feed-statistics.php?url=aHR0cDovL3RhbmVscG9kZXIub3RlcGFkLmNvbS9zY3JpcHQ6c2FtcGxlLnNxbA==" target=\"_blank\">sample.sql</a> is much more flexible than that!</p>
<p>Let say I want to sample multiple columns and have multiple filter conditions. It&#8217;s doable, just make sure that you put multiple parameters between doublequotes that sqlplus wouldn&#8217;t misunderstand the parameter positions and meanings:</p>
<pre><code>SQL&gt; @sample sql_id,event v$session "sid=85 and state='WAITING'" 100000

SQL_ID        EVENT                               COUNT
------------- ------------------------------ ----------
0ybwd63u2any5 db file sequential read             10717
82hxvr8kxuzjq db file sequential read              8431
8xcd951zh9dtm db file scattered read               6467
31t5pdt54fxa7 db file scattered read               5775
82hxvr8kxuzjq row cache lock                       1420
82hxvr8kxuzjq DFS lock handle                       185
31t5pdt54fxa7 row cache lock                        169
82hxvr8kxuzjq library cache pin                     147
8xcd951zh9dtm db file sequential read                76
31t5pdt54fxa7 db file sequential read                73
82hxvr8kxuzjq rdbms ipc reply                        53
4y1y43113gv8f db file sequential read                39
8xcd951zh9dtm row cache lock                         15

13 rows selected.

</code></pre>
<p>The above command just reported what SQL ID&#8217;s <em>waited</em> on which events the most for SID 85. Note that the top waiter was waiting during 10717 samples of total 100000 samples (specified in 4th parameter to sample.sql), which is about 10.7% of total response time during that sampling period.</p>
<p>Starting from 10g we have some very nice new columns in V$SESSION, PLSQL_OBJECT_ID and PLSQL_SUBPROGRAM_ID which show in which PL/SQL object and subobject the execution <em>currently</em> is (of course, if that session is currently executing PL/SQL). Also there are two fields called PLSQL_ENTRY_OBJECT_ID and PLSQL_ENTRY_SUBPROGRAM_ID which show you the top level PL/SQL stored proc which was called (and potentially has called other PL/SQL subroutines&#8230;)</p>
<p>So, I could profile in which PL/SQL stored procs the execution currently is:</p>
<pre><code>SQL&gt; @sample plsql_object_id,plsql_subprogram_id,sql_id v$session sid=85 100000

PLSQL_OBJECT_ID PLSQL_SUBPROGRAM_ID SQL_ID             COUNT
--------------- ------------------- ------------- ----------
                                    72kfm3yxd2d3h      50037
                                    ghzub02f1jmsr      30601
                                    acayttu2pgaz1       5693
           4345                  59 82hxvr8kxuzjq       3609
                                    0000000000000       2305
                                    0ybwd63u2any5       1794
                                    85b0rtskxucxj       1665
                                    5qtt0rzhzfkj9        564
           4345                 103 82hxvr8kxuzjq        543
           4345                 101 82hxvr8kxuzjq        448
           4345                  59 85b0rtskxucxj        433
           4365                  95 acayttu2pgaz1        299
           4365                  95 ghzub02f1jmsr        272
           4345                 109 82hxvr8kxuzjq        238
                                    7ng34ruy5awxq        198
           4258                   4 82hxvr8kxuzjq        146
           5357                  86 5qtt0rzhzfkj9        108
                                                          94
           5357                 134 82hxvr8kxuzjq         63

</code></pre>
<p>The rows where we see only SQL_ID but no PL/SQL object ID&#8217;s mean that Oracle is currently executing/fetching (recursive) SQL.<br />
The rows where we see PL/SQL object IDs <em>and</em> SQL_IDs, tell me that the execution/fetch call of a cursor with that SQL_ID has completed and Oralce&#8217;s running PL/SQL code (and hasn&#8217;t cleared the SQL_ID yet).</p>
<p>You can go to DBA_OBJECTS or DBA_PROCEDURES to map these object IDs back to stored procedure names:</p>
<pre><code>SQL&gt; select owner, object_name, procedure_name
  2  from dba_procedures
  3  where object_id = 4345
  4  and   subprogram_id = 59;

OWNER      OBJECT_NAME                    PROCEDURE_NAME
---------- ------------------------------ ------------------------------
SYS        DBMS_STATS                     GATHER_DATABASE_STATS

SQL&gt;

</code></pre>
<p>Note that historically Oracle has had quite many issues with setting/resetting the sql hash value / sql_id properly so I wouldn&#8217;t bet any of my &#8220;scientific&#8221; analysis of this data. Nevertheless this approach can be useful for getting an overview of which SQLs and PL/SQL subroutines are executed in a session as of now, via SQL.</p>
<p>When you look into <a href="http://blog.tanelpoder.com/wp-content/plugins/wordpress-feed-statistics/feed-statistics.php?url=aHR0cDovL3d3dy50YW5lbHBvZGVyLmNvbS9maWxlcy9zY3JpcHRzL3NhbXBsZS5zcWw=">the script</a> you will also see that I temporarily set two parameters to false, <em>_optimizer_sortmerge_join_enabled</em> and <em>hash_join_enabled</em>. The reason is that as sample.sql can take any table/view name as a parameter, then I can&#8217;t hardcode the underlying X$ table name for USE_NL global hint as I&#8217;ve done for WaitProf and LatchProf. Thus with views like V$SESSION I didn&#8217;t get the nested loop happening in right order. I wanted it to happen against X$KSUSE, but instead a sort merge join was done, which doesn&#8217;t do sampling but scans through inner rowsource only once. So, that&#8217;s why I temporarily simply disable hash join and sort merge join in the session and all Oracle is left with is nested loop mechanism.</p>
<p><strong>NB!</strong> I must warn you about sampling any view! While sampling views like V$SESSION, V$PROCESS and V$SESSION_WAIT is safe, there are some which you should <strong>not</strong> sample, well at least in production. X$KSMSP is first which comes into mind as even a single query against it can grab your shared pool latches for way long time, potentially halting all database activity. Another ones are V$SQL and V$OPEN_CURSOR and V$DB_OBJECT_CACHE and like as traversing through their underlying memory structures (library cache) requires quite a lot of library cache latching so normal user SQL execution could be heavily impacted.</p>
<p>However there&#8217;s an easy way to check whether an access to V$ view uses latches or not. You just need to sample some V$ view in your test environment and use instructions from my <a href="http://blog.tanelpoder.com/wp-content/plugins/wordpress-feed-statistics/feed-statistics.php?url=aHR0cDovL2Jsb2cudGFuZWxwb2Rlci5jb20vMjAwOC8wNy8yMy9hZHZhbmNlZC1vcmFjbGUtdHJvdWJsZXNob290aW5nLWd1aWRlLXBhcnQtOC1ldmVuLW1vcmUtZGV0YWlsZWQtbGF0Y2gtdHJvdWJsZXNob290aW5nLXVzaW5nLWxhdGNocHJvZngv" target=\"_blank\">LatchProfX</a> post for checking the latch usage from another session. Note that LatchProfX doesn&#8217;t show KGX mutex activity, so be aware that in Oracle 11g the V$SQL / X$KGLOB traversal is done under exclusive protection of Library Cache mutex, thus sampling those tables frequently is still not a good idea (and you don&#8217;t really get much out of sampling them thousands of times per second anyway)</p>
<div class="facebook_like_button"><iframe src="http://www.facebook.com/plugins/like.php?href=http%3A%2F%2Fblog.tanelpoder.com%2F2008%2F08%2F26%2Fflexible-sampling-of-any-v-or-x-view-with-samplesql%2F&amp;layout=standard&amp;show-faces=true&amp;width=450&amp;action=like&amp;font=arial&amp;colorscheme=light" scrolling="no" frameborder="0" allowTransparency="true" style="padding: 0px 0px; border:none; overflow:hidden; width:450px; height:70px;"></iframe></div> <img src="http://blog.tanelpoder.com/wp-content/plugins/wordpress-feed-statistics/feed-statistics.php?view=1&post_id=126" width="1" height="1" style="display: none;" /><p><a class="a2a_dd addtoany_share_save" href="http://www.addtoany.com/share_save"><img src="http://blog.tanelpoder.com/wp-content/plugins/add-to-any/share_save_171_16.png" width="171" height="16" alt="Share/Bookmark"/></a> </p>]]></content:encoded>
			<wfw:commentRss>http://blog.tanelpoder.com/2008/08/26/flexible-sampling-of-any-v-or-x-view-with-samplesql/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Advanced Oracle Troubleshooting Guide, Part 8: Even more detailed latch troubleshooting using LatchProfX</title>
		<link>http://blog.tanelpoder.com/2008/07/23/advanced-oracle-troubleshooting-guide-part-8-even-more-detailed-latch-troubleshooting-using-latchprofx/</link>
		<comments>http://blog.tanelpoder.com/2008/07/23/advanced-oracle-troubleshooting-guide-part-8-even-more-detailed-latch-troubleshooting-using-latchprofx/#comments</comments>
		<pubDate>Tue, 22 Jul 2008 16:58:23 +0000</pubDate>
		<dc:creator>Tanel Poder</dc:creator>
				<category><![CDATA[Cool stuff]]></category>
		<category><![CDATA[Internals]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[Performance]]></category>
		<category><![CDATA[Tools]]></category>
		<category><![CDATA[Troubleshooting]]></category>

		<guid isPermaLink="false">http://tanelpoder.wordpress.com/2008/07/23/advanced-oracle-troubleshooting-guide-part-8-even-more-detailed-latch-troubleshooting-using-latchprofx/</guid>
		<description><![CDATA[In my last AOT post I published my LatchProf script which is able to sample detailed latchholder data from V$LATCHHOLDER. Latchprof allows you to drill down into your latching problems at session level (which V$LATCH, V$LATCH_PARENT and V$LATCH_CHILDREN can&#8217;t do). It allows you to get valuable details about individual sessions who are holding a latch [...]]]></description>
			<content:encoded><![CDATA[<p>In my last <a href="http://blog.tanelpoder.com/wp-content/plugins/wordpress-feed-statistics/feed-statistics.php?url=aHR0cDovL2Jsb2cudGFuZWxwb2Rlci5jb20vMjAwOC8wNy8wOS9hZHZhbmNlZC1vcmFjbGUtdHJvdWJsZXNob290aW5nLWd1aWRlLXBhcnQtNy1zYW1wbGluZy1sYXRjaC1ob2xkZXItc3RhdGlzdGljcy11c2luZy1sYXRjaHByb2Yv" target=\"_blank\">AOT post</a> I published my <a href="http://blog.tanelpoder.com/wp-content/plugins/wordpress-feed-statistics/feed-statistics.php?url=aHR0cDovL3d3dy50YW5lbHBvZGVyLmNvbS9maWxlcy9zY3JpcHRzL2xhdGNocHJvZi5zcWw=" target=\"_blank\">LatchProf</a> script which is able to sample detailed latchholder data from V$LATCHHOLDER.</p>
<p>Latchprof allows you to drill down into your latching problems at session level (which V$LATCH, V$LATCH_PARENT and V$LATCH_CHILDREN can&#8217;t do). It allows you to get valuable details about individual sessions who are holding a latch the most, therefore <em>likely</em> contributing to the latch contention problem the most.</p>
<p>However after you have discovered the troublemaking session, then what next? One way forward is looking into V$SESSTAT counters using <a href="http://blog.tanelpoder.com/wp-content/plugins/wordpress-feed-statistics/feed-statistics.php?url=aHR0cDovL2Jsb2cudGFuZWxwb2Rlci5jb20vMjAwNy8xMi8wNi9vcmFjbGUtc2Vzc2lvbi1zbmFwcGVyLXYxMDYtcmVsZWFzZWQv" target=\"_blank\">Snapper</a> tool. Depending on what latch is the problematic one, you would look for different stats like various buffer get stats for cache buffers chains latches and parsing/executing stats when looking into library cache latches. However if those stats look &#8220;normal&#8221;, is there any other way do drill down further?</p>
<p>Yeah, there is and lets look into it!</p>
<p><span id="more-95"></span></p>
<p>A side comment: Nowadays I try to avoid using X$ tables in favor of V$ views (because in my experience in many client environments you don&#8217;t always have easy SYS access or X$ table &#8220;proxy views&#8221; created). Also, if you can get your job done using simple V$ views, then why bother with more complex X$ tables? (yeah, it&#8217;s much cooler, I know ;)</p>
<p>Anyway, here is an exception: <a href="http://blog.tanelpoder.com/wp-content/plugins/wordpress-feed-statistics/feed-statistics.php?url=aHR0cDovL3d3dy50YW5lbHBvZGVyLmNvbS9maWxlcy9zY3JpcHRzL2xhdGNocHJvZnguc3Fs" target=\"_blank\">LatchProfX</a> script, which means Latch holder Profiler eXtended, because it provides some very interesting extended information about latch holders. <strong>It shows <em>why</em> the latch currently held was taken after all</strong>. It uses X$KSUPRLAT instead of V$LATCHHOLDER as the latter doesn&#8217;t externalize all required information. I&#8217;m talking about latch Where statistics.</p>
<p>By &#8220;Where statistics&#8221; I mean that Oracle kernel&#8217;s latching module coders have instrumented their code well enough &#8211; whenever they get a latch, they mark down the code location ID of the latch getter within the latch structure itself.</p>
<p>Let&#8217;s not keep you waiting anymore and show some examples. The output is quite wide so you may need to size your browser window accordingly.</p>
<p>Normally you would run LatchProfX with specific enough parameters to monitor only one or few latches which you know are the troublemakers. However the first example run I do across all instance latches (the third parameter is %) to illustrate the usefulness of LatchProfX result data better.</p>
<p>LatchProfX syntax is similar to the original <a href="http://blog.tanelpoder.com/wp-content/plugins/wordpress-feed-statistics/feed-statistics.php?url=aHR0cDovL2Jsb2cudGFuZWxwb2Rlci5jb20vMjAwOC8wNy8wOS9hZHZhbmNlZC1vcmFjbGUtdHJvdWJsZXNob290aW5nLWd1aWRlLXBhcnQtNy1zYW1wbGluZy1sYXRjaC1ob2xkZXItc3RhdGlzdGljcy11c2luZy1sYXRjaHByb2Yv" target=\"_blank\">LatchProf</a> script, with the addition of <strong>ksllwnam</strong> and <strong>ksllwlbl</strong> field names you can sample. These are exactly the fields (from X$KSLLW) which give us the descriptions of &#8220;where in kernel the latch was aquired from&#8221; opcodes we get from X$KSUPRLAT (see the script itself for syntax and internals).</p>
<p>So (finally), lets run it:</p>
<pre><code>SQL&gt; @latchprofx name,<strong>ksllwnam,ksllwlbl</strong> % % 100000

-- LatchProfX 1.06 by Tanel Poder ( http://www.tanelpoder.com )

NAME                                <strong>KSLLWNAM</strong>                                 <strong>KSLLWLBL</strong>                   Held       Gets  Held %     Held ms Avg hold ms
----------------------------------- ---------------------------------------- -------------------- ---------- ---------- ------- ----------- -----------
<strong>shared pool</strong>                         <strong>kghalo</strong>                                                              9705       5987    <strong>9.71</strong>     185.366        <strong>.031</strong>
ges resource hash list              kjuscl: lock close request                                          2429       2350    2.43      46.394        .020
kks stats                           kksAllocChildStat                                                   2001       1460    2.00      38.219        .026
ges resource hash list              kjrmas1: lookup master node                                         1780       1724    1.78      33.998        .020
<strong>shared pool</strong>                         <strong>kghupr1</strong>                                  <strong>Chunk Header</strong>               1618        891    1.62      30.904        .035
row cache objects                   kqrpre: find obj                                                     988        986     .99      18.871        .019
shared pool simulator               kglsim_unpin_simhp                                                   804         97     .80      15.356        .158
ges process parent latch            kjlalc: lock allocation from lmon                                    779        766     .78      14.879        .019
active service list                 kswsgetso: get service object                                        663        658     .66      12.663        .019
<strong>shared pool</strong>                         <strong>kghalp</strong>                                                               635        628     .64      12.129        .019
ges process parent latch            kjatioc                                                              518        518     .52       9.894        .019
shared pool simulator               kglsim_upd_newhp                                                     402        401     .40       7.678        .019
ges resource hash list              kjrref: find matched resource                                        374        371     .37       7.143        .019
row cache objects                   kqreqd                                                               360        360     .36       6.876        .019
ges process parent latch            kjlrem: detach lock from group                                       235        229     .24       4.489        .020
enqueues                            ksqdel                                                               162        162     .16       3.094        .019
enqueues                            ksqgel: create enqueue                   parent obj                  148        148     .15       2.827        .019
row cache objects                   kqreqd: reget                                                        108        105     .11       2.063        .020
enqueue hash chains                 ksqrcl                                   resource addr               101        101     .10       1.929        .019
<strong>shared pool</strong>                         <strong>kgh_heap_sizes</strong>                                                        99         99     .10       1.891        .019
enqueue hash chains                 ksqgtl3                                  acquiring session            93         93     .09       1.776        .019
ges enqueue table freelist          kjlalc: lock allocation                                               85         83     .09       1.624        .020
<strong>shared pool</strong>                         <strong>kghfre</strong>                                   <strong>Chunk Header</strong>                 74         65     .07       1.413        .022
shared pool simulator               kglsim_scan_lru: scan                                                 69         69     .07       1.318        .019
ges process parent latch            kjucll: closing lock                                                  68         61     .07       1.299        .021
shared pool simulator               kglsim_chg_simhp_free                                                 48         47     .05        .917        .020
ges domain table                    kjdmlad: add a lock to an xid-hashed loc                              42         41     .04        .802        .020
shared pool simulator               kglsim_chg_simhp_noob                                                 41         41     .04        .783        .019
shared pool simulator               kglsim_unpin_simhp: fast path                                         39         39     .04        .745        .019
ksuosstats global area              ksugetosstat                                                          29          1     .03        .554        .554
ges group table                     kjgdgll: move a lock from group lock lis                              13          6     .01        .248        .041
kokc descriptor allocation latch    kokcdlt: regular free                    latch                        11          4     .01        .210        .053
ges enqueue table freelist          kjlfr: remove lock from parent object                                 11         11     .01        .210        .019
kokc descriptor allocation latch    kokcdlt: allocation                      latch                         5          4     .01        .096        .024
cache buffers chains                kcbgtcr: fast path                                                     3          3     .00        .057        .019
ges caches resource lists           kjrchc: cached obj cleanup                                             2          2     .00        .038        .019
process allocation                  ksufap: active procs                                                   2          2     .00        .038        .019
ges group table                     kjgagll: move a lock from process lock l                               2          2     .00        .038        .019
ges domain table                    kjdmlrm: remove a lock from its xid-hash                               1          1     .00        .019        .019
ges caches resource lists           kjchc: clean resource cache                                            1          1     .00        .019        .019
ges resource hash list              kjruch: move resource from cache to free                               1          1     .00        .019        .019
name-service memory objects         kjxgsnep: get a value-block object                                     1          1     .00        .019        .019
active service list                 kswsite: service iterator                                              1          1     .00        .019        .019

43 rows selected.

Elapsed: 00:00:02.06

</code></pre>
<p>What do we see? While LatchProf would just have provided latch holder info about which (child) latch was held and by which SID, then LatchProfX will show you additional <em>why</em> this latch was gotten.</p>
<p>From above example you see that there were various reasons why the shared pool latch was gotten, but the reason which caused the SP latch to be held the most time, was a code location <strong>kghalo</strong>, which means something like Kernel Generic Heap Allocate. Which essentially is the function used fol searching and allocating free space from shared pool. This kind of makes sense as stuff like freeing a chunk and updating heap global stats are simple operations as Oracle already knows the memory location it needs to update, but searching for free chunks for allocation can require scanning through a lot of chunks if there are no suitable ones in freelists and a LRU scan is needed.</p>
<p>Next, there&#8217;s a <strong>kghupr1</strong> location which has caused the shared pool latches held for 1.62% of total sampling time. This location, in function kghupr (un-pin recreatable) needs to take shared pool latch for putting the unpinned chunk back to heap LRU list. An Oracle kernel coder has also specified a label (action description) as &#8220;Chunk Header&#8221; which is seen from KSLLWLBL column. Its in place to state that this kernel location gets shared pool latch as it needs to update a chunk header &#8211; putting a chunk back to LRU list means updating the chunk header (and its neighbor(s) headers).</p>
<p>There are also kgh_heap_sizes and kghfre locations, which update global heap statistics (the stuff you see from v$sgastat for example) and free a chunk in heap respectively.</p>
<p>Wherever you have estabilished which latch is the troublemaker (under too much contention) and you know the &#8220;why&#8221; reason for it, you probably also want to know the sessions contributing to the problem. And LatchProfX can show that:</p>
<pre><code>SQL&gt; @latchprofx sid,name,ksllwnam,ksllwlbl % <strong>"shared pool"</strong> 100000

-- LatchProfX 1.06 by Tanel Poder ( http://www.tanelpoder.com )

       SID NAME                                KSLLWNAM                                 KSLLWLBL                   Held       Gets  Held %     Held ms Avg hold ms
---------- ----------------------------------- ---------------------------------------- -------------------- ---------- ---------- ------- ----------- -----------
       113 shared pool                         kghalo                                                              5038       2787    5.04      92.699        .033
       112 shared pool                         kghalo                                                              4723       2820    4.72      86.903        .031
       113 shared pool                         kghupr1                                  Chunk Header                875        369     .88      16.100        .044
       113 shared pool simulator               kglsim_unpin_simhp                                                   728         48     .73      13.395        .279
       112 shared pool                         kghupr1                                  Chunk Header                623        438     .62      11.463        .026
       112 shared pool simulator               kglsim_unpin_simhp                                                   273         37     .27       5.023        .136
       112 shared pool                         kghalp                                                               232        232     .23       4.269        .018
       113 shared pool simulator               kglsim_upd_newhp                                                     230        230     .23       4.232        .018
       113 shared pool                         kghalp                                                               224        223     .22       4.122        .018
       112 shared pool simulator               kglsim_upd_newhp                                                     194        194     .19       3.570        .018
       112 shared pool                         kgh_heap_sizes                                                        49         49     .05        .902        .018
       113 shared pool                         kgh_heap_sizes                                                        49         49     .05        .902        .018
       112 shared pool                         kghfre                                   Chunk Header                 37         37     .04        .681        .018
       113 shared pool simulator               kglsim_scan_lru: scan                                                 32         32     .03        .589        .018
       112 shared pool simulator               kglsim_scan_lru: scan                                                 28         27     .03        .515        .019
       112 shared pool simulator               kglsim_chg_simhp_free                                                 27         27     .03        .497        .018
       113 shared pool simulator               kglsim_chg_simhp_free                                                 26         25     .03        .478        .019
       113 shared pool                         kghfre                                   Chunk Header                 24         23     .02        .442        .019
       112 shared pool simulator               kglsim_unpin_simhp: fast path                                         23         23     .02        .423        .018
       113 shared pool simulator               kglsim_chg_simhp_noob                                                 22         22     .02        .405        .018
       112 shared pool simulator               kglsim_chg_simhp_noob                                                 18         18     .02        .331        .018
       113 shared pool simulator               kglsim_unpin_simhp: fast path                                         18         18     .02        .331        .018
       112 shared pool sim alloc               kglsim_chk_objlist: alloc                                              1          1     .00        .018        .018
       122 shared pool                         kghfre                                   Chunk Header                  1          1     .00        .018        .018

24 rows selected.

Elapsed: 00:00:01.93
SQL&gt;

</code></pre>
<p>From above you see are 2 main sessions contributing to shared pool latch contention and both are doing lots of kghalo calls, thus allocating memory. When V$SESSTAT for those sessions shows high &#8220;parse count (hard)&#8221; statistic it would be obvious that the heavy shared pool activity is due hard parsing (which is kind of common knowledge as excessive hard parsing very often the reason for shared pool latch contention). However check out another sample with same parameters:</p>
<pre><code>SQL&gt; @latchprofx sid,name,ksllwnam,ksllwlbl % "shared pool" 100000

-- LatchProfX 1.06 by Tanel Poder ( http://www.tanelpoder.com )

       SID NAME                                KSLLWNAM                                 KSLLWLBL                   Held       Gets  Held %     Held ms Avg hold ms
---------- ----------------------------------- ---------------------------------------- -------------------- ---------- ---------- ------- ----------- -----------
         <strong>0</strong> shared pool                         <strong>kghalo</strong>                                                             10714          <strong>5</strong>   10.71     212.137      <strong>42.427</strong>
       122 shared pool                         kghalo                                                              7685         42    7.69     152.163       3.623
       112 shared pool                         kghalo                                                              3874       2305    3.87      76.705        .033
       113 shared pool                         kghalo                                                              3616       2318    3.62      71.597        .031
       112 shared pool                         kghupr1                                  Chunk Header                659        327     .66      13.048        .040
       113 shared pool                         kghupr1                                  Chunk Header                597        353     .60      11.821        .033
       112 shared pool simulator               kglsim_unpin_simhp                                                   416         29     .42       8.237        .284
<em>[...snip...]</em>
       122 shared pool simulator               kglsim_upd_newhp                                                       2          2     .00        .040        .020
       117 shared pool                         kghupr1                                  Chunk Header                  2          2     .00        .040        .020
       122 shared pool                         kghalp                                                                 2          2     .00        .040        .020
       117 shared pool                         kghalp                                                                 1          1     .00        .020        .020

38 rows selected.

Elapsed: 00:00:02.14

SQL&gt;

</code></pre>
<p>What&#8217;s this? Why is there a session with SID=0 which has called kghalo function and held the latch for much longer on average than any other sessions? LatchProfX has detected only 5 latch gets by that session from kghalo location which still have taken the most time in total. And average hold time for that latch is estimated over 42 milliseconds compared to most others which are in microseconds (even though the 3.6 milliseconds average hold time for a latch is quite long too).</p>
<p>One of the reason for such long hold times could be CPU preemption &#8211; a latch holder process was switched off the CPU and it took a while before it got back on it to complete its work and release the latch again. However on my 4 CPU test machine the was plenty of CPU time available so such preemption should have not been the problem. The SID 0 gives a clue. When a new session is started in Oracle, apparently it&#8217;s reported as SID 0 during the session initialization sequence in some places, which includes the memory allocation for things like the v$sesstat and v$session_event arrays and also the session parameter values and set events in some cases. Considering that total size of these structures is quite large in normal shared pool usage context (tens of kB), then if freelists are empty and LRU scan must happen, this inevitably takes longer time &#8211; and shared pool latch is held all that time. (phew, I must write shorter statements).</p>
<p>Anyway, I recommend playing around with this script in a test environment, run different workloads like hard parses, soft parses, physical IOs and buffer gets and you should get quite a good overview of some Oracle internal workings that way. This hopefully also answers how do I know some of the Oracle function names and what&#8217;s their purpose &#8211; the answer is that Oracle itself exposes quite a lot of this information in the X$ tables I&#8217;m using in my script. You just need to experiment, observe and try to figure out what are the real names behind function names like kghalo etc. Fun :)</p>
<p>Few more notes:</p>
<p>If you want to drill down to latch child level, then use LADDR as another field (among SID,NAME etc) in first parameter to LatchProfX (I&#8217;ve described this in <a href="http://blog.tanelpoder.com/wp-content/plugins/wordpress-feed-statistics/feed-statistics.php?url=aHR0cDovL2Jsb2cudGFuZWxwb2Rlci5jb20vMjAwOC8wNy8wOS9hZHZhbmNlZC1vcmFjbGUtdHJvdWJsZXNob290aW5nLWd1aWRlLXBhcnQtNy1zYW1wbGluZy1sYXRjaC1ob2xkZXItc3RhdGlzdGljcy11c2luZy1sYXRjaHByb2Yv" target=\"_blank\">AOT guide part 7</a>).</p>
<p>Do not use <a href="http://blog.tanelpoder.com/wp-content/plugins/wordpress-feed-statistics/feed-statistics.php?url=aHR0cDovL3d3dy50YW5lbHBvZGVyLmNvbS9maWxlcy9zY3JpcHRzL2xhdGNocHJvZi5zcWw=" target=\"_blank\">LatchProf</a>, <a href="http://blog.tanelpoder.com/wp-content/plugins/wordpress-feed-statistics/feed-statistics.php?url=aHR0cDovL3d3dy50YW5lbHBvZGVyLmNvbS9maWxlcy9zY3JpcHRzL2xhdGNocHJvZnguc3Fs" target=\"_blank\">LatchProfX</a> and <a href="http://blog.tanelpoder.com/wp-content/plugins/wordpress-feed-statistics/feed-statistics.php?url=aHR0cDovL3d3dy50YW5lbHBvZGVyLmNvbS9maWxlcy9zY3JpcHRzL3dhaXRwcm9mLnNxbA==" target=\"_blank\">WaitProf</a> when your server&#8217;s CPUs are 100% utilized, for two reasons:</p>
<ol>
<li>All those scripts never sleep during their sampling but spin on CPU &#8211; thus they would make the CPU starvation even worse</li>
<li>More importantly, if all your CPUs are running at 100% utilization, the latch contention you see is probably just a symptom of the CPU starvation itself! The same goes to wait events as well &#8211; when your CPU runqueues are long, some of the CPU runqueue waiting time can be attributed to wait events such IO instead (for example when IO completes, but the process can&#8217;t get onto CPU to &#8220;end&#8221; the IO wait</li>
</ol>
<p>I have updated both LatchProf and LatchProfX do detect whether they&#8217;re running on Oracle 9i or 10g+ and dynamically run different code based on version (on 9i, there is no GETS column in V$LATCHHOLDER).</p>
<div class="facebook_like_button"><iframe src="http://www.facebook.com/plugins/like.php?href=http%3A%2F%2Fblog.tanelpoder.com%2F2008%2F07%2F23%2Fadvanced-oracle-troubleshooting-guide-part-8-even-more-detailed-latch-troubleshooting-using-latchprofx%2F&amp;layout=standard&amp;show-faces=true&amp;width=450&amp;action=like&amp;font=arial&amp;colorscheme=light" scrolling="no" frameborder="0" allowTransparency="true" style="padding: 0px 0px; border:none; overflow:hidden; width:450px; height:70px;"></iframe></div> <img src="http://blog.tanelpoder.com/wp-content/plugins/wordpress-feed-statistics/feed-statistics.php?view=1&post_id=95" width="1" height="1" style="display: none;" /><p><a class="a2a_dd addtoany_share_save" href="http://www.addtoany.com/share_save"><img src="http://blog.tanelpoder.com/wp-content/plugins/add-to-any/share_save_171_16.png" width="171" height="16" alt="Share/Bookmark"/></a> </p>]]></content:encoded>
			<wfw:commentRss>http://blog.tanelpoder.com/2008/07/23/advanced-oracle-troubleshooting-guide-part-8-even-more-detailed-latch-troubleshooting-using-latchprofx/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>Advanced Oracle Troubleshooting Guide, Part 7: Sampling latch holder statistics using LatchProf</title>
		<link>http://blog.tanelpoder.com/2008/07/09/advanced-oracle-troubleshooting-guide-part-7-sampling-latch-holder-statistics-using-latchprof/</link>
		<comments>http://blog.tanelpoder.com/2008/07/09/advanced-oracle-troubleshooting-guide-part-7-sampling-latch-holder-statistics-using-latchprof/#comments</comments>
		<pubDate>Tue, 08 Jul 2008 11:54:10 +0000</pubDate>
		<dc:creator>Tanel Poder</dc:creator>
				<category><![CDATA[Cool stuff]]></category>
		<category><![CDATA[Internals]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[Tools]]></category>
		<category><![CDATA[Troubleshooting]]></category>

		<guid isPermaLink="false">http://tanelpoder.wordpress.com/2008/07/08/advanced-oracle-troubleshooting-guide-part-7-sampling-latch-holder-statistics-using-latchprof/</guid>
		<description><![CDATA[I have been too busy since getting back from vacation, thus no posts for a while. But I hope the waiting was worthwhile as I present you LatchProf, a tool for digging in to latch contention problems &#8211; using plain SQL and sqlplus! As, I&#8217;m still busy, I make it short. LatchProf is a script [...]]]></description>
			<content:encoded><![CDATA[<p>I have been too busy since getting back from vacation, thus no posts for a while. But I hope the waiting was worthwhile as I present you <a href="http://blog.tanelpoder.com/wp-content/plugins/wordpress-feed-statistics/feed-statistics.php?url=aHR0cDovL3d3dy50YW5lbHBvZGVyLmNvbS9maWxlcy9zY3JpcHRzL2xhdGNocHJvZi5zcWw=" target=\"_blank\">LatchProf</a>, a tool for digging in to latch contention problems &#8211; using plain SQL and sqlplus!</p>
<p>As, I&#8217;m still busy, I make it short.</p>
<p>LatchProf is a script similar to <a href="http://blog.tanelpoder.com/wp-content/plugins/wordpress-feed-statistics/feed-statistics.php?url=aHR0cDovL2Jsb2cudGFuZWxwb2Rlci5jb20vMjAwOC8wNi8wNi9hZHZhbmNlZC1vcmFjbGUtdHJvdWJsZXNob290aW5nLWd1aWRlLXBhcnQtNS1zYW1wbGluZy12LXN0dWZmLXdpdGgtd2FpdHByb2YtcmVhbGx5LWZhc3QtdXNpbmctc3FsLw==" target=\"_blank\">WaitProf</a>, only it samples latch holder statistics from V$LATCHHOLDER. As V$LATCHHOLDER contains a SID column (with session ID of a latch holder) it becomes possible to find who is hitting a latch the most (a way to prove that crappy monitoring tools which constantly scan through V$SQL DO cause library cache latch contention themselves).</p>
<p><span id="more-90"></span></p>
<p>Some day I will write longer about this (and check the script itself), but here&#8217;s the syntax:</p>
<pre><code>@latchprof &lt;what&gt; &lt;sid|%&gt; &lt;latch|%|childaddr&gt; &lt;#samples&gt;

</code></pre>
<p>Here are the parameter definitions:</p>
<p>1) Columns to select actually map to V$LATCHHOLDER columns. So you can use SID,NAME,LADDR there for example.<br />
2) SID is either a sid as number or % to see latches held by all SIDs<br />
3) latch_name can be either % for all latches, a LIKE string, like %library% to see all latches with library in their name or the actual latch address! This allows us to monitor only specific child latches we want.<br />
4) Number of samples to take. I normally use 100 000 samples as starting point.</p>
<p>As latchprof (and waitprof for that matter) never sleep, they burn as much CPU resource as they can. This is good for sampling accuracy (well, at least on multiprocessor systems with enough spare CPU capacity), but will consume one of your CPUs. So, you don&#8217;t want to run this for too long at a time ( you could implement a short wait in there with a <a href="http://blog.tanelpoder.com/wp-content/plugins/wordpress-feed-statistics/feed-statistics.php?url=aHR0cDovL2pvbmF0aGFubGV3aXMud29yZHByZXNzLmNvbS8yMDA4LzA2LzE1L2V2ZW50LXNuYXBzaG90cy8=" target=\"_blank\">nice trick</a> by Jonathan Lewis ).</p>
<p>100k samples is usually done in around a second on my test kit (giving 100+ kHz sampling rate :) This will vary by your hardware and Oracle version and what&#8217;s your processes parameter (as V$LATCHHOLDER iterates through process state objects &#8211; the more processes you&#8217;ve got, the more time it takes).</p>
<p>Anyway, let say we have a performance issue and using wait interface we have determined that for a particular session, waits for <em>cache buffers chains</em> latches are unreasonably high.</p>
<p>Now, one way forward would be to identify the database blocks involved in this problem, by looking what blocks are protected by the latch (using X$BH.HLADDR = V$LATCH_CHILDREN.ADDR mapping) and trying to figure out the hottest block by X$BH.TCH touchcount. But where would that lead us? We would find out into which segment that block belongs, but it doesn&#8217;t really help us to understand who&#8217;s the troublemaker causing this issue.</p>
<p>Therefore we use another approach &#8211; sampling V$LATCHHOLDER to see which sessions hold the latches the most:</p>
<p>Just for demo purposes I start with listing general latchholder stats (not broken down by holding SID):</p>
<pre><code>SQL&gt; @<a href="http://blog.tanelpoder.com/wp-content/plugins/wordpress-feed-statistics/feed-statistics.php?url=aHR0cDovL3d3dy50YW5lbHBvZGVyLmNvbS9maWxlcy9zY3JpcHRzL2xhdGNocHJvZi5zcWw=" target=\"_blank\">latchprof</a> name % % 100000

NAME                                 Held       Gets  Held %     Held ms Avg hold ms
------------------------------ ---------- ---------- ------- ----------- -----------
cache buffers chains                21248      17418   21.25     267.725        .015
simulator lru latch                  7192       7169    7.19      90.619        .013
simulator hash latch                   66         66     .07        .832        .013
enqueue hash chains                     4          4     .00        .050        .013
ges resource hash list                  3          3     .00        .038        .013
process allocation                      2          2     .00        .025        .013
library cache                           2          2     .00        .025        .013
name-service namespace bucket           2          2     .00        .025        .013
name-service memory objects             1          1     .00        .013        .013
ges caches resource lists               1          1     .00        .013        .013
library cache pin                       1          1     .00        .013        .013

11 rows selected.

</code></pre>
<p>Yes it looks that cache buffers chains latches are held the most. The Held column here means that during how many samples a latch was held. The Gets column shows roughly how many times that latch was gotten during sampling time.<br />
As we took 100000 samples and during 21248 of them a cache buffers chains latch was held, we can conclude that this latch was busy (Held) 21.25% of the time. And as the sampling lasted about 1.3 seconds that time, we can extrapolate that of the 1.3 seconds, some cache buffers chains latches were in Held state for rougly 267 milliseconds. And as we know the number of gets, then we can compute the average latch hold time in milliseconds. This could help us detect very long latch holds like what can happen when someone is scanning through X$KSMSP or the server is CPU starved. The current estimate of 15 microseconds per latch hold is not that bad (note that with very few samples like 10000 or less, the avg hold time estimates become unreliable).</p>
<p>Anyway, let&#8217;s break down the individual latch holding sessions now:</p>
<pre><code>SQL&gt; @latchprof <strong>sid</strong>,name % % 100000

       SID NAME                                 Held       Gets  Held %     Held ms Avg hold ms
---------- ------------------------------ ---------- ---------- ------- ----------- -----------
        81 cache buffers chains                14916      14915   14.92     187.942        .013
        81 simulator lru latch                  6797       6745    6.80      85.642        .013
        85 cache buffers chains                 6195       2284    6.20      78.057        .034
        85 simulator lru latch                   524        506     .52       6.602        .013
        81 simulator hash latch                  121         85     .12       1.525        .018
       112 name-service namespace bucket           2          2     .00        .025        .013
       100 enqueue hash chains                     2          2     .00        .025        .013
        98 library cache                           2          2     .00        .025        .013
       112 process allocation                      2          2     .00        .025        .013
       112 ges caches resource lists               1          1     .00        .013        .013
       112 enqueues                                1          1     .00        .013        .013
       104 messages                                1          1     .00        .013        .013
       100 cache buffers chains                    1          1     .00        .013        .013
       100 active service list                     1          1     .00        .013        .013
       100 enqueues                                1          1     .00        .013        .013
        85 simulator hash latch                    1          1     .00        .013        .013
       100 library cache                           1          1     .00        .013        .013

17 rows selected.

</code></pre>
<p>We see that sessions 81 and 85 are hitting CBC latches the most. Let say (for demo purposes again) that the SID 85 was the one which experienced bad performance and led us to investigate this issue.</p>
<p>As there are tens of thousands (or even more) of CBC latches in a decent Oracle database, then of course looking only at aggregated latch name level stats is not enough. Do the sessions compete for a single child latch? Or do they just use lots of different CBC latches over time and not collide at all? This is the question where <a href="http://blog.tanelpoder.com/wp-content/plugins/wordpress-feed-statistics/feed-statistics.php?url=aHR0cDovL3d3dy50YW5lbHBvZGVyLmNvbS9maWxlcy9zY3JpcHRzL2xhdGNocHJvZi5zcWw=" target=\"_blank\">LatchProf</a> can help.</p>
<p>Let&#8217;s include latch address (child latch address) into picture and query only &#8220;cache&#8221; related latches. This returns lots of rows, so I normally press CTRL+C after first pageful of results as they are most important (the results are ordered by most busy ones first):</p>
<pre><code>SQL&gt; @latchprof sid,name,<strong>laddr</strong> % <strong>cache</strong> 100000

       SID NAME                           LADDR          Held       Gets  Held %     Held ms Avg hold ms
---------- ------------------------------ -------- ---------- ---------- ------- ----------- -----------
<strong>        81 cache buffers chains           41AACEEC      15061      15017   15.06     186.756        .012
</strong>        85 cache buffers chains           41B2C37C         34         34     .03        .422        .012
        85 cache buffers chains           41B85D64         31         31     .03        .384        .012
        85 cache buffers chains           41BCF7FC         30         30     .03        .372        .012
        85 cache buffers chains           41B415EC         28         28     .03        .347        .012
        85 cache buffers chains           41BCA658         25         25     .03        .310        .012
        85 cache buffers chains           41B4E738         25         25     .03        .310        .012
        85 cache buffers chains           41AE4694         25         25     .03        .310        .012
        85 cache buffers chains           41AB02E0         25          9     .03        .310        .034
        85 cache buffers chains           427FA1F8         24         24     .02        .298        .012
        85 cache buffers chains           427F6E04         22         22     .02        .273        .012
        85 cache buffers chains           41B78D94         22         22     .02        .273        .012
        85 cache buffers chains           41BA2220         21         21     .02        .260        .012
        85 cache buffers chains           41BEDA68         21         21     .02        .260        .012
        85 cache buffers chains           41BC8D1C         21         21     .02        .260        .012
        85 cache buffers chains           41B486BC         20         20     .02        .248        .012
        85 cache buffers chains           42B99698         19         19     .02        .236        .012
        85 cache buffers chains           41AE6EA8         19         14     .02        .236        .017
        85 cache buffers chains           41B1F230         19         19     .02        .236        .012
        85 cache buffers chains           41B09E44         19         17     .02        .236        .014
        85 cache buffers chains           41AE7A88         19         19     .02        .236        .012
        85 cache buffers chains           41B69E98         18         18     .02        .223        .012
[...snip...]
</code></pre>
<p>And voila, we see SID 81 is hitting one single child latch in maniac fashion while SID 85 is behaving more normally, just hitting lots of different CBC latches (you see the number of gets of a single child during 100k sampling doesn&#8217;t exceed 34). Interesting.<br />
So now that we know the child latch address (LADDR), we can refine our query to only monitor that child latch to see if any other sessions could be affected:</p>
<pre><code>SQL&gt; @latchprof sid,name,<strong>laddr</strong> % <strong>41AACEEC</strong> 100000

       SID NAME                           LADDR          Held       Gets  Held %     Held ms Avg hold ms
---------- ------------------------------ -------- ---------- ---------- ------- ----------- -----------
        81 cache buffers chains           41AACEEC      14058      14047   14.06     177.131        .013
        85 cache buffers chains           41AACEEC         18         18     .02        .227        .013

</code></pre>
<p>As of now it looks like noone else except the maniac SID 81 and victim SID 85 contend for the latch (otherwise one would expect to see at least few successful holds against that latch).</p>
<p>So, as SID 81 is so evidently the troublemaker here, let&#8217;s check what it&#8217;s doing:</p>
<pre><code>SQL&gt; select sql_text from v$sql where hash_value = (select sql_hash_value from v$session where sid = 81);

SQL_TEXT
---------------------------------------------------------------------------------------------------------
select count(*) X from kill_cpu connect by n &gt; prior n start with n = 1

1 row selected.

</code></pre>
<p>This is Jonathan Lewis&#8217;es <a href="http://blog.tanelpoder.com/wp-content/plugins/wordpress-feed-statistics/feed-statistics.php?url=aHR0cDovL3d3dy5vcmFmYXEuY29tL21haWxsaXN0L29yYWNsZS1sLzIwMDMvMDUvMTIvMTEyOC5odG0=" target=\"_blank\">kill_cpu script</a> in action, which hammered the same datablock in a tight loop using connect by ( you need to set _old_connect_by_enabled=true on 9i+ for this test to work :)</p>
<p>I will blog about some more use cases and an extended version of LatchProf script in (hopefully) near future.</p>
<p>Note once more that this script only detects latch holders. A latch held the most doesn&#8217;t necessarily mean it&#8217;s the latch waited for the most. You still need to start your diagnosis from wait interface, using <a href="http://blog.tanelpoder.com/wp-content/plugins/wordpress-feed-statistics/feed-statistics.php?url=aHR0cDovL2Jsb2cudGFuZWxwb2Rlci5jb20vMjAwNy8xMi8wNi9vcmFjbGUtc2Vzc2lvbi1zbmFwcGVyLXYxMDYtcmVsZWFzZWQv" target=\"_blank\">Snapper</a> or <a href="http://blog.tanelpoder.com/wp-content/plugins/wordpress-feed-statistics/feed-statistics.php?url=aHR0cDovL2Jsb2cudGFuZWxwb2Rlci5jb20vMjAwOC8wNi8wNi9hZHZhbmNlZC1vcmFjbGUtdHJvdWJsZXNob290aW5nLWd1aWRlLXBhcnQtNS1zYW1wbGluZy12LXN0dWZmLXdpdGgtd2FpdHByb2YtcmVhbGx5LWZhc3QtdXNpbmctc3FsLw==" target=\"_blank\">WaitProf</a> for example. Only if these tools show significant latch waits, you should drill down into latches further. And as <a href="http://blog.tanelpoder.com/wp-content/plugins/wordpress-feed-statistics/feed-statistics.php?url=aHR0cDovL3d3dy5vcmFwZXJmLmNvbS8=" target=\"_blank\">Anjo Kolk</a> recently (very well) said, latch contention is a symptom, not the problem. The root cause is somewhere else (in the application likely), once you fix that, the latch contention will go away itself.</p>
<p><em><strong>Update:</strong> I have published a more detailed latch profiling tool (latchprofX) which uses X$ tables for getting the extended data and can give even more detailed info about the latch holders. Read this article plus links in it:</em></p>
<p><a href="http://blog.tanelpoder.com/wp-content/plugins/wordpress-feed-statistics/feed-statistics.php?url=aHR0cDovL2Jsb2cudGFuZWxwb2Rlci5jb20vMjAwOS8wOC8yNy9sYXRjaC1jYWNoZS1idWZmZXJzLWNoYWlucy1sYXRjaC1jb250ZW50aW9uLWEtYmV0dGVyLXdheS1mb3ItZmluZGluZy10aGUtaG90LWJsb2NrLw==" target=\"_blank\">http://blog.tanelpoder.com/2009/08/27/latch-cache-buffers-chains-latch-contention-a-better-way-for-finding-the-hot-block/</a></p>
<div class="facebook_like_button"><iframe src="http://www.facebook.com/plugins/like.php?href=http%3A%2F%2Fblog.tanelpoder.com%2F2008%2F07%2F09%2Fadvanced-oracle-troubleshooting-guide-part-7-sampling-latch-holder-statistics-using-latchprof%2F&amp;layout=standard&amp;show-faces=true&amp;width=450&amp;action=like&amp;font=arial&amp;colorscheme=light" scrolling="no" frameborder="0" allowTransparency="true" style="padding: 0px 0px; border:none; overflow:hidden; width:450px; height:70px;"></iframe></div> <img src="http://blog.tanelpoder.com/wp-content/plugins/wordpress-feed-statistics/feed-statistics.php?view=1&post_id=90" width="1" height="1" style="display: none;" /><p><a class="a2a_dd addtoany_share_save" href="http://www.addtoany.com/share_save"><img src="http://blog.tanelpoder.com/wp-content/plugins/add-to-any/share_save_171_16.png" width="171" height="16" alt="Share/Bookmark"/></a> </p>]]></content:encoded>
			<wfw:commentRss>http://blog.tanelpoder.com/2008/07/09/advanced-oracle-troubleshooting-guide-part-7-sampling-latch-holder-statistics-using-latchprof/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
	</channel>
</rss>
