Search Results for: latchprof latchprofx

Troubleshooting high CPU usage with poor-man’s stack profiler – in a one-liner!

Here’s an example of a quick’n'dirty way of profiling stack traces on your command line. This is an example from Solaris (but the script should work on Linux too plus other Unixes with minor modifications).

I created a problem case below, based on a case I once troubleshooted at a client site. Note that they had set optimizer_mode = FIRST_ROWS in their database and the optimized came up with a very inefficient execution plan for the select from DBA_LOCK_INTERNAL view below:

SQL> ALTER SESSION SET optimizer_mode = FIRST_ROWS;

Session altered.

SQL> SET TIMING ON

SQL> SELECT * FROM dba_lock_internal;
...
... the output data stripped ...
...

927 rows selected.

Elapsed: 00:23:27.14

It took over 23 minutes to return 927 rows from DBA_LOCK_INTERNAL!

I ran Snapper to see where the time is spent then:

SQL> @snapper4 all 5 1 222
Sampling SID 222 with interval 5 seconds, taking 1 snapshots...

-- Session Snapper v4.00 BETA - by Tanel Poder ( http://blog.tanelpoder.com ) - Enjoy the Most Advanced Oracle Troubleshooting Script on the Planet! :)

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    SID, USERNAME  , TYPE, STATISTIC                                                 ,         DELTA, HDELTA/SEC,    %TIME, GRAPH       , NUM_WAITS,  WAITS/SEC,   AVERAGES
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    222, SYS       , STAT, non-idle wait count                                       ,             1,        .19,         ,             ,          ,           ,          ~ per execution
    222, SYS       , TIME, DB CPU                                                    ,       6028084,      1.17s,   117.2%, [@@@@@@@@@@],          ,           ,
    222, SYS       , TIME, sql execute elapsed time                                  ,       6032677,      1.17s,   117.3%, [##########],          ,           ,
    222, SYS       , TIME, DB time                                                   ,       6032677,      1.17s,   117.3%, [##########],          ,           ,          ~ unaccounted time
    222, SYS       , WAIT, library cache: mutex X                                    ,             6,     1.17us,      .0%, [          ],         1,        .19,        6us average wait

--  End of Stats snap 1, end=2013-02-14 21:30:45, seconds=5.1

---------------------------------------------------------------------------------------------
Active% | SQL_ID          | SQL_CHILD | EVENT                               | WAIT_CLASS
---------------------------------------------------------------------------------------------
   100% | c884zcqpv9y5h   | 0         | ON CPU                              | ON CPU

--  End of ASH snap 1, end=2013-02-14 21:30:45, seconds=5, samples_taken=50

The query is apparently 100% on CPU with no significant waits. Normally I would expect some other metrics to pop up in snapper output in such high-CPU cases, like thousands of session logical reads per second, sorts (rows) showing millions of rows sorted per second or parse count (hard) number in hundreds or thousands per second. These are the “usual suspects”.

But this time none of these additional metrics were incremented by the session. So it’s time to systematically drill down by other means.

[Read more...]

IOUG Select Journal Editor’s Choice Award 2011

In May I received the IOUG Select Journal Editor’s Choice Award for my Systematic Oracle Latch Contention Troubleshooting article where I introduced my LatchProfX tool for advanced drilldown into complex latch contention problems (thanks IOUG and John Kanagaraj!).

As the relevant IOUG webpage hasn’t been updated yet, I thought to delay this announcement until the update was done – but I just found an official enough announcement (press release) by accident from Reuters site:

Woo-hoo! :-)

The article itself is here:

Thanks to IOUG crew, John Kanagaraj and everyone else who has read, used my stuff and given feedback! :-)

An index of my TPT scripts

A lot of people have asked me whether there’s some sort of index or “table of contents” of my TPT scripts (there’s over 500 scripts in the tpt_public.zip file – http://tech.e2sn.com/oracle-scripts-and-tools )

I have planned to create such index for years, but never got to it. I probably never will :) So a good way to extract the descriptions of some scripts is this (run the command in the directory where you extracted my scripts to):

Note: the single and double-quotes may get messed up when the browser tries to be smart and replace them with nicer looking characters (which Unix doesn’t recognize then). When copying & pasting this command, make sure that the single & double-quotes are the regular ones Unix shell can accept):

$ grep -i Purpose: *.sql | awk -F: ‘{ printf(“%20s %-50s\n”, $1, $3) }’
            bhla.sql      Report which blocks are in buffer cache, protected by a cache
         bufprof.sql      Display buffer gets done by a session and their reason
            calc.sql      Basic calculator and dec/hex converter       
        channels.sql      Report KSR channel message counts by channel endpoints
        curheaps.sql      Show main cursor data block heap sizes and their contents
             dba.sql      Convert Data Block Address (a 6 byte hex number) to file#, block#
             ddl.sql      Extracts DDL statements for specified objects
              df.sql  Show Oracle tablespace free space in Unix df style
        diag_sid.sql      Display current Session Wait info            
        diag_sid.sql      An easy to use Oracle session-level performance snapshot utility
           disco.sql      Generates commands for disconnecting selected sessions
     getplusparm.sql      get sqlplus parameter value (such linesize, pagesize, sqlcode,
            grpn.sql      Quick group by query for aggregating Numeric columns
            hash.sql      Show the hash value, SQL_ID and child number of previously
             i2h.sql      Advanced Oracle Troubleshooting Seminar demo script
              im.sql      Display In-Memory Undo (IMU) buffer usage    
            init.sql Initializes sqlplus variables for 156 character terminal
       kglbroken.sql                                                   
       kglbroken.sql      Report broken kgl locks for an object this can be used for 
            kill.sql      Generates commands for killing selected sessions
              la.sql      Show which latch occupies a given memory address and its stats
     lastchanged.sql      Detect when a datablock in table was last changed
       latchprof.sql      Perform high-frequency sampling on V$LATCHHOLDER
   latchprof_old.sql      Perform high-frequency sampling on V$LATCHHOLDER
      latchprofx.sql      Perform high-frequency sampling on V$LATCHHOLDER
              lh.sql      Show latch holding SIDs and latch details from V$LATCHHOLDER
             lhp.sql      Perform high-frequency sampling on V$LATCHHOLDER
            lhpx.sql      Perform high-frequency sampling on V$LATCHHOLDER
     lotshparses.sql      Generate Lots of hard parses and shared pool activity 
    lotshparses2.sql      Generate Lots of hard parses and shared pool activity 
        lotslios.sql      Generate Lots of Logical IOs for testing purposes
        lotspios.sql      Generate Lots of Physical IOs for testing purposes
     lotssparses.sql      Generate Lots of soft parses and library cache/mutex activity 
    lotssparses2.sql      Generate Lots of soft parses and library cache/mutex activity 
       mutexprof.sql      Display KGX mutex sleep history from v$mutex_sleep_history
       nonshared.sql      Print reasons for non-shared child cursors from v$sql_shared_cursor
      nonshared2.sql      Show the reasons why more child cursors were created instead of
      ostackprof.sql      Take target process stack samples and show an execution profile
            pmem.sql      Show process memory usage breakdown – lookup by process SPID
     pmem_detail.sql      Show process memory usage breakdown details – lookup by process SPID
        prefetch.sql      Show KCB layer prefetch                      
          pvalid.sql      Show valid parameter values from V$PARAMETER_VALID_VALUES
        rowcache.sql      Show parent rowcache entries mathcing an object name
              rs.sql      Display available Redo Strands               
               s.sql      Display current Session Wait and SQL_ID info (10g+)
          sample.sql      Sample any V$ view or X$ table and display aggregated results
      sampleaddr.sql      High-frequency sampling of contents of a SGA memory address
             ses.sql      Display Session statistics for given sessions, filter by
            ses2.sql      Display Session statistics for given sessions, filter by
        sgastatx.sql      Show shared pool stats by sub-pool from X$KSMSS
            smem.sql      Show process memory usage breakdown – lookup by session ID
     smem_detail.sql      Show process memory usage breakdown details – lookup by session ID
         snapper.sql      An easy to use Oracle session-level performance snapshot utility
     snapper3.15.sql      An easy to use Oracle session-level performance snapshot utility
      snapper_v1.sql      An easy to use Oracle session-level performance snapshot utility
      snapper_v2.sql      An easy to use Oracle session-level performance snapshot utility
            stat.sql      Execute SQL statement in script argument and report basic
              sw.sql      Display current Session Wait info            
             sw2.sql      Display current Session Wait info            
             swg.sql      Display given Session Wait info grouped by state and event
             swo.sql      Display current Session Wait info            
          topsql.sql      Show TOP SQL ordered by user-provided criteria
            usql.sql      Show another session’s SQL directly from library cache
           usqlx.sql      Show another session’s SQL directly from library cache
        waitprof.sql      Sample V$SESSION_WAIT at high frequency and show resulting 
              xb.sql      Explain a SQL statements execution plan with execution 
            xde2.sql      Describe X$ tables, column offsets and report indexed fixed table
              xm.sql      Explain a SQL statements execution plan directly from library cache
             xma.sql      Explain a SQL statements execution plan directly from library cache
            xmai.sql      Explain a SQL statements execution plan with execution 
             xms.sql      Explain your last SQL statements execution plan with execution 
            xmsh.sql      Explain a SQL statements execution plan with execution 
            xmsi.sql      Explain a SQL statements execution plan with execution

 

New versions of LatchProf and LatchProfX for latch contention troubleshooting and tuning

The LatchProf and LatchProfX scripts allow you to be more systematic with latch contention troubleshooting and tuning. No more guesswork is needed as these scripts give you exact session IDs and in this version also SQLIDs of the troublemaking applications.

You can download the new versions here:

Example output (with SQLID info) is below:

SQL> @latchprof name,sid,sqlid % % 100000

-- LatchProf 1.21 by Tanel Poder ( http://www.tanelpoder.com )

NAME                                       SID SQLID               Held       Gets  Held %     Held ms Avg hold ms
----------------------------------- ---------- ------------- ---------- ---------- ------- ----------- -----------
cache buffers chains                       133 3jbwa65aqmkvm        349        348     .35      14.169        .041
simulator lru latch                        133 3jbwa65aqmkvm         51         51     .05       2.071        .041
row cache objects                          133 3jbwa65aqmkvm          5          5     .01        .203        .041
cache buffers chains                        24                        5          5     .01        .203        .041
cache buffers chains                       149 3jbwa65aqmkvm          3          3     .00        .122        .041
resmgr group change latch                   33 147a57cxq3w5y          2          2     .00        .081        .041
cache buffers chains                         9 5raw2bzx227wp          2          1     .00        .081        .081
In memory undo latch                       149 f3y38zthh270n          2          1     .00        .081        .081
active checkpoint queue latch                5                        2          1     .00        .081        .081
cache buffers chains                       149 75621g9y3xmvd          2          2     .00        .081        .041
cache buffers chains                         9 gvgdv2v90wfa7          2          2     .00        .081        .041
cache buffers chains                        33 75621g9y3xmvd          2          2     .00        .081        .041
checkpoint queue latch                       5                        1          1     .00        .041        .041
ksuosstats global area                       8                        1          1     .00        .041        .041
cache buffers lru chain                    133 3jbwa65aqmkvm          1          1     .00        .041        .041
multiblock read objects                    155 75ju2gn3s8009          1          1     .00        .041        .041
resmgr group change latch                    9 0w2qpuc6u2zsp          1          1     .00        .041        .041
resmgr group change latch                   33 apgb2g9q2zjh1          1          1     .00        .041        .041
resmgr group change latch                  133 apgb2g9q2zjh1          1          1     .00        .041        .041
space background task latch                 17                        1          1     .00        .041        .041
cache buffers chains                       149 5raw2bzx227wp          1          1     .00        .041        .041
cache buffers chains                        33 5raw2bzx227wp          1          1     .00        .041        .041
cache buffers chains                        33 05s4vdwsf5802          1          1     .00        .041        .041
cache buffers chains                        31 0yas01u2p9ch4          1          1     .00        .041        .041
cache buffers chains                         9 41zu158rqf4kf          1          1     .00        .041        .041
In memory undo latch                        33 0bzhqhhj9mpaa          1          1     .00        .041        .041
In memory undo latch                        31 gvgdv2v90wfa7          1          1     .00        .041        .041
In memory undo latch                         9 gvgdv2v90wfa7          1          1     .00        .041        .041
simulator lru latch                        149 3jbwa65aqmkvm          1          1     .00        .041        .041
row cache objects                          133 5yq51dtyc6qf2          1          1     .00        .041        .041
SQL memory manager workarea list la        133 3jbwa65aqmkvm          1          1     .00        .041        .041
enqueues                                   141                        1          1     .00        .041        .041
row cache objects                          132                        1          1     .00        .041        .041

33 rows selected.

LatchProf scripts allow you to easily identify which session and SQLID (or sqlhash in 9i) cause the latch(es) to be held the most.

Let’s check what’s the most “latch-holding” SQL reported by LatchProf:

SQL> @sqlid 3jbwa65aqmkvm

HASH_VALUE  CH# SQL_TEXT
---------- ---- ------------------------------------------------------------------------------------------------------------------------------------------------------
1432996723    0 SELECT O.ORDER_ID, LINE_ITEM_ID, PRODUCT_ID, UNIT_PRICE, QUANTITY, ORDER_MODE, ORDER_STATUS, ORDER_TOTAL, SALES_REP_ID, PROMOTION_ID, C.CUSTOMER_ID,
 CUST_FIRST_NAME, CUST_LAST_NAME, CREDIT_LIMIT, CUST_EMAIL, ORDER_DATE FROM ORDERS O , ORDER_ITEMS OI, CUSTOMERS C WHERE O.ORDER_ID = OI.ORDER_ID AND
 O.CUSTOMER_ID = C.CUSTOMER_ID AND O.ORDER_STATUS <= 4

If you want to read more about the capabilities of LatchProf and LatchProfX, go here:

Note that the latest version (v1.21) also fixes a problem with Oracle 11.2 where the script execution plan order was wrong, causing the sampling to not happen in correct order. I added a NO_TRANSFORM_DISTINCT_AGG hint to disable a new transformation happening in 11.2 to make the scripts behave correctly…

KGL simulator, shared pool simulator and buffer cache simulator – what are these?

If you have queried v$sgastat on recent Oracle versions (by which I mean 9i and above) you probably have seen allocations for some sort of simulators in Oracle instance. Here’s an example:

SQL> select * from v$sgastat where lower(name) like '%sim%' order by name;

POOL         NAME                            BYTES
------------ -------------------------- ----------
shared pool  kglsim alloc latch area          1700
shared pool  kglsim alloc latches               68
shared pool  kglsim count of pinned he        9248
shared pool  kglsim free heap list             204
shared pool  kglsim free obj list              204
shared pool  kglsim hash table                4104
shared pool  kglsim hash table bkts        2097152
shared pool  kglsim heap                    635536
shared pool  kglsim latch area                1700
shared pool  kglsim latches                     68
shared pool  kglsim main lru count           87040
shared pool  kglsim main lru size           174080
shared pool  kglsim object batch            909440
shared pool  kglsim pin list arr               816
shared pool  kglsim recovery area             2112
shared pool  kglsim sga                      22092
shared pool  kglsim size of pinned mem       18496
shared pool  ksim client list                   84
shared pool  log_simultaneous_copies           480
shared pool  sim cache nbufs                   640
shared pool  sim cache sizes                   640
shared pool  sim kghx free lists                 4
shared pool  sim lru segments                  640
shared pool  sim segment hits                 1280
shared pool  sim segment num bufs              640
shared pool  sim state object                   48
shared pool  sim trace buf                    5140
shared pool  sim trace buf context             120
shared pool  sim_knlasg                       1200
shared pool  simulator hash buckets          16512
shared pool  simulator latch/bucket st        4608

31 rows selected.

See, a bunch of “kgl sim” and then just “sim” allocations.

… or sometimes you can see latch contention on following latches:

SQL> select name from v$latch where name like '%sim%';

NAME
-------------------------------------------------------
ksim membership request latch
simulator lru latch
simulator hash latch
sim partition latch
shared pool simulator
shared pool sim alloc

6 rows selected.

Again, there seems to be some “simulation” work going on in Oracle instance.

So what are these simulators about?

[Read more...]

latch: cache buffers chains latch contention – a better way for finding the hot block

Here’s a treat for Oracle performance professionals and geeks who are looking for more systematic ways for cache buffers chains (CBC) latch contention troubleshooting. Cache buffers chains latches are taken when a process wants to walk through a cache buffer hash chain, looking if the block with required DBA (data block address) is in buffer cache. If the block happens to be in cache, then in most cases it has to be pinned first before use and unpinned after use, to make sure no-one else can perform an incompatible operation on that block at the same time. The modification of pin structures (pinning/unpinning) is also protected by CBC latches.

CBC latch contention can happen for multiple reasons, but one reason is that there is some really hot block in a SMP system with high number of CPUs (or CMT system with high number of threads like Sun T-series servers). Sometimes there happen to be multiple moderately hot blocks “under” the same CBC latch, which can result in latch contention again.

Traditionally DBAs used to look up the child latch address from V$SESSION_WAIT, sql_trace output or ASH and then look up all buffers protected by that latch from X$BH, using HLADDR column (HLADDR stands for Hash Latch Address). I also have a script for that, bhla.sql (Buffer Headers by Latch Address), which reports me all blocks currently in buffer cache, “under” that particular latch and the corresponding data block addresses and object names:

Another LatchProfX use case

Riyaj Shamsudeen wrote an excellent article about systematic latch contention troubleshooting.

Especially if the latch contention problem is ongoing, looking into system wide stats (like v$latch.sleep columns) is not the best idea in busy systems. This may sometimes lead you to fixing the wrong problem.

This is because sometimes the latch contention is not caused by some system wide inefficiency but rather by one or few sessions.

The right approach would be to measure the following things:

  1. Which latch (and exact child latches) the problem session is waiting for (query v$session_wait, ASH, sql_trace)
  2. Why is the problem session trying to get that latch so often (query v$sesstat counters for problem session and check execution plan if only single/few statements executed)
  3. Why this latch is busy: who’s holding it (query v$latchholder, LatchProf)
  4. Why is that someone holding the latch so much (query v$sesstat counters for that session or run LatchProfX)

Riyaj used this approach and successfully found out the troublemaker causing heavy library cache latch contention. He used my LatchProfX tool for part of the diagnosis (and I’m very happy to see that my advanced oracle troubleshooting tools find real world use in hands of other troubleshooters/tuners too!).

Here’s an excerpt from latchprofx output (from a little test case I put together), showing which session is holding which library cache latch how much, hold mode (shared/exclusive) and also the actual reason (function name) why the latch is held:

SQL> @latchprofx sid,name,laddr,ksllwnam,ksllwlbl,hmode 159 % 100000

-- LatchProfX 1.08 by Tanel Poder ( http://www.tanelpoder.com )

 SID NAME           LADDR            KSLLWNAM  KSLLWLBL  HMODE        Held  Gets  Held %  Held ms Avg hold ms
---- -------------- ---------------- --------- --------- ---------- ------ ----- ------- -------- -----------
 159 library cache  00000003A93513E0 kglic     child     exclusive   10044  9314   10.04  301.320        .032
 159 library cache  00000003A9351340 kglic     child     exclusive    9895  8458    9.90  296.850        .035
 159 library cache  00000003A9351660 kglic     child     exclusive    9761  8440    9.76  292.830        .035
 159 library cache  00000003A9351700 kglic     child     exclusive    9737  8924    9.74  292.110        .033
 159 library cache  00000003A9351480 kglic     child     exclusive    8999  7765    9.00  269.970        .035
 159 library cache  00000003A93515C0 kglic     child     exclusive    8553  7832    8.55  256.590        .033
 159 library cache  00000003A9351520 kglic     child     exclusive    6852  6828    6.85  205.560        .030

7 rows selected.

kglic means Kernel Generic Library cache Iterate Chain (AFAIK), it’s the function which is executed when you access most X$KGL tables. And this is where Riyaj got the evidence that the latch contention comes from inefficient scanning of library cache, caused by a session running queries against V$SQL_PLAN views with a bad plan.

So, check out Riyaj’s articele, my LatchProfX script, then my Advanced Oracle Troubleshooting Seminar :)

Oracle hidden costs revealed, Part2 – Using DTrace to find why writes in SYSTEM tablespace are slower than in others

I have written two posts in one, about a performance issue with writes in system tablespace and introduction of a little DTrace stack sampling script.

Have you noticed that DML on tables residing in SYSTEM tablespace is slower than tables in other tablespaces?

Here’s an example, I’ll create two similar tables, one in USERS tablespace, other in SYSTEM, and inset into the first one (Oracle 10.2.0.3 on Solaris x64):

SQL> create table t1(a int) tablespace USERS;

Table created.

SQL> create table t2(a int) tablespace SYSTEM;

Table created.

SQL> exec for i in 1..100000 loop insert into t1 values(i); end loop;

PL/SQL procedure successfully completed.

Elapsed: 00:00:03.09

Insert into table in USERS tablespace took 3 seconds.

Ok, let’s commit and flush dirty buffers that they wouldn’t get on the way of next insert.

SQL> commit;

Commit complete.

Elapsed: 00:00:00.02

SQL> alter system checkpoint; -- checkpointing to flush dirty buffers from previous inserts

System altered.

Elapsed: 00:00:01.34
SQL>

And now to the insert into the SYSTEM tablespace table:

SQL> exec for i in 1..100000 loop insert into t2 values(i); end loop;

PL/SQL procedure successfully completed.

Elapsed: 00:00:08.98
SQL>

What?! The same insert took 3 times longer, almost 9 seconds?

Fine! Let’s troubleshoot it!

[Read more...]

Flexible sampling of any V$ or X$ view with sample.sql

In recent past I’ve blogged few scripts which use specially crafted ordered nested loop for sampling contents of V$ and X$ views fast, with plain SQL.

If you haven’t read them yet, here are the links:

I wrote the above scripts having special purposes in mind (e.g. profile session waits or latching activity).

Now I introduce a simple but powerful sqlplus script for ad-hoc sampling of any V$ view. It’s called…. (drumroll) …. sample.sql :)

When you look into it, the script is actually very simple. It’s just using power of sqlplus substitution variables, I can pass the sampled column and table names and sampling conditions in to the script dynamically.

The basic syntax is:

@sample column_name table_name filter_condition num_samples

For example, let say I have a session which executes lots of SQL statements in a loop and I want to have a quick overview of what’s the TOP SQL statement for a session right now.

[Read more...]

Advanced Oracle Troubleshooting Guide, Part 8: Even more detailed latch troubleshooting using LatchProfX

In my last AOT post I published my LatchProf script which is able to sample detailed latchholder data from V$LATCHHOLDER.

Latchprof allows you to drill down into your latching problems at session level (which V$LATCH, V$LATCH_PARENT and V$LATCH_CHILDREN can’t do). It allows you to get valuable details about individual sessions who are holding a latch the most, therefore likely contributing to the latch contention problem the most.

However after you have discovered the troublemaking session, then what next? One way forward is looking into V$SESSTAT counters using Snapper tool. Depending on what latch is the problematic one, you would look for different stats like various buffer get stats for cache buffers chains latches and parsing/executing stats when looking into library cache latches. However if those stats look “normal”, is there any other way do drill down further?

Yeah, there is and lets look into it!

[Read more...]