Advanced Oracle Troubleshooting Guide, Part 6: Understanding Oracle execution plans with os_explain

Get ready for some more adventures in Oracle process stack!

Before proceeding though, please read this post about safety of different stack sampling approaches.

I have had few non-trivial Oracle troubleshooting cases, related to query hangs and bad performance, where I’ve wanted to know where exactly in execution plan the current execution is.
Remember, Oracle is just another program executing instructions clustered in functions on your server, so stack sampling can help out here as well.

So, I was looking into the following stack trace taken from an Oracle 10.1 database on Solaris SPARC, running a SQL with this execution plan.

$ cat pstack.txt
 ------------------------------------------------------------------------
 0000000101eca6e4 kdirfrs (ffffffff78eca5d8, 1, 0, 2, 86521ee00, 0) + 24
 0000000102502adc qerixFetchFastFullScan (ffffffff78eca3e0, 10248ebc0, 0, 1, 86521ebf8, fffd) + 33c
 0000000102558720 qergiFetch (ffffffff78ecaca8, 10248ebc0, ffffffff7fff5ee8, 10449dba0, 1, 86bb6bb38) + 80
 000000010248edd8 qerjoFetch (45, 10248ebc0, ffffffff7fff5ee8, ffffffff78ecad60, 86bb6bc08, 114) + 118
 000000010248eeb8 qerjoFetch (7ffe, 10248ebc0, ffffffff7fff5fe8, ffffffff78ecb5e8, 7f399b6f8, 7ffe) + 1f8
 00000001025171b8 rwsfcd (8177747a8, 10248e1e0, ffffffff7fff6168, 7fff, f0, 10449dba0) + 78
 000000010248e4dc qeruaFetch (745c0ab40, 10248e1e0, ffffffff7fff6168, ffffffff78ecb620, 10449dba8, 8177747a8) + 11c
 000000010248d840 qervwFetch (1d, 1d, ffffffff7fff6238, 7fff, 7cfc87c50, 10449d000) + a0
 00000001025171b8 rwsfcd (7876e8d18, 10248e1e0, ffffffff7fff63b8, 7fff, c0, 10449dba0) + 78
 000000010248e4dc qeruaFetch (817d1e438, 10248e1e0, ffffffff7fff63b8, ffffffff78e7d318, 10449dba8, 7876e8d18) + 11c
 000000010248d840 qervwFetch (1d, 1d, ffffffff7fff6488, 7fff, 82355a348, 10449d000) + a0
 00000001025171b8 rwsfcd (7454b0408, 10249e4e0, ffffffff7fff6608, 7fff, c0, 10449dba0) + 78
 00000001024a4620 qerhjFetch (86b1d64d8, 0, 0, 1, ffffffff78e7df08, 0) + 300
 000000010248f99c qerjotFetch (78d8ea1d8, 0, 0, 1, ffffffff78e7e8d0, 10449dba0) + dc
 000000010248eeb8 qerjoFetch (1, 10248ebc0, ffffffff7fff6838, ffffffff78e7f2b0, 7d53bb730, 1) + 1f8
 000000010248eeb8 qerjoFetch (1, 10248ebc0, ffffffff7fff6938, ffffffff7b9b2f78, 74b306ba0, 7fff) + 1f8
 000000010248d840 qervwFetch (5, 5, ffffffff7fff6a08, 7fff, 80b0faa18, 10449d000) + a0
 00000001025171b8 rwsfcd (865e988f8, 10249e4e0, ffffffff7fff6b88, 7fff, c0, 10449dba0) + 78
 00000001024a4620 qerhjFetch (7a57ae350, 10249e4e0, ffffffff7fff6d88, 1, ffffffff7bb89f00, 0) + 300
 00000001025171b8 rwsfcd (823642298, 10249e4e0, ffffffff7fff6d88, 7fff, 6f0, 10449dba0) + 78
 00000001024a4620 qerhjFetch (751158588, 102517980, 7511587f0, 1, ffffffff7b9b5090, 0) + 300
 000000010251d1f0 qersoFetch (94, 10506adf8, 10449d000, ffffffff7b9b53b8, 799854d60, 7511587f0) + 350
 0000000101aa6f24 opifch2 (7, f, 150, 1, 104400, 1050685e8) + a64
 0000000101aa6384 opifch (5, 2, ffffffff7fff79f8, 105000, 0, 10434c0a8) + 44
 0000000101ad81ec opipls (104000, 10434c, 1, ffffffff7bba3e6a, 0, 140010) + f4c
 00000001002d0058 opiodr (6, 10506ae10, 10434cfb0, 10506a, 105000, 104000) + 598
 00000001002d4ec0 rpidrus (ffffffff7fff8820, 105067f18, 105068860, ffffffff7bb5f560, 4a8c, 200000) + a0
 0000000102f615e4 skgmstack (ffffffff7fff8a48, ffffffff7f87cf8f, ffffffff7fff898f, 1002d4e20, ffffffff7fff8a70, acc01800) + a4
 00000001002d504c rpidru (ffffffff7fff9140, 10422b000, 10422a918, 104229598, 410, 82) + ac
 00000001002d4808 rpiswu2 (0, 104556000, ffffffff7fff8b88, 2, 104556418, ffffffff7fff92c0) + 1a8
 00000001002d61cc rpidrv (a, ffffffff7fff9044, 9, ffffffff7fff90c0, ffffffff7fff9140, 1002d5180) + 62c
 0000000102797f90 psddr0 (104400, 86e2fc628, ffffffff7fff92c0, 9, 1050769d8, 1050769d8) + 1f0
 0000000102798e04 psdnal (ffffffff7fffa0b8, ffffffff7fffa258, 105000, ffffffff7bc5eb00, 7f7aa86d0, 40) + 184
 000000010376d268 pevm_BFTCHC (0, 7f7aa86d0, 50, ffffffff7bb5f560, ffffffff7bc5eb00, 0) + 188
 000000010373dff4 pfrinstr_FTCHC (10, 15d0000000000000, ffffffff7bb5f5c8, ffffffff7bb5f560, 4a8c, ffffffff7bb66330) + b4
 00000001037362c8 pfrrun_no_tool (ffffffff7bb5f560, 779953684, ffffffff7bb5f5c8, 10457c9d8, 2001, 2001) + 48
 00000001037372d0 pfrrun (ffffffff7bb5f5c8, 200000, 0, 200000, ffffffff7bb5f560, 779abb110) + 2f0
 0000000103783374 plsql_run (ffffffff7bb55408, 1, 0, ffffdfff, ffffffff7fffa0b8, 0) + 274
 0000000103722554 peicnt (ffffffff7fffa0b8, 105068860, 6, ffffffff7fff9f28, 41d8, 1050685e8) + d4
 000000010327b784 kkxexe (105000, 104000, 105068, 104296000, 1050685e8, ffffffff7bb5f560) + 284
 0000000101ad0228 opiexe (4, ffffffff7bc68ee8, ffffffff7fffab00, 0, 0, ffffffff7bc70480) + 33c8
 0000000101a4c0a8 kpoal8 (40008, 1, ffffffff7fffd890, 0, 0, 4) + 648
 00000001002d0058 opiodr (14, 10506ae10, 10434ce70, 10506a, 105000, 104000) + 598
 0000000102cded94 ttcpip (105071450, 18, ffffffff7fffd890, ffffffff7fffcb88, 104229c98, ffffffff7fffcb84) + 694
 00000001002cd3e8 opitsk (1002cf000, 1, 0, ffffffff7fffd9e8, 105071450, 105071458) + 428
 0000000101aaf564 opiino (105070000, 1050683c0, 0, 0, f5, 105070290) + 404
 00000001002d0058 opiodr (4, 10506ae10, 10434c920, 10000, 105071, 105000) + 598
 00000001002cc174 opidrv (0, 4, 10506a, 105071450, 0, 3c) + 354
 00000001002c9828 sou2o (ffffffff7fffe6b8, 3c, 4, ffffffff7fffe698, 104aa6000, 104aa6) + 48
 00000001002a7b34 main (2, ffffffff7fffe798, ffffffff7fffe7b0, 0, 0, 100000000) + 94
 00000001002a7a7c _start (0, 0, 0, 0, 0, 0) + 17c

Not too encouraging, huh?

So, let’s run this stack trace through my os_explain script:

$ ./os_explain pstack.txt
   SELECT FETCH:
    SORT: Fetch
     HASH JOIN: Fetch
    * HASH JOIN: Fetch
     * VIEW: Fetch
        NESTED LOOP OUTER: Fetch
         NESTED LOOP OUTER: Fetch
          NESTED LOOP JOIN: Fetch
           HASH JOIN: Fetch
          * VIEW: Fetch
             UNION-ALL: Fetch
            * VIEW: Fetch
               UNION-ALL: Fetch
              * NESTED LOOP OUTER: Fetch
                 NESTED LOOP OUTER: Fetch
                  GRANULE ITERATOR: Fetch
                   INDEX: FetchFastFullScan
                    kdirfrs

Now this is much more understandable!

All my script did was:

  • remove the bottom part of the stack not relevant to plan execution
  • reverse the order of stack trace lines for better human readability
  • translating Query Execution Rowsource (qer) function prefixes to their corresponding names using info provided in Metalink note 175982.1

Easy :)

So, how to read this?

First, by now it should be obvious that in Oracle, each rowsource operator (the different lines you see in SQL execution plans) is actually just a function inside Oracle kernel. These are the row source functions, starting with “qer”. QER stands for Query Execution Rowsource as far as I know.

Whenever an Oracle process fetches data from a cursor, it calls opifch2() which in turn calls the root rowsource function in execution plan. In my case that function was qersoFetch and my os_explain script just substituted the “qerso” part with SORT (as per the Metalink note I mentioned above). The first child function of qersoFetch was qerhjFetch, which is a hash join rowsource, and so on. Note that os_explain prefixes some lines with an asterisk (*), this indicates that the output of given function is in turn filtered by a filter operation (the same filter ops what you normally see in the bottom of DBMS_XPLAN explained plans).

So, logically you can imagine an execution plan as a tree of Oracle functions:

  • For a SELECT query the OPI fetch function (opifch2) would be the root.
  • Various join and union functions like qerhjFetch (HASH JOIN) and qerjotFetch (NESTED LOOPS JOIN) would be branches.
  • The leaves would always be some sort of access path functions like qertbFetch (TABLE ACCESS FULL) or qerixFetch ( INDEX UNIQUE / FULL / RANGE SCAN ).

But physically, an execution plan is just a memory structure in subheap 6 of a child cursor (x$kglcursor.kglobhd6), which has a bunch of rowsource function opcodes in it.
During plan execution Oracle process “just” traverses through those opcodes, looks up the corresponding rowsource function starting address using a lookup table and calls it. That function does its task (probably calls other rowsource functions recursively) and returns to caller.

Note that many rowsource functions are designed to be cascading, being able to do only the work needed for returning a small subset of rows and return only few rows at a time, as opposed to the whole resultset.
This is a very good thing as rows can be cascaded, or pipelined back to parent functions as rows become available. For example a table fetch only fetches a handful of rows (and not the whole table) at a time and returns these “up” for further processing. Also, a nested loop join is able to pass matching rows “up” from the moment first matches are found, again there’s no need to perform the join on full dataset first before returning first rows.
This also means that there is no need to store the whole intermediate resultset somewhere in memory before passing it up to parent function; instead we just revisit that branch of execution plan tree whenever we need more rows from it. And the os_explain script shows you exactly in which execution branch the execution currently is.

Addition: I will elaborate on how to match the execution plan with stack trace in an upcoming post – it’s too much material for an introductory post.

So, cascading rowsources allow us to “incrementally” execute plans involving large datasets, without need to keep the intermediate resultsets in memory. On the other hand, few rowsource operators in your execution plan, like SORT, can not return any rows up before all its children’s rows are processed. With SORT (and aggregate operations which also use SORT) you just have to process all the source rows before returning any meaningful result back. You can’t just go through only half of the source data, order it and start returning rows in hope that the rest of the rows should have not been returned as first in the order. This is where the SQL cursor workareas come into play for such operations.

SORT, HASH and BITMAP rowsources can allocate SQL workareas for them, while others can’t. This can easily be identified from execution plan statistics of following sample query:

SQL> select /*+ gather_plan_statistics */ owner, count(*) from dba_source group by owner;

OWNER                            COUNT(*)
------------------------------ ----------
WKSYS                                8988
HR                                     34
[...some output snipped...]
SYS                                129299
WMSYS                                 704

25 rows selected.

SQL> select * from table(dbms_xplan.display_cursor(null,null,'MEMSTATS LAST'));

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------
SQL_ID  dcp37kxt02m9f, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ owner, count(*) from dba_source
group by owner

Plan hash value: 114136443

----------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name            | Starts | E-Rows | A-Rows |   A-Time   |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------------
|   1 |  HASH GROUP BY                |                 |      1 |    593K|     25 |00:00:18.24 |   821K|   821K| 5213K (0)|
|   2 |   VIEW                        | DBA_SOURCE      |      1 |    593K|    595K|00:00:16.84 |       |       |          |
|   3 |    UNION-ALL                  |                 |      1 |        |    595K|00:00:15.06 |       |       |          |
|*  4 |     FILTER                    |                 |      1 |        |    595K|00:00:10.88 |       |       |          |
|*  5 |      HASH JOIN                |                 |      1 |    595K|    595K|00:00:08.50 |   884K|   884K| 1316K (0)|
|*  6 |       HASH JOIN               |                 |      1 |   6527 |   6292 |00:00:00.12 |   870K|   870K| 1179K (0)|
|   7 |        TABLE ACCESS FULL      | USER$           |      1 |     98 |     98 |00:00:00.01 |       |       |          |
|*  8 |        HASH JOIN              |                 |      1 |   6527 |   6292 |00:00:00.09 |   909K|   909K| 1181K (0)|
|   9 |         INDEX FULL SCAN       | I_USER2         |      1 |     98 |     98 |00:00:00.01 |       |       |          |
|* 10 |         INDEX FAST FULL SCAN  | I_OBJ2          |      1 |   6527 |   6292 |00:00:00.04 |       |       |          |
|  11 |       INDEX FAST FULL SCAN    | I_SOURCE1       |      1 |    595K|    595K|00:00:01.56 |       |       |          |
|  12 |      NESTED LOOPS             |                 |      0 |      1 |      0 |00:00:00.01 |       |       |          |
|* 13 |       INDEX FULL SCAN         | I_USER2         |      0 |      1 |      0 |00:00:00.01 |       |       |          |
|* 14 |       INDEX RANGE SCAN        | I_OBJ4          |      0 |      1 |      0 |00:00:00.01 |       |       |          |
|  15 |     NESTED LOOPS              |                 |      1 |      1 |      0 |00:00:00.01 |       |       |          |
|  16 |      NESTED LOOPS             |                 |      1 |      1 |      0 |00:00:00.01 |       |       |          |
|  17 |       NESTED LOOPS            |                 |      1 |      1 |      0 |00:00:00.01 |       |       |          |
|* 18 |        INDEX FAST FULL SCAN   | I_OBJ2          |      1 |      6 |      0 |00:00:00.01 |       |       |          |
|* 19 |        FIXED TABLE FIXED INDEX| X$JOXFS (ind:1) |      0 |      1 |      0 |00:00:00.01 |       |       |          |
|* 20 |       INDEX RANGE SCAN        | I_USER2         |      0 |      1 |      0 |00:00:00.01 |       |       |          |
|  21 |      TABLE ACCESS CLUSTER     | USER$           |      0 |      1 |      0 |00:00:00.01 |       |       |          |
|* 22 |       INDEX UNIQUE SCAN       | I_USER#         |      0 |      1 |      0 |00:00:00.01 |       |       |          |
----------------------------------------------------------------------------------------------------------------------------

You see how the HASH operations do have their last Mem columns populated, therefore those rowsource functions did allocate a SQL workarea for them. Others like NESTED LOOPS joins and data access rowsources did not have any SQL workarea memory allocated as they are completely cascading.

Note that os_explain can also read the stack from STDIN as seen in example below. Also the -a option will tell os_explain to show all functions in the stack and not only the execution plan ones and their children.

Command:

SQL> alter session set statistics_level=typical;

Session altered.

SQL> select avg(length(text)) from dba_source where owner = 'SYS';

AVG(LENGTH(TEXT))
-----------------
       125.032127

Plan:

---------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name            | E-Rows |  OMem |  1Mem | Used-Mem |
---------------------------------------------------------------------------------------------------
|   1 |  SORT AGGREGATE                     |                 |      1 |       |       |          |
|   2 |   VIEW                              | DBA_SOURCE      |  17972 |       |       |          |
|   3 |    UNION-ALL                        |                 |        |       |       |          |
|*  4 |     FILTER                          |                 |        |       |       |          |
|   5 |      NESTED LOOPS                   |                 |        |       |       |          |
|   6 |       NESTED LOOPS                  |                 |  18056 |       |       |          |
|*  7 |        HASH JOIN                    |                 |    198 |   909K|   909K| 1193K (0)|
|   8 |         INDEX FULL SCAN             | I_USER2         |     98 |       |       |          |
|   9 |         NESTED LOOPS                |                 |    198 |       |       |          |
|  10 |          TABLE ACCESS BY INDEX ROWID| USER$           |      1 |       |       |          |
|* 11 |           INDEX UNIQUE SCAN         | I_USER1         |      1 |       |       |          |
|* 12 |          INDEX RANGE SCAN           | I_OBJ5          |    198 |       |       |          |
|* 13 |        INDEX RANGE SCAN             | I_SOURCE1       |     93 |       |       |          |
|  14 |       TABLE ACCESS BY INDEX ROWID   | SOURCE$         |     91 |       |       |          |
|  15 |      NESTED LOOPS                   |                 |      1 |       |       |          |
|* 16 |       INDEX FULL SCAN               | I_USER2         |      1 |       |       |          |
|* 17 |       INDEX RANGE SCAN              | I_OBJ4          |      1 |       |       |          |
|  18 |     NESTED LOOPS                    |                 |      1 |       |       |          |
|  19 |      NESTED LOOPS                   |                 |      1 |       |       |          |
|  20 |       NESTED LOOPS                  |                 |      1 |       |       |          |
|  21 |        TABLE ACCESS BY INDEX ROWID  | USER$           |      1 |       |       |          |
|* 22 |         INDEX UNIQUE SCAN           | I_USER1         |      1 |       |       |          |
|* 23 |        INDEX RANGE SCAN             | I_OBJ5          |      1 |       |       |          |
|* 24 |       FIXED TABLE FIXED INDEX       | X$JOXFS (ind:1) |      1 |       |       |          |
|* 25 |      INDEX RANGE SCAN               | I_USER2         |      1 |       |       |          |
---------------------------------------------------------------------------------------------------

Stack:

$ pstack 23740 | ./os_explain -a
   main
    ssthrdmain
     opimai_real
      sou2o
       opidrv
        opiodr
         opiino
          opitsk
           ttcpip
            opiodr
             kpoal8
              SELECT FETCH:
               GROUP BY SORT: Fetch
                VIEW: Fetch
                 UNION-ALL: Fetch
                * FILTER DEFINITION: FetchOutside
                   UNION-ALL: RowProcedure
                    VIEW: RowProcedure
                     qesaAggNonDistSS.
                      evaopn2
                       evalen
                        lxsCntChar

The FILTER rowsources (not talking about filter predicates on normal rowsources here) can make things more complicated though as they can introduce their own logic and loops into the execution plan (for running correlated subqueries etc).

By the way, see what happens when I run exactly the same query with rowsource level statistics collection enabled:

SQL> alter session set statistics_level=all;

Session altered.

SQL> select avg(length(text)) from dba_source where owner = 'SYS';

AVG(LENGTH(TEXT))
-----------------
       125.032127

$ pstack 23740 | ./os_explain -a
   main
    ssthrdmain
     opimai_real
      sou2o
       opidrv
        opiodr
         opiino
          opitsk
           ttcpip
            opiodr
             kpoal8
              SELECT FETCH:
               QUERY EXECUTION STATISTICS: Fetch
                GROUP BY SORT: Fetch
                 QUERY EXECUTION STATISTICS: Fetch
                  VIEW: Fetch
                   QUERY EXECUTION STATISTICS: Fetch
                    UNION-ALL: Fetch
                     QUERY EXECUTION STATISTICS: Fetch
                    * QUERY EXECUTION STATISTICS: Fetch
                       FILTER DEFINITION: FetchOutside
                        QUERY EXECUTION STATISTICS: Fetch
                         NESTED LOOP JOIN: Fetch
                          QUERY EXECUTION STATISTICS: Fetch
                           QUERY EXECUTION STATISTICS: SnapStats
                            sltrgftime64
                             gettimeofday
                              __kernel_vsyscall

Every rowsource is wrapped into a QUERY EXECUTION STATISTICS wrapper, which’ task is just to count the number of rows sent “up” to parents in the tree, logical IOs and also rowsource timing info whenever an internal counter (set by _rowsource_statistics_sampfreq parameter) wraps.

This is just an intro to Oracle execution plan internals and troubleshooting. Hopefully you don’t need this technique too often, however it has helped me to successfully pinpoint the root cause of a problem in few non-trivial database problems.

Note that in Oracle 11g there’s an excellent new feature called Real-time SQL Monitoring. It allows you to monitor the progress of currently running SQL statements. For serially running statements the monitoring kicks in for a statement after it’s used total 5 seconds of CPU or IO time (this time is controlled by _sqlmon_threshold parameter, but for PX the monitoring is always enabled). After that you can query V$SQL_MONITOR and V$SQL_PLAN_MONITOR for seeing how much time/rows/LIOs that session has spent executing a statement. You can see these details even at the SQL execution plan line level. Alternatively you can use DBMS_SQLTUNE. REPORT_SQL_MONITOR function to get this info nicely formatted. Greg Rahn has written a good blog entry about it.

However it’s important to note that both DBMS_SQLTUNE and V$SQL_MONITOR/V$SQL_PLAN_MONITOR use requires you to have Oracle Tuning Pack license which in turn requires Oracle Diagnostic Pack license. Details are in Oracle 11g Licensing Guide.

So another, low level approach for real-time monitoring will still be handy even after 11g becomes mainstream. In a future post I will be showing how to measure the progress and get execution profile of your plan by aggregating multiple stack traces and also some cool opportunities with DTrace.

This entry was posted in Cool stuff, Oracle and tagged , , , , , . Bookmark the permalink.

24 Responses to Advanced Oracle Troubleshooting Guide, Part 6: Understanding Oracle execution plans with os_explain

  1. Richa says:

    Thanks for this posting.

    Very interesting idea – getting the execution plan from an OS trace.

    I would love to see the source…

    Re DTrace – is this available for Linux? or something equivalent?

    Have you thought about extending this (getting the exec plan from OS process) with timing for each part from the OS?

    I would expect using DTrace with this tool would allow that?

  2. tanelp says:

    Hi Rich,

    Well, the source code is available here: http://blog.tanelpoder.com/files/scripts/tools/unix/os_explain

    os_explain is just a shell script using sed & awk. It is doing just some string manipulation, search & replace etc.

    DTrace allows accounting timing information for each rowsource function. On other platforms this could be doable with platform profilers (like OProfile in Linux).

    However the script fills its original purpose as it is right now – I wanted to see where in execution plan the execution was mostly looping and taking a few os_explain samples did show it to me. I yet have to write a post about how to map os_explain and v$sql_plan execution plan lines together…

  3. Jeff says:

    Hi Tanel, great blog. Thanks for the information.

    I’ve been using dbms_xplan.display_cursor quite a bit lately. I love the information it provides but I’m having trouble finding any detailed documentation on it. I’m specifically looking for more information on what the Omem and 1mem columns represent. Can you shed any light on the subject or point me in the right direction?

    Thanks!

    BTW – Any update on training dates in the US for 2008?

  4. Karl Arao says:

    Hi Tanel,

    I was just wondering if there’s a way to find out which part of the explain plan is currently running or executing.

    In your example, Plan hash value: 114136443, it would be nice if the

    I know that the part of

    * 8 | HASH JOIN | | 1 | 6527 | 6292 |00:00:00.09 | 909K| 909K| 1181K (0)|

    is currently running and..
    has a predicate of some value..

    Are there any v$ or x$ views available?

    Or any tracing facilities I should use?

    Thanks man :) see you in Singapore 04/2009..

  5. Tanel Poder says:

    Hi Karl,

    Well os_explain gives some indication. If you happen to have any physical IOs, you can map these (using v$session_wait.p1/p2) back to the segment name accessed. And if you know the segment name, you’ll know the execution plan line(s) currently executed as access path rowsources show the table/index name in exec plan as well.

    More options would be to trace consistent reads (with event 10200) and map the data block addresses back to segment names, but this it not a widely used technique (not for production).

    I would start from pstack as its possible to translate this back to execution plan.

    If you still can reproduce this, send me couple of process stacks when this query is executing and the execution plan, I can reply you with answer and explanation how I worked it out..

  6. Fred Coulson says:

    Hi Tanel,

    I thought I’d report a small glitch I encountered when working with os_explain. It appears that on RHEL 4 the version of ksh that comes installed by default is restrained by the old 1024 array size, which means that having

    MAX_ARRAY=4095

    inside the f_lifo procedure causes the script to fail with a “subscript out of range” error.

    There are a couple of workarounds…

    1. (easiest) Change the shell invocation at the top of the script to bash– your script runs fine in bash, and bash does not have such a tiny array ubound:

    #!/bin/bash

    2. Upgrade to a modern version of ksh

    Also, I noticed that the version of ksh installed with RHEL 5 is newer and does not suffer from this limitation.

    This is not a problem with your script per se, but I thought I’d offer this in case anyone else tries using your script on Redhat Linux and is baffled like I was.

    BTW, thanks for bringing your training seminar to the NYOUG earlier this month! It really took my understanding to a new level.

  7. Tanel Poder says:

    Hi Fred,

    Thanks for this info. I tested this on RHEL5 and Solaris only, thus didn’t hit this error.

    I don’t have RHEL4 handy right now, what if you set the MAX_ARRAY to 1024 (or 1023), does it work in the RHEL4’s ksh?

  8. Fred Coulson says:

    Tanel,

    Yes, setting MAX_ARRAY to 1023 (giving an array size of 1024) is another workaround.

    The RHEL4 ksh is actually “pdksh”. The man page for pdksh says, “Array indicies are currently limited to the range 0 through 1023, inclusive.”

    The RHEL5 ksh is called “ksh-20060214″, which has array bounds of “0 through 1,048,575″. RHEL4 users can get it from the CentOS repository:

    http://isoredirect.centos.org/centos/5/

    It works fine in RHEL4, but installing it involves upgrading various packages like gcc, glibc, libgcc, etc to their RHEL5 versions. I’d hesitate to do that in a production environment, needless to say. Easier just to use bash.

  9. Tanel Poder says:

    Thanks Fred for the info!

    I knew Linux distros had PDKSH instead of KSH, but didn’t know that RHEL5 one has the “real” KSH. It may be related with Sun open sourcing Solaris and related software?

  10. Tanel, awesome stuff, thanks for posting!

    I am curious – I see from your examples and from my own tests that pstack is only capturing current stack operations. You mention that you have a loop to cycle through while the query is running. This begs a couple questions:
    1) A loop is going to, effectively, sample the process stack. Correct?
    2) What would it take to stitch together stack traces for os_explain? Not sure if just deleting the “header” lines would do it.

    Thanks again for your intelligent and comprehensive contributions.

  11. Great article.

    In this particular script is there an advantage in using DTRACE over using truss
    to trace @ OS Level ?

  12. Tanel Poder says:

    Thanks Ravi,

    In this script (extracting stack traces) truss can’t be used at all, as far as I know truss can’t extract snapshots of stack traces like pstack does.

    But if comparing truss’es a.out:func* notation with DTrace, I would always use DTrace if its available and usable. DTrace is safe by design (although you *can* cause trouble with it even in non-destructive mode) but truss had problems in past IIRC, such target processes missing signals sent to them and failing to exit.

  13. Hi Tanel,

    Thanks for taking time to reply.

    Here is a what I tried with truss instead of Dtrace. Just save the pain of reversing the pstack output. Let me know your thoughts

    Also DTrace could have system wide impact where as truss would just impact the process , correct me here if I am wrong.

    Run truss on
    SQL> alter system flush shared_pool;

    System altered.

    SQL> select a from t2;

    A
    ———-
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10

    10 rows selected.

    SQL> exit

    This is the output I get from truss
    cat 1truss.out | grep ‘oracle|qer’ |cut -d2-20 (output edited)

    ^I -> oracle:qerstStart(0x407b48a20, 0x1, 0x0, 0x0)$
    ^I -> oracle:qerstInitialize(0x0, 0x0, 0x0, 0x1)$
    ^I oracle:qertbStart(0x40e484da8, 0x1, 0x16, 0x30)$
    ^I -> oracle:qerstStart(0x40e484f20, 0x1, 0x12, 0x6)$
    ^I -> oracle:qerstInitialize(0x0, 0x0, 0x0, 0x1)$
    ^I oracle:qerixStart(0x413faafd0, 0x1, 0x16, 0x30)$
    ^I -> oracle:qerixGetKey(0x413faafd0, 0x40e484f90, 0x6, 0xffffffffffffffff)$
    ^I oracle:qerixGetKey(0x413faafd0, 0x413faab30, 0x6, 0x1)$
    ^I <- oracle:qerixGetKey() = 1$
    ^I <- oracle:qerixStart() = 0x413faafd0$
    ^I <- oracle:qertbStart() = 0x40e484f20$
    ^I oracle:qertbRelease(0x40e484da8, 0x0, 0x0, 0x0)$
    ^I -> oracle:qerixRelease(0x413faafd0, 0x0, 0x0, 0x0)$
    ^I <- oracle:qerixRelease() = 0$
    ^I oracle:qerstFetch(0x407b48a20, 0x1025145c0, 0xffffffff7ffe7520, 0x1)$
    ^I -> oracle:qertbFetchByRowID(0x40e484da8, 0x1009af1e0, 0xffffffff7ffe6e20, 0x1)$
    ^I -> oracle:qerstFetch(0x40e484f20, 0x0, 0xffffffff7ffe6d54, 0x1)$
    ^I -> oracle:qerixtFetch(0x413faafd0, 0x0, 0xffffffff7ffe65ec, 0x1)$
    ^I -> oracle:qerixGetRowid(0x4dd394de0, 0x12, 0xffffffff7c056758, 0x4dd394dde)$
    ^I <- oracle:qerixGetRowid() = 46$
    ^I <- oracle:qerixtFetch() = 0$
    ^I oracle:qertbGetPartitionNumber(0xffffffff7ffe6d14, 0x40e484ea0, 0xffffffff7c057190,
    0xffffffff7c05bde0)$

    I need to run sed to get these converted to oracle calls.

    Sorry If I am taking up too much space, I can share the whole script if you think this does the job.

    Thanks

    Ravi

  14. Haris says:

    Hi Tanel

    Excellent script.

    But the script does not work for HP-UX 11i v2

    The errors
    $ pstack 10020 | ./os_explain
    ./os_explain[138]: ==: unknown test operator
    f_lifo[10]: array: subscript out of range

    Is there a plan to deply in HP-UX?

    Regards
    Haris

  15. Tanel Poder says:

    Ravi,

    Truss does a different job, with pstack I took the sampling approach. I sample 10-100 times and then profile which stacks I see the most, to see where most time is spent.

    With truss or DTrace approach also some post-processing would need to be done if you want to know what plan operation takes most time. I have done this in DTrace..

    Haris,

    Please download the script again, or just change the MAX_ARRAY=4095 to MAX_ARRAY=1023 in it, I suspect your shell can’t handle larger arrays (also on RHEL 4 and before there seems to be such problem)

  16. Tanel Poder says:

    Haris, if the output doesn’t make sense on HP-UX then please send a sample pstack output, maybe the columns are ordered there differently than on linux and solaris.

  17. Ivan says:

    @Tanel Poder
    But pdksh is more “compatible” than the real “ksh”. pdksh emulates the old ksh88 while ksh in RHEL5 is the newest ksh93. See:
    https://bugzilla.redhat.com/show_bug.cgi?id=431448

  18. KG says:

    Hi Tanel
    None of the scripts links on the blog work

  19. Tanel Poder says:

    @KG
    Yeah I’ll fix this soon….

  20. Tanel Poder says:

    @Tanel Poder
    Most of my scripts are present in the tpt_public.zip …

  21. sriram Sanka says:

    Hi tanel !
    This was nice & neat.But the scripts pages “http://www.tanelpoder.com/files/scripts/os_explain” and the other pages are giving 404 errror.
    So could you please fix it !

    Regards
    Sriram

  22. sriram Sanka says:

    Sorry I have n`t read this.!
    “Most of my scripts are present in the tpt_public.zip …”

    Thanks for the update.

    Sriram

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>