Advanced Oracle Troubleshooting Guide, Part 6: Understanding Oracle execution plans with os_explain

Tanel Poder

2008-06-15

Get ready for some more adventures in Oracle process stack!

Before proceeding though, please read this post about safety of different stack sampling approaches.

I have had few non-trivial Oracle troubleshooting cases, related to query hangs and bad performance, where I’ve wanted to know where exactly in execution plan the current execution is.

Remember, Oracle is just another program executing instructions clustered in functions on your server, so stack sampling can help out here as well.

So, I was looking into the following stack trace taken from an Oracle 10.1 database on Solaris SPARC, running a SQL with this execution plan.

$ cat pstack.txt
 ------------------------------------------------------------------------
 0000000101eca6e4 kdirfrs (ffffffff78eca5d8, 1, 0, 2, 86521ee00, 0) + 24
 0000000102502adc qerixFetchFastFullScan (ffffffff78eca3e0, 10248ebc0, 0, 1, 86521ebf8, fffd) + 33c
 0000000102558720 qergiFetch (ffffffff78ecaca8, 10248ebc0, ffffffff7fff5ee8, 10449dba0, 1, 86bb6bb38) + 80
 000000010248edd8 qerjoFetch (45, 10248ebc0, ffffffff7fff5ee8, ffffffff78ecad60, 86bb6bc08, 114) + 118
 000000010248eeb8 qerjoFetch (7ffe, 10248ebc0, ffffffff7fff5fe8, ffffffff78ecb5e8, 7f399b6f8, 7ffe) + 1f8
 00000001025171b8 rwsfcd (8177747a8, 10248e1e0, ffffffff7fff6168, 7fff, f0, 10449dba0) + 78
 000000010248e4dc qeruaFetch (745c0ab40, 10248e1e0, ffffffff7fff6168, ffffffff78ecb620, 10449dba8, 8177747a8) + 11c
 000000010248d840 qervwFetch (1d, 1d, ffffffff7fff6238, 7fff, 7cfc87c50, 10449d000) + a0
 00000001025171b8 rwsfcd (7876e8d18, 10248e1e0, ffffffff7fff63b8, 7fff, c0, 10449dba0) + 78
 000000010248e4dc qeruaFetch (817d1e438, 10248e1e0, ffffffff7fff63b8, ffffffff78e7d318, 10449dba8, 7876e8d18) + 11c
 000000010248d840 qervwFetch (1d, 1d, ffffffff7fff6488, 7fff, 82355a348, 10449d000) + a0
 00000001025171b8 rwsfcd (7454b0408, 10249e4e0, ffffffff7fff6608, 7fff, c0, 10449dba0) + 78
 00000001024a4620 qerhjFetch (86b1d64d8, 0, 0, 1, ffffffff78e7df08, 0) + 300
 000000010248f99c qerjotFetch (78d8ea1d8, 0, 0, 1, ffffffff78e7e8d0, 10449dba0) + dc
 000000010248eeb8 qerjoFetch (1, 10248ebc0, ffffffff7fff6838, ffffffff78e7f2b0, 7d53bb730, 1) + 1f8
 000000010248eeb8 qerjoFetch (1, 10248ebc0, ffffffff7fff6938, ffffffff7b9b2f78, 74b306ba0, 7fff) + 1f8
 000000010248d840 qervwFetch (5, 5, ffffffff7fff6a08, 7fff, 80b0faa18, 10449d000) + a0
 00000001025171b8 rwsfcd (865e988f8, 10249e4e0, ffffffff7fff6b88, 7fff, c0, 10449dba0) + 78
 00000001024a4620 qerhjFetch (7a57ae350, 10249e4e0, ffffffff7fff6d88, 1, ffffffff7bb89f00, 0) + 300
 00000001025171b8 rwsfcd (823642298, 10249e4e0, ffffffff7fff6d88, 7fff, 6f0, 10449dba0) + 78
 00000001024a4620 qerhjFetch (751158588, 102517980, 7511587f0, 1, ffffffff7b9b5090, 0) + 300
 000000010251d1f0 qersoFetch (94, 10506adf8, 10449d000, ffffffff7b9b53b8, 799854d60, 7511587f0) + 350
 0000000101aa6f24 opifch2 (7, f, 150, 1, 104400, 1050685e8) + a64
 0000000101aa6384 opifch (5, 2, ffffffff7fff79f8, 105000, 0, 10434c0a8) + 44
 0000000101ad81ec opipls (104000, 10434c, 1, ffffffff7bba3e6a, 0, 140010) + f4c
 00000001002d0058 opiodr (6, 10506ae10, 10434cfb0, 10506a, 105000, 104000) + 598
 00000001002d4ec0 rpidrus (ffffffff7fff8820, 105067f18, 105068860, ffffffff7bb5f560, 4a8c, 200000) + a0
 0000000102f615e4 skgmstack (ffffffff7fff8a48, ffffffff7f87cf8f, ffffffff7fff898f, 1002d4e20, ffffffff7fff8a70, acc01800) + a4
 00000001002d504c rpidru (ffffffff7fff9140, 10422b000, 10422a918, 104229598, 410, 82) + ac
 00000001002d4808 rpiswu2 (0, 104556000, ffffffff7fff8b88, 2, 104556418, ffffffff7fff92c0) + 1a8
 00000001002d61cc rpidrv (a, ffffffff7fff9044, 9, ffffffff7fff90c0, ffffffff7fff9140, 1002d5180) + 62c
 0000000102797f90 psddr0 (104400, 86e2fc628, ffffffff7fff92c0, 9, 1050769d8, 1050769d8) + 1f0
 0000000102798e04 psdnal (ffffffff7fffa0b8, ffffffff7fffa258, 105000, ffffffff7bc5eb00, 7f7aa86d0, 40) + 184
 000000010376d268 pevm_BFTCHC (0, 7f7aa86d0, 50, ffffffff7bb5f560, ffffffff7bc5eb00, 0) + 188
 000000010373dff4 pfrinstr_FTCHC (10, 15d0000000000000, ffffffff7bb5f5c8, ffffffff7bb5f560, 4a8c, ffffffff7bb66330) + b4
 00000001037362c8 pfrrun_no_tool (ffffffff7bb5f560, 779953684, ffffffff7bb5f5c8, 10457c9d8, 2001, 2001) + 48
 00000001037372d0 pfrrun (ffffffff7bb5f5c8, 200000, 0, 200000, ffffffff7bb5f560, 779abb110) + 2f0
 0000000103783374 plsql_run (ffffffff7bb55408, 1, 0, ffffdfff, ffffffff7fffa0b8, 0) + 274
 0000000103722554 peicnt (ffffffff7fffa0b8, 105068860, 6, ffffffff7fff9f28, 41d8, 1050685e8) + d4
 000000010327b784 kkxexe (105000, 104000, 105068, 104296000, 1050685e8, ffffffff7bb5f560) + 284
 0000000101ad0228 opiexe (4, ffffffff7bc68ee8, ffffffff7fffab00, 0, 0, ffffffff7bc70480) + 33c8
 0000000101a4c0a8 kpoal8 (40008, 1, ffffffff7fffd890, 0, 0, 4) + 648
 00000001002d0058 opiodr (14, 10506ae10, 10434ce70, 10506a, 105000, 104000) + 598
 0000000102cded94 ttcpip (105071450, 18, ffffffff7fffd890, ffffffff7fffcb88, 104229c98, ffffffff7fffcb84) + 694
 00000001002cd3e8 opitsk (1002cf000, 1, 0, ffffffff7fffd9e8, 105071450, 105071458) + 428
 0000000101aaf564 opiino (105070000, 1050683c0, 0, 0, f5, 105070290) + 404
 00000001002d0058 opiodr (4, 10506ae10, 10434c920, 10000, 105071, 105000) + 598
 00000001002cc174 opidrv (0, 4, 10506a, 105071450, 0, 3c) + 354
 00000001002c9828 sou2o (ffffffff7fffe6b8, 3c, 4, ffffffff7fffe698, 104aa6000, 104aa6) + 48
 00000001002a7b34 main (2, ffffffff7fffe798, ffffffff7fffe7b0, 0, 0, 100000000) + 94
 00000001002a7a7c _start (0, 0, 0, 0, 0, 0) + 17c

Not too encouraging, huh?

So, let’s run this stack trace through my os_explain script:

$ ./os_explain pstack.txt
   SELECT FETCH:
    SORT: Fetch
     HASH JOIN: Fetch
    * HASH JOIN: Fetch
     * VIEW: Fetch
        NESTED LOOP OUTER: Fetch
         NESTED LOOP OUTER: Fetch
          NESTED LOOP JOIN: Fetch
           HASH JOIN: Fetch
          * VIEW: Fetch
             UNION-ALL: Fetch
            * VIEW: Fetch
               UNION-ALL: Fetch
              * NESTED LOOP OUTER: Fetch
                 NESTED LOOP OUTER: Fetch
                  GRANULE ITERATOR: Fetch
                   INDEX: FetchFastFullScan
                    kdirfrs

Now this is much more understandable!

All my script did was:

Easy :)

So, how to read this?

First, by now it should be obvious that in Oracle, each rowsource operator (the different lines you see in SQL execution plans) is actually just a function inside Oracle kernel. These are the row source functions, starting with “qer”. QER stands for Query Execution Rowsource as far as I know.

Whenever an Oracle process fetches data from a cursor, it calls opifch2() which in turn calls the root rowsource function in execution plan. In my case that function was qersoFetch and my os_explain script just substituted the “qerso” part with SORT (as per the Metalink note I mentioned above). The first child function of qersoFetch was qerhjFetch, which is a hash join rowsource, and so on. Note that os_explain prefixes some lines with an asterisk (*), this indicates that the output of given function is in turn filtered by a filter operation (the same filter ops what you normally see in the bottom of DBMS_XPLAN explained plans).

So, logically you can imagine an execution plan as a tree of Oracle functions:

But physically, an execution plan is just a memory structure in subheap 6 of a child cursor (x$kglcursor.kglobhd6), which has a bunch of rowsource function opcodes in it.

During plan execution Oracle process “just” traverses through those opcodes, looks up the corresponding rowsource function starting address using a lookup table and calls it. That function does its task (probably calls other rowsource functions recursively) and returns to caller.

Note that many rowsource functions are designed to be cascading, being able to do only the work needed for returning a small subset of rows and return only few rows at a time, as opposed to the whole resultset.

This is a very good thing as rows can be cascaded, or pipelined back to parent functions as rows become available. For example a table fetch only fetches a handful of rows (and not the whole table) at a time and returns these “up” for further processing. Also, a nested loop join is able to pass matching rows “up” from the moment first matches are found, again there’s no need to perform the join on full dataset first before returning first rows.

This also means that there is no need to store the whole intermediate resultset somewhere in memory before passing it up to parent function; instead we just revisit that branch of execution plan tree whenever we need more rows from it. And the os_explain script shows you exactly in which execution branch the execution currently is.

Addition: I will elaborate on how to match the execution plan with stack trace in an upcoming post – it’s too much material for an introductory post.

So, cascading rowsources allow us to “incrementally” execute plans involving large datasets, without need to keep the intermediate resultsets in memory. On the other hand, few rowsource operators in your execution plan, like SORT, can not return any rows up before all its children’s rows are processed. With SORT (and aggregate operations which also use SORT) you just have to process all the source rows before returning any meaningful result back. You can’t just go through only half of the source data, order it and start returning rows in hope that the rest of the rows should have not been returned as first in the order. This is where the SQL cursor workareas come into play for such operations.

SORT, HASH and BITMAP rowsources can allocate SQL workareas for them, while others can’t. This can easily be identified from execution plan statistics of following sample query:

SQL> select /*+ gather_plan_statistics */ owner, count(*) from dba_source group by owner;

OWNER                            COUNT(*)
------------------------------ ----------
WKSYS                                8988
HR                                     34
[...some output snipped...]
SYS                                129299
WMSYS                                 704

25 rows selected.

SQL> select * from table(dbms_xplan.display_cursor(null,null,'MEMSTATS LAST'));

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------
SQL_ID  dcp37kxt02m9f, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ owner, count(*) from dba_source
group by owner

Plan hash value: 114136443

----------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name            | Starts | E-Rows | A-Rows |   A-Time   |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------------
|   1 |  HASH GROUP BY                |                 |      1 |    593K|     25 |00:00:18.24 |   821K|   821K| 5213K (0)|
|   2 |   VIEW                        | DBA_SOURCE      |      1 |    593K|    595K|00:00:16.84 |       |       |          |
|   3 |    UNION-ALL                  |                 |      1 |        |    595K|00:00:15.06 |       |       |          |
|*  4 |     FILTER                    |                 |      1 |        |    595K|00:00:10.88 |       |       |          |
|*  5 |      HASH JOIN                |                 |      1 |    595K|    595K|00:00:08.50 |   884K|   884K| 1316K (0)|
|*  6 |       HASH JOIN               |                 |      1 |   6527 |   6292 |00:00:00.12 |   870K|   870K| 1179K (0)|
|   7 |        TABLE ACCESS FULL      | USER$           |      1 |     98 |     98 |00:00:00.01 |       |       |          |
|*  8 |        HASH JOIN              |                 |      1 |   6527 |   6292 |00:00:00.09 |   909K|   909K| 1181K (0)|
|   9 |         INDEX FULL SCAN       | I_USER2         |      1 |     98 |     98 |00:00:00.01 |       |       |          |
|* 10 |         INDEX FAST FULL SCAN  | I_OBJ2          |      1 |   6527 |   6292 |00:00:00.04 |       |       |          |
|  11 |       INDEX FAST FULL SCAN    | I_SOURCE1       |      1 |    595K|    595K|00:00:01.56 |       |       |          |
|  12 |      NESTED LOOPS             |                 |      0 |      1 |      0 |00:00:00.01 |       |       |          |
|* 13 |       INDEX FULL SCAN         | I_USER2         |      0 |      1 |      0 |00:00:00.01 |       |       |          |
|* 14 |       INDEX RANGE SCAN        | I_OBJ4          |      0 |      1 |      0 |00:00:00.01 |       |       |          |
|  15 |     NESTED LOOPS              |                 |      1 |      1 |      0 |00:00:00.01 |       |       |          |
|  16 |      NESTED LOOPS             |                 |      1 |      1 |      0 |00:00:00.01 |       |       |          |
|  17 |       NESTED LOOPS            |                 |      1 |      1 |      0 |00:00:00.01 |       |       |          |
|* 18 |        INDEX FAST FULL SCAN   | I_OBJ2          |      1 |      6 |      0 |00:00:00.01 |       |       |          |
|* 19 |        FIXED TABLE FIXED INDEX| X$JOXFS (ind:1) |      0 |      1 |      0 |00:00:00.01 |       |       |          |
|* 20 |       INDEX RANGE SCAN        | I_USER2         |      0 |      1 |      0 |00:00:00.01 |       |       |          |
|  21 |      TABLE ACCESS CLUSTER     | USER$           |      0 |      1 |      0 |00:00:00.01 |       |       |          |
|* 22 |       INDEX UNIQUE SCAN       | I_USER#         |      0 |      1 |      0 |00:00:00.01 |       |       |          |
----------------------------------------------------------------------------------------------------------------------------

You see how the HASH operations do have their last Mem columns populated, therefore those rowsource functions did allocate a SQL workarea for them. Others like NESTED LOOPS joins and data access rowsources did not have any SQL workarea memory allocated as they are completely cascading.

Note that os_explain can also read the stack from STDIN as seen in example below. Also the -a option will tell os_explain to show all functions in the stack and not only the execution plan ones and their children.

Command:

SQL> alter session set statistics_level=typical;

Session altered.

SQL> select avg(length(text)) from dba_source where owner = 'SYS';

AVG(LENGTH(TEXT))
-----------------
       125.032127

Plan:

---------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name            | E-Rows |  OMem |  1Mem | Used-Mem |
---------------------------------------------------------------------------------------------------
|   1 |  SORT AGGREGATE                     |                 |      1 |       |       |          |
|   2 |   VIEW                              | DBA_SOURCE      |  17972 |       |       |          |
|   3 |    UNION-ALL                        |                 |        |       |       |          |
|*  4 |     FILTER                          |                 |        |       |       |          |
|   5 |      NESTED LOOPS                   |                 |        |       |       |          |
|   6 |       NESTED LOOPS                  |                 |  18056 |       |       |          |
|*  7 |        HASH JOIN                    |                 |    198 |   909K|   909K| 1193K (0)|
|   8 |         INDEX FULL SCAN             | I_USER2         |     98 |       |       |          |
|   9 |         NESTED LOOPS                |                 |    198 |       |       |          |
|  10 |          TABLE ACCESS BY INDEX ROWID| USER$           |      1 |       |       |          |
|* 11 |           INDEX UNIQUE SCAN         | I_USER1         |      1 |       |       |          |
|* 12 |          INDEX RANGE SCAN           | I_OBJ5          |    198 |       |       |          |
|* 13 |        INDEX RANGE SCAN             | I_SOURCE1       |     93 |       |       |          |
|  14 |       TABLE ACCESS BY INDEX ROWID   | SOURCE$         |     91 |       |       |          |
|  15 |      NESTED LOOPS                   |                 |      1 |       |       |          |
|* 16 |       INDEX FULL SCAN               | I_USER2         |      1 |       |       |          |
|* 17 |       INDEX RANGE SCAN              | I_OBJ4          |      1 |       |       |          |
|  18 |     NESTED LOOPS                    |                 |      1 |       |       |          |
|  19 |      NESTED LOOPS                   |                 |      1 |       |       |          |
|  20 |       NESTED LOOPS                  |                 |      1 |       |       |          |
|  21 |        TABLE ACCESS BY INDEX ROWID  | USER$           |      1 |       |       |          |
|* 22 |         INDEX UNIQUE SCAN           | I_USER1         |      1 |       |       |          |
|* 23 |        INDEX RANGE SCAN             | I_OBJ5          |      1 |       |       |          |
|* 24 |       FIXED TABLE FIXED INDEX       | X$JOXFS (ind:1) |      1 |       |       |          |
|* 25 |      INDEX RANGE SCAN               | I_USER2         |      1 |       |       |          |
---------------------------------------------------------------------------------------------------

Stack:

$ pstack 23740 | ./os_explain -a
   main
    ssthrdmain
     opimai_real
      sou2o
       opidrv
        opiodr
         opiino
          opitsk
           ttcpip
            opiodr
             kpoal8
              SELECT FETCH:
               GROUP BY SORT: Fetch
                VIEW: Fetch
                 UNION-ALL: Fetch
                * FILTER DEFINITION: FetchOutside
                   UNION-ALL: RowProcedure
                    VIEW: RowProcedure
                     qesaAggNonDistSS.
                      evaopn2
                       evalen
                        lxsCntChar

The FILTER rowsources (not talking about filter predicates on normal rowsources here) can make things more complicated though as they can introduce their own logic and loops into the execution plan (for running correlated subqueries etc).

By the way, see what happens when I run exactly the same query with rowsource level statistics collection enabled:

SQL> alter session set statistics_level=all;

Session altered.

SQL> select avg(length(text)) from dba_source where owner = 'SYS';

AVG(LENGTH(TEXT))
-----------------
       125.032127

$ pstack 23740 | ./os_explain -a
   main
    ssthrdmain
     opimai_real
      sou2o
       opidrv
        opiodr
         opiino
          opitsk
           ttcpip
            opiodr
             kpoal8
              SELECT FETCH:
               QUERY EXECUTION STATISTICS: Fetch
                GROUP BY SORT: Fetch
                 QUERY EXECUTION STATISTICS: Fetch
                  VIEW: Fetch
                   QUERY EXECUTION STATISTICS: Fetch
                    UNION-ALL: Fetch
                     QUERY EXECUTION STATISTICS: Fetch
                    * QUERY EXECUTION STATISTICS: Fetch
                       FILTER DEFINITION: FetchOutside
                        QUERY EXECUTION STATISTICS: Fetch
                         NESTED LOOP JOIN: Fetch
                          QUERY EXECUTION STATISTICS: Fetch
                           QUERY EXECUTION STATISTICS: SnapStats
                            sltrgftime64
                             gettimeofday
                              __kernel_vsyscall

Every rowsource is wrapped into a QUERY EXECUTION STATISTICS wrapper, which’ task is just to count the number of rows sent “up” to parents in the tree, logical IOs and also rowsource timing info whenever an internal counter (set by _rowsource_statistics_sampfreq parameter) wraps.

This is just an intro to Oracle execution plan internals and troubleshooting. Hopefully you don’t need this technique too often, however it has helped me to successfully pinpoint the root cause of a problem in few non-trivial database problems.

Note that in Oracle 11g there’s an excellent new feature called Real-time SQL Monitoring. It allows you to monitor the progress of currently running SQL statements. For serially running statements the monitoring kicks in for a statement after it’s used total 5 seconds of CPU or IO time (this time is controlled by _sqlmon_threshold parameter, but for PX the monitoring is always enabled). After that you can query V$SQL_MONITOR and V$SQL_PLAN_MONITOR for seeing how much time/rows/LIOs that session has spent executing a statement. You can see these details even at the SQL execution plan line level. Alternatively you can use DBMS_SQLTUNE. REPORT_SQL_MONITOR function to get this info nicely formatted. Greg Rahn has written a good blog entry about it.

However it’s important to note that both DBMS_SQLTUNE and V$SQL_MONITOR/V$SQL_PLAN_MONITOR use requires you to have Oracle Tuning Pack license which in turn requires Oracle Diagnostic Pack license. Details are in Oracle 11g Licensing Guide.

So another, low level approach for real-time monitoring will still be handy even after 11g becomes mainstream. In a future post I will be showing how to measure the progress and get execution profile of your plan by aggregating multiple stack traces and also some cool opportunities with DTrace.


  1. I am finally close to launching the completely rebuilt 2024 versions of my Linux & AOT classes in my Learning Platform! (Updates to SQL Tuning class in H2 2024):
    Advanced Oracle SQL Tuning training. Advanced Oracle Troubleshooting training, Linux Performance & Troubleshooting training. Check them out!
  2. Get randomly timed updates by email or follow Social/RSS