Dump Current Oracle SQL Execution Plan Line ID Using Oradebug

2019/04/08

This post is written mostly for fun, not for practical everyday usability (enjoy!).

Introduction

There was a question on Twitter about alternative options for retrieving the SQL execution plan line ID that Oracle is currently executing. Normally Oracle’s Active Session History shows this info in its PLAN_LINE* columns as I have explained in my asqlmon.sql post.

But it looks like a way without ASH (or Diagnostics/Tuning pack licenses) is needed. Here’s a totally hipster way (read: advanced) for dumping the current plan line info from process private memory. For some background reading you may want to check out my old blog entries, scripts and videos:

This background material says that when a parent plan operator (plan line) calls its child operator, it will pass a pointer to the operator metadata inside the SQL child cursor into the function call. That metadata contains the plan line ID of the child operator and can be read from memory. After all, every plan line is just a qerNNFetch() function call, where NN is the execution plan operator type. You can see all execution plan operators using my @xplto.sql script.

So here’s an example. My application’s SQL query seems to be stuck or very slow and I want to dump the execution plan line ID it is currently executing. I’ll first list the execution plan for my application SID 849 (output edited for brevity):

SQL> @xsid 849

-------------------------------------------------------------------------------
| Id  | Operation                                | Name                       |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                         |                            |
|   1 |  TEMP TABLE TRANSFORMATION               |                            |
|   2 |   LOAD AS SELECT (CURSOR DURATION MEMORY)| SYS_TEMP_0FD9D6689_1564080 |
|   3 |    UNION-ALL                             |                            |
|   4 |     HASH GROUP BY                        |                            |
|*  5 |      HASH JOIN                           |                            |
|   6 |       TABLE ACCESS FULL                  | DATE_DIM                   |
|*  7 |       HASH JOIN                          |                            |
|   8 |        TABLE ACCESS FULL                 | CUSTOMER                   |
|   9 |        PARTITION RANGE ALL               |                            |
|  10 |         TABLE ACCESS FULL                | STORE_SALES                |
|  11 |     HASH GROUP BY                        |                            |
|* 12 |      HASH JOIN                           |                            |
|  13 |       TABLE ACCESS FULL                  | DATE_DIM                   |
|* 14 |       HASH JOIN                          |                            |
|  15 |        TABLE ACCESS FULL                 | CUSTOMER                   |
|  16 |        PARTITION RANGE ALL               |                            |
|  17 |         TABLE ACCESS FULL                | WEB_SALES                  |
|  18 |   SORT ORDER BY                          |                            |
|* 19 |    HASH JOIN                             |                            |
|* 20 |     HASH JOIN                            |                            |
|* 21 |      HASH JOIN                           |                            |
|* 22 |       VIEW                               |                            |
|  23 |        TABLE ACCESS FULL                 | SYS_TEMP_0FD9D6689_1564080 |
|* 24 |       VIEW                               |                            |
|  25 |        TABLE ACCESS FULL                 | SYS_TEMP_0FD9D6689_1564080 |
|* 26 |      VIEW                                |                            |
|  27 |       TABLE ACCESS FULL                  | SYS_TEMP_0FD9D6689_1564080 |
|* 28 |     VIEW                                 |                            |
|  29 |      TABLE ACCESS FULL                   | SYS_TEMP_0FD9D6689_1564080 |
-------------------------------------------------------------------------------

Dumping call stacks

The above script uses v$session and v$sql_plan to show which SQL statement and plan my session is executing, but it doesn’t know on which exact plan line the execution currently is. So I connect to that stuck session with ORADEBUG and make it dump its call stack:

SQL> @odsid 849
Oracle pid: 135, Unix process pid: 31432, image: oracle@linux01.localdomain

SQL> ORADEBUG DUMP CALLSTACK 2
Statement processed.

The Oracle’s CALLSTACK dump dumps less data than ERRORSTACK. In fact, errorstack uses callstack for dumping the kernel stack under the hood. I’m using the Oracle’s stack dumping mechanism here as Oracle uses its built-in stack unwind code that tries to extract function call arguments (if available in the stack) too.

Here’s the lengthy output. I pasted it for completeness, but will separately paste sections of interest in my analysis below. The top functions (everything above __sighandler() are the debug functions that were doing the actual stack unwinding and dumping so we can ignore them. And since we are interested in execution plan operators, you can scroll down straight to the qer* functions below and then keep reading:

*** 2019-04-08T11:31:21.079018-04:00
Received ORADEBUG command (#27) 'DUMP CALLSTACK 2' from process '31462'

----- Call Stack Trace -----
calling              call     entry                argument values in hex      
location             type     point                (? means dubious value)     
-------------------- -------- -------------------- ----------------------------
ksedst1()+110        call     kgdsdst()            7FFCDF1C5D10 000000003
                                                   7FFCDF1C0050 ? 7FFCDF1C0168 ?
                                                   7FFCDF1C5830 ? 000000083 ?
ksdxdmp()+1439       call     ksedst1()            000000001 000000001
                                                   7FFCDF1C0050 ? 7FFCDF1C0168 ?
                                                   7FFCDF1C5830 ? 000000083 ?
ksdxfdmp()+149       call     ksdxdmp()            0141BF760 7FFCDF1C62D8
                                                   000000002 000000000
                                                   7FFCDF1C5830 ? 000000083 ?
ksdxcb()+896         call     ksdxfdmp()           7FFCDF1C6B90 000000010
                                                   000000003 7FFCDF1C6A48
                                                   7FFCDF1C6AE8 000000083 ?
sspuser()+217        call     ksdxcb()             000000001 000000010 ?
                                                   000000003 ? 7FFCDF1C6A48 ?
                                                   7FFCDF1C6AE8 ? 000000083 ?
__sighandler()       call     sspuser()            000000001 ? 000000010 ?
                                                   000000003 ? 7FFCDF1C6A48 ?
                                                   7FFCDF1C6AE8 ? 000000083 ?
skgghash3()+192      signal   __sighandler()       7F261F019080 ? 00000000A ?  <-- received the SIGUSR2
                                                   0A7C45E2F ?                     signal that Oradebug uses
                                                   3D0FD3EA42E2AFDC ?
                                                   0A173AC3C ?
                                                   9A62D9A34660CFE8 ?
qesnhLoadRowsetsFor  call     skgghash3()          7F261F019080 ? 00000000A ?
GBY()+1239                                         0A7C45E2F ?
                                                   3D0FD3EA42E2AFDC ?
                                                   0A173AC3C ?
                                                   9A62D9A34660CFE8 ?
qerghRowPRowsetsFas  call     qesnhLoadRowsetsFor  7F2600000030 7F2625CF5548
tAggsCacheAware()+3           GBY()                0A7C45E2F ?
1622                                               3D0FD3EA42E2AFDC ?
                                                   0A173AC3C ?
                                                   9A62D9A34660CFE8 ?
qerhnProbeRowsetFin  call     qerghRowPRowsetsFas  7FFCDF1C9C88 000007FFF
ishInnerKu()+375              tAggsCacheAware()    7F2625CF5548
                                                   3D0FD3EA42E2AFDC ? 000000040
                                                   9A62D9A34660CFE8 ?
qerhnProbeRowsetInn  call     qerhnProbeRowsetFin  7FFCDF1C98F0 000007FFF
erEncoding()+1551             ishInnerKu()         7F2625CF5548 ?
                                                   3D0FD3EA42E2AFDC ?
                                                   000000040 ?
                                                   9A62D9A34660CFE8 ?
qerhnProbeRowsetFin  call     qerhnProbeRowsetInn  7F26259947C8 7F2625994FC8
ishInnerKu()+375              erEncoding()         7F2625CF5548 ?
                                                   3D0FD3EA42E2AFDC ?
                                                   000000040 ?
                                                   9A62D9A34660CFE8 ?
qerhnProbeRowsetInn  call     qerhnProbeRowsetFin  7FFCDF1C9700 000007FFF
erEncoding()+1551             ishInnerKu()         7F2625CF5548 ?
                                                   3D0FD3EA42E2AFDC ?
                                                   000000040 ?
                                                   9A62D9A34660CFE8 ?
kdstf00001010000100  call     qerhnProbeRowsetInn  7F2625998800 7F2625999000
0km()+1511                    erEncoding()         7F2625CF5548 ?
                                                   3D0FD3EA42E2AFDC ?
                                                   000000040 ?
                                                   9A62D9A34660CFE8 ?
kdsttgr()+1683       call     kdstf00001010000100  00000002C ? 7F2625999000 ?
                              0km()                0040701C0 7FFCDF1C9700
                                                   000007FFF 7FFCDF1C8970
qertbFetch()+1375    call     kdsttgr()            7FFCDF1C8870 ? 00000000A
                                                   0040701C0 ? 7FFCDF1C9700 ?
                                                   000007FFF ? 7FFCDF1C8970 ?
qergiFetch()+569     call     qertbFetch()         7FFCDF1C8870 ? 00000000A ?  <-- qerNNFetch function called
                                                   0040701C0 7FFCDF1C9700 ?
                                                   000007FFF 7FFCDF1C8970 ?
rwsfcd()+140         call     qergiFetch()         3FBA2CE80 12BED7140 0040701C0 
                                                   7FFCDF1C9700 000007FFF                  
                                                   7FFCDF1C8970 ?
qerhnFetch()+1372    call     rwsfcd()             3FBA2C898 ? 7F262BED72E0 ?
                                                   0040701C0 ? 7FFCDF1C9700 ?
                                                   000007FFF ? 7FFCDF1C8970 ?
rwsfcd()+140         call     qerhnFetch()         3FBA2C7A8 7F2625CF49F0
                                                   3FBA2C7A8 7F2625CF49F0
                                                   0040701C0 7FFCDF1C8970 ?
qerhnFetch()+6529    call     rwsfcd()             3FBA2C668 ? 7F2625CF4F40 ?
                                                   0040701C0 ? 7F2625CF49F0 ?
                                                   0040701C0 ? 7FFCDF1C8970 ?
qerghFetch()+664     call     qerhnFetch()         304000005 7F2600000001
                                                   3FBA2C470 7F2625CF5018
                                                   004154370 7FFCDF1C8970 ?
rwsfcd()+140         call     qerghFetch()         3FBA2C1E8 ? 7F2625CF9B08 ?
                                                   0040422A0 7FFCDF1C9E68
                                                   000007FFF 7FFCDF1C8970 ?
qeruaFetch()+376     call     rwsfcd()             3FBA2BE60 ? 7F2625CF9F28 ?
                                                   0040422A0 ? 7FFCDF1C9E68 ?
                                                   000007FFF ? 7FFCDF1C8970 ?
rwsfcd()+140         call     qeruaFetch()         3FBA28B18 3FBA28B18
                                                   7F2625C78348 0122ACA00
                                                   7FFCDF1CA0A0 7FFCDF1C8970 ?
qerltFetch()+547     call     rwsfcd()             3FBA28668 ? 7F2625C784A0 ?
                                                   7F262C036980 ? 0122ACA00 ?
                                                   7FFCDF1CA0A0 ? 7FFCDF1C8970 ?
insdlexe()+367       call     qerltFetch()         3EB5FB498 7F2625C785C0
                                                   00361BA70 7FFCDF1CA1A0
                                                   000000001 7FFCDF1C8970 ?
insExecStmtExecIniE  call     insdlexe()           3F9EBA018 ? 3E97EB0A8 ?
ngine()+76                                         00361BA70 ? 7FFCDF1CA1A0 ?
                                                   000000001 ? 7FFCDF1C8970 ?
insexe()+2269        call     insExecStmtExecIniE  3F9EBA018 ? 3E97EB0A8 ?
                              ngine()              00361BA70 ? 7FFCDF1CA1A0 ?
                                                   000000001 ? 7FFCDF1C8970 ?
qes3tExecSQL()+859   call     insexe()             3F9EBA018 ? 3E97EB0A8 ?
                                                   00361BA70 ? 7FFCDF1CA1A0 ?
                                                   000000001 ? 7FFCDF1C8970 ?
qerleStart()+770     call     qes3tExecSQL()       3E8F50FE8 000000001 000000000
                                                   3EA5AD5E0 000000001 ?
                                                   7FFCDF1C8970 ?
selexe0()+990        call     qerleStart()         3EA5AD5E0 7F262516EEE0
                                                   000000001 3EA5AD5E0 ?
                                                   000000001 ? 7FFCDF1C8970 ?
opiexe()+6556        call     selexe0()            3F9EBA018 ? 7FFCDF1CBE50
                                                   000000001 ? 000000001
                                                   7F262BFEF9A0 7FFCDF1C8970 ?
kpoal8()+2419        call     opiexe()             000000049 7FFCDF1CBE50 ?
                                                   7FFCDF1CC5B0 000000001 ?
                                                   7F262BFEF9A0 ? 7FFCDF1C8970 ?
opiodr()+1244        call     kpoal8()             00000005E 000000026
                                                   7FFCDF1D0040 000000001 ?
                                                   7F262BFEF9A0 ? 7FFCDF1C8970 ?
ttcpip()+1239        call     opiodr()             00000005E 000000026
                                                   7FFCDF1D0040 ? 000000000
                                                   7F262BFEF9A0 ? 7FFCDF1C8970 ?
opitsk()+1943        call     ttcpip()             7F262C010370 ? 0000005E0 ?
                                                   7FFCDF1D0040 000000000 ?
                                                   7FFCDF1CFA90 7FFCDF1D029C
opiino()+957         call     opitsk()             000000000 000000000
                                                   7FFCDF1D0040 ? 000000000 ?
                                                   7FFCDF1CFA90 ? 7FFCDF1D029C ?
opiodr()+1244        call     opiino()             00000003C 000000004
                                                   7FFCDF1D1C08 000000000 ?
                                                   7FFCDF1CFA90 ? 7FFCDF1D029C ?
opidrv()+1091        call     opiodr()             00000003C 000000004
                                                   7FFCDF1D1C08 ? 000000000
                                                   7FFCDF1CFA90 ? 7FFCDF1D029C ?
sou2o()+191          call     opidrv()             00000003C 000000004
                                                   7FFCDF1D1C08 000000000 ?
                                                   7FFCDF1CFA90 ? 7FFCDF1D029C ?
opimai_real()+455    call     sou2o()              7FFCDF1D1B31 00000003C
                                                   000000004 7FFCDF1D1C08
                                                   7FFCDF1CFA90 ? 7FFCDF1D029C ?
ssthrdmain()+423     call     opimai_real()        000000000 7FFCDF1D2308
                                                   000000004 ? 7FFCDF1D1C08 ?
                                                   7FFCDF1CFA90 ? 7FFCDF1D029C ?
main()+275           call     ssthrdmain()         000000000 000000002
                                                   7FFCDF1D2308 000000001
                                                   000000000 7FFCDF1D029C ?
__libc_start_main()  call     main()               000000002 7FFCDF1D2558
+245                                               7FFCDF1D2308 ? 000000001 ?
                                                   000000000 ? 7FFCDF1D029C ?
_start()+41          call     __libc_start_main()  000D6B3C0 000000002
                                                   7FFCDF1D2558 7F2627A2E495 ?
                                                   000000000 ? 7FFCDF1D029C ?

Ok, I’ll just paste the first (topmost) qer* function calls of interest here, so that there’d be less scrolling around:

qergiFetch()+569     call     qertbFetch()         7FFCDF1C8870 ? 00000000A ?
                                                   0040701C0 7FFCDF1C9700 ?
                                                   000007FFF 7FFCDF1C8970 ?
rwsfcd()+140         call     qergiFetch()         3FBA2CE80 12BED7140 0040701C0 
                                                   7FFCDF1C9700 000007FFF                  
                                                   7FFCDF1C8970 ?
qerhnFetch()+1372    call     rwsfcd()             3FBA2C898 ? 7F262BED72E0 ?
                                                   0040701C0 ? 7FFCDF1C9700 ?
                                                   000007FFF ? 7FFCDF1C8970 ?
rwsfcd()+140         call     qerhnFetch()         3FBA2C7A8 7F2625CF49F0
                                                   3FBA2C7A8 7F2625CF49F0
                                                   0040701C0 7FFCDF1C8970 ?

The “tb” in qertbFetch means “table”. “gi” means partition granule iterator. “hn” is a newer hash join function. Since the plan has many table scans and hash joins, how would we know which exact one this call stack was executing? This is where the qerNNFetch function call arguments come in. The first argument (highlighted above) is a pointer to the execution plan operator in the child cursor. A plan operator’s memory structure contains the human-readable execution plan line ID too.

When we look into the qertbFetch() argument 7FFCDF1C8870 ?, it has a question mark after it. This indicates a “dubious value” so might not be trustworthy.

Whenever Oracle’s stack unwinder isn’t sure that it extracted the correct argument info, it will report the value as dubious. Depending on the CPU architecture and compiler function calling conventions, not all arguments are saved in the stack, but passed via CPU registers only. Thus it’s not possible to read the function call arguments after the call was made.

Indeed when looking into the memory address 7FFCDF1C8870 - this is process private memory, not a pointer to a child cursor in SGA. So, let’s just ignore that dubious argument and see if we have more luck with the next function call qergiFetch() and its argument 3FBA2CE80.

Examining the child cursor

This address seems to be within the SGA as per @ksminfo.sql or when using my @fcha.sql to Find Chunk Address. Note that @fcha.sql uses X$KSMSP under the hood, so should likely not be used in busy production systems.

SQL> @fcha 3FBA2CE80
Find in which heap (UGA, PGA or Shared Pool) the memory address 3FBA2CE80 resides...

WARNING!!! This script will query X$KSMSP, which will cause heavy shared pool latch contention
in systems under load and with large shared pool. This may even completely hang
your instance until the query has finished! You probably do not want to run this in production!

Press ENTER to continue, CTRL+C to cancel...


LOC KSMCHPTR           KSMCHIDX   KSMCHDUR KSMCHCOM           KSMCHSIZ KSMCHCLS   KSMCHTYP KSMCHPAR
--- ---------------- ---------- ---------- ---------------- ---------- -------- ---------- ----------------
SGA 00000003FBA2C628          1          4 SQLA^208866ee          4096 freeabl           0 00000003F4F37AE0

Indeed we found a SQLA - SQL Area type chunk from that address (SQL Area in this context is the child cursor’s execution plan).

Since 11g, Oracle keeps the KGL object name’s hash value in chunk comment, so we can see what cursor the hash value 208866ee corresponds to:

SQL> SELECT sql_text FROM v$sql WHERE hash_value = TO_NUMBER('208866ee', 'xxxxxxxx');

SQL_TEXT
------------------------------------------------------------------------------------------
WITH year_total AS ( SELECT c_customer_id customer_id ,c_first_name customer_first_name ...
WITH year_total AS ( SELECT c_customer_id customer_id ,c_first_name customer_first_name ...

Ok, yep, looks like the pointer 3FBA2CE80 points to my SQL statement’s child cursor. Now let’s read what’s behind that pointer:

SQL> @peek 3FBA2CE80
[3FBA2CE80, 3FBA2CEC0) = 00260008 01000009 FBA2C898 00000003 00000000 00000000 FBA2D730 00000003 FBA2CE38 00000003 000001E0 00000A18 1F340009 00000045 ...

The highlighted bytes show the human readable plan line ID that this operator belongs to. Let’s look into the execution plan again, to see what ID 9 is:

-------------------------------------------------------------------------------
| Id  | Operation                                | Name                       |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                         |                            |
|   1 |  TEMP TABLE TRANSFORMATION               |                            |
|   2 |   LOAD AS SELECT (CURSOR DURATION MEMORY)| SYS_TEMP_0FD9D6689_1564080 |
|   3 |    UNION-ALL                             |                            |
|   4 |     HASH GROUP BY                        |                            |
|*  5 |      HASH JOIN                           |                            |
|   6 |       TABLE ACCESS FULL                  | DATE_DIM                   |
|*  7 |       HASH JOIN                          |                            |
|   8 |        TABLE ACCESS FULL                 | CUSTOMER                   |
|   9 |        PARTITION RANGE ALL               |                            |
|  10 |         TABLE ACCESS FULL                | STORE_SALES                |
...

Indeed, ID 9 is the PARTITION RANGE ALL operator, that is implemented by the qergiFetch() function that we saw in the call stack earlier. I will ignore the rwsfcd() function above as it is not a stand-alone execution plan line operator - it is used for implementing filter predicates anywhere in the plan. I’ll look into the qerhnFetch() call with the argument 3FBA2C7A8:

SQL> @peek 3FBA2C7A8
[3FBA2C7A8, 3FBA2C7E8) = 002E0008 01000007 FBA2C668 00000003 00000000 00000000 FBA2D968 00000003 FBA2C6E0 00000003 000001F0 00000000 04000007 00000086 ...

When you look into the execution plan excerpt above, it sure looks like ID 7 means HASH JOIN (qerhnFetch is a hash join function).

Summary

Hopefully this post and the background reading explain some internal mechanics of Oracle’s SQL Plan execution. I wrote it mainly for some internals hacking fun. If you ever need to use this in practice, you would probably want to run multiple CALLSTACK commands in a row to get a better sample set of where the execution is stuck or looping. With just one stack sample you may get unlucky and end up troubleshooting the wrong thing. Also, I’d stay away from production with ORADEBUG & undocumented dumps if simpler & safer tools (like ASH, SQL Monitoring or GATHER_PLAN_STATISTICS) are available to you.

[Comments]
Click on Tweet to comment or ask a question! Keep the "via @tanelpoder" in the tweet text to notify me.
NB! Check out my 2019 online training classes here! Practical Linux Performance & Application Troubleshooting training (new), Advanced Oracle SQL Tuning training, Advanced Oracle Troubleshooting training. In addition to the online classes, all attendees will receive personal downloadable video recordings too!