Get ready for some more adventures in Oracle process stack!
Before proceeding though, please read this post about safety of different stack sampling approaches.
I have had few non-trivial Oracle troubleshooting cases, related to query hangs and bad performance, where I’ve wanted to know where exactly in execution plan the current execution is.
Remember, Oracle is just another program executing instructions clustered in functions on your server, so stack sampling can help out here as well.
So, I was looking into the following stack trace taken from an Oracle 10.1 database on Solaris SPARC, running a SQL with this execution plan.
$ cat pstack.txt ------------------------------------------------------------------------ 0000000101eca6e4 kdirfrs (ffffffff78eca5d8, 1, 0, 2, 86521ee00, 0) + 24 0000000102502adc qerixFetchFastFullScan (ffffffff78eca3e0, 10248ebc0, 0, 1, 86521ebf8, fffd) + 33c 0000000102558720 qergiFetch (ffffffff78ecaca8, 10248ebc0, ffffffff7fff5ee8, 10449dba0, 1, 86bb6bb38) + 80 000000010248edd8 qerjoFetch (45, 10248ebc0, ffffffff7fff5ee8, ffffffff78ecad60, 86bb6bc08, 114) + 118 000000010248eeb8 qerjoFetch (7ffe, 10248ebc0, ffffffff7fff5fe8, ffffffff78ecb5e8, 7f399b6f8, 7ffe) + 1f8 00000001025171b8 rwsfcd (8177747a8, 10248e1e0, ffffffff7fff6168, 7fff, f0, 10449dba0) + 78 000000010248e4dc qeruaFetch (745c0ab40, 10248e1e0, ffffffff7fff6168, ffffffff78ecb620, 10449dba8, 8177747a8) + 11c 000000010248d840 qervwFetch (1d, 1d, ffffffff7fff6238, 7fff, 7cfc87c50, 10449d000) + a0 00000001025171b8 rwsfcd (7876e8d18, 10248e1e0, ffffffff7fff63b8, 7fff, c0, 10449dba0) + 78 000000010248e4dc qeruaFetch (817d1e438, 10248e1e0, ffffffff7fff63b8, ffffffff78e7d318, 10449dba8, 7876e8d18) + 11c 000000010248d840 qervwFetch (1d, 1d, ffffffff7fff6488, 7fff, 82355a348, 10449d000) + a0 00000001025171b8 rwsfcd (7454b0408, 10249e4e0, ffffffff7fff6608, 7fff, c0, 10449dba0) + 78 00000001024a4620 qerhjFetch (86b1d64d8, 0, 0, 1, ffffffff78e7df08, 0) + 300 000000010248f99c qerjotFetch (78d8ea1d8, 0, 0, 1, ffffffff78e7e8d0, 10449dba0) + dc 000000010248eeb8 qerjoFetch (1, 10248ebc0, ffffffff7fff6838, ffffffff78e7f2b0, 7d53bb730, 1) + 1f8 000000010248eeb8 qerjoFetch (1, 10248ebc0, ffffffff7fff6938, ffffffff7b9b2f78, 74b306ba0, 7fff) + 1f8 000000010248d840 qervwFetch (5, 5, ffffffff7fff6a08, 7fff, 80b0faa18, 10449d000) + a0 00000001025171b8 rwsfcd (865e988f8, 10249e4e0, ffffffff7fff6b88, 7fff, c0, 10449dba0) + 78 00000001024a4620 qerhjFetch (7a57ae350, 10249e4e0, ffffffff7fff6d88, 1, ffffffff7bb89f00, 0) + 300 00000001025171b8 rwsfcd (823642298, 10249e4e0, ffffffff7fff6d88, 7fff, 6f0, 10449dba0) + 78 00000001024a4620 qerhjFetch (751158588, 102517980, 7511587f0, 1, ffffffff7b9b5090, 0) + 300 000000010251d1f0 qersoFetch (94, 10506adf8, 10449d000, ffffffff7b9b53b8, 799854d60, 7511587f0) + 350 0000000101aa6f24 opifch2 (7, f, 150, 1, 104400, 1050685e8) + a64 0000000101aa6384 opifch (5, 2, ffffffff7fff79f8, 105000, 0, 10434c0a8) + 44 0000000101ad81ec opipls (104000, 10434c, 1, ffffffff7bba3e6a, 0, 140010) + f4c 00000001002d0058 opiodr (6, 10506ae10, 10434cfb0, 10506a, 105000, 104000) + 598 00000001002d4ec0 rpidrus (ffffffff7fff8820, 105067f18, 105068860, ffffffff7bb5f560, 4a8c, 200000) + a0 0000000102f615e4 skgmstack (ffffffff7fff8a48, ffffffff7f87cf8f, ffffffff7fff898f, 1002d4e20, ffffffff7fff8a70, acc01800) + a4 00000001002d504c rpidru (ffffffff7fff9140, 10422b000, 10422a918, 104229598, 410, 82) + ac 00000001002d4808 rpiswu2 (0, 104556000, ffffffff7fff8b88, 2, 104556418, ffffffff7fff92c0) + 1a8 00000001002d61cc rpidrv (a, ffffffff7fff9044, 9, ffffffff7fff90c0, ffffffff7fff9140, 1002d5180) + 62c 0000000102797f90 psddr0 (104400, 86e2fc628, ffffffff7fff92c0, 9, 1050769d8, 1050769d8) + 1f0 0000000102798e04 psdnal (ffffffff7fffa0b8, ffffffff7fffa258, 105000, ffffffff7bc5eb00, 7f7aa86d0, 40) + 184 000000010376d268 pevm_BFTCHC (0, 7f7aa86d0, 50, ffffffff7bb5f560, ffffffff7bc5eb00, 0) + 188 000000010373dff4 pfrinstr_FTCHC (10, 15d0000000000000, ffffffff7bb5f5c8, ffffffff7bb5f560, 4a8c, ffffffff7bb66330) + b4 00000001037362c8 pfrrun_no_tool (ffffffff7bb5f560, 779953684, ffffffff7bb5f5c8, 10457c9d8, 2001, 2001) + 48 00000001037372d0 pfrrun (ffffffff7bb5f5c8, 200000, 0, 200000, ffffffff7bb5f560, 779abb110) + 2f0 0000000103783374 plsql_run (ffffffff7bb55408, 1, 0, ffffdfff, ffffffff7fffa0b8, 0) + 274 0000000103722554 peicnt (ffffffff7fffa0b8, 105068860, 6, ffffffff7fff9f28, 41d8, 1050685e8) + d4 000000010327b784 kkxexe (105000, 104000, 105068, 104296000, 1050685e8, ffffffff7bb5f560) + 284 0000000101ad0228 opiexe (4, ffffffff7bc68ee8, ffffffff7fffab00, 0, 0, ffffffff7bc70480) + 33c8 0000000101a4c0a8 kpoal8 (40008, 1, ffffffff7fffd890, 0, 0, 4) + 648 00000001002d0058 opiodr (14, 10506ae10, 10434ce70, 10506a, 105000, 104000) + 598 0000000102cded94 ttcpip (105071450, 18, ffffffff7fffd890, ffffffff7fffcb88, 104229c98, ffffffff7fffcb84) + 694 00000001002cd3e8 opitsk (1002cf000, 1, 0, ffffffff7fffd9e8, 105071450, 105071458) + 428 0000000101aaf564 opiino (105070000, 1050683c0, 0, 0, f5, 105070290) + 404 00000001002d0058 opiodr (4, 10506ae10, 10434c920, 10000, 105071, 105000) + 598 00000001002cc174 opidrv (0, 4, 10506a, 105071450, 0, 3c) + 354 00000001002c9828 sou2o (ffffffff7fffe6b8, 3c, 4, ffffffff7fffe698, 104aa6000, 104aa6) + 48 00000001002a7b34 main (2, ffffffff7fffe798, ffffffff7fffe7b0, 0, 0, 100000000) + 94 00000001002a7a7c _start (0, 0, 0, 0, 0, 0) + 17c
Not too encouraging, huh?
So, let’s run this stack trace through my os_explain script:
$ ./os_explain pstack.txt
SELECT FETCH:
SORT: Fetch
HASH JOIN: Fetch
* HASH JOIN: Fetch
* VIEW: Fetch
NESTED LOOP OUTER: Fetch
NESTED LOOP OUTER: Fetch
NESTED LOOP JOIN: Fetch
HASH JOIN: Fetch
* VIEW: Fetch
UNION-ALL: Fetch
* VIEW: Fetch
UNION-ALL: Fetch
* NESTED LOOP OUTER: Fetch
NESTED LOOP OUTER: Fetch
GRANULE ITERATOR: Fetch
INDEX: FetchFastFullScan
kdirfrs
Now this is much more understandable!
All my script did was:
- remove the bottom part of the stack not relevant to plan execution
- reverse the order of stack trace lines for better human readability
- translating Query Execution Rowsource (qer) function prefixes to their corresponding names using info provided in Metalink note 175982.1
Easy :)
So, how to read this?
First, by now it should be obvious that in Oracle, each rowsource operator (the different lines you see in SQL execution plans) is actually just a function inside Oracle kernel. These are the row source functions, starting with “qer”. QER stands for Query Execution Rowsource as far as I know.
Whenever an Oracle process fetches data from a cursor, it calls opifch2() which in turn calls the root rowsource function in execution plan. In my case that function was qersoFetch and my os_explain script just substituted the “qerso” part with SORT (as per the Metalink note I mentioned above). The first child function of qersoFetch was qerhjFetch, which is a hash join rowsource, and so on. Note that os_explain prefixes some lines with an asterisk (*), this indicates that the output of given function is in turn filtered by a filter operation (the same filter ops what you normally see in the bottom of DBMS_XPLAN explained plans).
So, logically you can imagine an execution plan as a tree of Oracle functions:
- For a SELECT query the OPI fetch function (opifch2) would be the root.
- Various join and union functions like qerhjFetch (HASH JOIN) and qerjotFetch (NESTED LOOPS JOIN) would be branches.
- The leaves would always be some sort of access path functions like qertbFetch (TABLE ACCESS FULL) or qerixFetch ( INDEX UNIQUE / FULL / RANGE SCAN ).
But physically, an execution plan is just a memory structure in subheap 6 of a child cursor (x$kglcursor.kglobhd6), which has a bunch of rowsource function opcodes in it.
During plan execution Oracle process “just” traverses through those opcodes, looks up the corresponding rowsource function starting address using a lookup table and calls it. That function does its task (probably calls other rowsource functions recursively) and returns to caller.
Note that many rowsource functions are designed to be cascading, being able to do only the work needed for returning a small subset of rows and return only few rows at a time, as opposed to the whole resultset.
This is a very good thing as rows can be cascaded, or pipelined back to parent functions as rows become available. For example a table fetch only fetches a handful of rows (and not the whole table) at a time and returns these “up” for further processing. Also, a nested loop join is able to pass matching rows “up” from the moment first matches are found, again there’s no need to perform the join on full dataset first before returning first rows.
This also means that there is no need to store the whole intermediate resultset somewhere in memory before passing it up to parent function; instead we just revisit that branch of execution plan tree whenever we need more rows from it. And the os_explain script shows you exactly in which execution branch the execution currently is.
Addition: I will elaborate on how to match the execution plan with stack trace in an upcoming post – it’s too much material for an introductory post.
So, cascading rowsources allow us to “incrementally” execute plans involving large datasets, without need to keep the intermediate resultsets in memory. On the other hand, few rowsource operators in your execution plan, like SORT, can not return any rows up before all its children’s rows are processed. With SORT (and aggregate operations which also use SORT) you just have to process all the source rows before returning any meaningful result back. You can’t just go through only half of the source data, order it and start returning rows in hope that the rest of the rows should have not been returned as first in the order. This is where the SQL cursor workareas come into play for such operations.
SORT, HASH and BITMAP rowsources can allocate SQL workareas for them, while others can’t. This can easily be identified from execution plan statistics of following sample query:
SQL> select /*+ gather_plan_statistics */ owner, count(*) from dba_source group by owner; OWNER COUNT(*) ------------------------------ ---------- WKSYS 8988 HR 34 [...some output snipped...] SYS 129299 WMSYS 704 25 rows selected. SQL> select * from table(dbms_xplan.display_cursor(null,null,'MEMSTATS LAST')); PLAN_TABLE_OUTPUT ---------------------------------------------------------------------------------------------------------------------------------- SQL_ID dcp37kxt02m9f, child number 0 ------------------------------------- select /*+ gather_plan_statistics */ owner, count(*) from dba_source group by owner Plan hash value: 114136443 ---------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | OMem | 1Mem | Used-Mem | ---------------------------------------------------------------------------------------------------------------------------- | 1 | HASH GROUP BY | | 1 | 593K| 25 |00:00:18.24 | 821K| 821K| 5213K (0)| | 2 | VIEW | DBA_SOURCE | 1 | 593K| 595K|00:00:16.84 | | | | | 3 | UNION-ALL | | 1 | | 595K|00:00:15.06 | | | | |* 4 | FILTER | | 1 | | 595K|00:00:10.88 | | | | |* 5 | HASH JOIN | | 1 | 595K| 595K|00:00:08.50 | 884K| 884K| 1316K (0)| |* 6 | HASH JOIN | | 1 | 6527 | 6292 |00:00:00.12 | 870K| 870K| 1179K (0)| | 7 | TABLE ACCESS FULL | USER$ | 1 | 98 | 98 |00:00:00.01 | | | | |* 8 | HASH JOIN | | 1 | 6527 | 6292 |00:00:00.09 | 909K| 909K| 1181K (0)| | 9 | INDEX FULL SCAN | I_USER2 | 1 | 98 | 98 |00:00:00.01 | | | | |* 10 | INDEX FAST FULL SCAN | I_OBJ2 | 1 | 6527 | 6292 |00:00:00.04 | | | | | 11 | INDEX FAST FULL SCAN | I_SOURCE1 | 1 | 595K| 595K|00:00:01.56 | | | | | 12 | NESTED LOOPS | | 0 | 1 | 0 |00:00:00.01 | | | | |* 13 | INDEX FULL SCAN | I_USER2 | 0 | 1 | 0 |00:00:00.01 | | | | |* 14 | INDEX RANGE SCAN | I_OBJ4 | 0 | 1 | 0 |00:00:00.01 | | | | | 15 | NESTED LOOPS | | 1 | 1 | 0 |00:00:00.01 | | | | | 16 | NESTED LOOPS | | 1 | 1 | 0 |00:00:00.01 | | | | | 17 | NESTED LOOPS | | 1 | 1 | 0 |00:00:00.01 | | | | |* 18 | INDEX FAST FULL SCAN | I_OBJ2 | 1 | 6 | 0 |00:00:00.01 | | | | |* 19 | FIXED TABLE FIXED INDEX| X$JOXFS (ind:1) | 0 | 1 | 0 |00:00:00.01 | | | | |* 20 | INDEX RANGE SCAN | I_USER2 | 0 | 1 | 0 |00:00:00.01 | | | | | 21 | TABLE ACCESS CLUSTER | USER$ | 0 | 1 | 0 |00:00:00.01 | | | | |* 22 | INDEX UNIQUE SCAN | I_USER# | 0 | 1 | 0 |00:00:00.01 | | | | ----------------------------------------------------------------------------------------------------------------------------
You see how the HASH operations do have their last Mem columns populated, therefore those rowsource functions did allocate a SQL workarea for them. Others like NESTED LOOPS joins and data access rowsources did not have any SQL workarea memory allocated as they are completely cascading.
Note that os_explain can also read the stack from STDIN as seen in example below. Also the -a option will tell os_explain to show all functions in the stack and not only the execution plan ones and their children.
Command:
SQL> alter session set statistics_level=typical;
Session altered.
SQL> select avg(length(text)) from dba_source where owner = 'SYS';
AVG(LENGTH(TEXT))
-----------------
125.032127
Plan:
--------------------------------------------------------------------------------------------------- | Id | Operation | Name | E-Rows | OMem | 1Mem | Used-Mem | --------------------------------------------------------------------------------------------------- | 1 | SORT AGGREGATE | | 1 | | | | | 2 | VIEW | DBA_SOURCE | 17972 | | | | | 3 | UNION-ALL | | | | | | |* 4 | FILTER | | | | | | | 5 | NESTED LOOPS | | | | | | | 6 | NESTED LOOPS | | 18056 | | | | |* 7 | HASH JOIN | | 198 | 909K| 909K| 1193K (0)| | 8 | INDEX FULL SCAN | I_USER2 | 98 | | | | | 9 | NESTED LOOPS | | 198 | | | | | 10 | TABLE ACCESS BY INDEX ROWID| USER$ | 1 | | | | |* 11 | INDEX UNIQUE SCAN | I_USER1 | 1 | | | | |* 12 | INDEX RANGE SCAN | I_OBJ5 | 198 | | | | |* 13 | INDEX RANGE SCAN | I_SOURCE1 | 93 | | | | | 14 | TABLE ACCESS BY INDEX ROWID | SOURCE$ | 91 | | | | | 15 | NESTED LOOPS | | 1 | | | | |* 16 | INDEX FULL SCAN | I_USER2 | 1 | | | | |* 17 | INDEX RANGE SCAN | I_OBJ4 | 1 | | | | | 18 | NESTED LOOPS | | 1 | | | | | 19 | NESTED LOOPS | | 1 | | | | | 20 | NESTED LOOPS | | 1 | | | | | 21 | TABLE ACCESS BY INDEX ROWID | USER$ | 1 | | | | |* 22 | INDEX UNIQUE SCAN | I_USER1 | 1 | | | | |* 23 | INDEX RANGE SCAN | I_OBJ5 | 1 | | | | |* 24 | FIXED TABLE FIXED INDEX | X$JOXFS (ind:1) | 1 | | | | |* 25 | INDEX RANGE SCAN | I_USER2 | 1 | | | | ---------------------------------------------------------------------------------------------------
Stack:
$ pstack 23740 | ./os_explain -a
main
ssthrdmain
opimai_real
sou2o
opidrv
opiodr
opiino
opitsk
ttcpip
opiodr
kpoal8
SELECT FETCH:
GROUP BY SORT: Fetch
VIEW: Fetch
UNION-ALL: Fetch
* FILTER DEFINITION: FetchOutside
UNION-ALL: RowProcedure
VIEW: RowProcedure
qesaAggNonDistSS.
evaopn2
evalen
lxsCntChar
The FILTER rowsources (not talking about filter predicates on normal rowsources here) can make things more complicated though as they can introduce their own logic and loops into the execution plan (for running correlated subqueries etc).
By the way, see what happens when I run exactly the same query with rowsource level statistics collection enabled:
SQL> alter session set statistics_level=all;
Session altered.
SQL> select avg(length(text)) from dba_source where owner = 'SYS';
AVG(LENGTH(TEXT))
-----------------
125.032127
$ pstack 23740 | ./os_explain -a
main
ssthrdmain
opimai_real
sou2o
opidrv
opiodr
opiino
opitsk
ttcpip
opiodr
kpoal8
SELECT FETCH:
QUERY EXECUTION STATISTICS: Fetch
GROUP BY SORT: Fetch
QUERY EXECUTION STATISTICS: Fetch
VIEW: Fetch
QUERY EXECUTION STATISTICS: Fetch
UNION-ALL: Fetch
QUERY EXECUTION STATISTICS: Fetch
* QUERY EXECUTION STATISTICS: Fetch
FILTER DEFINITION: FetchOutside
QUERY EXECUTION STATISTICS: Fetch
NESTED LOOP JOIN: Fetch
QUERY EXECUTION STATISTICS: Fetch
QUERY EXECUTION STATISTICS: SnapStats
sltrgftime64
gettimeofday
__kernel_vsyscall
Every rowsource is wrapped into a QUERY EXECUTION STATISTICS wrapper, which’ task is just to count the number of rows sent “up” to parents in the tree, logical IOs and also rowsource timing info whenever an internal counter (set by _rowsource_statistics_sampfreq parameter) wraps.
This is just an intro to Oracle execution plan internals and troubleshooting. Hopefully you don’t need this technique too often, however it has helped me to successfully pinpoint the root cause of a problem in few non-trivial database problems.
Note that in Oracle 11g there’s an excellent new feature called Real-time SQL Monitoring. It allows you to monitor the progress of currently running SQL statements. For serially running statements the monitoring kicks in for a statement after it’s used total 5 seconds of CPU or IO time (this time is controlled by _sqlmon_threshold parameter, but for PX the monitoring is always enabled). After that you can query V$SQL_MONITOR and V$SQL_PLAN_MONITOR for seeing how much time/rows/LIOs that session has spent executing a statement. You can see these details even at the SQL execution plan line level. Alternatively you can use DBMS_SQLTUNE. REPORT_SQL_MONITOR function to get this info nicely formatted. Greg Rahn has written a good blog entry about it.
However it’s important to note that both DBMS_SQLTUNE and V$SQL_MONITOR/V$SQL_PLAN_MONITOR use requires you to have Oracle Tuning Pack license which in turn requires Oracle Diagnostic Pack license. Details are in Oracle 11g Licensing Guide.
So another, low level approach for real-time monitoring will still be handy even after 11g becomes mainstream. In a future post I will be showing how to measure the progress and get execution profile of your plan by aggregating multiple stack traces and also some cool opportunities with DTrace.




Thanks for this posting.
Very interesting idea – getting the execution plan from an OS trace.
I would love to see the source…
Re DTrace – is this available for Linux? or something equivalent?
Have you thought about extending this (getting the exec plan from OS process) with timing for each part from the OS?
I would expect using DTrace with this tool would allow that?
Hi Rich,
Well, the source code is available here: http://blog.tanelpoder.com/files/scripts/tools/unix/os_explain
os_explain is just a shell script using sed & awk. It is doing just some string manipulation, search & replace etc.
DTrace allows accounting timing information for each rowsource function. On other platforms this could be doable with platform profilers (like OProfile in Linux).
However the script fills its original purpose as it is right now – I wanted to see where in execution plan the execution was mostly looping and taking a few os_explain samples did show it to me. I yet have to write a post about how to map os_explain and v$sql_plan execution plan lines together…
Hi Tanel, great blog. Thanks for the information.
I’ve been using dbms_xplan.display_cursor quite a bit lately. I love the information it provides but I’m having trouble finding any detailed documentation on it. I’m specifically looking for more information on what the Omem and 1mem columns represent. Can you shed any light on the subject or point me in the right direction?
Thanks!
BTW – Any update on training dates in the US for 2008?
Hi Tanel,
I was just wondering if there’s a way to find out which part of the explain plan is currently running or executing.
In your example, Plan hash value: 114136443, it would be nice if the
I know that the part of
* 8 | HASH JOIN | | 1 | 6527 | 6292 |00:00:00.09 | 909K| 909K| 1181K (0)|
is currently running and..
has a predicate of some value..
Are there any v$ or x$ views available?
Or any tracing facilities I should use?
Thanks man :) see you in Singapore 04/2009..
Hi Karl,
Well os_explain gives some indication. If you happen to have any physical IOs, you can map these (using v$session_wait.p1/p2) back to the segment name accessed. And if you know the segment name, you’ll know the execution plan line(s) currently executed as access path rowsources show the table/index name in exec plan as well.
More options would be to trace consistent reads (with event 10200) and map the data block addresses back to segment names, but this it not a widely used technique (not for production).
I would start from pstack as its possible to translate this back to execution plan.
If you still can reproduce this, send me couple of process stacks when this query is executing and the execution plan, I can reply you with answer and explanation how I worked it out..
Hi Tanel,
I thought I’d report a small glitch I encountered when working with os_explain. It appears that on RHEL 4 the version of ksh that comes installed by default is restrained by the old 1024 array size, which means that having
MAX_ARRAY=4095
inside the f_lifo procedure causes the script to fail with a “subscript out of range” error.
There are a couple of workarounds…
1. (easiest) Change the shell invocation at the top of the script to bash– your script runs fine in bash, and bash does not have such a tiny array ubound:
#!/bin/bash
2. Upgrade to a modern version of ksh
Also, I noticed that the version of ksh installed with RHEL 5 is newer and does not suffer from this limitation.
This is not a problem with your script per se, but I thought I’d offer this in case anyone else tries using your script on Redhat Linux and is baffled like I was.
BTW, thanks for bringing your training seminar to the NYOUG earlier this month! It really took my understanding to a new level.
Hi Fred,
Thanks for this info. I tested this on RHEL5 and Solaris only, thus didn’t hit this error.
I don’t have RHEL4 handy right now, what if you set the MAX_ARRAY to 1024 (or 1023), does it work in the RHEL4′s ksh?
Tanel,
Yes, setting MAX_ARRAY to 1023 (giving an array size of 1024) is another workaround.
The RHEL4 ksh is actually “pdksh”. The man page for pdksh says, “Array indicies are currently limited to the range 0 through 1023, inclusive.”
The RHEL5 ksh is called “ksh-20060214″, which has array bounds of “0 through 1,048,575″. RHEL4 users can get it from the CentOS repository:
http://isoredirect.centos.org/centos/5/
It works fine in RHEL4, but installing it involves upgrading various packages like gcc, glibc, libgcc, etc to their RHEL5 versions. I’d hesitate to do that in a production environment, needless to say. Easier just to use bash.
Thanks Fred for the info!
I knew Linux distros had PDKSH instead of KSH, but didn’t know that RHEL5 one has the “real” KSH. It may be related with Sun open sourcing Solaris and related software?
Tanel, awesome stuff, thanks for posting!
I am curious – I see from your examples and from my own tests that pstack is only capturing current stack operations. You mention that you have a loop to cycle through while the query is running. This begs a couple questions:
1) A loop is going to, effectively, sample the process stack. Correct?
2) What would it take to stitch together stack traces for os_explain? Not sure if just deleting the “header” lines would do it.
Thanks again for your intelligent and comprehensive contributions.
Great article.
In this particular script is there an advantage in using DTRACE over using truss
to trace @ OS Level ?
Thanks Ravi,
In this script (extracting stack traces) truss can’t be used at all, as far as I know truss can’t extract snapshots of stack traces like pstack does.
But if comparing truss’es a.out:func* notation with DTrace, I would always use DTrace if its available and usable. DTrace is safe by design (although you *can* cause trouble with it even in non-destructive mode) but truss had problems in past IIRC, such target processes missing signals sent to them and failing to exit.
Hi Tanel,
Thanks for taking time to reply.
Here is a what I tried with truss instead of Dtrace. Just save the pain of reversing the pstack output. Let me know your thoughts
Also DTrace could have system wide impact where as truss would just impact the process , correct me here if I am wrong.
Run truss on
SQL> alter system flush shared_pool;
System altered.
SQL> select a from t2;
A
———-
1
2
3
4
5
6
7
8
9
10
10 rows selected.
SQL> exit
This is the output I get from truss
cat 1truss.out | grep ‘oracle|qer’ |cut -d2-20 (output edited)
^I -> oracle:qerstStart(0x407b48a20, 0×1, 0×0, 0×0)$
^I -> oracle:qerstInitialize(0×0, 0×0, 0×0, 0×1)$
^I oracle:qertbStart(0x40e484da8, 0×1, 0×16, 0×30)$
^I -> oracle:qerstStart(0x40e484f20, 0×1, 0×12, 0×6)$
^I -> oracle:qerstInitialize(0×0, 0×0, 0×0, 0×1)$
^I oracle:qerixStart(0x413faafd0, 0×1, 0×16, 0×30)$
^I -> oracle:qerixGetKey(0x413faafd0, 0x40e484f90, 0×6, 0xffffffffffffffff)$
^I oracle:qerixGetKey(0x413faafd0, 0x413faab30, 0×6, 0×1)$
^I <- oracle:qerixGetKey() = 1$
^I <- oracle:qerixStart() = 0x413faafd0$
^I <- oracle:qertbStart() = 0x40e484f20$
^I oracle:qertbRelease(0x40e484da8, 0×0, 0×0, 0×0)$
^I -> oracle:qerixRelease(0x413faafd0, 0×0, 0×0, 0×0)$
^I <- oracle:qerixRelease() = 0$
^I oracle:qerstFetch(0x407b48a20, 0x1025145c0, 0xffffffff7ffe7520, 0×1)$
^I -> oracle:qertbFetchByRowID(0x40e484da8, 0x1009af1e0, 0xffffffff7ffe6e20, 0×1)$
^I -> oracle:qerstFetch(0x40e484f20, 0×0, 0xffffffff7ffe6d54, 0×1)$
^I -> oracle:qerixtFetch(0x413faafd0, 0×0, 0xffffffff7ffe65ec, 0×1)$
^I -> oracle:qerixGetRowid(0x4dd394de0, 0×12, 0xffffffff7c056758, 0x4dd394dde)$
^I <- oracle:qerixGetRowid() = 46$
^I <- oracle:qerixtFetch() = 0$
^I oracle:qertbGetPartitionNumber(0xffffffff7ffe6d14, 0x40e484ea0, 0xffffffff7c057190,
0xffffffff7c05bde0)$
I need to run sed to get these converted to oracle calls.
Sorry If I am taking up too much space, I can share the whole script if you think this does the job.
Thanks
Ravi
Hi Tanel
Excellent script.
But the script does not work for HP-UX 11i v2
The errors
$ pstack 10020 | ./os_explain
./os_explain[138]: ==: unknown test operator
f_lifo[10]: array: subscript out of range
Is there a plan to deply in HP-UX?
Regards
Haris
Ravi,
Truss does a different job, with pstack I took the sampling approach. I sample 10-100 times and then profile which stacks I see the most, to see where most time is spent.
With truss or DTrace approach also some post-processing would need to be done if you want to know what plan operation takes most time. I have done this in DTrace..
Haris,
Please download the script again, or just change the MAX_ARRAY=4095 to MAX_ARRAY=1023 in it, I suspect your shell can’t handle larger arrays (also on RHEL 4 and before there seems to be such problem)
Haris, if the output doesn’t make sense on HP-UX then please send a sample pstack output, maybe the columns are ordered there differently than on linux and solaris.
@Tanel Poder
But pdksh is more “compatible” than the real “ksh”. pdksh emulates the old ksh88 while ksh in RHEL5 is the newest ksh93. See:
https://bugzilla.redhat.com/show_bug.cgi?id=431448
Hi Tanel
None of the scripts links on the blog work
@KG
Yeah I’ll fix this soon….
@Tanel Poder
Most of my scripts are present in the tpt_public.zip …
Hi tanel !
This was nice & neat.But the scripts pages “http://www.tanelpoder.com/files/scripts/os_explain” and the other pages are giving 404 errror.
So could you please fix it !
Regards
Sriram
Sorry I have n`t read this.!
“Most of my scripts are present in the tpt_public.zip …”
Thanks for the update.
Sriram
Following scripts/documents are missing
1) http://www.tanelpoder.com/files/samples/os_explain_plan.txt
2) http://www.tanelpoder.com/files/scripts/os_explain
@Deba
Get them from the zip file for now:
http://blog.tanelpoder.com/2011/04/15/new-scripts-tools-and-broken-links/