Advanced Oracle Troubleshooting Guide, Part 2: No magic is needed, systematic approach will do

There are two ways for diagnosing problems:

  1. Checking for usual suspects and hoping to find a matching one
  2. Following a systematic approach

Checking for usual suspects and hoping to find a matching one

The first approach relies on previous experience (both in particular subject area/technology and about the context/environment the problem occurs). For example if a patient comes to doctor complaining about pain in chest, then for doctor (and also for the patient) it would definitely be beneficial to know more relevant info about the patient – the context. If the patient had just fell off a 10-foot ladder, then it’d be more suitable to look for broken ribs. On the other hand, if the patient has been a long-time smoker and was watching TV on a couch when the pain started, then perhaps it’d be more suitable to start with an EKG (note that I’m not an expert on how human body works so should anybody complain about any pain in their chest to you, send them to real doctor immediately!)

Anyway, if you’ve been administering a database full time for last 5 years, you will probably know where to look immediately when a specific problem occurs. Note that I wrote “where to look immediately” here, not “what to change immediately”. Using previous experience to identify root causes of problems is obviously a perfectly valid approach which may get you to the solution very fast (again and again) – but it can work well only if you do have lots of previous experience in solving problems for that technology and that particular environment (in other words, you know the context). The big risk here is that if a “new” problem expresses itself in similar symptoms like the “old-and-well-known-problem”, we could easily end up looking for and fixing the wrong issue. And if that doesn’t work, continuing to try out a solution which helped with another problem last year. And if that doesn’t help then the cycle continues, we dig up even more unlikely fixes which have been useful once in past and we apply them. And then we┬áresort to googling and trying out whatever solutions anyone has suggested for problems others have experienced.

We have ended up in Desperate Switch Flipping state. Usually this leads to flipping even larger switches, starting from Oracle session/instance parameters and adding random SQL hints up to restarting servers, upgrading databases, operating systems, hardware – usually without any luck. This wastes time, doesn’t solve our problem and may cause even more trouble. This is not good. Checking out the usual suspects may help solving common recurring problems (hey, shouldn’t good specialists avoid recurring problems rather than fixing them again and again?) . However it is very important to draw the line between checking for usual suspects and falling into DSF state, as looks like happened to someone at Oracle-L.

Quoting:

“I ran Statspack reports at the highest level of detail until I was blue in the face. I ran traces. I set events. But I also am by nature intuitive and tend often to use intuition to solve a problem with facts to back up my intuitive conclusion. So after providing all of this stuff to Oracle Support, they were at a loss, well, they were very eager to look at corruption as a cause, because they didn’t have another solution.”

(Btw, I think the DSF state would be a suitable addition to pathological DBA problems list which Gaja once started with CTD)

The other way to diagnose problems is to follow a systematic approach, an appropriate methodology based on knowledge how computers work.

Following a systematic approach

I personally allow myself to check for the usual suspects based on my (sic!) experience in diagnosing problems on that system (or similar systems) and often do a quick Metalink search, but if this doesn’t reveal the root cause immediately, I will get systematic by starting measuring the dynamics of the task I’m interested in. I will not start randomly shooting in the dark or doing DSF based on random internet-posts showing up in search engines.

It’s easy to be systematic in Oracle world. This is mainly for following reasons:

  1. Computers and computer systems are deterministic – they always generate exactly the same set of output conditions for same set of input conditions. One could argue that human interaction, distributed systems and possible lower-level bugs/corruptions make large computer systems non-deterministic. However when troubleshooting a specific problematic task it is enough if we identify only the immediate points of system interaction which our task is experiencing. This is where instrumentation comes into play.
  2. Oracle is well-instrumented – it is easy to narrow down problems with your task as with various tracing, oracle wait interface and statistic counters you have good understanding what Oracle is doing when a problem occurs.
  3. Computer software is not a black box – and Oracle’s no exception. Any piece of software is just a bunch of computer instructions physically clustered into functions which the CPU then executes in a deterministic way (I avoided saying very deterministic here because you can’t have a more or less deterministic system, it’s either deterministic or not. And computer CPUs are definitely deterministic in their machine code execution). So it is fairly easy to follow where in the kernel code path Oracle is, even if the session statistics and wait interface are not enough (as I started blogging in my previous post in this series)

I normally start measuring the problematic task I’m diagnosing from high-level. Based on the results I will decide where to drill down and measure the next set of data, more focused on a specific area. I will look at the data then, measure again where appropriate and usually I get closer to the root cause step-by-step.

So, how does all this look in practice? I think the best way is to provide some examples from real life.There are various types of problems you could experience with Oracle, but one of the most complex (and interesting) type of problem is a performance problem. And this is where I start. I will start from a simple performance problem though – a single identifyable Oracle session experiencing performance issues.

Case study 1: Session hung – no wait no CPU

I often have to deal with issues where a session has apparently got hung and hasn’t responded for a while. Usually these cases are something simple as waiting for some lock or a query plan gone wild, doing nested loop (cartesian) joins over millions of rows. Such cases are easy to solve, however the following case was a pretty interesting one ( Oracle 10.1 on Solaris 8 Sparc 64bit ):

1) Check V$SESSION_WAIT for the session

I queried v$session_wait for that session using my little script sw.sql ( sw = Session Wait ). As my script formats some data for readability I added the output of a plain “raw” query against v$session_wait as well:

SQL> select state, seq#, event, seconds_in_wait from v$session_wait where sid = 197;

STATE SEQ# EVENT SECONDS_IN_WAIT
——————- ———- —————————– —————
WAITED KNOWN TIME 63 SQL*Net message from client 1505

SQL> @sw 197

SID STATE EVENT SEQ# SECONDS_IN_WAIT P1 P2 P3
———- ——- ——————– ——- ————— ———— ———- ———-
197 WORKING On CPU / runqueue 63 1514 1413697536 1 0

From output of both queries we see that Oracle thinks the session is NOT waiting on anything (as status is WAITED not WAITING), thus being on CPU, apparently for last 1500 seconds. In such cases I always check what the OS has to say, as OS tools will have more accurate picture on process’s state (and the reason is that OS tools will query the OS process tables directly, while Oracle’s V$SESSION_WAIT is populated by Oracle itself).

2) Confirm resource utilization from OS level

SQL> select spid from v$process where addr = (select paddr from v$session where sid = 197);

SPID
————
9597

$ prstat -p 9597
PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP
9597 oracle 1145M 1070M sleep 59 0 0:00.00 0.0% oracle/1

$ ps -flp 9597
F S UID PID PPID C PRI NI ADDR SZ WCHAN STIME TTY TIME CMD
c S oracle 9597 1 0 40 20 ? 146564 ? 11:09:44 ? 0:00 oracleXXXXXX01 (LOCAL=NO)

Both prstat and ps output show that this process is sleeping and hasn’t used any noticeable CPU time at all, even though it had been active over 1500 seconds.

3) Check where in the Oracle kernel code path the process is sleeping

The two-step diagnosis above has already proven that this session has been waiting for something vast majority of it’s lifetime. And as Oracle’s wait interface has failed to record this wait then there’s no point in looking further into v$session_event and such. We go directly to the source and get what’s going on.

By going to the source I don’t mean looking into source code, as I don’t have it (and never had it). I rather mean that instead of making hypothesis and wild guesses where the process could be stuck, we will just go and see where it is stuck.

Any debugger can provide us this information, also in Solaris and Linux we have a handy pstack utility to use:

$ pstack 9597
9597: oracleXXXXXX01 (LOCAL=NO)
ffffffff7c9a7854 sigsuspend (ffffffff7fff6d90)
ffffffff7c956d2c _libc_sleep (0, 5, 0, 0, 0, ffffffff7fff6f21) + f8
0000000100cc0188 qm_init_uga_helper (380028000, 380028, 3ba048b68, 380000, 380028000, 380028000) + 6c8
00000001002d4808 rpiswu2 (0, 104556000, ffffffff7fff70d8, 2, 104556418, ffffffff7fff7a80) + 1a8
0000000100cc5af0 qm_run_as_ksys (ffffffff7fff761c, 100cbfac0, ffffffff7fff7541, 1042dbd80, 1042db000, 1042db) + b0
0000000100cc0314 qm_init_uga (3ee8, 1042969f0, 104296000, 104296, 100cbf000, 100cbf) + 14
0000000100d2a5f4 qmeGetResRelByHdl (105000, 0, 105140578, 21, 0, 0) + 54
0000000100cac7e0 qmtkLookupSchemaOIDPath (e, 1428, ffffffff7fff7fb0, 105069b28, 10506ae00, 104400) + 180
0000000100cac5cc qmtkLookupSchemaOID (105068700, 6, d, ffffffff7fff87d8, ffffffff7fff7ce4, 0) + ac
00000001002ba9b0 qmtLookupSchemaElement (104556, ffffffff7b3410d8, d, ffffffff7b340090, 7, ffffffff7fff87d8) + 30
000000010309a834 qmu_preparse_xml2 (42400, 3c0bfcf49, d, 0, e, ffffffff7fff7fb0) + 454
00000001030bccac qmxtgGetUrlAndElNameCS (c, ffffffff7b9701f8, ffffffff7fff87e8, 10501dba8, 10501dba8, ffffffff7fff8808) + 14c
00000001030bd0e0 qmxtgCreateFromLob (105068700, ffffffff7bb61e88, ffffffff7fff87e8, 0, 1a, ffffffff7b9701f8) + 180
0000000102649ca4 qmxtkCreateFromLob (ffffffff7fff9fe8, ffffffff7bb61e88, 0, ffffffff7fff8938, ffffffffffffffff, 0) + 164
0000000102649d30 qmxtkCreateFromClob2 (0, 0, 0, ffffffff7bb69960, ffffffffffffffff, 0) + 50
000000010264ae2c qmxtkConsFromClob (ffffffff7bb69980, 0, ffffffff7bb69960, 0, 0, 0) + 4c
0000000103a21b3c spefcpfa (0, 10264ade0, ffffffff7bb69828, ffffffff7fff9088, ffffffff7bb69960, 103a216a4) + 4dc
00000001039d8a70 spefmccallstd (ffffffff7fffa058, ffffffff7fff9b90, ffffffff7fff9c60, ffffffff7fff9bd0, 2c, ffffffff7fff9bd0) + 1f0
00000001038f42c8 peftrusted (ffffffff7fff9bd0, 104556000, ffffffff7fffac78, ffffffff7fffa058, 105068700, ffffffff7fff9c60) + 88
0000000103a9c678 psdexsp (380007, 104556, ffffffff7fff9278, 10506b000, 10506b, 105068860) + b8
00000001002d4808 rpiswu2 (0, 104556000, ffffffff7fff9438, 2, 104556418, ffffffff7fff98c8) + 1a8
0000000103a9c3c0 psdextp (10506a000, 103a9c000, 0, 0, a, 9) + 180
00000001038f3eac pefccal (ffffffff7fffac78, ffffffff7fffa058, ffffffff7fff9c60, ffffffff7ba5e050, 105068700, 100b0aa80) + 12c
00000001038f3c14 pefcal (ffffffff7fffa058, 1, 105068860, 4004, 4326, 4000) + 94
000000010377407c pevm_FCAL (ffffffff7ba5e050, 3ba9eed48, ffffffff7ba5e0b8, 3ba9f0e18, 1, ffffffff7fff9fe8) + 7c
00000001037400cc pfrinstr_FCAL (ffffffff7ba5e050, 3ba9f0bde, ffffffff7ba5e0b8, 10457d, 104400, 3ba9f0be4) + 4c
00000001037362c8 pfrrun_no_tool (ffffffff7ba5e050, 3ba9f0bdc, ffffffff7ba5e0b8, 10457c9d8, 2001, 2001) + 48
00000001037372d0 pfrrun (ffffffff7ba5e0b8, 200000, 0, 200000, ffffffff7ba5e050, 3ba97022c) + 2f0
0000000103783374 plsql_run (ffffffff7ba6e4a0, 1, 0, ffffdfff, ffffffff7fffac78, 0) + 274
0000000103722554 peicnt (ffffffff7fffac78, 105068860, 6, ffffffff7fffaae8, 41d8, 1050685e8) + d4
000000010327b784 kkxexe (105000, 104000, 105068, 104296000, 1050685e8, ffffffff7ba5e050) + 284
0000000101ad0228 opiexe (4, ffffffff7ba3a6a8, ffffffff7fffb6c0, 0, 0, ffffffff7bb70420) + 33c8
0000000101a4c0a8 kpoal8 (40008, 1, ffffffff7fffe450, 0, 0, 3) + 648
00000001002d0058 opiodr (14, 10506ae10, 10434ce70, 10506a, 105000, 104000) + 598
0000000102cded94 ttcpip (105071450, 18, ffffffff7fffe450, ffffffff7fffd748, 104229c98, ffffffff7fffd744) + 694
00000001002cd3e8 opitsk (1002cf000, 1, 0, ffffffff7fffe5a8, 105071450, 105071458) + 428
0000000101aaf564 opiino (105070000, 105000, 3c0a60c98, 105000, e4, 105070290) + 404
00000001002d0058 opiodr (4, 10506ae10, 10434c920, 10000, 105071, 105000) + 598
00000001002cc174 opidrv (0, 4, 10506a, 105071450, 0, 3c) + 354
00000001002c9828 sou2o (ffffffff7ffff278, 3c, 4, ffffffff7ffff258, 104aa6000, 104aa6) + 48
00000001002a7b34 main (2, ffffffff7ffff358, ffffffff7ffff370, 0, 0, 100000000) + 94
00000001002a7a7c _start (0, 0, 0, 0, 0, 0) + 17c

This is the full stack trace of the current state of server process 9597.

Even though at beginning it may look unreadable, it’s actually easy! Just start reading from bottom up:

  • From the lowest line we see that _start() function has been invoked (normally by fork() + exec() syscalls in Unix) and the execution in that function has got to 0x17c bytes from the function start address, where a call to next function up in the list has been made. Note that the function start address is specified in the 1st column of output, immediately before the function name.
  • In our case main() was the next function called (sounds familiar? ;) and the program execution has got to 0×94 bytes from its start when it called sou2o().
  • sou2o called opidrv() after reaching 0x48th byte from its start
  • opidrv (Oracle Program Interface DRiVer) called opiodr() after reaching 0x354th byte from its start. Note that the numbers in brackets are the arguments passed to the called function. These may not always be complete and entirely accurate though, as they are read from thread stack storage but some parameters may be passed in CPU registers, thus never be written to stack at all.
  • opiodr called opiino() after reaching 0x598th byte from its start. Note that this doesn’t mean that only 0×598 bytes worth of instructions have been executed, a lot of other function calls may have been made, which returned after doing their work, also there may have been lots of jumping back and forth inside the function itself (looping for example)
  • opiino called opitsk(). This is the place where the main low-level task dispatching is done. When there is no work to do, then opitsk normally waits in opikndf2() which in turn waits for n* functions which hold Oracle’s network-handling code. Once the network handling functions return, they usually have some data sent from client, for which opitsk then calls an appropriate handling function. This is usually ttcpip as also seen in stack trace above.
  • Despite its name, ttcpip() doesn’t have directly to do with TCP/IP, it actually means Two-Task Common PIPe read/write (TTC pipe). As the name suggests, this is the separating layer and gatekeeper between “inner-side” of the server code and the “outer-side” exposed to the world. I think of it like a system call interface into Oracle kernel. Many sanity checks and also conversions are done in TTC layer.
  • Skipping few rows, we see opiexe() which is OPI function for execute requests ( they translate to EXEC lines in sql trace, also there are corresponding opipar and opifch functions for PARSE and FETCH )
  • Continuing on, we see peicnt, plsql_run and pfrrun which are PL/SQL related functions, so obviously this process must be running some PL/SQL
  • Continuing on, we see rpiswu2 function – RPI stands for Recursive Program Interface and is used for setting up recursive calls like recursive SQL, autonomous transactions, executing SQL from PL/SQL and vice versa etc etc etc. Basically RPI calls set up a separate context in form of variables, state objects etc for calls which need to be executed before the current call can complete.
  • Continuing, we see a peftrusted() call. This one is an interesting one, it allows to call external libraries in trusted-mode, meaning that they already are in (or are loaded in to) Oracle server processes address space (as opposed to calling the library function through extproc). Oracle’s XMLType type and oracle_loader & oracle_datapump ODCI cartridges use this feature.
  • Continuing, we see a bunch of qm* calls, these handle XML datatypes and XML schemas.
  • Continuing towards the top of the stack we see some UGA initialization related functions and a function which name indicates that some code is going to be executed as ksys (I don’t know what ksys means, but as that qm_run_as_ksys function calls rpiswu2 again, this is a good example how Oracle sets up a separate context to execute code in elevated privilege mode).
  • Now, 0x6c8 bytes into qm_init_uga_helper function we see something different happening – this function calls a next function _libc_sleep() which in turn has issued sigsuspend() system call, but from the first column we see that these functions reside in a way different location in that processes address space (at 0xffffffff7c956d2c as opposed to 0x000000010xxxxxxx…). So what’s going on?
    Such difference in function start addresses can be explained by examining the processes address space:

$ pmap 9597 | grep -i ^000000010
0000000100000000 81016K read/exec /apps/oracle/product/10.1.0.3/bin/oracle
000000010501C000 864K read/write/exec /apps/oracle/product/10.1.0.3/bin/oracle
00000001050F4000 432K read/write/exec [ heap ]

$ pmap 9597 | grep -i ^ffffffff7c9
FFFFFFFF7C900000 728K read/exec /usr/lib/sparcv9/libc.so.1

From above output we see that Oracle binary itself has been mapped to address 0×100000000 in the process address space

  • libc.so.1 has been mapped to ffffffff7c900000 and it’s total size is 728kB, which means that the _libc_sleep function is somewhere in the middle of library there (at address ffffffff7c956d2c – 56d2c is roughly 355k). The function starting address relative to starting address of the library image could be verified using nm utility:

$ nm /usr/lib/sparcv9/libc.so.1 | egrep "Size|_libc_sleep"
[Index] Value Size Type Bind Other Shndx Name
[228] | 355380| 396|FUNC |LOCL |0 |9 |_libc_sleep

So, after doing this stack reading we have conclusive evidence where our Oracle process is its execution. It’s stuck – in a sleep system call made by some XML code, and it is not instrumented by Oracle wait interface. This is definitely abnormal and is not just some perfrormance or user code issue. Reading a stack trace became very handy.

Armed with knowledge that this wasn’t just an user process waiting uninstrumented on IO or spinning on the CPU and after seen the rpiswu2 / init_uga / _libc_sleep calls in stack, I started wondering whether this could be something related to memory allocation or system library calls (yes, this is where the previous experience kicked in again).

I looked into Unix environment variables used for starting the Oracle instance and there it was – $ORACLE_HOME/lib32 directory coming before $OH/lib in LD_LIBRARY_PATH. This caused some XMLDB functions to fail (maybe trying to load 32-bit library into 64-bit address space and failing had messed something up, perhaps in that processes PLT section where addresses of shared object-library functions are loaded).

While checking the stack trace in this example did not point the environment variable problem out to me directly, it helped to immediately eliminate many other causes for such hangs like uninstrumented IO or missing an IPC post from another process thus not being woken up etc. Also the Oracle kernel functions reported in top of the stack gave some indication on the nature of work done ( RPI setting up another call environment, UGA initialization – which can mean memory allocation etc).

So, stack tracing can be very helpful where Oracle instrumentation doesn’t deliver.The key thing about stack traces is that from there you can see the ultimate computer reality, the most accurate place to check what an Oracle server process is doing. No statistic or wait event can’t beat the accuracy and reliability of stack traces as this info comes directly from “the source” – the thread stack memory structure organized by hardware ABI (Application Binary Interface) standard.

Note that there are few issues (as always) with getting and interpreting stack traces and there may be cases where even stack traces don’t give you accurate picture (stack corruptions for example). However, I’ve just realized, to my surprise, that it’s 1am already, so I’ll blog a about it some next time ;-)

Conclusion

So, where am I getting at?

Well I just got started, I have couple examples from my experience of diagnosing ASSM performance bugs using stack tracing. When I have time to blog these, it should be evident how crucial tool stack tracing is for advanced and accurate troubleshooting. Nevertheless, stack tracing may not be required at all if Oracle/OS instrumentation can show you the accurate picture. Or even such tools may not always be required – if you have some serious previous experience on the system you’re troubleshooting ( anyone firefought bad query plans due bind variable peeking with comment on table T is ‘blah’ every Monday morning? ;-)

So, where I’m getting at is stated below – and I intend to write more posts about the second point below:

  • Checking for the usual suspects may get you there, very fast.
  • Following a systematic approach will get you there.

References

Metalink Note 175982.1 ORA-600 Lookup Error Categories ( many Oracle kernel function prefixes are documented there )

Metalink Note 453521.1 ORA-04031 “KSFQ Buffers” ksmlgpalloc ( some common Oracle kernel functions are documented there )

Note that this year’s only Advanced Oracle Troubleshooting class takes place in the end of April/May 2014, so sign up now if you plan to attend this year!

This entry was posted in Oracle and tagged , , , . Bookmark the permalink.

19 Responses to Advanced Oracle Troubleshooting Guide, Part 2: No magic is needed, systematic approach will do

  1. tanelp says:

    Thanks Howard, I plan to blog about it more in my post about getting stack dumps. On Windows you could use the Oradebug dump errorstack, however one needs to be careful with that, as oradebug is very intrusive compared to Solaris pstack. Linux’s pstack is somewhere in between as it really uses just gdb’s backtrace command (Solaris pstack just reads the stack from /proc/SPID/mem in a very non-intrusive fashion)

    Starting from the moment anyone is able to extract Oracle symbol data from the %ORACLE_HOME%/rdbms/admin/*.SYM files on Windows, then we could use any Windows debugger or Sysinternals Process Explorer for getting stack traces the non-intrusive way.

    I haven’t been able to extract the symbols from there yet (although the Oracle’s built-in stack unwind code seems to access them), so if anyone manages to do it, I’d be very interested to hear about it!

  2. Howard Rogers says:

    Nice post… but it would be nice to know how you’d go about doing all of that on a Windows box!

  3. Riyaj Shamsudeen says:

    Excellent posting, Tanel.

    Riyaj Shamsudeen
    ERP Financials DBA
    Works-for-cingular[new AT&T]-and-uses-only-scientific-methods ;-)

  4. John Kanagaraj says:

    Excellent, incisive, deeply technical and finally a very clear explanation (as usual!)

    Btw, from V$SESSION_WAIT, you could also use this segment of code:

    decode(wait_time,0,’(Waiting)’,’(on CPU)’)

    This comes directly from looking at the view definition of V$ACTIVE_SESSION_HISTORY!

    John Kanagaraj

  5. Jeroen says:

    Interesting to repeat this excercise for background sessions waiting on “rdbms ipc message”

  6. Jeroen says:

    I think I pushed submit a bit too soon, as I meant to add that the wait_time column shows zero for a background process (DBWR) waiting on a semaphore yet the OS indicates it is actually sleeping (prstat). So it looks like this is another example of where Oracle’s description might not totally match reality.

  7. tanelp says:

    Thanks to all for the comments.
    John, yep I’m already using v$session_wait.state which in turn is also decoded from x$ksusecst’s time_waited equivalent column (and from x$kslwt in 11g).
    I do prefer to use the STATE column though, it’s easier to remember: waiting = waiting, anything else = NOT waiting :)

    Jeroen, the seconds_in_wait column is increased for long waits in v$session_wait. I don’t use the time_waited column directly at all when diagnosing performance issues, as seconds_in_wait shows the time spent in current wait and if Im interested in wait history then I either query v$session_wait_history for last 10 events or use v$session_event or Session Snapper…

  8. amit poddar says:

    Hi,

    Do you know how AIX’s procstack work.
    Is it very intrusive ?

  9. tanelp says:

    Hi Amit,

    Which version of AIX?

    In 5.2+ it should be pretty safe as it uses /proc filesystem for reading process stack (earlier versions may use /proc too)

    The procstack command briefly suspends the probed process (just like scheduler does when timeslice has expired) and reads stack through /proc filesystem.

    This is fast and non-intrusive.

    On linux however the pstack is just a wrapper around gdb command. So gdb starts up, attaches to the process, suspends it, loads a bunch of libraries and reads the process stack using slow ptrace() system calls. Also I’ve seen the pstack crashing the probed process in some circumstances (like when running a strace on the pstack process itself and attaching to target)

  10. amit poddar says:

    Linux also has a /proc file system.

    Do you know why they chose a intrusive method instead of reading it of from the /proc file system ?

  11. tanelp says:

    Yep, Linux /proc filesystem allows you to write a pstack utility reading directly from /proc//mem too, but looks like it’s not written yet :)

  12. Pingback: Advanced Oracle Troubleshooting Guide, Part 3: More adventures in process stack « Tanel Poder’s blog: Core IT for geeks and pros

  13. qihua says:

    how hpux? what kind of tool hpux provided could do the same investigation

  14. tanelp says:

    Newer versions of HP-UX should have the pstack utility.

    Otherwise you should be able to do the same with a debugger like dbx or gdb (whichever is present on your machine).

    On which HP-UX version are you?

  15. Khalid Azmi says:

    Your blog is just WOW, why didn’t i come across this earlier. I think i read somewhere in your blog that you were planning to write a book but then dropped it :(

  16. Patryk says:

    Hi Tanel,

    I never seen such a deep investigation.
    Keep going man.

  17. cam says:

    Very interesting.. but unless I am armed with your knowledge of Oracle internals I can’t possibly decode that stacktrace and thus arrive at the solution…

  18. Tanel Poder says:

    @cam

    I started from somewhere too. That’s why I put the 2 metalink note references there, they help understanding what these function name prefixes are about, they help to point you in the right direction.

  19. Roman says:

    Please update link to sw.sql it is brocken on all pages

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>