I have mentioned ORADEBUG SHORT_STACK command in my blog posts before – it’s an easy way to get and see target processes stack backtrace directly in sqlplus. No need to log on to the Unix/Windows server and use OS tools for extracting the stack.
I have also written few tools which allow you to post-process stack traces taken using OS tools (like pstack) for better readability or performance profiling. For example os_explain and DStackProf – the DTrace stack profiler and function call aggregator.
Now I will introduce OStackProf which combines the ORADEBUG SHORT_STACK with a client side post-processing script for easy stack profiling directly from SQLPLUS – no need to log on to the server host at all!
I have demonstrated this script at conferences for couple of months now with a promise to blog about it “soon”, but it’s only now that I finally have a chance to write a blog entry about it (I have couple of hours left on a flight to help a client in London). Sorry to keep you waiting ;-)
So, this is what you see when you run oradebug short_stack on a process:
SQL> oradebug setospid 32200 Oracle pid: 2, Unix process pid: 32200, image: oracle@linux03 (PMON) SQL> SQL> oradebug short_stack <-ksedsts()+275<-ksdxfstk()+22<-ksdxcb()+1599<-sspuser()+102<-__kernel_vsyscall()+2<-ntevpque()+89<-ntevqone()+34<-nsevwait()+10098<-ksnwait()+72<-ksliwat()+7249<-kslwaitctx()+135<-ksuclnwt()+249<-ksucln()+509<-ksbrdp()+1258<-opirip()+548<-opidrv()+500<-sou2o()+71<-opimai_real()+238<-ssthrdmain()+142<-main()+116<-__libc_start_main()+220<-_start()+33 SQL>
This is a stack trace, the leftmost function is where the execution currently was at the moment of taking the stack backtrace, the one to the right from it is the caller of the left function and so on. All the way to the right you see the “bottom” functions of a stack, like C main() function and also _start() which is Linux OS process loader helper function.
Note the two bold functions. Oradebug short_stack works the way that whenever it needs to let target process know about a debugger request, it sends a SIGUSR2 signal to it. sspuser() is the signal handler for SIGUSR2 and its task is to see what debug function to call. So, the sspuser() function and anything to the left from it is actually the codepath for processing the oradebug request, so we can ignore that for troubleshooting purposes (this also evidences that oradebug short_stack and dump errorstack as matter of fact do stray Oracle from its normal execution path – OS tools don’t do that, they just suspend the process and read what’s needed from process stack frames and mapped executable symbol sections).
So, the real “business” function the target was in, was __kernel_vsyscall() which in Linux means the process was doing some sort of syscall. I’m not going further in explaining the interpretation of function names here as I’ve done it in my previous AOT posts (and will continue so in the future).
I will continue on stack profiling topics. Taking only single stack backtrace can be helpful in cases when the target process is completely stuck, but if you want to diagnose the cause for just bad performance, then you need to take multiple stack backtraces, aggregate them and see into which execution “branch” of the codepath do the most stack samples fall in (as DStackProf does for example). Of course you would do this only after you’ve exhausted the step 1 and 2 in normal Oracle troubleshooting process (1. Look into wait interface data for the session, 2. Look into v$sesstat counters for the session).
Manually aggregating the stack traces is time consuming, error prone and did I mention boring, so now I finally introduce OStackProf v1.00!