Archive

Archive for October, 2008

Advanced Oracle Troubleshooting Guide, Part 9 – Process stack profiling from sqlplus using OStackProf

October 31st, 2008

I have mentioned ORADEBUG SHORT_STACK command in my blog posts before – it’s an easy way to get and see target processes stack backtrace directly in sqlplus. No need to log on to the Unix/Windows server and use OS tools for extracting the stack.

I have also written few tools which allow you to post-process stack traces taken using OS tools (like pstack) for better readability or performance profiling. For example os_explain and DStackProf – the DTrace stack profiler and function call aggregator.

Now I will introduce OStackProf which combines the ORADEBUG SHORT_STACK with a client side post-processing script for easy stack profiling directly from SQLPLUS – no need to log on to the server host at all!

I have demonstrated this script at conferences for couple of months now with a promise to blog about it “soon”, but it’s only now that I finally have a chance to write a blog entry about it (I have couple of hours left on a flight to help a client in London). Sorry to keep you waiting ;-)

So, this is what you see when you run oradebug short_stack on a process:

SQL> oradebug setospid 32200
Oracle pid: 2, Unix process pid: 32200, image: oracle@linux03 (PMON)
SQL>
SQL> oradebug short_stack
<-ksedsts()+275<-ksdxfstk()+22<-ksdxcb()+1599<-sspuser()+102<-__kernel_vsyscall()+2<-ntevpque()+89<-ntevqone()+34<-nsevwait()+10098<-ksnwait()+72<-ksliwat()+7249<-kslwaitctx()+135<-ksuclnwt()+249<-ksucln()+509<-ksbrdp()+1258<-opirip()+548<-opidrv()+500<-sou2o()+71<-opimai_real()+238<-ssthrdmain()+142<-main()+116<-__libc_start_main()+220<-_start()+33
SQL>

This is a stack trace, the leftmost function is where the execution currently was at the moment of taking the stack backtrace, the one to the right from it is the caller of the left function and so on. All the way to the right you see the “bottom” functions of a stack, like C main() function and also _start() which is Linux OS process loader helper function.

Note the two bold functions. Oradebug short_stack works the way that whenever it needs to let target process know about a debugger request, it sends a SIGUSR2 signal to it. sspuser() is the signal handler for SIGUSR2 and its task is to see what debug function to call. So, the sspuser() function and anything to the left from it is actually the codepath for processing the oradebug request, so we can ignore that for troubleshooting purposes (this also evidences that oradebug short_stack and dump errorstack as matter of fact do stray Oracle from its normal execution path – OS tools don’t do that, they just suspend the process and read what’s needed from process stack frames and mapped executable symbol sections).

So, the real “business” function the target was in, was __kernel_vsyscall() which in Linux means the process was doing some sort of syscall. I’m not going further in explaining the interpretation of function names here as I’ve done it in my previous AOT posts (and will continue so in the future).

I will continue on stack profiling topics. Taking only single stack backtrace can be helpful in cases when the target process is completely stuck, but if you want to diagnose the cause for just bad performance, then you need to take multiple stack backtraces, aggregate them and see into which execution “branch” of the codepath do the most stack samples fall in (as DStackProf does for example). Of course you would do this only after you’ve exhausted the step 1 and 2 in normal Oracle troubleshooting process (1. Look into wait interface data for the session, 2. Look into v$sesstat counters for the session).

Manually aggregating the stack traces is time consuming, error prone and did I mention boring, so now I finally introduce OStackProf v1.00!

Read more…

  • Share/Bookmark

Tanel Poder Cool stuff, Internals, Oracle, Performance, Troubleshooting

Transportable tablespaces and ROWID uniqueness

October 21st, 2008

I recently saw a fellow OakTable member mentioning a section in Oracle documentation where it’s said that:
“When a database contains tablespaces that have been plugged in (from other databases), the ROWIDs in that database are no longer unique. A ROWID is guaranteed unique only within a table.”

It’s a well known fact that the old Oracle7 style restricted rowids (which contained only File#, block# and row#) may not be unique in Oracle8+ databases which can have 1022 datafiles per tablespace not per database as previously. That’s why the 10-byte extended rowids were introduced, which also included the data object ID of a segment inside the rowid.

So if you have a global (partitioned) index on a partitioned table, the new extended rowids are used in it. With help of data dictionary cache this allows Oracle to quickly figure out in which tablespace the referenced segment resides and then use old fashioned relative-file#, block# and row# lookup on it.

Note that with local indexes and non-partitioned tables the rowids stored in indexes are old 6-byte restricted rowids. They have 4 bytes for data block address consisting of 10bits for file# and 22bits for block#. The other 2 bytes specify the row# in block.

The 10-byte extended rowids used to be unique within a database, until the transportable tablespaces came into play. Why is that – it’s because how transportable tablespaces work.

The idea behind TTS’es is that one should be able to copy (large) tablespaces around at file level, very fast, without much pre- or post-processing. So when we copy an 1TB TTS around, we don’t want to start scanning through it after plug-in to make ROWIDs in indexes and chained row forward pointers to somehow “make them right”. Thus, when plugging in a TTS, we need to keep all low-level row addressing structures untouched. This means that file#, block# and row# parts in index leaves and chained row pointers need to stay as they are. This means that DBAs (data block addresses in ASSM blocks need to stay as they are).

Also, as every segment data block holds a data object ID in it (which is also part of extended rowid as I mentioned), we can’t change this either, as otherwise we would need to scan through all tables/indexes in that tablespace again.

Thus, if all components of extended ROWID of an imported TTS need to stay as they are, there is theoretical and practical chance of ROWID collision within a database (note that I said within a database, not table).

A test case for demonstrating this is quite simple:

Read more…

  • Share/Bookmark

Tanel Poder Administration, Internals, Oracle

Austrian Oracle User Group event on 15th Oct in Vienna

October 14th, 2008

I will be speaking at Austrian Oracle User Group event in Vienna this Wednesday (15. Oct) with Christian Antognini

I will deliver an 1 hour version of my Advanced Oracle Troubleshooting presentation there, so if you’re in Austria and want to see what’s coming in my full length seminar I deliver on 3.-4. Nov also in Vienna, then you can see a preview on this Wednesday :)

  • Share/Bookmark

Tanel Poder Uncategorized