Finding Oracle Homes which Oracle instances are using on Linux

I had a question about how to quickly identify which Oracle process runs out of which ORACLE_HOME on Linux.

I have uploaded a little script for that – it’s basically looking up all PMON process IDs and then using /proc/PID/exe link to find out where is the oracle binary of a running process located.

You may have to run this as root (as on some Linux versions I get “ls: cannot read symbolic link: Permission denied” error even when running this command as the owner of all Oracle homes (it seems to happen when your users UID and primary GID are different than thet setuid/setgid bits on the oracle binary):

oracle@linux03:~$ sudo ./findhomes.sh
   PID NAME                 ORACLE_HOME
  4421 asm_pmon_+ASM        /u01/app/oracle/product/11.2.0/db_1/
  4545 ora_pmon_demo112     /u01/app/oracle/product/11.2.0/dbhome_1/
  4547 ora_pmon_test112     /u01/app/oracle/product/11.2.0/dbhome_1/

You can use a similar approach on other Unixes too where the executable location or current working directory (CWD) is externalized in the /proc filesystem – or just use pmap to get this info instead.

Asynch descriptor resize wait event in Oracle

A lot of people have started seeing “asynch descriptor resize” wait event in Oracle 11gR2. Here’s my understanding of what it is. Note that I didn’t spend too much time researching it, so some details may be not completely accurate, but my explanation will at least give you an idea of why the heck you suddenly see this event in your database.

FYI, there’s a short, but incomplete explanation of this wait event also documented in MOS Note 1081977.1

Update: There’s a bug and a patch related to this wait event too.

The “direct path loader” (KCBL) module is used for performing direct path IO in Oracle, such as direct path segment scans and reading/writing spilled over workareas in temporary tablespace. Direct path IO is used whenever you see “direct path read/write*” wait events reported in your session. This means that IOs aren’t done from/to buffer cache, but from/to PGA directly, bypassing the buffer cache.

 

This KCBL module tries to dynamically scale up the number of asynch IO descriptors (AIO descriptors are the OS kernel structures, which keep track of asynch IO requests) to match the number of direct path IO slots a process uses. In other words, if the PGA workarea and/or spilled-over hash area in temp tablespace gets larger, Oracle also scales up the number of direct IO slots. Direct IO slots are PGA memory structures helping to do direct IO between files and PGA.

 

In order to be able to perform this direct IO asynchronously, Oracle also dynamically scales up the number of OS asynch IO descriptors, one for each slot (up to 4096 descriptors per process). When Oracle doesn’t need the direct IO slots anymore (when the direct path table scan has ended or a workarea/tempseg gets cancelled) then it scales down the number of direct IO slots and asynch IO descriptors. Scaling asynch IO descriptors up/down requires issuing syscalls to OS (as the AIO descriptors are OS kernel structures).

 

I guess this is supposed to be an optimization, to avoid running out of OS AIO descriptors, by releasing them when not they’re not needed, but as that Metalink note mentioned, the resize apparently sucks on Linux. Perhaps that’s why other ports also suffer and have seen the same wait event.

 

The “asynch descriptor resize” event itself is really an IO wait event (recorded in the wait class Other though), waiting for reaping outstanding IOs. Once this wait is over, then the OS call to change the amount of asynch IO descriptors (allocated to that process) is made. There’s no wait event recorded for the actual “resize” OS call as it shouldn’t block.

 

So, the more direct IO you do, especially when sorting/hashing to temp with frequent workarea closing/opening, the more of this event you’ll see (and it’s probably the same for regular tablespace direct path IO too).

 

This problem wouldn’t be noticeable if Oracle kept async io descriptors cached and wouldn’t constantly allocated/free them. Of course then you may end up running out of aio descriptors in the whole server easier. Also I don’t know whether there would be some OS issues with reusing cached aio descriptors, perhaps there is a good reason why such caching isn’t done.

 

Nevertheless, what’s causing this wait event is too frequent aio descriptor resize due to changes in direct IO slot count (due to changes in PGA workarea/temp segment and perhaps when doing frequent direct path scans through lots of tables/partitions too).

 

So, the obvious question here is what to do about this wait event? Well, first you should check how big part of your total response time this event takes at all?

 

  1. If it’s someting like 1% of your response time, then this is not your problem anyway and troubleshooting this further would be not practical – it’s just how Oracle works :)
  2. If it’s something like 20% or more of your response time, then it’s clearly a problem and you’d need to talk to Oracle Support to sort out the bug
  3. If it’s anything in between, make sure you don’t have an IO problem first, before telling that this is a bug. In one recent example I saw direct path reads take over a second on average when this problem popped up. The asynch descriptor resize wait event may well disappear from the radar once you fix the root cause – slow IO (or SQL doing too much IO). Remember, the asynch descriptor resize wait event, at least on Linux, is actually an IO wait event, the process is waiting for outstanding IO completion before the descriptor count increase/decrease can take place.

cursor: pin S waits, sporadic CPU spikes and systematic troubleshooting

I recently consulted one big telecom and helped to solve their sporadic performance problem which had troubled them for some months. It was an interesting case as it happened in the Oracle / OS touchpoint and it was a product of multiple “root causes”, not just one, an early Oracle mutex design bug and a Unix scheduling issue – that’s why it had been hard to resolve earlier despite multiple SRs opened etc.

Martin Meyer, their lead DBA, posted some info about the problem and technical details, so before going on, you should read his blog entry and read my comments below after this:

Problem:

So, the problem was, that occasionally the critical application transactions which should have taken very short time in the database (<1s), took 10-15 seconds or even longer and timed out.

Symptoms:

  1. When the problem happened, the CPU usage also jumped up to 100% for the problem duration (from few tens of seconds up to few minutes).
  2. In AWR snapshots (taken every 20 minutes), the “cursor: pin S” popped into top TOP5 waits (around 5-10% of total instance wait time) and sometimes also “cursor: pin S wait on X” which is a different thing, also “latch: library cache” and interestingly “log file sync”. These waits had then much higher average wait times per wait occurrence than normal (tens or hundreds of milliseconds per wait, on average).
  3. The V$EVENT_HISTOGRAM view showed lots of cursor: pin S waits taking very long time (over a second, some even 30+ seconds) and this certainly isn’t normal (Martin has these numbers in his blog entry)

AWR and OS CPU usage measurement tools are system-wide tools (as opposed to session-wide tools).

Troubleshooting:

I can’t give you exact numbers or AWR data here, but will explain the flow of troubleshooting and reasoning.

  • As the symptoms involved CPU usage spikes, I first checked whether there were perhaps logon storms going on due a bad application server configuration, where the app server suddenly decides to fire up hundreds of more connections at the same time (that happens quite often, so it’s a usual suspect when troubleshooting such issues). A logon storm can consume lots of CPU as all these new processes need to be started up in OS, they attach to SGA (syscalls, memory pagetable set-up operations) and eventually they need to find & allocate memory from shared pool and initialize session structures. This all takes CPU.However the logons cumulative statistic in AWR didn’t go up almost at all during the 20 minute snapshot, so that ruled out a logon storm. As the number of sessions in the end of AWR snapshot (compared to the beginning of it) did not go down, this ruled out a logoff storm too (which also consumes CPU as now the exiting processes need to release their resources etc).
  • It’s worth mentioning that log file sync waits also went up by over an order of magnitude (IIRC from 1-2ms to 20-60 ms per wait) during the CPU spikes. However as log file parallel write times didn’t go up so radically, this indicated that the log file sync wait time was wasted somewhere else too – which is very likely going to be CPU scheduling latency (waiting on the CPU runqueue) when CPUs are busy.
  • As one of the waits which popped up during the problem was cursor: pin S, then I chcecked V$MUTEX_SLEEP_HISTORY and it did not show any specific cursor as a significant contention point (all contention recorded in that sleep history buffer was evenly spread across many different cursors), so that indicated to me that the problem was likely not related to a single cursor related issue (a bug or just too heavy usage of that cursor). Note that this view was not queried during the worst problem time, so there was a chance that some symptoms were not in there anymore (V$MUTEX_SLEEP_HISTORY is a circular buffer of few hundred last mutex sleeps).
  • So, we had CPU starvation and very long cursor: pin S waits popping up at the same time. cursor: pin S operation should happen really fast as it’s a very simple operation (few tens of instructions only) of marking the cursor’s mutex in-flux so its reference count could be bumped up for a shared mutex get.
  • Whenever you see CPU starvation (CPUs 100% busy and runqueues are long) and latch or mutex contention, then the CPU starvation should be resolved first, as the contention may just be a symptom of the CPU starvation. The problem is that if you get unlucky and a latch or mutex holder process is preempted and taken off CPU by the scheduler, the latch/mutex holder can’t release the latch before it gets back onto CPU to complete its operation! But OS doesn’t have a clue about this, as latches/mutexes are just Oracle’s memory structures in SGA. So the latch/mutex holder is off CPU and everyone else who gets onto CPU may want to take the same latch/mutex. They can’t get it and spin shortly in hope that the holder releases it in next few microseconds, which isn’t gonna happen in this case, as the latch/mutex holder is still off CPU!
  • And now comes a big difference between latches and mutexes in Oracle 10.2: When a latch getter can’t get the latch after spinning, it will go to sleep to release the CPU. Even if there are many latch getters in the CPU runqueue before the latch holder, they all spin quickly and end up sleeping again. But when a mutex getter doesn’t get the mutex after spinning, it will not go to sleep!!! It will yield() the CPU instead, which means that it will go to the end of runqueue and try to get back onto CPU as soon as possible. So, mutex getters in 10.2 are much less graceful, they can burn a lot of CPU when the mutex they want is held by someone else for long time.
  • But so what, if a mutex holder is preempted and taken off CPU by OS scheduler – it should get back onto CPU pretty fast, once it works its way through the CPU runqueue?
  • Well, yes IF all the processes in the system have the same priority.
  • This is where a second problem comes into play – Unix process priority decay. When a process eats a lot of CPU (and does little IO / voluntary sleeping) then the OS lowers that processes CPU scheduling priority so that other, less CPU hungry processes would still get their fair share of CPU (especially when coming back from an IO wait for example etc).
  • When a mutex holder has a lower priority than most other processes and is now taken off CPU, a thing called priority inversion happens. Even though other processes do have higher priority, they can not proceed, as the critical lock or resource they need, is already held by the other process with a lower priority who can’t complete its work as the “high priority” processes keep the CPUs busy.
  • In case of latches, the problem is not that bad as the latch getters go to sleep until they are posted when the latch is released by the holder process (I’ve written about it here). But the priority inversion takes a crazy turn in case of mutexes – as their getters don’t sleep (not even for a short time) by default, but yield the CPU and try to get back to it immediately and so on until they get the mutex. That can lead to huge CPU runqueue spikes, unresponsive systems and even hangs.
  • This is why starting from Oracle 11g the mutex getters do sleep instead of just yielding the CPU and also Oracle has backported the fix into Oracle 10.2.0.4, where a patch must be applied and where the _first_spare_parameter will specify the sleep duration in centiseconds.
  • So, knowing how mutexes worked in 10.2, all the symptoms led me to suspect this priority inversion problem, greatly amplified by how the mutex getters do never sleep by default. And we checked the effective priorities of all Oracle processes in the server, and we hit the jackpot – there was a number of processes with significantly lower priorities than all other processes had. And it takes only one process with low priority to cause all this trouble, just wait until it starts modifying a mutex and is preempted while doing this.
  • So, in order to fix both of the problems which amplified each other, we had to enable HPUX_SCHED_NOAGE Oracle parameter, to prevent priority decay of the processes and set the _first_spare_parameter to 10, which meant that default mutex sleep time will be 10 centiseconds (which is pretty long time in mutex/latching world, but better than crazily retrying without any sleeping at all). That way no process (the mutex holder) is pushed back and kept away from CPU for long periods of time.

This was not a trivial problem, as it happened in Oracle / OS touchpoint and happened not because a single reason, but as a product of multiple separate reasons, amplifying each other.

There are few interesting, non-technical points here:

  1. When troubleshooting, don’t let performance tools like AWR (or any other tool!) tell you what your problem is! Your business, your users should tell you what the problem is and the tools should only be used for symptom drilldown (This is what Cary Millsap has been constantly telling us). Note how I mentioned the problem and symptoms separately in the beginning of my post – and the problem was that some business transactions (systemwide) timed out because the database response time was 5-15 seconds!
  2. The detail and scope of your performance data must have at least as good detail and scope of your performance problem!
    In other words, if your problem is measured in few seconds, then your performance data should also be sampled at least every few seconds in order to be fully systematic.The classic issue in this case was that the 20 minute AWR reports still showed IO wait times as main DB time consumers, but that was averaged over 20 minutes. But our problem happened severely and shortly within few seconds in that 20 minutes, so the averaging and aggregation over long period of time did hide the extreme performance issue that happened in a very short time.

Next time when it seems to be impossible to diagnose a problem and if the troubleshooting effort ends up going in circles, then you should ask, “what’s the real problem and who and how is experiencing it” and see if your performance data’s detail and scope matches that problem!

Oh, this is a good point to mention that in addition to my Advanced Oracle Troubleshooting/SQL Tuning seminars I also actually perform advanced Oracle troubleshooting consulting too! I eat mutexes for breakfast ;-)

Update: Srivenu Kadiyala has experienced the same problem and has written about it here.

 

Sometimes things are easy (Part 1): How to fix wrapped execution plan text?

What you see below is a common problem. Someone sends you (or posts to a forum) a wide execution plan, which is unreadable because of wrapped lines. For example, this one below:

--------------------------------------------------------------------------------
-------------------

| Id  | Operation                   | Name                    | E-Rows |  OMem |
 1Mem | Used-Mem |

--------------------------------------------------------------------------------
-------------------

|   0 | SELECT STATEMENT            |                         |        |       |
 |          |

|   1 |  SORT AGGREGATE             |                         |      1 |       |
 |          |

|*  2 |   HASH JOIN                 |                         |     13 |  1102K|
 1102K|  355K (0)|

|*  3 |    HASH JOIN                |                         |     13 |   988K|
 988K|  367K (0)|

|*  4 |     HASH JOIN               |                         |     13 |   921K|
 921K|  621K (0)|

|*  5 |      HASH JOIN OUTER        |                         |     13 |   836K|
 836K| 1224K (0)|

|*  6 |       HASH JOIN             |                         |     13 |   821K|
 821K|  501K (0)|

|*  7 |        HASH JOIN            |                         |     13 |  1102K|
 1102K|  501K (0)|

|   8 |         MERGE JOIN CARTESIAN|                         |      1 |       |
 |          |

|*  9 |          TABLE ACCESS FULL  | PROFILE$                |      1 |       |
 |          |

|  10 |          BUFFER SORT        |                         |      1 | 73728 |
 73728 |          |

|* 11 |           TABLE ACCESS FULL | PROFILE$                |      1 |       |
 |          |

|* 12 |         TABLE ACCESS FULL   | USER$                   |     36 |       |
 |          |

|  13 |        TABLE ACCESS FULL    | PROFNAME$               |      1 |       |
 |          |

|* 14 |       TABLE ACCESS FULL     | RESOURCE_GROUP_MAPPING$ |      1 |       |
 |          |

|  15 |      TABLE ACCESS FULL      | TS$                     |      7 |       |
 |          |

|  16 |     TABLE ACCESS FULL       | TS$                     |      7 |       |
 |          |

|  17 |    TABLE ACCESS FULL        | USER_ASTATUS_MAP        |      9 |       |
 |          |

--------------------------------------------------------------------------------
-------------------

So now you either try to manually edit and fix the execution plan text so you could read it or ask the developer to send the execution plan again. Both approaches take time.

Well, sometimes things are easy – in this particular case I saved the above into a file called /tmp/x and ran the following command:

$ cat /tmp/x | awk '{ printf "%s", $0 ; if (NR % 3 == 0) print } END { print }'
---------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name                    | E-Rows |  OMem | 1Mem | Used-Mem |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                         |        |       | |          |
|   1 |  SORT AGGREGATE             |                         |      1 |       | |          |
|*  2 |   HASH JOIN                 |                         |     13 |  1102K| 1102K|  355K (0)|
|*  3 |    HASH JOIN                |                         |     13 |   988K| 988K|  367K (0)|
|*  4 |     HASH JOIN               |                         |     13 |   921K| 921K|  621K (0)|
|*  5 |      HASH JOIN OUTER        |                         |     13 |   836K| 836K| 1224K (0)|
|*  6 |       HASH JOIN             |                         |     13 |   821K| 821K|  501K (0)|
|*  7 |        HASH JOIN            |                         |     13 |  1102K| 1102K|  501K (0)|
|   8 |         MERGE JOIN CARTESIAN|                         |      1 |       | |          |
|*  9 |          TABLE ACCESS FULL  | PROFILE$                |      1 |       | |          |
|  10 |          BUFFER SORT        |                         |      1 | 73728 | 73728 |          |
|* 11 |           TABLE ACCESS FULL | PROFILE$                |      1 |       | |          |
|* 12 |         TABLE ACCESS FULL   | USER$                   |     36 |       | |          |
|  13 |        TABLE ACCESS FULL    | PROFNAME$               |      1 |       | |          |
|* 14 |       TABLE ACCESS FULL     | RESOURCE_GROUP_MAPPING$ |      1 |       | |          |
|  15 |      TABLE ACCESS FULL      | TS$                     |      7 |       | |          |
|  16 |     TABLE ACCESS FULL       | TS$                     |      7 |       | |          |
|  17 |    TABLE ACCESS FULL        | USER_ASTATUS_MAP        |      9 |       | |          |
---------------------------------------------------------------------------------------------------

All I did here was that I stripped out line feeds from all lines except every 3rd line (which is the real end of the original line).

Note that if your linesize is very wide (and trimspool/trimout settings are ON) then this script would need some adjustment…

I’m sure this trivial approach doesn’t work in all situations, but with this article I wanted to illustrate that sometimes things which seem hard can be made much easier with a little scripting knowledge. If you are thinking which technology you should learn next – then better check out a Perl, Python or some shell+AWK book :)

By the way, if you want real flexibility displaying your execution plans (from library cache), then check this out:

http://blog.tanelpoder.com/2009/05/26/scripts-for-showing-execution-plans-via-plain-sql-and-also-in-oracle-9i/

Identifying shared memory segment users using lsof

Lsof (list open files) is a really useful tool for troubleshooting open file decriptors which prevent a deleted file from being released or a shared memory segment from being removed.

Here’s a little situation on Linux where an Oracle shared memory segment was not released as someone was still using it.

$ ipcs -ma

------ Shared Memory Segments --------
key        shmid      owner      perms      bytes      nattch     status
0x00000000 393216     oracle    640        289406976  1          dest
0xbfb94e30 425985     oracle    640        289406976  18
0x3cf13430 557058     oracle    660        423624704  22

------ Semaphore Arrays --------
key        semid      owner      perms      nsems
0xe2260ff0 1409024    oracle    640        154
0x9df96b74 1671169    oracle    660        154

------ Message Queues --------
key        msqid      owner      perms      used-bytes   messages

The bold line should have disappeared after instance shutdown, but it didn’t. From “natcch” (number of attached processes) column I see there is still some process using the shared memory segment. Thus the segment was not released and even ipcrm command did not remove it (just like with normal files if someone has them open).

So, I needed to identify which process was still using the memory segment. If that had been a normal existing file, I’d could have used /sbin/fuser command to see which process still holds it open, but this only works for existing files with existing directory entries.

However for deleted files, sockets and shared memory segments, you can use lsof command (it’s normally installed by default on Linux, but for Unixes you need to separately download and install).

The SHM ID of that segment was 393216 as ipcs -ma showed, so I simply run lsof to show all open file descriptors and grep for that SHM ID:

$ lsof | egrep "393216|COMMAND"
COMMAND     PID      USER   FD      TYPE     DEVICE       SIZE       NODE NAME
python    18811    oracle  DEL       REG        0,8                393216 /SYSVbfb94e30

See how the NODE column corresponds to SHM ID in ipcs output.

So I kill the PID 18811 which is still attached to the SHM segment:

$ kill 18811

$ ipcs -ma

------ Shared Memory Segments --------
key        shmid      owner      perms      bytes      nattch     status
0xbfb94e30 425985     oracle    640        289406976  18
0x3cf13430 557058     oracle    660        423624704  25

------ Semaphore Arrays --------
key        semid      owner      perms      nsems
0xe2260ff0 1409024    oracle    640        154
0x9df96b74 1671169    oracle    660        154

------ Message Queues --------
key        msqid      owner      perms      used-bytes   messages


Now the shared memory segment is gone and its memory released.

Note that the lsof command is very useful for many other tasks as well. For example it allows you to list open sockets by network protocol, IP, port etc. For example you can determine to which client some server process is talking to, from OS level:

$ lsof -i:1521
COMMAND   PID   USER   FD   TYPE DEVICE SIZE NODE NAME
tnslsnr  6212 oracle   11u  IPv4  49486       TCP *:1521 (LISTEN)
tnslsnr  6212 oracle   13u  IPv4 276708       TCP linux03:1521->linux03:37277 (ESTABLISHED)
tnslsnr  6212 oracle   14u  IPv4 264894       TCP linux03:1521->linux03:41122 (ESTABLISHED)
oracle  22687 oracle   20u  IPv4 264893       TCP linux03:41122->linux03:1521 (ESTABLISHED)
oracle  25250 oracle   15u  IPv4 276707       TCP linux03:37277->linux03:1521 (ESTABLISHED)
oracle  25530 oracle   15u  IPv4 279910       TCP linux03:1521->192.168.247.1:nimsh (ESTABLISHED)

Unfortunately lsof is not installed by default in classic Unixes, but in some shops the sysadmins have chosen to install it. But even then, it may not work for regular users as lsof requires access to kernel memory structures through /dev/kmem or similar. If you can’t get access to lsof then there may be other tools available which can do some tricks lsof can do. For example on Solaris, there’s an useful command pfiles which can show open files of a process and since Solaris 9 ( I think ) it can also report the TCP connection endpoints of network sockets…

Reliable latch waits and a new blog

Here’s a link to Alex Fatkulin’s blog if you haven’t seen it already: http://afatkulin.blogspot.com/

He has some good Oracle internals information in there, I also like his research style.

Alex just blogged about a finding (on Oracle 11g on Linux) that when Oracle process doesn’t get a latch after spinning, it goes to sleep using semop() system call, which never wakes up unless this semaphore is posted by another process. From past versions we remember that Oracle processes go to sleep for a short period of time, wake up, try to get the latch and sleep again for a longer period of time if unsuccessful (up to _max_exponential_sleep centiseconds). This kind of sleeping with timeout is done using semtimedop() syscall on Linux.

Also, for some latches the latch waiter posting was available. If a process failed to get a latch, it put a pointer to its state object into the waiter list for that latch and went to sleep for some centiseconds. If the latch holder released this latch, it scanned through a waiter list for that latch and posted the waiters, so that they would not have to sleep until the end of this x centisecond sleep. This used to be controllable using _latch_wait_posting parameter, but since 9i this parameter has been removed and most latches do have wait posting enabled by default.

With semaphores and posting, on most Unixes there have been problems in past with missed posts/wakeups, sometimes due bugs, sometimes just due the implementation of signals and semaphore operations in Unix kernel. So that’s why the past Oracle versions always have some kind of timeout for latch sleeps (and most enqueue sleeps and buffer busy waits as well, as a matter of fact). If a process manages to miss the wakeup call, it will wake up after some timeout anyway. Performance suffers, but at least the process won’t hang infinitely. And this was achieved using semtimedop() systemcall (on Linux).

So, how come Alex saw just semop() calls in his test?

The answer is that apparently the minimum required Linux kernel used for Oracle 10.2+ does support reliable posting using semaphores, so Oracle is taking use of this.

There is a new parameter called _enable_reliable_latch_waits in 10.2+ and (at least) on Linux it is true. When it’s true, Oracle trusts that all wakeup calls (through semaphores) are received by the sleeping processes, thus there’s no need to periodically wake up to check whether the latch has become “available”.

Here’s a little test case:
[Read more...]

Oracle memory troubleshooting, Part 1: Heapdump Analyzer

When troubleshooting Oracle process memory issues like ORA-4030′s or just excessive memory usage, you may want to get a detailed breakdown of PGA, UGA and Call heaps to see which component in there is the largest one.

The same goes for shared pool memory issues and ORA-4031′s – sometimes you need to dump the shared pool heap metadata for understanding what kind of allocations take most of space in there.

The heap dumping can be done using a HEAPDUMP event, see http://www.juliandyke.com/Diagnostics/Dumps/Dumps.html for syntax.

NB! Note that when dumping SGA heaps (like shared, large, java and streams pools), your process holds shared pool latches for the entire dump duration so this should be used only as a last resort in busy production instances. Dumping a big shared pool could hang your instance for quite some time. Dumping private process heaps is safer as that way only the target process is affected.

The heapdump output file structure is actually very simple, all you need to look at is the HEAP DUMP header to see in which heap the following chunks of memory belong (as there may be multiple heaps dumped into a single tracefile).

HEAP DUMP heap name="sga heap(1,1)"  desc=04EA22D0
 extent sz=0xfc4 alt=108 het=32767 rec=9 flg=-125 opc=0
 parent=00000000 owner=00000000 nex=00000000 xsz=0x400000
EXTENT 0 addr=20800000
  Chunk 20800038 sz=   374904    free      "               "
  Chunk 2085b8b0 sz=      540    recreate  "KGL handles    "  latch=00000000
  Chunk 2085bacc sz=      540    recreate  "KGL handles    "  latch=00000000
  Chunk 2085bce8 sz=     1036    freeable  "parameter table"
  Chunk 2085c0f4 sz=     1036    freeable  "parameter table"
  Chunk 2085c500 sz=     1036    freeable  "parameter table"
  Chunk 2085c90c sz=     1036    freeable  "parameter table"
  Chunk 2085cd18 sz=     1036    freeable  "parameter table"
  Chunk 2085d124 sz=      228    recreate  "KGL handles    "  latch=00000000
  Chunk 2085d208 sz=      228    recreate  "KGL handles    "  latch=00000000
  Chunk 2085d2ec sz=      228    recreate  "KGL handles    "  latch=00000000
  Chunk 2085d3d0 sz=      228    recreate  "KGL handles    "  latch=00000000
  Chunk 2085d4b4 sz=      228    recreate  "KGL handles    "  latch=00000000
  Chunk 2085d598 sz=      540    recreate  "KQR PO         "  latch=2734AA00
  Chunk 2085d7b4 sz=      540    recreate  "KQR PO         "  latch=2734AA00
  Chunk 2085d9d0 sz=      228    recreate  "KGL handles    "  latch=00000000
...

The first list of chunks after HEAP DUMP (the list above) is the list of all chunks in the heap. There are more lists such as freelists and LRU lists in a regular heap, but lets ignore those for now, I’ll write more about heaps in an upcoming post.

After identifying heap name from HEAP DUMP line, you can see all individual chunks from the “Chunk” lines. The second column after Chunk shows the start address of a chunk, sz= means chunk size, the next column shows the type of a chunk (free, freeable, recreate, perm, R-free, R-freeable).

The next column is important one for troublehsooting, it shows the reason why a chunk was allocated (such KGL handles for library cache handles, KGR PO for dictionary cache parent objects etc). Every chunk in a heap has a fixed 16 byte area in the chunk header which stores the allocation reason (comment) of a chunk. Whenever a client layer (calling a kghal* chunk allocation function) allocates heap memory, it needs to pass in a comment up to 16 bytes and it’s stored in the newly allocated chunk header.

This is a trivial technique for troubleshooting memory leaks and other memory allocation problems. When having memory issues you can just dump all the heap’s chunks sizes and aggregate these by allocation reason/comment. That would show you the biggest heap occupier and give further hints where to look next.

As there can be lots of chunks in large heaps, aggregating the data manually would be time consuming (and boring). Here’s a little shell script which can summarize Oracle heapdump output tracefile contents for you:

[Read more...]

Advanced Oracle Troubleshooting Guide, Part 6: Understanding Oracle execution plans with os_explain

Get ready for some more adventures in Oracle process stack!

Before proceeding though, please read this post about safety of different stack sampling approaches.

I have had few non-trivial Oracle troubleshooting cases, related to query hangs and bad performance, where I’ve wanted to know where exactly in execution plan the current execution is.
Remember, Oracle is just another program executing instructions clustered in functions on your server, so stack sampling can help out here as well.

So, I was looking into the following stack trace taken from an Oracle 10.1 database on Solaris SPARC, running a SQL with this execution plan.

[Read more...]

Oracle hidden costs revealed, part 1 – Does a batch job run faster when executed locally?

This series is about revealing some Oracle’s internal execution costs and inefficiencies. I will analyze few situations and special cases where you can experience a performance hit where you normally wouldn’t expect to.

The first topic is about a question I saw in a recent Oracle Forum thread.

The question goes like this: “Is there any benefit if I run long sql queries from the server (by using telnet,etc) or from the remote by sql client.”

In order to leave out the network transfer cost of resultset for simplicity, I will rephrase the question like that: “Do I get better performance when I execute my server-side batch jobs (which don’t return any data to client) locally from the database server versus a remote application server or workstation?”

The obvious answer would be “NO, it does not matter where from you execute your batch job, as Oracle is a client server database system. All execution is done locally regardless of the client’s location, thus the performance is the same”.

While this sounds plausible in theory, there is (at least) one practical issue which can affect Oracle server performance depending on the clients platform and client libaries version.

It is caused by regular in-band break checking in client server communication channel where out of band break signalling is not available. A test case is below:

[Read more...]

Systematic application troubleshooting in Unix

How many times have you seen a following case, where a user or developer complains that their Oracle session is stuck or running very slowly and the person who starts investigating the issue does following:

  1. Checks the database for locks
  2. Checks free disk space
  3. Checks alert log
  4. Goes back to the client saying “we did a healthcheck and everything looks ok” and closes the case or asks the user/developer to contact application support team or tune their SQL

The point here is that what the heck do the database locks, alert log or disk space have to do with first round session troubleshooting, when Oracle provides just about everything you need in one simple view?

Yes, I am talking about sampling V$SESSION_WAIT here. Database locks, free space and potential errors in alert log may have something to do with your users problems, but not necessarily. As there are many more causes, like network issues etc which could affect your user (and the whole database), it doesn’t make sense to go through all those random “healthchecks” every time you receive a user phone call. Moreover, even if you identify that there is shortage of disk space or there are many database locks – so what? They may not have anything to do with the users problem.

The issue here is that still many people do not know about V$SESSION_WAIT which in most cases shows your problem immediately or at least points you to right direction (e.g. there’s no need to check for locks if your session is waiting on “log file switch (archiving needed)” wait – and vice versa). Even if “these people” have heard of V$SESSION_WAIT and may be able to drop this in during their job interview, they may not know how to use it in systematic troubleshooting context. Many hours of service downtime and user frustration would be saved if all DBAs knew this extremely simple concept of looking at V$SESSION_WAIT.

This blog entry is not about Oracle though, so I will leave this rant for a future blog post.

This post is about a similar problem in Unix world. Having been involved with resolving some serious production issues lately I have been surprised quite many times by the corporate Unix support people who seem to do behave in similar manner. For example, there is a user calling in saying that their scheduled Unix job, which normally takes 5 minutes, has been running for hours now. The “senior unix support analyst” will do following:

[Read more...]