Oracle memory troubleshooting, Part 1: Heapdump Analyzer

When troubleshooting Oracle process memory issues like ORA-4030’s or just excessive memory usage, you may want to get a detailed breakdown of PGA, UGA and Call heaps to see which component in there is the largest one.

The same goes for shared pool memory issues and ORA-4031’s – sometimes you need to dump the shared pool heap metadata for understanding what kind of allocations take most of space in there.

The heap dumping can be done using a HEAPDUMP event, see http://www.juliandyke.com/Diagnostics/Dumps/Dumps.html for syntax.

NB! Note that when dumping SGA heaps (like shared, large, java and streams pools), your process holds shared pool latches for the entire dump duration so this should be used only as a last resort in busy production instances. Dumping a big shared pool could hang your instance for quite some time. Dumping private process heaps is safer as that way only the target process is affected.

The heapdump output file structure is actually very simple, all you need to look at is the HEAP DUMP header to see in which heap the following chunks of memory belong (as there may be multiple heaps dumped into a single tracefile).

HEAP DUMP heap name="sga heap(1,1)"  desc=04EA22D0
 extent sz=0xfc4 alt=108 het=32767 rec=9 flg=-125 opc=0
 parent=00000000 owner=00000000 nex=00000000 xsz=0x400000
EXTENT 0 addr=20800000
  Chunk 20800038 sz=   374904    free      "               "
  Chunk 2085b8b0 sz=      540    recreate  "KGL handles    "  latch=00000000
  Chunk 2085bacc sz=      540    recreate  "KGL handles    "  latch=00000000
  Chunk 2085bce8 sz=     1036    freeable  "parameter table"
  Chunk 2085c0f4 sz=     1036    freeable  "parameter table"
  Chunk 2085c500 sz=     1036    freeable  "parameter table"
  Chunk 2085c90c sz=     1036    freeable  "parameter table"
  Chunk 2085cd18 sz=     1036    freeable  "parameter table"
  Chunk 2085d124 sz=      228    recreate  "KGL handles    "  latch=00000000
  Chunk 2085d208 sz=      228    recreate  "KGL handles    "  latch=00000000
  Chunk 2085d2ec sz=      228    recreate  "KGL handles    "  latch=00000000
  Chunk 2085d3d0 sz=      228    recreate  "KGL handles    "  latch=00000000
  Chunk 2085d4b4 sz=      228    recreate  "KGL handles    "  latch=00000000
  Chunk 2085d598 sz=      540    recreate  "KQR PO         "  latch=2734AA00
  Chunk 2085d7b4 sz=      540    recreate  "KQR PO         "  latch=2734AA00
  Chunk 2085d9d0 sz=      228    recreate  "KGL handles    "  latch=00000000
...

The first list of chunks after HEAP DUMP (the list above) is the list of all chunks in the heap. There are more lists such as freelists and LRU lists in a regular heap, but lets ignore those for now, I’ll write more about heaps in an upcoming post.

After identifying heap name from HEAP DUMP line, you can see all individual chunks from the “Chunk” lines. The second column after Chunk shows the start address of a chunk, sz= means chunk size, the next column shows the type of a chunk (free, freeable, recreate, perm, R-free, R-freeable).

The next column is important one for troublehsooting, it shows the reason why a chunk was allocated (such KGL handles for library cache handles, KGR PO for dictionary cache parent objects etc). Every chunk in a heap has a fixed 16 byte area in the chunk header which stores the allocation reason (comment) of a chunk. Whenever a client layer (calling a kghal* chunk allocation function) allocates heap memory, it needs to pass in a comment up to 16 bytes and it’s stored in the newly allocated chunk header.

This is a trivial technique for troubleshooting memory leaks and other memory allocation problems. When having memory issues you can just dump all the heap’s chunks sizes and aggregate these by allocation reason/comment. That would show you the biggest heap occupier and give further hints where to look next.

As there can be lots of chunks in large heaps, aggregating the data manually would be time consuming (and boring). Here’s a little shell script which can summarize Oracle heapdump output tracefile contents for you:


http://blog.tanelpoder.com/files/scripts/tools/unix/heapdump_analyzer

After taking a heapdump, you just run to get a heap summary, total allocation sizes grouped by parent heap, chunk comment and chunk size.

heapdump_analyzer tracefile.trc

Here’s an example of a shared pool dump analysis (heapdump at level 2):

SQL> alter session set events 'immediate trace name heapdump level 2';

Session altered.

SQL> exit
...

$ heapdump_analyzer lin10g_ora_7145.trc

  -- Heapdump Analyzer v1.00 by Tanel Poder ( http://www.tanelpoder.com )

  Total_size #Chunks  Chunk_size,        From_heap,       Chunk_type,  Alloc_reason
  ---------- ------- ------------ ----------------- ----------------- -----------------
    11943936       3    3981312 ,    sga heap(1,3),             free,
     3981244       1    3981244 ,    sga heap(1,0),             perm,  perm
     3980656       1    3980656 ,    sga heap(1,0),             perm,  perm
     3980116       1    3980116 ,    sga heap(1,0),             perm,  perm
     3978136       1    3978136 ,    sga heap(1,0),             perm,  perm
     3977156       1    3977156 ,    sga heap(1,1),         recreate,  KSFD SGA I/O b
     3800712       1    3800712 ,    sga heap(1,0),             perm,  perm
     3680560       1    3680560 ,    sga heap(1,0),             perm,  perm
     3518780       1    3518780 ,    sga heap(1,0),             perm,  perm
     3409016       1    3409016 ,    sga heap(1,0),             perm,  perm
     3394124       1    3394124 ,    sga heap(1,0),             perm,  perm
     2475420       1    2475420 ,    sga heap(1,1),             free,
     2319892       1    2319892 ,    sga heap(1,3),             free,
     2084864     509       4096 ,    sga heap(1,3),         freeable,  sql area
...

It shows that the biggest component in shared pool is 11943936 bytes, it consists of 3 free chunks, which reside in shared pool subpool 1 and sub-sub-pool 3 (see the sga heap(1,3) section).

Note that my script is very trivial as of now, it reports different sized chunks on different lines so you still may need to do some manual aggregation if there’s no obvious troublemaker seen in the top of the list.

Here’s an example of a summarized heapdump level 29 ( PGA + UGA + call heaps ):

$ heapdump_analyzer lin10g_ora_7145_0002.trc

  -- Heapdump Analyzer v1.00 by Tanel Poder ( http://www.tanelpoder.com )

  Total_size #Chunks  Chunk_size,        From_heap,       Chunk_type,  Alloc_reason
  ---------- ------- ------------ ----------------- ----------------- -----------------
     7595216     116      65476 ,     top uga heap,         freeable,  session heap
     6779640     105      64568 ,     session heap,         freeable,  kxs-heap-w
     2035808       8     254476 ,         callheap,         freeable,  kllcqas:kllsltb
     1017984       4     254496 ,    top call heap,         freeable,  callheap
      987712       8     123464 ,     top uga heap,         freeable,  session heap
      987552       8     123444 ,     session heap,         freeable,  kxs-heap-w
      196260       3      65420 ,     session heap,         freeable,  kxs-heap-w
      159000       5      31800 ,     session heap,         freeable,  kxs-heap-w
      112320      52       2160 ,         callheap,             free,
       93240     105        888 ,     session heap,             free,
       82200       5      16440 ,     session heap,         freeable,  kxs-heap-w
       65476       1      65476 ,     top uga heap,         recreate,  session heap
       65244       1      65244 ,    top call heap,             free,
       56680      26       2180 ,    top call heap,         freeable,  callheap
       55936       1      55936 ,     session heap,         freeable,  kxs-heap-w
...

You can also use -t option to show total heap sizes in the output (this total is not computed by my script, I just take the “Total” lines from the heapdump tracefile):

$ heapdump_analyzer -t lin10g_ora_7145_0002.trc | grep Total
  Total_size #Chunks  Chunk_size,        From_heap,       Chunk_type,  Alloc_reason
     8714788       1    8714788 ,     top uga heap,            TOTAL,  Total heap size
     8653464       1    8653464 ,     session heap,            TOTAL,  Total heap size
     2169328       2    1084664 ,         callheap,            TOTAL,  Total heap size
     1179576       1    1179576 ,    top call heap,            TOTAL,  Total heap size
      191892       1     191892 ,         pga heap,            TOTAL,  Total heap size

References:

This entry was posted in Oracle and tagged , , , . Bookmark the permalink.

4 Responses to Oracle memory troubleshooting, Part 1: Heapdump Analyzer

  1. Kumar says:

    Hi Tanel:
    How do we dig deep in to the output once we run the head dump. In your example, you have explained about KGL handles for library cache handles, KGR PO for dictionary cache parent objects etc.
    How do you go further from this? I think I have two questions
    (a) How do you know KGL handles = library cache handles. If I get a different output how do I interpret what that handle is being held for?
    (b) Once you get information on (a), how do you proceed further to get to the acutal application or object causing the issue?

    Thank you
    Kumar

  2. Andy D says:

    This is great information. Thank you so much. How do you turn off the trace? also are there any commands to stop the error from occurring? for example flushing the shared pool?

  3. Darshan says:

    Tanel,

    Can you please correct url for the script heapdump_analyzer to download?

    Thanks
    Darshan

  4. Leonard cowan says:

    Can you please correct url for the script heapdump_analyzer to download?

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>