Multipart cursor subexecution and PRECOMPUTE_SUBQUERY hint

There was a question about PRECOMPUTE_SUBQUERY hint in an Oracle Forums thread.

Here I will post the answer I gave there and also elaborate it more as it explains a little known interesting fact about Oracle cursor management. Also it allows me to introduce few advanced Oracle troubleshooting scripts by example. This is a fairly long post, but if you are interested in some Oracle cursor management and SQL execution internals, keep on reading ;)

Consider the following test case with two tables, T1 and T2:

SQL> create table t1 as select rownum a from dual connect by level < 10;

Table created.

SQL> create table t2 as select rownum+10 b from dual connect by level < 10;

Table created.

SQL>
SQL> select * from t1;

         A
----------
         1
         2
         3
         4
         5
         6
         7
         8
         9

9 rows selected.

SQL>
SQL> select * from t2;

         B
----------
        11
        12
        13
        14
        15
        16
        17
        18
        19

9 rows selected.

Now lets run a query with a simple subquery in it:

SQL> select a
  2  from   t1
  3  where  a in (select b from t2);

no rows selected

SQL> select * from table(dbms_xplan.display_cursor(null,null,'ALLSTATS LAST'));

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------
SQL_ID  aucw6byq3d5q8, child number 0
-------------------------------------
select a from   t1 where  a in (select b from t2)

Plan hash value: 561629455

----------------------------------------------------------------------------
| Id  | Operation           | Name     | E-Rows |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |          |        |       |       |          |
|*  1 |  HASH JOIN SEMI     |          |      1 |  1066K|  1066K| 1056K (0)|
|   2 |   TABLE ACCESS FULL | T1       |      9 |       |       |          |
|   3 |   VIEW              | VW_NSO_1 |      9 |       |       |          |
|   4 |    TABLE ACCESS FULL| T2       |      9 |       |       |          |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("A"="B")

Note that a hash semijoin was performed which semijoined two of its child rowsources with join condition “A=B”.

Now lets run exactly the same query with PRECOMPUTE_SUBQUERY hint in subquery block:

SQL> select a
  2  from   t1
  3  where  a in (select /*+ PRECOMPUTE_SUBQUERY */b from t2);

no rows selected

SQL>
SQL> select * from table(dbms_xplan.display_cursor(null,null,'ALLSTATS LAST'));

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------
SQL_ID  fvnqhjkcjnybx, child number 0
-------------------------------------
select a from   t1 where  a in (select /*+ PRECOMPUTE_SUBQUERY */b from
t2)

Plan hash value: 3617692013

-------------------------------------------
| Id  | Operation         | Name | E-Rows |
-------------------------------------------
|   0 | SELECT STATEMENT  |      |        |
|*  1 |  TABLE ACCESS FULL| T1   |      5 |
-------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(("A"=11 OR "A"=12 OR "A"=13 OR "A"=14 OR "A"=15 OR "A"=16
              OR "A"=17 OR "A"=18 OR "A"=19))

See what happened! The join is gone and it looks like table T2 is not accessed at all (as there is only one TABLE ACCESS rowsource which reads from table T1 ).

However, there has appeared a filter condition which has all the values from T2 in it! How are these values retrieved?

A simple sql_trace reveals this:

[Read more...]

Identifying shared memory segment users using lsof

Lsof (list open files) is a really useful tool for troubleshooting open file decriptors which prevent a deleted file from being released or a shared memory segment from being removed.

Here’s a little situation on Linux where an Oracle shared memory segment was not released as someone was still using it.

$ ipcs -ma

------ Shared Memory Segments --------
key        shmid      owner      perms      bytes      nattch     status
0x00000000 393216     oracle    640        289406976  1          dest
0xbfb94e30 425985     oracle    640        289406976  18
0x3cf13430 557058     oracle    660        423624704  22

------ Semaphore Arrays --------
key        semid      owner      perms      nsems
0xe2260ff0 1409024    oracle    640        154
0x9df96b74 1671169    oracle    660        154

------ Message Queues --------
key        msqid      owner      perms      used-bytes   messages

The bold line should have disappeared after instance shutdown, but it didn’t. From “natcch” (number of attached processes) column I see there is still some process using the shared memory segment. Thus the segment was not released and even ipcrm command did not remove it (just like with normal files if someone has them open).

So, I needed to identify which process was still using the memory segment. If that had been a normal existing file, I’d could have used /sbin/fuser command to see which process still holds it open, but this only works for existing files with existing directory entries.

However for deleted files, sockets and shared memory segments, you can use lsof command (it’s normally installed by default on Linux, but for Unixes you need to separately download and install).

The SHM ID of that segment was 393216 as ipcs -ma showed, so I simply run lsof to show all open file descriptors and grep for that SHM ID:

$ lsof | egrep "393216|COMMAND"
COMMAND     PID      USER   FD      TYPE     DEVICE       SIZE       NODE NAME
python    18811    oracle  DEL       REG        0,8                393216 /SYSVbfb94e30

See how the NODE column corresponds to SHM ID in ipcs output.

So I kill the PID 18811 which is still attached to the SHM segment:

$ kill 18811

$ ipcs -ma

------ Shared Memory Segments --------
key        shmid      owner      perms      bytes      nattch     status
0xbfb94e30 425985     oracle    640        289406976  18
0x3cf13430 557058     oracle    660        423624704  25

------ Semaphore Arrays --------
key        semid      owner      perms      nsems
0xe2260ff0 1409024    oracle    640        154
0x9df96b74 1671169    oracle    660        154

------ Message Queues --------
key        msqid      owner      perms      used-bytes   messages

Now the shared memory segment is gone and its memory released.

Note that the lsof command is very useful for many other tasks as well. For example it allows you to list open sockets by network protocol, IP, port etc. For example you can determine to which client some server process is talking to, from OS level:

$ lsof -i:1521
COMMAND   PID   USER   FD   TYPE DEVICE SIZE NODE NAME
tnslsnr  6212 oracle   11u  IPv4  49486       TCP *:1521 (LISTEN)
tnslsnr  6212 oracle   13u  IPv4 276708       TCP linux03:1521->linux03:37277 (ESTABLISHED)
tnslsnr  6212 oracle   14u  IPv4 264894       TCP linux03:1521->linux03:41122 (ESTABLISHED)
oracle  22687 oracle   20u  IPv4 264893       TCP linux03:41122->linux03:1521 (ESTABLISHED)
oracle  25250 oracle   15u  IPv4 276707       TCP linux03:37277->linux03:1521 (ESTABLISHED)
oracle  25530 oracle   15u  IPv4 279910       TCP linux03:1521->192.168.247.1:nimsh (ESTABLISHED)

Unfortunately lsof is not installed by default in classic Unixes, but in some shops the sysadmins have chosen to install it. But even then, it may not work for regular users as lsof requires access to kernel memory structures through /dev/kmem or similar. If you can’t get access to lsof then there may be other tools available which can do some tricks lsof can do. For example on Solaris, there’s an useful command pfiles which can show open files of a process and since Solaris 9 ( I think ) it can also report the TCP connection endpoints of network sockets…

Reliable latch waits and a new blog

Here’s a link to Alex Fatkulin’s blog if you haven’t seen it already: http://afatkulin.blogspot.com/

He has some good Oracle internals information in there, I also like his research style.

Alex just blogged about a finding (on Oracle 11g on Linux) that when Oracle process doesn’t get a latch after spinning, it goes to sleep using semop() system call, which never wakes up unless this semaphore is posted by another process. From past versions we remember that Oracle processes go to sleep for a short period of time, wake up, try to get the latch and sleep again for a longer period of time if unsuccessful (up to _max_exponential_sleep centiseconds). This kind of sleeping with timeout is done using semtimedop() syscall on Linux.

Also, for some latches the latch waiter posting was available. If a process failed to get a latch, it put a pointer to its state object into the waiter list for that latch and went to sleep for some centiseconds. If the latch holder released this latch, it scanned through a waiter list for that latch and posted the waiters, so that they would not have to sleep until the end of this x centisecond sleep. This used to be controllable using _latch_wait_posting parameter, but since 9i this parameter has been removed and most latches do have wait posting enabled by default.

With semaphores and posting, on most Unixes there have been problems in past with missed posts/wakeups, sometimes due bugs, sometimes just due the implementation of signals and semaphore operations in Unix kernel. So that’s why the past Oracle versions always have some kind of timeout for latch sleeps (and most enqueue sleeps and buffer busy waits as well, as a matter of fact). If a process manages to miss the wakeup call, it will wake up after some timeout anyway. Performance suffers, but at least the process won’t hang infinitely. And this was achieved using semtimedop() systemcall (on Linux).

So, how come Alex saw just semop() calls in his test?

The answer is that apparently the minimum required Linux kernel used for Oracle 10.2+ does support reliable posting using semaphores, so Oracle is taking use of this.

There is a new parameter called _enable_reliable_latch_waits in 10.2+ and (at least) on Linux it is true. When it’s true, Oracle trusts that all wakeup calls (through semaphores) are received by the sleeping processes, thus there’s no need to periodically wake up to check whether the latch has become “available”.

Here’s a little test case:
[Read more...]

New year and some news…

First, Happy New Year to you all!

This year I promise to organize my blog a little better, to have some index of my articles, scripts etc :)

Here are the news…

I’ve moved my blog…

In December, when visiting Shanghai, I noticed that my blog was not accessible from there. Apparently the state firewall blocks all access to wordpress.com IPs.

So I have moved my blog to a virtual private server, off wordpress.com – my blog should be accessible from China now as well.

My Advanced Oracle Troubleshooting Seminars in year 2009

If you like the contents of my blog or conference presentations, you’ll sure like my 2-day seminar!

 

Here are the dates and links for more info:

3-4. February – NYOUG @ New York Cityhttp://www.nyoug.org/etc/training/htm/NYOUG_Training_Session.htm

NB! As this is my first public seminar in US, you’ll be able to attend it for a very very good price! And the early bird registration (which gives you even better price) is open until end of today! More seminars in other US cities are planned for 2nd half of 2009.

2-3. April – Miracle @ Netherlandshttp://www.miraclebv.nl/

Probably happens in Utrecht, I’ll keep you updated if it changes.

13-14. April – Oracle @ Singaporehttp://www.oracle.com/education/apac/sg_tanel_poder.html

16-17. April – Oracle @ Sydneyhttp://www.oracle.com/education/apac/au_tanel_poder.html

20-21. April – Oracle @ Melbournehttp://www.oracle.com/education/apac/au_tanel_poder.html

23-24. April Oracle @ Aucklandhttp://www.oracle.com/education/apac/nz_tanel_poder.html

27-28. April – PiSec Ltd @ Edinburghhttp://www.pisec.org/index.php?option=com_content&view=article&id=6&Itemid=12

11-12. May – Oracle @ Spainhttp://education.oracle.com/pls/web_prod-plq-dad/show_desc.redirect?dc=D70365_1060245&p_org_id=51&lang=E&source_call=

18-19. May – Miracle @ Denmarkhttp://www.miracleas.dk/index.php?option=com_content&view=article&id=100:advanced-oracle-troubleshooting&catid=19:info&Itemid=71

 

Note that the Advanced Oracle Troubleshooting seminar is meant mainly for experienced DBAs and performance engineers as I go very deep in Oracle internals and OS/hardware details.

See more details about my seminar here

Conferences in 2009

11-12. February – RMOUG Conference @ Denver, Colorado

I will be delivering two presentations about advanced Oracle troubleshooting, tuning and internals:

8-12. March – Hotsos Symposium @ Dallas, Texas

I will be delivering two presentations:

As the second title says, I don’t use any slides at that presentation, I will just demo some of the scripts and tools I use every day plus few case studies. It will be fun!

Also, I look forward attending the Training Day by Jonathan Lewis on Oracle troubleshooting, on last day of the Symposium.

Oracle memory troubleshooting, Part 1: Heapdump Analyzer

When troubleshooting Oracle process memory issues like ORA-4030′s or just excessive memory usage, you may want to get a detailed breakdown of PGA, UGA and Call heaps to see which component in there is the largest one.

The same goes for shared pool memory issues and ORA-4031′s – sometimes you need to dump the shared pool heap metadata for understanding what kind of allocations take most of space in there.

The heap dumping can be done using a HEAPDUMP event, see http://www.juliandyke.com/Diagnostics/Dumps/Dumps.html for syntax.

NB! Note that when dumping SGA heaps (like shared, large, java and streams pools), your process holds shared pool latches for the entire dump duration so this should be used only as a last resort in busy production instances. Dumping a big shared pool could hang your instance for quite some time. Dumping private process heaps is safer as that way only the target process is affected.

The heapdump output file structure is actually very simple, all you need to look at is the HEAP DUMP header to see in which heap the following chunks of memory belong (as there may be multiple heaps dumped into a single tracefile).

HEAP DUMP heap name="sga heap(1,1)"  desc=04EA22D0
 extent sz=0xfc4 alt=108 het=32767 rec=9 flg=-125 opc=0
 parent=00000000 owner=00000000 nex=00000000 xsz=0x400000
EXTENT 0 addr=20800000
  Chunk 20800038 sz=   374904    free      "               "
  Chunk 2085b8b0 sz=      540    recreate  "KGL handles    "  latch=00000000
  Chunk 2085bacc sz=      540    recreate  "KGL handles    "  latch=00000000
  Chunk 2085bce8 sz=     1036    freeable  "parameter table"
  Chunk 2085c0f4 sz=     1036    freeable  "parameter table"
  Chunk 2085c500 sz=     1036    freeable  "parameter table"
  Chunk 2085c90c sz=     1036    freeable  "parameter table"
  Chunk 2085cd18 sz=     1036    freeable  "parameter table"
  Chunk 2085d124 sz=      228    recreate  "KGL handles    "  latch=00000000
  Chunk 2085d208 sz=      228    recreate  "KGL handles    "  latch=00000000
  Chunk 2085d2ec sz=      228    recreate  "KGL handles    "  latch=00000000
  Chunk 2085d3d0 sz=      228    recreate  "KGL handles    "  latch=00000000
  Chunk 2085d4b4 sz=      228    recreate  "KGL handles    "  latch=00000000
  Chunk 2085d598 sz=      540    recreate  "KQR PO         "  latch=2734AA00
  Chunk 2085d7b4 sz=      540    recreate  "KQR PO         "  latch=2734AA00
  Chunk 2085d9d0 sz=      228    recreate  "KGL handles    "  latch=00000000
...

The first list of chunks after HEAP DUMP (the list above) is the list of all chunks in the heap. There are more lists such as freelists and LRU lists in a regular heap, but lets ignore those for now, I’ll write more about heaps in an upcoming post.

After identifying heap name from HEAP DUMP line, you can see all individual chunks from the “Chunk” lines. The second column after Chunk shows the start address of a chunk, sz= means chunk size, the next column shows the type of a chunk (free, freeable, recreate, perm, R-free, R-freeable).

The next column is important one for troublehsooting, it shows the reason why a chunk was allocated (such KGL handles for library cache handles, KGR PO for dictionary cache parent objects etc). Every chunk in a heap has a fixed 16 byte area in the chunk header which stores the allocation reason (comment) of a chunk. Whenever a client layer (calling a kghal* chunk allocation function) allocates heap memory, it needs to pass in a comment up to 16 bytes and it’s stored in the newly allocated chunk header.

This is a trivial technique for troubleshooting memory leaks and other memory allocation problems. When having memory issues you can just dump all the heap’s chunks sizes and aggregate these by allocation reason/comment. That would show you the biggest heap occupier and give further hints where to look next.

As there can be lots of chunks in large heaps, aggregating the data manually would be time consuming (and boring). Here’s a little shell script which can summarize Oracle heapdump output tracefile contents for you:

[Read more...]