Archive

Archive for February, 2010

How to CANCEL a query running in another session?

February 17th, 2010

Here’s a treat for Oracle geeks, hackers and maniacs out there…

Update: As the beginning says, this article was meant as something interesting about Oracle’s internals and CTRL+C / OCICancel() handling. There’s a more practical way for canceling session calls if you are running Oracle Enterprise Edition and are currently using resource manager:

You can set the consumer group for a session to CANCEL_SQL to cancel its current call:

DBMS_RESOURCE_MANAGER.SWITCH_CONSUMER_GROUP_FOR_SESS (
session_id IN NUMBER,
session_serial IN NUMBER,
consumer_group IN VARCHAR2);

Thanks to commenter “null” for this info. Note that I haven’t tested how/whether this feature works correctly so there’s homework for you ;-)

I recently received a question about how to cancel queries running in another Oracle session, so that the session would not be killed, but would remain alive.

Well, there’s no supported way I can tell you, but thanks to how Oracle handles out-of-band breaks on Unix platforms, you can cancel database calls using an OS tool – kill.

Before we go on, here’s how query cancellation (pressing CTRL+C in sqlplus for example) works in Oracle:

  1. The user presses CTRL+C or clicks some button in App which runs OCICancel()
  2. The client process sends an urgent TCP packet (which is a regular TCP packet with URG bit set) to the socket in the database server it is connected to
  3. The server OS TCP stack receives the urgent TCP packet and sends URGENT signal to the process which owns the socket (the Oracle server process)
  4. Unix interrupts whatever the current process was doing and calls the URGENT signal handler function (which Oracle has registered during process startup)
  5. The urgent signal handler blindly assumes that the urgent signal has been received because user wants to cancel the query, stops the execution and returns back with an error: ORA-01013: user requested cancel of current operation

So, if we can’t make our application send the break packet, OCICancel() then we can just send the SIGURG signal to the Oracle process just like the OS TCP stack would do when it receives the packet with urgent bit set.

Here’s an example:

In one session I’m running a DBMS_STATS call:

SQL> exec dbms_stats.gather_database_stats;

I identify the SPID of that session’s process and send an URG signal to that process:

kill -URG 4476

And the call gets cancelled in the other session:

SQL> exec dbms_stats.gather_database_stats;
<pre>BEGIN dbms_stats.gather_database_stats; END;

*
ERROR at line 1:
ORA-00604: error occurred at recursive SQL level 1
ORA-01013: user requested cancel of current operation
ORA-06512: at "SYS.DBMS_STATS", line 13336
ORA-06512: at "SYS.DBMS_STATS", line 13682
ORA-06512: at "SYS.DBMS_STATS", line 13826
ORA-06512: at "SYS.DBMS_STATS", line 13790
ORA-06512: at line 1

My session was not killed – I still can run queries in it:

SQL> select * from dual;

D
-
X

SQL>

This works only on Unix platforms. Also this does not work when your client application is Windows sqlplus! This is because Windows sqlplus does not set up the out-of-band break checking properly when connecting. Maybe this is because old Windows versions TCP stacks didn’t know anything about urgent TCP packets! :)

A word of warning – this stuff is not for your everyday production usage! While it works and we know how and why it works, it’s not a good idea to send “random” signals to Oracle processes at your will. So the better way is to make your application able to cancel its database calls when you want it, but well in real world its not always (or should I even say rarely) possible.

Another thing to consider is when you run Oracle with Direct NFS, there will be network connections to the NFS server used by your server process, in addition to the client-server communication. I haven’t tested what happens when you send URG packet to a process in the DNFS case.

So try this out at your own risk ;-)

If you want to know more about query cancelling and what the in-band and out-of-band break checking is then you can read one of my old blog posts about it:

  • Share/Bookmark

Tanel Poder Uncategorized

A Free afternoon seminar in Singapore (24th Feb)

February 17th, 2010

If you are in Singapore and have 24th Feb afternoon available then you can register and join a free Oracle performance troubleshooting seminar I’m doing in Singapore Management University’s (SMU) campus.

The seminar will be about:

  • Systematic Oracle Performance Troubleshooting
  • Identifying performance troublemakers
  • Understanding execution plans

The date is Wednesday, 24th Feb

The seminar time is from 15:30-19:00 (don’t be late)

Registration and more details are here:

  • Share/Bookmark

Tanel Poder Cool stuff, Oracle, Performance, Troubleshooting, Tuning

New versions of LatchProf and LatchProfX for latch contention troubleshooting and tuning

February 15th, 2010

The LatchProf and LatchProfX scripts allow you to be more systematic with latch contention troubleshooting and tuning. No more guesswork is needed as these scripts give you exact session IDs and in this version also SQLIDs of the troublemaking applications.

You can download the new versions here:

Example output (with SQLID info) is below:

SQL> @latchprof name,sid,sqlid % % 100000

-- LatchProf 1.21 by Tanel Poder ( http://www.tanelpoder.com )

NAME                                       SID SQLID               Held       Gets  Held %     Held ms Avg hold ms
----------------------------------- ---------- ------------- ---------- ---------- ------- ----------- -----------
cache buffers chains                       133 3jbwa65aqmkvm        349        348     .35      14.169        .041
simulator lru latch                        133 3jbwa65aqmkvm         51         51     .05       2.071        .041
row cache objects                          133 3jbwa65aqmkvm          5          5     .01        .203        .041
cache buffers chains                        24                        5          5     .01        .203        .041
cache buffers chains                       149 3jbwa65aqmkvm          3          3     .00        .122        .041
resmgr group change latch                   33 147a57cxq3w5y          2          2     .00        .081        .041
cache buffers chains                         9 5raw2bzx227wp          2          1     .00        .081        .081
In memory undo latch                       149 f3y38zthh270n          2          1     .00        .081        .081
active checkpoint queue latch                5                        2          1     .00        .081        .081
cache buffers chains                       149 75621g9y3xmvd          2          2     .00        .081        .041
cache buffers chains                         9 gvgdv2v90wfa7          2          2     .00        .081        .041
cache buffers chains                        33 75621g9y3xmvd          2          2     .00        .081        .041
checkpoint queue latch                       5                        1          1     .00        .041        .041
ksuosstats global area                       8                        1          1     .00        .041        .041
cache buffers lru chain                    133 3jbwa65aqmkvm          1          1     .00        .041        .041
multiblock read objects                    155 75ju2gn3s8009          1          1     .00        .041        .041
resmgr group change latch                    9 0w2qpuc6u2zsp          1          1     .00        .041        .041
resmgr group change latch                   33 apgb2g9q2zjh1          1          1     .00        .041        .041
resmgr group change latch                  133 apgb2g9q2zjh1          1          1     .00        .041        .041
space background task latch                 17                        1          1     .00        .041        .041
cache buffers chains                       149 5raw2bzx227wp          1          1     .00        .041        .041
cache buffers chains                        33 5raw2bzx227wp          1          1     .00        .041        .041
cache buffers chains                        33 05s4vdwsf5802          1          1     .00        .041        .041
cache buffers chains                        31 0yas01u2p9ch4          1          1     .00        .041        .041
cache buffers chains                         9 41zu158rqf4kf          1          1     .00        .041        .041
In memory undo latch                        33 0bzhqhhj9mpaa          1          1     .00        .041        .041
In memory undo latch                        31 gvgdv2v90wfa7          1          1     .00        .041        .041
In memory undo latch                         9 gvgdv2v90wfa7          1          1     .00        .041        .041
simulator lru latch                        149 3jbwa65aqmkvm          1          1     .00        .041        .041
row cache objects                          133 5yq51dtyc6qf2          1          1     .00        .041        .041
SQL memory manager workarea list la        133 3jbwa65aqmkvm          1          1     .00        .041        .041
enqueues                                   141                        1          1     .00        .041        .041
row cache objects                          132                        1          1     .00        .041        .041

33 rows selected.

LatchProf scripts allow you to easily identify which session and SQLID (or sqlhash in 9i) cause the latch(es) to be held the most.

Let’s check what’s the most “latch-holding” SQL reported by LatchProf:

SQL> @sqlid 3jbwa65aqmkvm

HASH_VALUE  CH# SQL_TEXT
---------- ---- ------------------------------------------------------------------------------------------------------------------------------------------------------
1432996723    0 SELECT O.ORDER_ID, LINE_ITEM_ID, PRODUCT_ID, UNIT_PRICE, QUANTITY, ORDER_MODE, ORDER_STATUS, ORDER_TOTAL, SALES_REP_ID, PROMOTION_ID, C.CUSTOMER_ID,
 CUST_FIRST_NAME, CUST_LAST_NAME, CREDIT_LIMIT, CUST_EMAIL, ORDER_DATE FROM ORDERS O , ORDER_ITEMS OI, CUSTOMERS C WHERE O.ORDER_ID = OI.ORDER_ID AND
 O.CUSTOMER_ID = C.CUSTOMER_ID AND O.ORDER_STATUS <= 4

If you want to read more about the capabilities of LatchProf and LatchProfX, go here:

Note that the latest version (v1.21) also fixes a problem with Oracle 11.2 where the script execution plan order was wrong, causing the sampling to not happen in correct order. I added a NO_TRANSFORM_DISTINCT_AGG hint to disable a new transformation happening in 11.2 to make the scripts behave correctly…

  • Share/Bookmark

Tanel Poder Administration, Cool stuff, Oracle, Performance, Troubleshooting

Oracle Troubleshooting: How to read Oracle ERRORSTACK output?!

February 14th, 2010

I have written the first article to the troubleshooting section of my new website tech.E2SN.com:

It’s about a very valuable Oracle troubleshooting tool -> ERRORSTACK trace.

I cover 4 frequently asked questions there:

  1. Reading the current executing SQL statement text from errorstack trace
  2. Reading the current executing PL/SQL package and PL/SQL source code line number from errorstack trace
  3. Reading the current bind variable values from errostack trace
  4. Identifying how much private (UGA) memory a cursor is using

You can read it here:

By the way, if you like my new website, feel free to link to it !!! ;-)

  • Share/Bookmark

Tanel Poder Administration, Cool stuff, Internals, Oracle, Troubleshooting, Tuning

Future appearances, conferences and seminars

February 11th, 2010

Just to let you know where I’ll be in the near future :)

Seminars

Advanced Oracle Troubleshooting 2.0

I have rearranged and adjusted the material so it flows smoother, has even more practical tools and scripts and describes some internals even deeper ;-)

These dates are close, last chance to register ;-) Note that after these I won’t be doing an Advanced Oracle Troubleshooting class in US for a while…

Conferences

Hotsos Symposium, Dallas, TX, 8-11 March

I will deliver 2 presentations at the Hotsos Symposium 2010

  • Back to Basics: Choosing the Starting Point of Performance Tuning and Troubleshooting Wisely
  • Understanding LGWR, log file sync Waits and Commit Performance

Also, I will be the Training Day speaker after the conference (I feel quite honored about this btw ;)

  • Scripts and Tools for Oracle Troubleshooting and Advanced Performance Analysis

Note that I will show some very cool (and of course, useful) stuff at the Training Day! Some things are gonna be so cool that I almost want to sit in the audience myself! One of the things is called MOATS – the Mother Of All Tuning Scripts for example ;-)

Miracle OpenWorld 2010, Denmark, 14-16 April

I will deliver two sessions at MOW

  • Understanding LGWR, Log file sync Waits and Commit Performance
  • Oracle Wait Interface is Useless (sometimes) – together with  James Morle

It will be fun! :)

IOUG Collaborate, Las Vegas, NV, 19-22 April

Does “What happens in Vegas…” also apply to all the new stuff learnt at a conference? ;-)

EMEA Harmony 2010, Tallinn, Estonia, 20-21 May

This event will be a blast – awesome speakers (Steven Feuerstein, Tom Kyte, C.J. Date among others) and for the first time the Estonian,Latvian,Finnish and Russian OUG organize such event together (I’m from Estonia originally who don’t know that)

I will deliver a keynote there and one more session:

  • Back to Basics: Choosing the Starting Point of Performance Tuning and Troubleshooting Wisely (keynote)
  • Understanding LGWR, log file sync Waits and Commit Performance

Michigan OakTable Symposium, Ann Arbor, MI, 16-17 September

This will be an awesome event, lots of OakTable people there, including me!

  • Back to Basics: Choosing the Starting Point of Performance Tuning and Troubleshooting Wisely
  • Understanding LGWR, log file sync Waits and Commit Performance

I think we’ll even have a chance to see some football there…

  • Share/Bookmark

Tanel Poder Cool stuff, Oracle, Training

Using Process Memory Matrix script for calculating Oracle process memory usage on Solaris

February 11th, 2010

I just published a new script and article about calculating the real Oracle process memory usage on Solaris.

The problem with V$PROCESS* views (and the V$SESSTAT) is that they will tell you what Oracle thinks it’s using, not the real amount of memory used. There will be discrepancies due how memory is actually allocated in OS, libraries out of Oracle’s control, the static memory areas inside Oracle binary and of course bugs.

I was working on one of such problems and decided to make my life easier by writing the script. It’s not so much about calculating the exact figures (they will never be 100% correct), but more about presenting the memory usage data in a better and more convenient fashion.

The script is called procmm and stands for Process Memory Matrix as it shows the memory usage in a matrix grid.

Here’s an example output to show what I’m talking about:

oracle@solaris02:~/research/memory$ ./procmm.sh -t `pgrep -f ora_.*SOL102`

-- procmm.sh: Process Memory Matrix v1.01 by Tanel Poder ( http://tech.e2sn.com )
-- All numbers are shown in kilobytes

Total PIDs 17, working: .................

PID            SEGMENT_TYPE      VIRTUAL          RSS         ANON       LOCKED    SWAP_RSVD
------ -------------------- ------------ ------------ ------------ ------------ ------------
0                       lib       389844       388796        13180            0        17816
0                    oracle      1629064      1628908         3336            0        42012
0            ism_shmid=0x1d      6963336      6963336            0      6963336            0
0             hc_SOL102.dat           48           48            0            0            0
0                      anon        32936        15936        15452            0        32868
0                     stack         1660         1628         1592            0         1660
0                      heap        37004        18016        16844            0        37004
------ -------------------- ------------ ------------ ------------ ------------ ------------
0                 TOTAL(kB)      9053892      9016668        50404      6963336       131360

And here’s the full article and in there a link to the script:

Comments are welcome here, as I haven’t set up commenting on my other site yet…

  • Share/Bookmark

Tanel Poder Uncategorized

Oracle Wait Event reference

February 9th, 2010

Kyle Hailey has started putting together a much needed Oracle wait event reference.

You can access it here.

By the way, Oracle documentation also has a wait event reference section, it has more events, but it’s less detailed…

I have plans to go deep into some wait events and cover some less common ones in tech.E2SN too… in the future ;-)

  • Share/Bookmark

Tanel Poder Cool stuff, Internals, Oracle, Performance, Tuning

Oracle Peformance Visualization…

February 3rd, 2010

Coskan Gundogar and Karl Arao have written two interesting articles about Oracle performance analysis and visualization, check these out!

Coskan’s article:

Karl’s article:

Note that in March I will be releasing PerfSheet v3.0, which will have lots of improvements! ;-)

  • Share/Bookmark

Tanel Poder Cool stuff, Oracle, Performance

Bind Variable Peeking – execution plan inefficiency

February 2nd, 2010

In my Beyond Oracle Wait interface article I troubleshooted a test case where an execution plan somehow went “crazy” and started burning CPU, lots of logical IOs and the query never completed.

I have uploaded the test case I used to my new website, to a section where I will upload some of my demo scripts which I show at my seminars (and people can download & test these themselves too):

http://tech.e2sn.com/oracle-seminar-demo-scripts

Basically what I do is this:

  1. I run the query with bind variable values where only a handful of rows match the filter condition. Thus Oracle picks nested loop join (and indexed access path)
  2. Then I run the same query with different bind values, where a lot of rows match the filter condition. Oracle reuses existing execution plan (with nested loops!!!). Oracle ends up looping through a lot of blocks again and again (because nested loop visits the “right” side of the join once for every row coming from the “left” side of the join).

Using nested loops over lots of rows is a sure way to kill your performance.

And an interesting thing with my script is that the problem still happens in Oracle 11.1 and 11.2 too!

Oracle 11g has Adaptive Cursor Sharing, right? This should take care of such a problem, right? Well no, adaptive bind variable peeking is a reactive technique – it only kicks in after the problem has happened!

So feel free to download the script, review it and test it out!

  • Share/Bookmark

Tanel Poder Oracle, Performance, SQL, Troubleshooting