The full power of Oracle’s diagnostic events, part 1: Syntax for KSD debug event handling

There’s a recent thread in Oracle-L about deadlocks and a recommendation to dump various instance information when the deadlock happens. A deadlock trace dumps some useful things automatically, but sometimes you want more, especially in RAC environment.

So is it possible to make Oracle dump additional things when the deadlock event happens? Yes it is and it’s doable with Oracle diagnostic event handling infrastructure.

First I’ll take a step back and explain, what this command below means:

SQL> alter session set events '10046 trace name context forever, level 12';

Session altered.

Of course you know what it does, it enables extended SQL trace. But why such cumbersome syntax?

This syntax actually reveals some of the power of KSD diagnostic event syntax (KSD=kernel service debug):

10046

The first word in event string (the 10046) specifies the when some action should be taken.
And everything after that (trace name context forever, level 12) is that action.

trace

The first word in action (trace) specifies what type of action Oracle needs to take if that event (10046) occurs. By “event occurs” I mean that let say an Oracle parsing function calls ksdpec() function with 10046 as parameter, which in turn may recursively call some action set for that event and then returns that event level (12) back to the caller. Its up to the caller to act on that returned value, in 10046 case some ksd* tracing function is called then.

In event syntax, “trace” is most generic action, which is used for tracing, behavior changing and Oracle dumps. In this post I will concentrate on the “trace” action as it’s most common one to use.

name

The “name” specifies that the name of what to dump/trace will follow. The “name” is always present when “trace” option is used (as far as I know).

context

Now the next keyword (context) is the one where you can define whether you want Oracle to dump something when the event is hit or just do context specific trace. If you replace the “context” with “errorstack” for example, you wouldn’t get SQL trace output, but rather an Oracle errorstack dump whenever event 10046 is hit.

You can use “oradebug dumplist” to find all the possible dump commands what you can set as actions for an event.

forever

The next keyword (forever) is actually an option to the action, not an action keyword itself. Forever means that keep invoking the action when the 10046 event is hit, forever (or until explicitly disabled). If we don’t specify forever, then the action would be invoked only once and the event will disable itself after that.

level 12

The “level 12″ is also just another option to the action, specifying that the value for that event handler in given session should be 12. This means that whenever some Oracle function is checking whether that event is set, they will be returned value 12, the calling function interprets the value and acts appropriately (traces both binds and waits in our case).

As both the “forever” and “level 12″ are just options for the same action, separated by commas, we can replace their order like that:

SQL> alter session set events '10046 trace name context level 12, forever';

Session altered.

The outcome is exactly the same, tracing is enabled at level 12, for forever.

Also we can leave both options out to make Oracle use default values ( invoke the action only once and tracing level is set to 1 ):

SQL> alter session set events '10046 trace name context';

Session altered.

There are other options available in addition to just “forever”. If we want the action to be invoked let say 10000 times only and then automatically disable that event, we can use the LIFETIME option:

SQL> alter session set events '10046 trace name context level 12, lifetime 10000';

Session altered.

And if we want to ignore that event for some occurrences and start acting only after that, we can also specify AFTER x OCCURRENCES option to tell Oracle not to act on first x occurrences of that event and only activate itself after that:

SQL> alter session set events '10046 trace name context level 12, lifetime 10000, after 5000 occurrences';

Session altered.

You see I have combined 3 options to the action, level, lifetime and after 5000 occurrences. They can be specified in any order and need to be separated by commas (only if there are multiple options to an action, the first option must not be preceded by comma).

So far I spoke about 10046 event which is used for tracing, but there are other uses for diagnostic events.

Every ORA- error code in Oracle checks for an event with the same number. In fact Oracle diagnostic events and ORA- errors share the same range of numbers from 0 to 65535 for their codes (you can see most of them in $ORACLE_HOME/rdbms/mesg/oraus.msg file)

So, for example ORA-60 which is “deadlock detected” error will also check for event number 60 and if it is set, it will invoke its action(s) before sending the deadlock exception up the application stack.

For example, I could invoke hanganalyze at level 4 when ORA-60 is hit in my session:

SQL> alter session set events '60 trace name hanganalyze level 4';

Session altered.

As mentioned previously, you can run “oradebug dumplist” to see all actions available in your Oracle version (of course be careful when setting any undocumented events as some can even corrupt your database).

Sometimes we want more than just one type of dump when an error occurs. So it is possible to invoke multiple trace actions for a specific event. You need to separate these actions with semicolons.

SQL> alter session set events '60 trace name hanganalyze_global ; name systemstate level 266';

Session altered.

In the example above I tell Oracle to invoke a RAC global hang analyze and a systemstate dump with short stacks of processes ( level 256 + 10 = 266 ) whenever an ORA-60 deadlock error is raised.

It is possible to specify different options such level, event lifetime etc at each action level, for example we might not want to dump systemstate multiple times if the deadlock keeps reoccurring:

SQL> alter session set events '60 trace name hanganalyze_global level 4, forever; -
>                              name systemstate level 266, lifetime 1; -
>                              name processstate level 2, forever'
  2  /

Session altered.

So far I’ve defined the events upon which to take action using their numbers. Event 60 (which corresponds to ORA-00060 error message) is the “deadlock detected while waiting for resource” error message.

10046 on the other hand is not an actual error, but just a condition which is checked in few locations in Oracle kernel code (such OPI parse, execute, fetch and wait interface functions). Whenever these functions run, they check whether an event number 10046 is enabled for current process or session and if yes, then take appropriate action (which normally means tracing).

Note that it is possible to see most ORA- error messages and tracing/debug event numbers in $ORACLE_HOME/rdbms/admin/oraus.msg file on Unix platforms. Never set an event without well understanding what this event is doing and what do you want to achieve with it (and ideally you should have blessing from Oracle support/Metalink for using it).

A few numeric events have also text aliases. For example, instead of using ORA-60 as deadlock event definition, you could run the following statement:

SQL> alter session set events 'deadlock trace name hanganalyze_global';

Session altered.

Few more useful text aliases are:

  • PARSE_SQL_STATEMENT – corresponds to event 10035 which allows to write parse failures to alert.log file
  • SORT_END – corresponds to event 10033, “sort statistics (SOR*)”
  • SORT_RUN – corresponds to event 10032, “sort run information (SRD*/SRS*)”

So, armed with the knowledge for setting events, we can define fairly complex conditions:

For example we can use the following statement to follow some memory allocation of a sort operation. We take level 29 heapdumps (PGA+UGA+Call heaps) every sort pass (sort_run event) and also in the end of a sort operation (sort_end event). And also we want to take a workareatab_dump when sort ends but only during the first sort pass (thus the lifetime 1 condition).

SQL> alter session set events ' -
>     sort_end trace name workareatab_dump level  3, forever;    -
>                    name heapdump         level 29, forever:    -
>     sort_run trace name workareatab_dump level  3, lifetime 1; -
>                    name heapdump         level 29, forever'
  2  /

Session altered.

Note that the lifetime and after x occurrences conditions are accounted at event level in session or process state, not statement execution level.
So if event’s lifetime expires during some statement execution, then that event will be disabled until manually initialized with another alter session command.

There is also a special named event – IMMEDIATE. This means that don’t wait for some event to happen in the future, but execute the required action immediately, right now.

SQL> alter session set events 'immediate trace name processstate level 10';

Session altered.

This would invoke a process state dump immediately. It’s the same as running ORADEBUG DUMP PROCESSSTATE 10 command.

Few more interesting bits:

Event syntax can understand numbers in hex as well if you prefix them with “0x”:

SQL> alter session set events '0x273E trace name context forever, level 0xC';

Session altered.

Also, in 11g you can run following command too:

SQL> alter session set events '10046 off';

Session altered.

The above clears any set actions for that event.

Before 11g you can either explicitly set all event handlers to off for an event or you can use the “all” qualifier:

SQL> alter session set events '60 trace name all off';

Session altered.

This would clear all tracing actions for that event in session. However crash and debug actions would still remain (more about these later).

That should be enough for part 1, the syntax of KSD debug events. In next part I will demo couple examples where the advanced event syntax is useful. Also, there’s a reason why I keep mentioning KSD diagnostic events, there’s another diagnostic event handling mechanism in Oracle as well and Oracle has introduced pretty big improvements in 11g, but more about this in the future ;-)

Note that this year’s only Advanced Oracle Troubleshooting class takes place in the end of April/May 2014, so sign up now if you plan to attend this year!

This entry was posted in Cool stuff, Oracle and tagged , . Bookmark the permalink.

31 Responses to The full power of Oracle’s diagnostic events, part 1: Syntax for KSD debug event handling

  1. Vlado says:

    Tanel, thank your for the additional information.

    oradebug dumplist returns quite a few options.
    1. Are you aware of any documentation on what they mean?
    2. To capture bind values in deadlock situations, is errorstack level 2 the best option?
    3. Is there a way to find the source of the sql involved in the deadlock, ie. pkg_a called pkg_b which executed the sql

  2. Tanel Poder says:

    Update: I fixed the systemstate dump level. I changed it to correct value 266 (256+10) from incorrect value 522 (512+10).

    Systemstate dump at level 266 will produce short stack dumps of processes, this is useful for seeing where in kernel code the processes are (hung).

  3. Tanel Poder says:

    Vlado,

    1. There’s not much documentation available. There are references, but not much material about how to use these dumps. If you’re unsure then don’t use them :)

    And if you search metalink you find some more common ones like heapdump.

    2. If errorstack level 2 actually displays the binds on your version, then yes. If you’re really desperate then one option would also be to enable SQL_TRACE at level 4, but this will make everything much slower because it switches on the “sqlstat_enabled” optimizer parameter for any newly executed cursors.

    3. Errorstack level 2 should also include PL/SQL stack backtrace so you should know exact PL/SQL functions & line numbers involved. Otherwise you can read the same info from returned exception output (especially if your PL/SQL code uses dbms_utility to get full stack backtrace)

  4. Vlado says:

    Thank you for taking the time to answer my questions.
    Hopefully I will have a chance someday to reciprocate :)

  5. Asif Momen says:

    Really nice information.

  6. Surachart says:

    Thank you for your stuff.

    I’d like to use oradebug events ’10046′, that’s good and easy to trace some problems.

  7. Stefan Knecht says:

    Hi Tanel!

    Very very cool stuff yet again :-)

    Just one thing that gets me curious … How did you find that syntax ?

    You weren’t digging around in the binary, were you :)

    Cheers from Switzerland

    Stefan

  8. Good stuff mate! I didn’t know you can trigger several actions for one event. Nice.

  9. Boris says:

    Excellent sum up.

  10. Joel Wittenmyer says:

    For more information on oradebug events check http://www.evdbt.com/Oradebug_Modrakovic.pdf.

  11. Pingback: Log Buffer #138: A Carnival of the Vanities for DBAs

  12. Tanel Poder says:

    Thanks all!

    Stefan, no digging in binary, just observing various metalink notes, bug descriptions over some time and and having the curiosity about why the heck was syntax for enabling an event so complex :)

  13. Miladin says:

    Good post.

    There are two more words in syntax:
    - trace is well known but there are also

    debugger – invokes system debugger
    and
    crash – cause Oracle crash for testing

    e.g.

    SQL> alter session set events ‘immediate debugger ‘;

    Session altered.

    SQL> oradebug event immediate crash

    For more info visit my blog..

    Miladin

  14. Tanel Poder says:

    Yes these are available options and there’s more. But I will cover these in an upcoming blog entry.

    I’ve about a use case for crash event last year:

    http://blog.tanelpoder.com/2008/06/19/killing-an-oracle-process-from-inside-oracle/

  15. Pingback: Diagnostic events wit debugger and crash.. « Miladin Modrakovic’s Blog: Oraclue

  16. Miladin says:

    Yes I saw that one.I have been using it for some time .Oracle actually published this in course Dumps, Crashes and Corruptions long time ago.

  17. Tanel Poder says:

    Yeah Oracle should know this stuff, they’re the ones who wrote it! :)

  18. Cool stuff….
    Ramesh J Menon
    OCM 10g

  19. Hi Tanel,

    what would be your approach in getting heap dumps for ORA-4031 without filling the disk in case dozens of ORA-4031 appear at once.

    Can I use “lifetime 10″ in “alter system set events”?

    Best regards,
    Martin

  20. Tanel Poder says:

    Hi Martin,

    Which db version are you on? From 9.2 onwards you can use _4031_dump_bitvec parameter to control level and frequency of heapdumps on 4031 errors.

    Read metalink note 396940.1 for details.

    One other option is to set _4031_dump_bitvec = 0 to disable the built in heapdumps and use an event ’4031 trace name heapdump level x, lifetime y’

    Btw in more recent Oracle versions there are more _4031 parameters, all described in that metalink note i mentioned.

    
    SQL> @pd 4031
    
    NAME                                          VALUE                          DESCRIPTION
    --------------------------------------------- ------------------------------ ----------------------------------------------------
    _4031_dump_bitvec                             67194879                       bitvec to specify dumps prior to 4031 error
    _4031_max_dumps                               100                            Maximum number of 4031 dumps for this process
    _4031_dump_interval                           300                            Dump 4031 error once for each n-second interval
    _4031_sga_dump_interval                       3600                           Dump 4031 SGA heapdump error once for each n-second
                                                                                 interval
    
    _4031_sga_max_dumps                           10                             Maximum number of SGA heapdumps
    
    
  21. Igor says:

    Hi Tanel,

    what can I use for tracing PL/SQL calls and input varibles values for each input/output parameter ? May be errorstack but I see only PL/SQL package/procure/function name and line number …

  22. Tanel Poder says:

    Check out dbms_trace (if you can enable tracing before running the pl/sql code) or event 10938 if you want to enable this for a running session (google for it first though and note that you need to create a logging table for that AND this is not really a safe option for production)

    I dont remember whether these options trace pl/sql procedure arguments though!

  23. Hi Tanel, is it possible to enable the trace based on time limit?
    Like i want to enabled 10046 for next minute.

  24. Tanel Poder says:

    Not that I know of. Well at least not with the pre-11g event syntax. Maybe with new 11g debug infrastructure it is possible, but I don’t have access to the internal document describing the syntax.

    The examples (lifetime and after x times) I showed, are more useful in cases where you want to apply some action to certain error like deadlock or ORA-4030 etc.

  25. Pingback: The full power of Oracle’s diagnostic events, part 2: ORADEBUG DOC and 11g improvements | Tanel Poder's blog: Core IT for Geeks and Pros

  26. Harry says:

    Hi Tanel,

    Excellent post. I have learnt a lot from your blogs.
    I was looking up the messages file and found it under
    $ORACLE_HOME/rdbms/mesg instead of $ORACLE_HOME/rdbms/admin

    Thank you.
    Harry

  27. Pingback: Book Review: Oracle Database 11g Performance Tuning Recipes (Part 2) « Charles Hooper's Oracle Notes

  28. Hi Tanel,

    As an addition:
    MOS note 1402625.1 – Diagnostic Events in 10g and 11g [VIDEO]

  29. Pingback: SQL Trace Without Storage Worries « ORAganism

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>