What the heck is the INTERNAL_FUNCTION in execution plan predicate section?

Sometimes you see something like this in an execution plan:

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |       |       |     2 (100)|          |
|*  1 |  TABLE ACCESS FULL| T    |     1 |    22 |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("B"=INTERNAL_FUNCTION("A"))

There’s quite a little information available about what the INTERNAL_FUNCTION really is and why does it show up, thus this blog entry.

There’s actually no function called INTERNAL_FUNCTION in Oracle and this is (partially) confirmed also by querying the V$SQLFN_METADATA view:

SQL> @sqlfn %internal%

no rows selected

The common understanding, coming from Oracle documentation is that the INTERNAL_FUNCTION is some sort of a special function doing (implicit) datatype conversion. This is only partially true and not the whole truth – but let’s examine the datatype conversion first and proceed to the explanation and other examples later on.

Implicit datatype conversion

I’m creating a table which stores a date in VARCHAR2 datatype in column A and a date in DATE datatype in column B:

SQL> CREATE TABLE t(a VARCHAR2(20), b DATE);

Table created.

SQL> @desc t
           Name                            Null?    Type
           ------------------------------- -------- ----------------------------
    1      A                                        VARCHAR2(20)
    2      B                                        DATE

SQL> INSERT INTO t VALUES( TO_CHAR(sysdate), sysdate) ;

1 row created.

Now let’s run a simple select query and see its execution plan:

SQL> SELECT * FROM t WHERE a = b;

A                    B
-------------------- -----------------
20130116 17:41:49    20130116 17:41:49

SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR);

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |       |       |     2 (100)|          |
|*  1 |  TABLE ACCESS FULL| T    |     1 |    21 |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("B"=INTERNAL_FUNCTION("A"))

What happens here is that Oracle is forced to (implicitly) add a datatype conversion function around column A, to be able to physically compare two different datatypes. Internally Oracle is not running a comparison "WHERE a = b" anymore, but rather something like "WHERE TO_DATE(a) = b". This is one of the reasons why the INTERNAL_FUNCTION shows up – the code generating the human-readable execution plan from the actual “binary” execution plan is not able to convert the internal opcode to a corresponding human-readable function name, thus shows a default “INTERNAL_FUNCTION” string there instead.

This is Oracle 10g+ behavior, in 9i and before, Oracle just printed nothing as that function name, example output from 9i is below:

1 - filter(("DUAL"."DUMMY")=:TEST_VAR)

Do you see that there are seemingly unnecessary brackets around “DUAL”.”DUMMY” above? Why not just "DUAL.DUMMY"=:TEST_VAR, is this ("DUAL"."DUMMY"):=TEST_VAR usage some typo? It’s actually not a typo, before Oracle 10g you had to be careful to spot any such “unneccesary” brackets as they really indicated there was some function called, something like F(“DUAL”.”DUMMY”), only that the “F” was never printed. In Oracle 10g onwards, a generic “INTERNAL_FUNCTION” is printed instead of nothing in such cases – at least we know there’s some function applied to the column/variable. You need to look into the code (and possibly session-level NLS_ settings) to figure out what function it could logically be (like a function casting TIMESTAMP into a DATE when comparing these datatypes).

Un-unparseable Complex Expressions

Ok, the datatype conversion reason is already covered in Oracle docs and blogosphere. But there’s more.

Check out this example with an OR clause:

SQL> CREATE TABLE t AS SELECT * FROM dba_objects;

Table created.

SQL> SELECT COUNT(*) FROM t WHERE owner = 'SYS' OR owner = 'SYSTEM';

  COUNT(*)
----------
     32272

SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR);

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |       |       |   293 (100)|          |
|   1 |  SORT AGGREGATE    |      |     1 |    17 |            |          |
|*  2 |   TABLE ACCESS FULL| T    | 36652 |   608K|   293   (1)| 00:00:04 |
---------------------------------------------------------------------------

   2 - filter(("OWNER"='SYS' OR "OWNER"='SYSTEM'))

All is fine so far – the DBMS_XPLAN.DISPLAY_CURSOR function, which reads the plans directly from library cache is able to explain the predicate correctly.

Now let’s make the predicate a little more complex, I will add another OR to this predicate, but it’s against a different column:

SQL> SELECT COUNT(*) FROM t WHERE owner = 'SYS' OR owner = 'SYSTEM' OR object_id = 123;
...
SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR);

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |       |       |   293 (100)|          |
|   1 |  SORT AGGREGATE    |      |     1 |    30 |            |          |
|*  2 |   TABLE ACCESS FULL| T    | 36652 |  1073K|   293   (1)| 00:00:04 |
---------------------------------------------------------------------------

   2 - filter((INTERNAL_FUNCTION("OWNER") OR "OBJECT_ID"=123))

Now, suddenly the two conditions on the OWNER table are gone and replaced by an INTERNAL_FUNCTION?

Let’s try an IN operator instead of the OR, but wait, we are checking for values in two different columns (so we can’t put them all into the same IN clause):

SQL> SELECT COUNT(*) FROM t WHERE owner IN ('SYS','SYSTEM','SCOTT') AND object_type = 'TABLE';

  COUNT(*)
----------
      1178

SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR);

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |       |       |   293 (100)|          |
|   1 |  SORT AGGREGATE    |      |     1 |    28 |            |          |
|*  2 |   TABLE ACCESS FULL| T    |   374 | 10472 |   293   (1)| 00:00:04 |
---------------------------------------------------------------------------

   2 - filter((INTERNAL_FUNCTION("OWNER") AND "OBJECT_TYPE"='TABLE'))

Still no luck.
Let’s try a logically simpler operation, with just searching for 3 values in the same column:

SQL> SELECT COUNT(*) FROM t WHERE owner IN ('SYS','SYSTEM','SCOTT');

  COUNT(*)
----------
     32278

SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR);

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |       |       |   293 (100)|          |
|   1 |  SORT AGGREGATE    |      |     1 |    17 |            |          |
|*  2 |   TABLE ACCESS FULL| T    | 31960 |   530K|   293   (1)| 00:00:04 |
---------------------------------------------------------------------------

   2 - filter(("OWNER"='SCOTT' OR "OWNER"='SYS' OR "OWNER"='SYSTEM'))

Works! Oracle has transformed (or at least shows it in the explanation) this IN predicate to a bunch of OR-ed conditions (against the same column).

You might already see what’s going on with the earlier examples – DBMS_XPLAN.DISPLAY_CURSOR is not able to explain “complex” composite predicates applied in a single execution plan step, which include multiple different columns AND at least one of the columns has multiple values to check for (like an in-list or OR-ed predicates).

Where does DISPLAY_CURSOR get its data from and an explanation

DBMS_XPLAN.DISPLAY_CURSOR gets its plan data from V$SQL_PLAN, the predicate section comes from ACCESS_PREDICATES and FILTER_PREDICATES columns. But when I query the V$SQL_PLAN directly, I still see the same problem:

SQL> SELECT id, filter_predicates FROM v$sql_plan WHERE sql_id = 'gcqgrmtna9g1u';

        ID FILTER_PREDICATES
---------- ------------------------------------------------------------
         0
         1
         2 (INTERNAL_FUNCTION("OWNER") AND "OBJECT_TYPE"='TABLE')

And you may have noticed that there are brackets() around the raw ORed conditions above too, which in 9i meant that there was an “unexplained” internal function in the “binary” execution plan around the reported predicates, but in this case (as 10g+ supports the internal_function naming), blank function names shouldn’t occur… Not really sure why there about this, but that’s too low level little detail for this post.

The V$SQL_PLAN view itself accesses the actual “binary” child cursor in library cache (after taking appropriate latches/pins/mutexes) and UNPARSES it. Why such term – well isn’t parsing something that takes a human readable input and translates it into computer-understandable “binary” format. Thus unparsing is the opposite – V$SQL_PLAN accesses the cursor’s “binary” execution plan memory structure and translates it to human-readable execution plan output. There’s even a parameter controlling this V$SQL_PLAN behavior, if it’s set to false, the ACCESS_PREDICATES and FILTER_PREDICATES columns will be empty there:

SQL> @pd unparse
Show all parameters and session values from x$ksppi/x$ksppcv...

NAME                             VALUE                                      DESCRIPTION
----------------------------- --------- -----------------------------------------------
_cursor_plan_unparse_enabled      TRUE          enables/disables using unparse to build
                                                                  projection/predicates

By the way, why do I keep saying “binary” execution plan and in double quotes? It’s because I want to emphasize that the real execution plan that Oracle executes is not in the text form like we see on the screen, the text is just generated for humans, for troubleshooting reasons. The execution plan is not a real executable binary (as in oracle.exe) either, it’s not directly fed to the CPUs for execution. The physical execution plan in the library cache child cursor is a bunch of opcodes, object_ids and pointers for defining the hierarchy and order of rowsource execution. It’s the SQL execution engine, which then loops through these opcodes, decodes them and knows what to do (which rowsource function to call) next.

So, as we’ve seen above, some predicates with complex AND/OR conditions chained together are displayed as INTERNAL_FUNCTION() by DBMS_XPLAN.DISPLAY_CURSOR and V$SQL_PLAN as they are unable to decode (unparse) the execution plan info fully.

Using the good old EXPLAIN PLAN

There’s some good news though! The good old EXPLAIN PLAN command can unparse (some of) these complex predicates properly! As the EXPLAIN PLAN operation parses the given SQL again, in a special, more instrumented way, it has more information at hand apparently (and it uses more memory too). Or it could just be that whoever wrote V$SQL_PLAN, didn’t write the piece of code for unparsing more complex predicates :)

Check this output, where I’m using the explain plan command instead of just running the SQL:

SQL> EXPLAIN PLAN FOR 
     SELECT COUNT(*) FROM t WHERE owner IN ('SYS','SYSTEM','SCOTT') AND object_type = 'TABLE';

Explained.

SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY);

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |    28 |   293   (1)| 00:00:04 |
|   1 |  SORT AGGREGATE    |      |     1 |    28 |            |          |
|*  2 |   TABLE ACCESS FULL| T    |   374 | 10472 |   293   (1)| 00:00:04 |
---------------------------------------------------------------------------

   2 - filter(("OWNER"='SCOTT' OR "OWNER"='SYS' OR "OWNER"='SYSTEM')
              AND "OBJECT_TYPE"='TABLE')

It’s a miracle! The INTERNAL_FUNCTION is gone and all the predicate values are shown correctly. EXPLAIN PLAN command was very useful here.

So, while I usually don’t use the EXPLAIN PLAN command as explain plan can lie to you, then whenever I see an INTERNAL_FUNCTION in the DISPLAY_CURSOR/V$SQL_PLAN/SQL Monitor output, I run an explain plan command for the same query in hope of quickly finding out what the predicates in there really are.

Ok, it’s 2:30am here yet again – off to sleep! :)

Posted in Oracle | 6 Comments

A tip for lazy Oracle users – type less with ANSI DATE and TIMESTAMP SQL syntax

I am lazy, therefore if I can type less, I will type less.

Often people are surprised to see that Oracle supports the ANSI DATE and TIMESTAMP syntax in the SQL code, which allows me to shorten the lengthy TO_DATE( …. , ‘YYYY-MM-DD HH24:MI:SS’) syntax a bit. You can just type this if you want to compare some field to a date (day precision):

SQL> SELECT COUNT(*) FROM dba_objects WHERE created > DATE'2012-12-01';

  COUNT(*)
----------
         0

SQL> SELECT COUNT(*) FROM dba_objects WHERE created > DATE'2012-01-01';

  COUNT(*)
----------
       804

This way you can always use the standard date format regardless of any NLS settings that may say otherwise.

Note that in above example, the date 2012-01-01 means 2012-01-01 00:00:00 really, so any objects created on that date from 00:00:01 onwards would be included in the results. If you want timestamp precision, then you can use the TIMESTAMP keyword:

SQL> SELECT COUNT(*) FROM dba_objects WHERE created > TIMESTAMP'2012-01-01 08:12:34';

  COUNT(*)
----------
       805

SQL> SELECT COUNT(*) FROM dba_objects WHERE created > TIMESTAMP'2012-01-01 08:12:34.000000000';

  COUNT(*)
----------
       805

SQL> SELECT COUNT(*) FROM dba_objects WHERE created > TIMESTAMP'2012-01-01 08:12:34.000000000 -08:00';

  COUNT(*)
----------
       805

I think this syntax works since Oracle 9i as this is where ANSI-SQL syntax was introduced into Oracle.

You can use this syntax also in some other situations, like the FLASHBACK TO TIMESTAMP and SELECT … AS OF TIMESTAMP clauses of a statement:

SQL> SELECT COUNT(*) FROM t AS OF TIMESTAMP TIMESTAMP'2012-12-29 20:00:10 +03:00';

  COUNT(*)
----------
     25237

SQL> SELECT COUNT(*) FROM t AS OF TIMESTAMP TIMESTAMP'2012-12-29 20:01:00 +03:00';

  COUNT(*)
----------
         1

SQL> ALTER TABLE t ENABLE ROW MOVEMENT;

Table altered.

SQL> FLASHBACK TABLE t TO TIMESTAMP TIMESTAMP'2012-12-29 20:00:00 +03:00';

Flashback complete.

SQL> SELECT COUNT(*) FROM t;

  COUNT(*)
----------
     25237

Note the double “TIMESTAMP TIMESTAMP …” above. This is not a typo, the first TIMESTAMP tells you want to flash back by specifying a human timestamp as opposed to Oracle transactional timestamp in which case you would have used the “SCN” keyword. And the second TIMESTAMP is the ANSI timestamp specifier (you could use the good old TO_DATE() syntax there).

Enjoy saving time every time you type a SQL command with date&time logic in it (those of you who didn’t know about this feature, you can buy me beer when we meet).

Happy new year! :)

Update: Sayan Malakshinov has added an interesting piece of information into the comments section – that there’s also an undocumented TIME keyword available. Let’s do a quick test:


SQL> SELECT TIME'12:34:56.000000' a FROM dual;

A
---------------------------------------------------------------------------
12.34.56.000000000 PM

This returns only the time-of-day component apparently, but no day, month, year info.

Let’s try to create a table to see what datatype we’ll end up with then:

SQL> CREATE TABLE t AS SELECT TIME'12:34:56.000000' a FROM dual;
CREATE TABLE t AS SELECT TIME'12:34:56.000000' a FROM dual
*
ERROR at line 1:
ORA-00902: invalid datatype

Error!
Sayan mentioned an event 10407 in the comment, which allowed the table to be created. Let’s check the oraus.msg file and see what it’s about:

10407, 00000, "enable datetime TIME datatype creation"
// *Cause:
// *Action:  set this event to enable datetime datatype creation
// *Comment: This event is set when the user wants to create
//           a datetime datatype column.  The compatible=8.1.0.0.0 parameter
//           must also be set.

Let’s use this event:

SQL> ALTER SESSION SET EVENTS '10407 trace name context forever, level 1';

Session altered.

SQL> CREATE TABLE t AS SELECT TIME'12:34:56.000000' a FROM dual;

Table created.

Ok it’s possible now (but undocumented + unsupported too).

So after disabling the event and dropping the table I tried a few more variations with casting the TIME datatype to something else:

SQL> CREATE TABLE t AS SELECT CAST(TIME'12:34:56.000000' AS DATE) a FROM dual;
CREATE TABLE t AS SELECT CAST(TIME'12:34:56.000000' AS DATE) a FROM dual
                              *
ERROR at line 1:
ORA-00932: inconsistent datatypes: expected NUMBER got TIME

Ok, can’t convert it to DATE. Let’s try TIMESTAMP:

SQL> CREATE TABLE t AS SELECT CAST(TIME'12:34:56.000000' AS TIMESTAMP) a FROM dual;

Table created.

SQL> @desc t
           Name                            Null?    Type
           ------------------------------- -------- ----------------------------
    1      A                                        TIMESTAMP(6)


SQL> SELECT * FROM t;

A
---------------------------------------------------------------------------
29-DEC-12 12.34.56.000000 PM

Converting to TIMESTAMP worked – looks like the current DATE is chosen as the date component for the result.

Hmm, I wonder if we can somehow concatenate a date component and a time component together?


SQL> SELECT TRUNC(sysdate - 100) + TIME'15:00:00' FROM dual;
SELECT TRUNC(sysdate - 100) + TIME'15:00:00' FROM dual
                            *
ERROR at line 1:
ORA-30087: Adding two datetime values is not allowed

Well, apparently not.

Anyway, my flight departs in 40 minutes, so got to get moving. Thanks Sayan for the TIME addition – it’s good to learn new stuff (every day ;-)

Posted in Oracle, Productivity | 31 Comments

Snapper v3.61 released – and more work is in progress!

Here’s the latest version of my Oracle session level performance Snapper tool:

I’m going to add more stuff to Snapper in coming days, but thought to release something for testing already :)

  1. There are some cosmetic changes, like printing empty lines and header lines for better readability (still controllable by the pagesize parameter, look into the scripts).
  2. I’ve made some changes in the visual “graph” column just for easier readability when glancing at a snapper report:

    the @-character is used for CPU time (@ looks like a letter C a bit, doesn’t it :)
    the #-character is for all other Time model time
    thw W-character is for wait time 

  3. Snapper now displays the SQL child cursor number by default in the “ASH” breakdown section 
  4. The main new feature is the addition of event count columns from V$SESSION_EVENT, which allows me to calculate session-level wait event duration averages too. For example, next time you see the log file sync wait event taking lots of time, you can immediately check how much these individual waits take on average instead of having to derive it from other metrics like user commits and transaction rollbacks. In the example below I’ve identified one user session performing commits (SID 298) and also LGWRs session (492) and am snapping them at the same time.

 

Scroll right in the below section to see the wait time average columns (if you want narrower output, look into the script’s “configurable stuff” section to hide individual columns):

SQL> @snapper all 5 1 298,492

Sampling SID 298,492 with interval 5 seconds, taking 1 snapshots...

-- Session Snapper v3.61 by Tanel Poder ( http://blog.tanelpoder.com )

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    SID, USERNAME  , TYPE, STATISTIC                                                 ,         DELTA, HDELTA/SEC,    %TIME, GRAPH       , NUM_WAITS,  WAITS/SEC,   AVERAGES
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    298, SOE       , STAT, Requests to/from client                                   ,            36,        7.2,         ,             ,          ,           ,
    298, SOE       , STAT, opened cursors cumulative                                 ,           379,       75.8,         ,             ,          ,           ,
    298, SOE       , STAT, user commits                                              ,            54,       10.8,         ,             ,          ,           ,
    298, SOE       , STAT, user calls                                                ,            72,       14.4,         ,             ,          ,           ,
    298, SOE       , STAT, recursive calls                                           ,           433,       86.6,         ,             ,          ,           ,

... lots of output removed ...

    298, SOE       , STAT, execute count                                             ,           323,       64.6,         ,             ,          ,           ,
    298, SOE       , STAT, bytes sent via SQL*Net to client                          ,          2859,      571.8,         ,             ,          ,           ,
    298, SOE       , STAT, bytes received via SQL*Net from client                    ,          6972,      1.39k,         ,             ,          ,           ,
    298, SOE       , STAT, SQL*Net roundtrips to/from client                         ,            34,        6.8,         ,             ,          ,           ,
    298, SOE       , STAT, cell flash cache read hits                                ,            44,        8.8,         ,             ,          ,           ,
    298, SOE       , TIME, parse time elapsed                                        ,          1242,    248.4us,      .0%, [          ],          ,           ,
    298, SOE       , TIME, PL/SQL execution elapsed time                             ,         39365,     7.87ms,      .8%, [#         ],          ,           ,
    298, SOE       , TIME, DB CPU                                                    ,        137978,     27.6ms,     2.8%, [@         ],          ,           ,
    298, SOE       , TIME, sql execute elapsed time                                  ,        152021,     30.4ms,     3.0%, [#         ],          ,           ,
    298, SOE       , TIME, DB time                                                   ,        690485,    138.1ms,    13.8%, [##        ],          ,           ,
    298, SOE       , WAIT, latch: cache buffers chains                               ,            27,      5.4us,      .0%, [          ],         1,         .2,       27us
    298, SOE       , WAIT, log file sync                                             ,        530709,   106.14ms,    10.6%, [WW        ],        20,          4,    26.54ms
    298, SOE       , WAIT, SQL*Net message to client                                 ,            65,       13us,      .0%, [          ],        34,        6.8,     1.91us
    298, SOE       , WAIT, SQL*Net message from client                               ,        511002,    102.2ms,    10.2%, [W         ],        34,        6.8,    15.03ms
    298, SOE       , WAIT, PL/SQL lock timer                                         ,       3589765,   717.95ms,    71.8%, [WWWWWWWW  ],       227,       45.4,    15.81ms
    298, SOE       , WAIT, cell single block physical read                           ,         30350,     6.07ms,      .6%, [W         ],        47,        9.4,   645.74us
    298, SOE       , WAIT, events in waitclass Other                                 ,            50,       10us,      .0%, [          ],         1,         .2,       50us

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    SID, USERNAME  , TYPE, STATISTIC                                                 ,         DELTA, HDELTA/SEC,    %TIME, GRAPH       , NUM_WAITS,  WAITS/SEC,   AVERAGES
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    492, (LGWR)    , STAT, non-idle wait time                                        ,           328,       65.6,         ,             ,          ,           ,
    492, (LGWR)    , STAT, non-idle wait count                                       ,           350,         70,         ,             ,          ,           ,
    492, (LGWR)    , STAT, in call idle wait time                                    ,           148,       29.6,         ,             ,          ,           ,
    492, (LGWR)    , STAT, messages received                                         ,           343,       68.6,         ,             ,          ,           ,
    492, (LGWR)    , STAT, background timeouts                                       ,             1,         .2,         ,             ,          ,           ,
    492, (LGWR)    , STAT, physical write total IO requests                          ,           384,       76.8,         ,             ,          ,           ,
    492, (LGWR)    , STAT, physical write total bytes                                ,       1330688,    266.14k,         ,             ,          ,           ,
    492, (LGWR)    , STAT, cell physical IO interconnect bytes                       ,       2661376,    532.28k,         ,             ,          ,           ,
    492, (LGWR)    , STAT, redo wastage                                              ,        105336,     21.07k,         ,             ,          ,           ,
    492, (LGWR)    , STAT, redo writes                                               ,           344,       68.8,         ,             ,          ,           ,
    492, (LGWR)    , STAT, redo blocks written                                       ,          2604,      520.8,         ,             ,          ,           ,
    492, (LGWR)    , STAT, redo write time                                           ,           332,       66.4,         ,             ,          ,           ,
    492, (LGWR)    , STAT, redo blocks checksummed by FG (exclusive)                 ,           200,         40,         ,             ,          ,           ,
    492, (LGWR)    , TIME, background cpu time                                       ,        117983,     23.6ms,     2.4%, [#         ],          ,           ,
    492, (LGWR)    , TIME, background elapsed time                                   ,       4282180,   856.44ms,    85.6%, [######### ],          ,           ,
    492, (LGWR)    , WAIT, rdbms ipc message                                         ,       1673152,   334.63ms,    33.5%, [WWWW      ],       261,       52.2,     6.41ms
    492, (LGWR)    , WAIT, log file parallel write                                   ,       3068565,   613.71ms,    61.4%, [WWWWWWW   ],       367,       73.4,     8.36ms
    492, (LGWR)    , WAIT, events in waitclass Other                                 ,            55,       11us,      .0%, [          ],         4,         .8,    13.75us

--  End of Stats snap 1, end=2012-12-18 20:23:18, seconds=5

---------------------------------------------------------------------------------------------
Active% | SQL_ID          | SQL_CHILD | EVENT                               | WAIT_CLASS
---------------------------------------------------------------------------------------------
    71% |                 | 0         | log file parallel write             | System I/O
     7% |                 |           | log file sync                       | Commit
     5% | c13sma6rkr27c   | 0         | ON CPU                              | ON CPU
     2% | 0y1prvxqc2ra9   | 0         | ON CPU                              | ON CPU

--  End of ASH snap 1, end=2012-12-18 20:23:18, seconds=5, samples_taken=42

PL/SQL procedure successfully completed.

Having a quick way to check session level wait event average durations can speed up your troubleshooting – as it’s an important difference whether your session is responding slowly due to 1000 x 1 ms waits or 1 x 1000 ms wait.

Of course averages may hide some detail, but with Snapper these averages are not from system wide, but from session level sources and usually calculated over a snapshot of a few seconds, not minutes or hours.

As a next step, I plan to add more helpful averages (the usual things I currently manually calculate when needed) after the TIME model and some STAT metrics in Snapper output too. This should help to save a few seconds when troubleshooting that super-critical issue next time ;-)

So please test it out and tell me what you think!

 

(P.S. I have just updated my Oracle performance online seminars page with 2013 info!)

Posted in Uncategorized | 17 Comments

Select statement generating redo – and lost write detection

Gwen Shapira has written a nice summary of a problem case where the classic wait interface based troubleshooting method is not always enough for troubleshooting low-level issues.

The top SQL + top wait approach should usually be used as the starting point of troubleshooting a session, workload etc, but often the troubleshooting does not stop there. So, when the wait interface and related tools don’t explain the problem well enough, then you either start guessing from there or dig deeper into performance data. And Gwen used the V$SESSTAT metrics (using my Snapper tool) to understand why was a select statement generating redo this time (there are multiple reasons for that – and V$SESSTAT tells you why).

By the way, I’ve repeated this many times, here’s the data and the order of it what I normally use when I want to understand what’s going on in a session or workload:

  1. TOP Waits / TOP SQL
  2. V$SESSTAT counters
  3. Process stack samples

Sometimes looking into #1 is enough, often I need to look into #1 and #2 and rarely, but regularly I need to look into #3 too. But I always start from #1, then #2 and then, if needed, #3. I call this the “Advanced Oracle Troubleshooting” method ;-)

Read Gwen’s article here and another recent blog entry of mine about how V$SESSTAT helps you to measure, not guess.

Posted in Exadata, Oracle | Leave a comment

Sqlplus is my second home: Part 6 – Colorizing sqlplus and terminal output

Jagjeet Singh mentioned that he has built a V2 of his cool SQL Dashboard sqlplus-based monitoring script, which now also shows a nice colored ASH graph output. Yes, right inside sqlplus!

This reminded me that I demoed color output in sqlplus when I delivered the Hotsos Symposium training day a few years ago, when introducing the MOATS tool the first time. I never got to colorize it as I had hoped to though… as MOATS was more of a fun project I worked on with Adrian Billington just to prove that it is possible to build a monitoring tool purely based on SQL and PL/SQL with a TOP-like self-refreshing screen output.

Anyway, Jagjeet has done this cool work now, but if you want to colorize some of your own scripts, then on XTerm-capable terminal emulators its simply done with just outputting some escape codes before the text you want to color. Most (or even all) widely used XTerm clients support 8 colors for text and 8 for background – but some terminal emulators (like putty, iTerm, Terminator) even support 256 colors!

Here’s a simple example script I demoed at the Hotsos training day too:

Just run it in sqlplus and see what happens! Note that you have to use the sqlplus in an XTerm terminal emulator (maybe ANSI terminals would work too) – this won’t work in Windows Command prompt!

Anyway, have fun and keep those cool & colorful sqlplus scripts coming! :-)

 

Posted in Cool stuff, Oracle | 5 Comments

Optimizer statistics-driven direct path read decision for full table scans (_direct_read_decision_statistics_driven)

Hello all fellow Oracle geeks and technology enthusiasts! Long time no see ;-)

In the hacking session about Oracle full table scans and direct path reads I explained how the direct path read decision is not done by the optimizer, but instead during every execution, separately for every single segment (partition) scanned in the query. I also explained how the _small_table_threshold parameter and the X$KCBOQH.NUM_BUF(which keeps track of how many buffers of any segment are currently cached) are used for determining whether to scan using direct path reads or not.

If you don’t know what the _small_table_threshold parameter is and how it relates to the direct path read decision, then read this post by Alex Fatkulin first.

In addition to the abovementioned values, Oracle needs to know how many blocks there are to scan (under the HWM) in a segment. This was traditionally done by reading in the segment header block first (using a regular buffered single block read – “db file sequential read”).

Starting from Oracle 11.2.0.2, things have changed a little. Instead of making the direct path read decision based on the actual segment block count extracted from the segment header, Oracle actually takes this number from TAB$.BLKCNT (dba_tables.blocks) or TABPART$.BLKCNT, IND$.LEAFCNT etc.

There’s a new parameter _direct_read_decision_statistics_driven which controls this:

 SQL> @pd direct_read_decision
 Show all parameters and session values from x$ksppi/x$ksppcv...

NAME                                               VALUE    DESCRIPTION
 -------------------------------------------------- -------- ----------------------------------------------------------
 _direct_read_decision_statistics_driven            TRUE     enable direct read decision based on optimizer statistics
  • When this parameter is FALSE, the direct path read decision is done based on the segment header’s block count (actual block count).
  • When TRUE (default in 11.2.0.2+), the direct path read decision is done based on the block count stored in one of the base tables (TAB$, IND$) – the optimizer statistics

Note that even though the block counts are taken from the optimizer statistics in data dictionary, it’s not the optimizer who does the direct path read decision in the traditional sense (it’s not a cost-based decision).

Here’s an example from my test database:

SQL> CREATE TABLE t AS SELECT * FROM dba_source;

Table created.

SQL> @gts t
Gather Table Statistics for table t...

PL/SQL procedure successfully completed.

SQL>
SQL> SELECT blocks FROM user_tables WHERE table_name = 'T';

BLOCKS
----------
10704

SQL>

The table uses 10704 blocks (up HWM).

SQL> SELECT name,block_size,buffers FROM v$buffer_pool;

NAME                 BLOCK_SIZE    BUFFERS
-------------------- ---------- ----------
DEFAULT                    8192       9424

The table (10704 blocks) is bigger than the entire buffer cache (9424 buffers). And this is way bigger than the _small_table_threshold value of 168 in my instance (watch the full table scans and direct path reads) hacking session for more about this parameter.

So whenever I run a SELECT COUNT(*) FROM t, I see direct path reads show up:

SQL> @snapper ash,stats=sw,sinclude=table.scan|gets 5 1 99
Sampling SID 99 with interval 5 seconds, taking 1 snapshots...

-- Session Snapper v3.54 by Tanel Poder ( http://blog.tanelpoder.com )

----------------------------------------------------------------------------------------------------------------------------------------
    SID, USERNAME  , TYPE, STATISTIC                                                 ,         DELTA, HDELTA/SEC,    %TIME, GRAPH
----------------------------------------------------------------------------------------------------------------------------------------
     99, SYS       , STAT, db block gets                                             ,             1,         .2,
     99, SYS       , STAT, db block gets from cache                                  ,             1,         .2,
     99, SYS       , STAT, consistent gets                                           ,         11867,      2.37k,
     99, SYS       , STAT, consistent gets from cache                                ,             7,        1.4,
     99, SYS       , STAT, consistent gets from cache (fastpath)                     ,             6,        1.2,
     99, SYS       , STAT, consistent gets - examination                             ,             1,         .2,
     99, SYS       , STAT, consistent gets direct                                    ,         11860,      2.37k,
     99, SYS       , STAT, no work - consistent read gets                            ,         11859,      2.37k,
     99, SYS       , STAT, cleanouts only - consistent read gets                     ,             1,         .2,
     99, SYS       , STAT, table scans (long tables)                                 ,             1,         .2,
     99, SYS       , STAT, table scans (direct read)                                 ,             1,         .2,
     99, SYS       , STAT, table scan rows gotten                                    ,        739834,    147.97k,
     99, SYS       , STAT, table scan blocks gotten                                  ,         11860,      2.37k,
     99, SYS       , TIME, parse time elapsed                                        ,            46,      9.2us,      .0%, |          |
     99, SYS       , TIME, DB CPU                                                    ,         79988,       16ms,     1.6%, |@         |
     99, SYS       , TIME, sql execute elapsed time                                  ,        254990,       51ms,     5.1%, |@         |
     99, SYS       , TIME, DB time                                                   ,        255375,    51.08ms,     5.1%, |@         |
     99, SYS       , WAIT, enq: KO - fast object checkpoint                          ,        174947,    34.99ms,     3.5%, |@         |
     99, SYS       , WAIT, direct path read                                          ,          1280,      256us,      .0%, |          |
     99, SYS       , WAIT, SQL*Net message to client                                 ,             9,      1.8us,      .0%, |          |
     99, SYS       , WAIT, SQL*Net message from client                               ,       4672912,   934.58ms,    93.5%, |@@@@@@@@@@|
     99, SYS       , WAIT, events in waitclass Other                                 ,             6,      1.2us,      .0%, |          |
--  End of Stats snap 1, end=2012-09-02 20:03:55, seconds=5

---------------------------------------------------------------------------------
Active% | SQL_ID          | EVENT                               | WAIT_CLASS
---------------------------------------------------------------------------------
     2% | 88r4qn9mwhcf5   | enq: KO - fast object checkpoint    | Application
     2% | 88r4qn9mwhcf5   | ON CPU                              | ON CPU

--  End of ASH snap 1, end=2012-09-02 20:03:55, seconds=5, samples_taken=43

Let’s now fake the table stats so it looks like that there’s only 5 blocks in it – way below the _small_table_threshold value

SQL> EXEC DBMS_STATS.SET_TABLE_STATS(user,'T',numblks=>5);

PL/SQL procedure successfully completed.

SQL> SELECT COUNT(*) FROM t /* attempt 2 */;

  COUNT(*)
----------
      1000

The direct path reads are gone – we are doing regular buffered reads now!

SQL> @snapper ash,stats=sw,sinclude=table.scan|gets 5 1 99

Sampling SID 99 with interval 5 seconds, taking 1 snapshots...

-- Session Snapper v3.54 by Tanel Poder ( http://blog.tanelpoder.com )

----------------------------------------------------------------------------------------------------------------------------------------
    SID, USERNAME  , TYPE, STATISTIC                                                 ,         DELTA, HDELTA/SEC,    %TIME, GRAPH
----------------------------------------------------------------------------------------------------------------------------------------
     99, SYS       , STAT, db block gets                                             ,             1,        .17,
     99, SYS       , STAT, db block gets from cache                                  ,             1,        .17,
     99, SYS       , STAT, consistent gets                                           ,         11865,      1.98k,
     99, SYS       , STAT, consistent gets from cache                                ,         11865,      1.98k,
     99, SYS       , STAT, consistent gets from cache (fastpath)                     ,         11528,      1.92k,
     99, SYS       , STAT, consistent gets - examination                             ,             1,        .17,
     99, SYS       , STAT, no work - consistent read gets                            ,         11851,      1.98k,
     99, SYS       , STAT, cleanouts only - consistent read gets                     ,             1,        .17,
     99, SYS       , STAT, table scans (long tables)                                 ,             1,        .17,
     99, SYS       , STAT, table scan rows gotten                                    ,        738834,    123.14k,
     99, SYS       , STAT, table scan blocks gotten                                  ,         11852,      1.98k,
     99, SYS       , TIME, parse time elapsed                                        ,            84,       14us,      .0%, |          |
     99, SYS       , TIME, DB CPU                                                    ,        109983,    18.33ms,     1.8%, |@         |
     99, SYS       , TIME, sql execute elapsed time                                  ,        116709,    19.45ms,     1.9%, |@         |
     99, SYS       , TIME, DB time                                                   ,        117102,    19.52ms,     2.0%, |@         |
     99, SYS       , WAIT, db file scattered read                                    ,         63956,    10.66ms,     1.1%, |@         |
     99, SYS       , WAIT, SQL*Net message to client                                 ,             8,     1.33us,      .0%, |          |
     99, SYS       , WAIT, SQL*Net message from client                               ,       5119722,   853.29ms,    85.3%, |@@@@@@@@@ |
--  End of Stats snap 1, end=2012-09-02 20:06:19, seconds=6

---------------------------------------------------------------------------------
Active% | SQL_ID          | EVENT                               | WAIT_CLASS
---------------------------------------------------------------------------------
     2% | 07sgczqj432mr   | db file scattered read              | User I/O

--  End of ASH snap 1, end=2012-09-02 20:06:19, seconds=5, samples_taken=46

Note that I deliberately forced a hard parse (with the “attempt 2″ comment) to compile a new cursor. The _direct_read_decision_statistics_driven parameter is not part of the optimizer environment, so a new child cursor would not be automatically created after the parameter change (the same applies to the _small_table_threshold and _serial_direct_read parameters, by the way). But when I change the SQL text, then an entirely new (parent and child) cursor will be compiled anyway.

But wait a minute! Why do I need to compile a new cursor to get Oracle to read the new block count value from optimizer stats?!

I have said in the beginning of this post (and in many other places) that the direct path read decision is not done by the optimizer anyway and is a runtime decision done during every execution, every time any segment (including individual partitions) is scanned during query runtime. This is true for the old (up to 11.2.0.1) Oracle versions, where a direct path decision is done based on the actual, current block count in the segment header, thus the decision can suddenly change when a segment grows by a few blocks, crossing the _small_table_threshold calculation threshold. Perhaps due to performance stability reasons, this seems to have changed.

My tests on 11.2.0.2 have so far shown that when using the new statistics-driven direct path read decisions, each segments’ block counts are stored somewhere in the compiled cursor and reused during next executions of it, even if the block count of the segment changes in the optimizer stats later on! This might result in somewhat better stability as long as you don’t gather new stats – and your buffer cache size (and already cached block counts) don’t change. However if the amount of cached blocks of a segment does change (due to other, index-based accesses for example), then the direct path decision can still change during runtime. It’s just the block counts which are stored in the cursor, but the other factors affecting the decision (buffer cache size, cached block counts) can still change.

This topic is especially relevant on Exadata, as the entire Smart Scanning functionality depends on whether a direct path read IO method gets picked for full segment scans. When experimenting with this, you’ve got to be pretty careful and thorough (to not come to wrong conclusions) as there multiple moving parts and factors involved in the decisions:

  1. Are block counts taken from optimizer stats or segment header
  2. Do the segment header and/or optimizer stats block counts change
  3. Does the buffer cache size change (thus the _small_table_threshold too)
  4. Does the amount of cached blocks of a segment change
  5. Parallel vs Serial execution
  6. Are buffered parallel full scans allowed (the in-memory PX feature of 11.2)
  7. Did a new child cursor get created or the old one reused
  8. etc :)

Update: there’s an interesting comment in another page about how the incorrectly handled subpartitions statistics cause Oracle to not choose a direct path read for some subpartitions.

Posted in Exadata, Oracle | 19 Comments

Do you like to watch TV? Watch Enkitec.TV from now on! :-)

I’m happy to announce Enkitec TV which is a video channel/aggregator of some of the coolest stuff we do here at Enkitec. I have uploaded some of my videos there, including the previously unpublished Oracle parameters infrastructure hacking session and Kerry’s & Cary’s E4 Exadata interview is available there as well!

I’ll be posting more of my hacking sessions (hopefully shorter ones!!!) there in the future – and so will other fellow Oracle-Exadata-and-Big-Data geeks of Enkitec too!

I think video is so much better for communicating certain types of demos, use cases and examples (and interviews) – so instead of writing extremely long blog articles in the future, I will likely just record a video (and let others do the hard work of writing later on ;-)

So, sit back, grab a beer (or coffee) and enjoy Enkitec.TV!

 

Note that it’s possible to subscribe to the iTunes podcast too:

 

Posted in Cool stuff, Exadata, Oracle | 2 Comments

The limitations of CURSOR_SHARING = FORCE and FORCE_MATCHING_SIGNATURE for SQL plan stability

It’s a well known fact that the cursor_sharing parameter can be set to FORCE for making Oracle replace any literals with system-generated bind variable placeholders and then calculating the SQL hash value for looking up an existing cursor in library cache. This should reduce the amount of hard parsing and shared pool garbage.

Also, the same mechanism (of replacing literal values with bind variable placeholders before calculating the SQL hash value for library cache lookup) can be used for enforcing a SQL profile for SQL statements which differ by literal values in them. You just accept a SQL profile using DBMS_SQLTUNE.ACCEPT_PROFILE( …, FORCE_MATCH=>TRUE).

However this “force matching” has one limitation which can make it almost useless for achieving plan stability in databases where the applications use dynamically generated SQL.

Here’s an example – I’m just setting up my sqlplus environment, set cursor_sharing = FORCE (which is not needed for DBMS_SQLTUNE’s FORCE_MATCH by the way, I’m just doing it for displaying the “new” SQL text used for hash value computation):


SQL> COL force_matching_signature FOR 99999999999999999999999999
SQL> COL sql_text FOR A100
SQL>
SQL> ALTER SESSION SET cursor_sharing = FORCE;

Session altered.

SQL>
SQL> SELECT * FROM dual WHERE rownum IN (1,2,3);

D
-
X

And now I’ll query V$SQL to see what the actual SQL text looks like (and its related hash values):


SQL> SELECT sql_id,hash_value,force_matching_signature,sql_text FROM v$sql WHERE sql_id = (SELECT prev_sql_id FROM v$session WHERE sid = USERENV('SID'));

SQL_ID HASH_VALUE FORCE_MATCHING_SIGNATURE SQL_TEXT
------------- ---------- --------------------------- ----------------------------------------------------------------------------------------------------
fqcq5k1wd4d4h 2027041936 1308158718700150644 SELECT * FROM dual WHERE rownum IN (:"SYS_B_0",:"SYS_B_1",:"SYS_B_2")

Scroll right to see how the literal values 1,2,3 have been replaced with :SYS_B_x bind variables in SQL text. The SQL hash value (and SQL_ID and force_matching_signature) are calculated after this replacement was done.

So, now I’ll run a similar statement with just different literal values:


SQL> SELECT * FROM dual WHERE rownum IN (999,888,777);

no rows selected

SQL> SELECT sql_id,hash_value,force_matching_signature,sql_text FROM v$sql WHERE sql_id = (SELECT prev_sql_id FROM v$session WHERE sid = USERENV('SID'));

SQL_ID HASH_VALUE FORCE_MATCHING_SIGNATURE SQL_TEXT
------------- ---------- --------------------------- ----------------------------------------------------------------------------------------------------
fqcq5k1wd4d4h 2027041936 1308158718700150644 SELECT * FROM dual WHERE rownum IN (:"SYS_B_0",:"SYS_B_1",:"SYS_B_2")

See, different literal values (999,888,777), but as they got replaced with exactly the same number of bind variables (in the same order), the resulting SQL text is exactly the same – therefore the SQL text hash values are also exactly the same as you see above.

So this is how these features work – the literal values are replaced in SQL text string just before calculating the hash value for library cache lookup.

However, let’s run one more query, this time with 4 literal values instead of 3:


SQL> SELECT * FROM dual WHERE rownum IN (999,888,777,666);

no rows selected

SQL> SELECT sql_id,hash_value,force_matching_signature,sql_text FROM v$sql WHERE sql_id = (SELECT prev_sql_id FROM v$session WHERE sid = USERENV('SID'));

SQL_ID HASH_VALUE FORCE_MATCHING_SIGNATURE SQL_TEXT
------------- ---------- --------------------------- ----------------------------------------------------------------------------------------------------
8btxq8q6avt6b 2360206539 15602626316910109322 SELECT * FROM dual WHERE rownum IN (:"SYS_B_0",:"SYS_B_1",:"SYS_B_2",:"SYS_B_3")

Now as we used 4 literals (999,888,777,666), they got replaced by 4 bind variables in the SQL text (as opposed to 3 in earlier queries) and thanks to that the resulting SQL text is different! Therefore the hash values will also be different and anything that’s matching/looking up cursors based on SQL text hash value, won’t find the previous query plan anymore. Note that the force_matching_signature is also different from previous thanks to this. So, a new hard parse and library cache load would be done for this cursor – and also any SQL Profile used with first 2 queries with FORCE_MATCH = TRUE would not be applicable to this new query – simply because it can’t find a profile matching the new force_matching_signature.

Now you might say who cares, I’ll just create a profile for this query with 4 literal values in the inlist and we are done. Yes true, but now you’d only have a profile for the query with 3 and 4 inlist values. Often in data warehouses, reporting environments and also in some OLTP systems, the developers are using dynamically generated in-lists extensively. By now I have seen queries with tens of thousands of literal values in  inlists (using OR concatenation to work around the ORA-01795: maximum number of expressions in a list is 1000 error). There’s also a bug in MOS about someone’s query failing with 60 000 bind variables in it! :)

So if you have a dynamically generated query with multiple IN-lists used for filtering different tables and the number of items in each inlist fluctuates anywhere between 1 and tens of thousands, you may end up with a very large number of possible combinations – and each of such combinations of inlist item counts would have its own force_matching_signature. Thus even if you think you have fixed your plans with a force matching profile, some day you will hit a yet another combination and have to reactively fix that “new” query again.

So what can you do in such case? (other than making sure that your schema, code, stats and configuration is good enough so that the CBO would come up with good plans for new queries by itself). Well, don’t use such dynamically generated inlist strings!

Instead of dynamically generating an inlist string with 10000 literal values (or binds), you can just create an array of these values in your application side and bind that entire array to your query as a single bind variable. So regardless of how many values you have to pass in to this in-list, they would all be still stored in a single bind variable, leaving the SQL hash values & force_matching_signature the same. PL/SQL supports it, OCI-based tools support it, Oracle’s JDBC supports it (and probably ODP.NET too). So there’s no excuse not to use it :)

I didn’t find my own demo code anymore, so I googled for a quick example in Java – http://blogs.itemis.de/kloss/2009/03/05/arrays-preparedstatements-jdbc-and-oracle/. If you do have your own code examples, feel free to post links to these into comments.

There are other working, but less clean options, like passing in a comma separated string as a single bind variable (either as VARCHAR2 or even a CLOB) and then using a string tokenizer function in the inlist to break this into individual values) or even always inserting the inlist items into a GTT and then joining it to the main query block (this could be useful if you have millions of inlist values and you don’t want to use too much of private memory used for storing bind values and PL/SQL arrays).

There’s possible one challenge though – the CBO’s cardinality estimates for such array-based inlists may be incorrect, so some adjustment might be needed either via the cardinality hint or a SQL profile or a variety of other means :)

Update: Karen Morton has written about another limitation of force_matching_signature. IIRC this used to be a bug, but apparently now it’s documented, so it’s a new feature :)

Posted in Oracle | 26 Comments

Oradebug hanganalyze with a prelim connection and “ERROR: Can not perform hang analysis dump without a process state object and a session state object.”

Back in the (really) old days, systemstate dumps had to be used for diagnosing hangs and finding blockers of hung databases. Basically you just identified which resource your waiting sessions were waiting for and then scanned through the rest of the system state dump to see which session held that particular resource (note that over time the instrumentation and systemstate dumps have evolved to resolve and dump the blocker information right at the waiting session section in the dump).

Diagnosing complex hangs was a tedious manual process (or required some scripting), so the hanganalyze was a really welcome addition to hang diagnosis. The hanganalyze basically walked through the blocker-waiter chains, found which session was the LEAF in the chain (the LEAF is the ultimate/final blocker in the wait chain), which everybody either directly or indirectly was waiting for.

Additionally the hanganalyze output allows you to draw the topology of the hang too, which may be useful in complex hang scenarios. In fact Enterprise Manager’s hang topology page also just calls ORADEBUG HANGANALYZE internally and formats the output into a visual hang graph (this is why the hang page requires OS credentials or sysdba privileges).

Starting from 11g, there is a welcome change though – there’s a new V$WAIT_CHAINS view, which lists all the hangs it has detected, it’s RAC-aware and is accessible via plain SQL. No need to run oradebug anymore, assuming that you are able to log on to query that V$ view :-)

And this is where we have a little problem – in rare circumstances an instance can get hung so that even a SYSDBA privilege holder can not log on, so how would you run the diagnostic queries & commands? This is why Oracle has introduced the preliminary connection option in sqlplus for diagnosing hung databases. With a prelim connection you will have a server process started for you, it attaches to the SGA shared memory segments, but it skips the last step of setting up that session/process-related structures in SGA. These operations on shared SGA structures have to be protected by latches/mutexes and would potentially get blocked if the related latches/mutexes are already held by someone else part of that hang.

I have already written about How to log on to Oracle when even sysdba cannot do so, but here’s an example:

$ sqlplus -prelim "/as sysdba"

SQL*Plus: Release 11.2.0.3.0 Production on Sun Jan 29 12:48:03 2012

Copyright (c) 1982, 2011, Oracle.  All rights reserved.

SYS:dbm1> ORADEBUG HANGANALYZE 3

Statement processed.

The “problem” is though that prelim connections only allow you to run ORADEBUG commands as running regular SQL do require that all the session/process structures are set up properly, plus that the parsing, library cache access etc would get latches/mutexes which could again block.

But we just wanted to dump hang analysis and ORADEBUG HANGANALYZE above seems to have succeeded. However when looking into the resulting tracefile, we see this:

*** 2012-01-29 12:48:11.041
Processing Oradebug command 'HANGANALYZE 3'
===============================================================================
HANG ANALYSIS:

ERROR: Can not perform hang analysis dump without a process
       state object and a session state object.
  ( process=(nil), sess=(nil) )

===============================================================================

It looks like hanganalyze isn’t able to work anymore without a fully logged on session where the process & session state objects (in v$process and v$session respectively) have been set up. This is a bit weird as one of the benefits of hanganalyze (and systemstate dumps) was that it did not require allocating any SGA structures nor take any latches & locks. 

This behavior change seems to have appeared since Oracle 11.2.0.2 and it’s documented in MOS note 452358.1, but that note doesn’t offer a good alternative for diagnosing hangs in such situations.

The alternative is actually simple – you should attach to an existing fully-initialized & logged on process with oradebug and let that process do the hanganalyze for you. Whatever process you attach to with oradebug is the process which will be performing the work for you.

So if you ever see the error above when diagnosing extreme hangs, then just identify some least critical process logged in to the database (not a background process ideally and definitely not critical bg processes such DBWR & LGWR etc) and attach to that with oradebug (as opposed to “oradebug setmypid”) and the hang analysis should work ok.

However, the catch is that the above ORADEBUG HANGANALYZE command is a shortcut, which always tries to run hanganalyze in your (preliminary) connection, by calling the hanganalyze function directly in the process. So you have to use ORADEBUG DUMP HANGANALYZE 3 instead (the 3 is the detail level where I usually start from) as this syntax will send the “dump hanganalyze” command to the target (fully-initialized) process where you’ve attached to.

Update: I added this to clarify which commands should you use when you hit this problem:

  1. sqlplus -prelim “/as sysdba”
  2. ORADEBUG SETOSPID OS_PID
  3. ORADEBUG DUMP HANGANALYZE 3

Enjoy :)

Posted in Oracle | 2 Comments