Explain Plan For command may show you the wrong execution plan – Part 1

In Oracle-L mailing list a question was asked about under which conditions can the explain plan report a wrong execution plan (not the one which was actually used when a problem happened).

I replied this, but thought to show an example test case of this problem too:

1) The optimizer statistics the EXPLAIN PLAN ends up using are different
from the statistics the other session ended up using

2) Explain plan does not use bind variable peeking thus will not optimize
for current bind variable values

3) Explain plan treats all bind variables as VARCHAR2, thus you ma have
implicit datatype conversion happening during the plan execution, (meaning
to_char,to_number functions are added around variables/columns) and this for
example may make optimizer to ignore some indexes if you get unlucky.

…Of course explain plan doesn’t really
execute the plan so the implicit datatype conversion you see is in the
explained plan only, but if you actually execute the statement (with correct
bind datatypes) then there’s no implicit datatype conversion. And that’s
where the difference comes from…

And here comes an example of condition number 3 above. Lets use a little bit of bad design out there and put numeric values into varchar2 columns:

SQL> create table t (id varchar2(10), name varchar2(100));

Table created.

SQL> insert into t select to_char(object_id), object_name from dba_objects;

51449 rows created.

Now we add a little index for lookup performance and gather stats:

SQL> create index i on t(id);

Index created.

SQL> exec dbms_stats.gather_table_stats(user,'T',cascade=>true);

PL/SQL procedure successfully completed.

Now lets define a bind variable of NUMBER type and set a value for it:

SQL> var x number
SQL>
SQL> exec :x:=99999

PL/SQL procedure successfully completed.

Now lets use “explain plan for” to estimate the execution plan:

SQL> explain plan for
 2  select sum(length(name)) from t where id >  :x;

Explained.

SQL> select * from table(dbms_xplan.display) ;

PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------
Plan hash value: 3694077449

-------------------------------------------------------------------------------------
| Id  | Operation                    | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |      |     1 |    29 |    56   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE              |      |     1 |    29 |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID| T    |  2572 | 74588 |    56   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN          | I    |   463 |       |     3   (0)| 00:00:01 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

 3 - access("ID">:X)

15 rows selected.

Explain plan command nicely reports that we’d be using an index range scan, which would be a good thing to do given my test data and search condition.

Now lets actually run the statement and see the REAL execution plan actually used for the execution. I’ll use dbms_xplan.display_CURSOR for this. If you don’t pass SQL_ID/child into that function it will just report the last SQL statement executed in your current session. But the key difference between the dbms_xplan.DISPLAY and DISPLAY_CURSOR is that the latter goes to library cache and fetches the actual SQL plan used from there. The explain plan command just reparses the statement and estimates a plan, ignoring any bind variable values and assuming that all bind variables are of type varchar2:

SQL> select sum(length(name)) from t where id >  :x;

SUM(LENGTH(NAME))
-----------------

SQL> select * from table(dbms_xplan.display_cursor);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------
SQL_ID  7zm570j6kj597, child number 0
-------------------------------------
select sum(length(name)) from t where id >  :x

Plan hash value: 2966233522

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |       |       |    60 (100)|          |
|   1 |  SORT AGGREGATE    |      |     1 |    29 |            |          |
|*  2 |   TABLE ACCESS FULL| T    |  2572 | 74588 |    60   (5)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

 2 - filter(TO_NUMBER("ID")>:X)

19 rows selected.

Whatta? We actually used a full table scan!

Also check out a related article by Kerry Osborne

Select COUNT(*) and COUNT(column) are different things!

Every now and then I see someone wondering why Oracle is “returning wrong results” for some count queries when counting using COUNT(column_name) instead of COUNT(*) or COUNT(<constant>).

Oracle is actually returning correct results, its just that sometimes the people asking the questions haven’t realized that COUNT(column) is something semantically different from COUNT(*).

COUNT(*) operation counts all rows fed to it by execution plan branch under it.

COUNT(column) operation on the other hand counts all non-null values in that column from rows fed to it by execution plan branch under it.

And here’s a little example:

[Read more...]

Scripts for showing execution plans via plain SQL and also in Oracle 9i

Hi all,

Here are few scripts which allow you to query SQL execution plans and their execution statistics out from V$SQL_PLAN and V$SQL_PLAN_STATISTICS yourself.

Why would you want to do this as there’s the DBMS_XPLAN.DISPLAY_CURSOR() function in Oracle 10g?

Well, my scripts work also on Oracle 9.2. Also they give you better detail (and flexibility) compared to DBMS_XPLAN.DISPLAY_CURSOR.

Of course in 9i there’s the DBMS_XPLAN.DISPLAY function which you can use in conjuction with EXPLAIN PLAN FOR command, but that approach has problems. EXPLAIN PLAN treats all bind variables as varchar2 datatype, possibly causing the reported execution plan to be different from reality. Also, if you have a long running report from last night and you run explain plan today morning, then if optimizer statistics have changed meanwhile, the explain plan command might come up with a different execution plan again, which is different from reality.

So that’s why in 10g we have the DBMS_XPLAN.DISPLAY_CURSOR which goes directly to required library cache child cursor and extracts (unparses) the execution plan from there. The function uses V$SQL_PLAN% views as its data source. And guess what – these views are there in version 9.2 already! And thats’ where my scripts come in:

Here’s an example. Let’s set statistics_level=all so we get rowsource level execution stats for the cursor (note that this parameter makes your query consume much more CPU so it should only be used at session level for troubleshooting a specific performance issue):

SQL> alter session set statistics_level = all;

Session altered.

SQL> select count(*) from all_users;

  COUNT(*)
----------
        36

I know the hash value of this query, so lets report its execution plan, directly from library cache. This is the REAL execution plan inside that child cursor, not some estimate like EXPLAIN PLAN command gives:

[Read more...]

60000 bind variables?! Maybe it’s time to use a temporary table instead…

I just noticed a bug 8277300 filed in Metalink with following description:

ORA-7445[XTYQBCB] OCCURS DURING EXECUTING SQL THAT USES 60000 BIND VARIABLES.

Wow! That’s about 100 times more bind variables in a single query than what I’ve seen in past. And I thought that query was bad!!! :)

I suspect this is a massive IN list passed to a query. Maybe it’s time to use a temporary table or a collection for passing in the IN values instead?

Why does even a small difference in SQL text cause a hard parse?

I just replied to an Oracle Forum Thread about why does even a small difference in SQL statement text cause it to be hard parsed and loaded as a different cursor. The reason is actually very simple – and I’m posting it into my blog too:

 

The reason why a statement with even a minor difference in text is parsed as a separate cursor is due how Oracle looks up statements from library cache.

 

First, a little background:

Advanced Oracle Troubleshooting Guide, Part 6: Understanding Oracle execution plans with os_explain

Get ready for some more adventures in Oracle process stack!

Before proceeding though, please read this post about safety of different stack sampling approaches.

I have had few non-trivial Oracle troubleshooting cases, related to query hangs and bad performance, where I’ve wanted to know where exactly in execution plan the current execution is.
Remember, Oracle is just another program executing instructions clustered in functions on your server, so stack sampling can help out here as well.

So, I was looking into the following stack trace taken from an Oracle 10.1 database on Solaris SPARC, running a SQL with this execution plan.

[Read more...]

Generating lots of rows using connect by – safely!

Every now and then I need to generate a large number of rows into some table.

Not that I have any personal urge to do so, but you know, this is needed for some test cases and so on ;-)

It’s quite well known that it’s possible to generate lots of rows using CONNECT BY LEVEL <= #rows syntax.

However as the CONNECT BY LEVEL syntax is recursive by nature, it will consume more memory the more rows you query in one shot ( I think it was Mark Bobak who once posted this observation to Oracle-L, but I didn’t manage to find that thread anymore ).

So, here’s a test case:

SQL> select count(r)
  2  from (
  3     select rownum r from dual connect by rownum <= 100000000
  4  )
  5  /
        select rownum r from dual connect by rownum <= 100000000
                             *
ERROR at line 3:
ORA-04030: out of process memory when trying to allocate 44 bytes (kxs-heap-w,cursor work heap)

After running for a while the server process run out of private memory, used for the CONNECT BY cursor work heap.

Let’s investigate:

[Read more...]

Advanced Oracle Troubleshooting Guide, Part 5: Sampling V$ stuff with WaitProf. Really fast. Using SQL!

I bet you thought I’ll be writing about direct SGA access?! ;)

Nope!

Direct SGA access has excellent troubleshooting potential (as long as you know the shared memory data structures), but it has one major drawback – very few companies have such tools already in place in their production systems.

I have occasionally been called in to solve an urgent performance problem, happening right now and it needs solving immediately! And did I mention, these are critical production systems. Where you can’t just install binary executables freshly downloaded off internet. In fact you would want to diagnose the issue with minimal impact and changes required to those production environments (and that leaves sql tracing out the first round troubleshooting tools for me as well!)

So, I’ve developed myself a toolset for such purpose, Snapper and sw.sql and some process stack reading techniques I already have introduced in my blog.

Next in line is waitprof.sql which is a high-frequency V$SESSION_WAIT sampler – implemented in plain SQL (not PL/SQL).

Waitprof is basically a sampling session wait profiler. It’s like running a select against V$SESSION_WAIT in a very tight loop and aggregating results – but I have used a trick to do all this in plain SQL, which gives me performance advantage over PL/SQL based loops. Waitprof is able to sample V$SESSION_WAIT for a session up to 100 000 times per second!

This depends on your hardware of course and Oracle version too, but normally you’ll get 50-70kHz sampling rate with it.

Ok, you want to see an example? ;-)

[Read more...]

Excellent article on Oracle 11g PL/SQL function result cache

I have so far avoided writing such pointer blog posts which only refer you to another article, but I have to do it with this one. Adrian Billington has written an excellent article on performance of Oracle 11g PL/SQL function result cache. His article is a good example of a thorough, well organized and well written technical content. I really enjoyed reading it and thanks to his thoroughness, he has just saved me some precious time doing that research on my own.

Get the article here: http://www.oracle-developer.net/display.php?id=504

Can you write a working SQL statement without using any whitespace?

I read this post by Laurent Schneider yesterday. In the comment section Tom Kyte already explained what the issue was about, but I’ll expand this explanation a little.

The question was why should the apparently invalid statement below work? I mean there is no such column nor number as “1.x”), yet the statement works ok:

SQL> select 1.x from dual;

         X
----------
         1

The column header gives a good hint what happened above. Oracle has treated the X as the column alias.

Let’s remove the “x” and see:

SQL> select 1. from dual;

        1.
----------
         1

Now all works as expected, “1.” is treated as number ( 1. = 1.0 )

But the question remains, how come Oracle splits up the string “1.x” and decides that the “x” is the column and “1.” is the number part – considering that there’s no whitespace between the 1. and x?

The answer is that apparently the string tokenizer used by Oracle SQL parser is smart enough that it doesn’t rely only on whitespace for recognizing token delimiters. It is also able to use character class analysis for understanding where a literal ends and the next token (like column alias) starts.

So, it is possible to write valid SQL statements without using any whitespace at all. For example:

[Read more...]