Optimizer statistics-driven direct path read decision for full table scans (_direct_read_decision_statistics_driven)

Hello all fellow Oracle geeks and technology enthusiasts! Long time no see ;-)

In the hacking session about Oracle full table scans and direct path reads I explained how the direct path read decision is not done by the optimizer, but instead during every execution, separately for every single segment (partition) scanned in the query. I also explained how the _small_table_threshold parameter and the X$KCBOQH.NUM_BUF(which keeps track of how many buffers of any segment are currently cached) are used for determining whether to scan using direct path reads or not.

If you don’t know what the _small_table_threshold parameter is and how it relates to the direct path read decision, then read this post by Alex Fatkulin first.

In addition to the abovementioned values, Oracle needs to know how many blocks there are to scan (under the HWM) in a segment. This was traditionally done by reading in the segment header block first (using a regular buffered single block read – “db file sequential read”).

Starting from Oracle 11.2.0.2, things have changed a little. Instead of making the direct path read decision based on the actual segment block count extracted from the segment header, Oracle actually takes this number from TAB$.BLKCNT (dba_tables.blocks) or TABPART$.BLKCNT, IND$.LEAFCNT etc.

There’s a new parameter _direct_read_decision_statistics_driven which controls this:

 SQL> @pd direct_read_decision
 Show all parameters and session values from x$ksppi/x$ksppcv...

NAME                                               VALUE    DESCRIPTION
 -------------------------------------------------- -------- ----------------------------------------------------------
 _direct_read_decision_statistics_driven            TRUE     enable direct read decision based on optimizer statistics
  • When this parameter is FALSE, the direct path read decision is done based on the segment header’s block count (actual block count).
  • When TRUE (default in 11.2.0.2+), the direct path read decision is done based on the block count stored in one of the base tables (TAB$, IND$) – the optimizer statistics

Note that even though the block counts are taken from the optimizer statistics in data dictionary, it’s not the optimizer who does the direct path read decision in the traditional sense (it’s not a cost-based decision).

Here’s an example from my test database:

SQL> CREATE TABLE t AS SELECT * FROM dba_source;

Table created.

SQL> @gts t
Gather Table Statistics for table t...

PL/SQL procedure successfully completed.

SQL>
SQL> SELECT blocks FROM user_tables WHERE table_name = 'T';

BLOCKS
----------
10704

SQL>

The table uses 10704 blocks (up HWM).

SQL> SELECT name,block_size,buffers FROM v$buffer_pool;

NAME                 BLOCK_SIZE    BUFFERS
-------------------- ---------- ----------
DEFAULT                    8192       9424

The table (10704 blocks) is bigger than the entire buffer cache (9424 buffers). And this is way bigger than the _small_table_threshold value of 168 in my instance (watch the full table scans and direct path reads) hacking session for more about this parameter.

So whenever I run a SELECT COUNT(*) FROM t, I see direct path reads show up:

SQL> @snapper ash,stats=sw,sinclude=table.scan|gets 5 1 99
Sampling SID 99 with interval 5 seconds, taking 1 snapshots...

-- Session Snapper v3.54 by Tanel Poder ( http://blog.tanelpoder.com )

----------------------------------------------------------------------------------------------------------------------------------------
    SID, USERNAME  , TYPE, STATISTIC                                                 ,         DELTA, HDELTA/SEC,    %TIME, GRAPH
----------------------------------------------------------------------------------------------------------------------------------------
     99, SYS       , STAT, db block gets                                             ,             1,         .2,
     99, SYS       , STAT, db block gets from cache                                  ,             1,         .2,
     99, SYS       , STAT, consistent gets                                           ,         11867,      2.37k,
     99, SYS       , STAT, consistent gets from cache                                ,             7,        1.4,
     99, SYS       , STAT, consistent gets from cache (fastpath)                     ,             6,        1.2,
     99, SYS       , STAT, consistent gets - examination                             ,             1,         .2,
     99, SYS       , STAT, consistent gets direct                                    ,         11860,      2.37k,
     99, SYS       , STAT, no work - consistent read gets                            ,         11859,      2.37k,
     99, SYS       , STAT, cleanouts only - consistent read gets                     ,             1,         .2,
     99, SYS       , STAT, table scans (long tables)                                 ,             1,         .2,
     99, SYS       , STAT, table scans (direct read)                                 ,             1,         .2,
     99, SYS       , STAT, table scan rows gotten                                    ,        739834,    147.97k,
     99, SYS       , STAT, table scan blocks gotten                                  ,         11860,      2.37k,
     99, SYS       , TIME, parse time elapsed                                        ,            46,      9.2us,      .0%, |          |
     99, SYS       , TIME, DB CPU                                                    ,         79988,       16ms,     1.6%, |@         |
     99, SYS       , TIME, sql execute elapsed time                                  ,        254990,       51ms,     5.1%, |@         |
     99, SYS       , TIME, DB time                                                   ,        255375,    51.08ms,     5.1%, |@         |
     99, SYS       , WAIT, enq: KO - fast object checkpoint                          ,        174947,    34.99ms,     3.5%, |@         |
     99, SYS       , WAIT, direct path read                                          ,          1280,      256us,      .0%, |          |
     99, SYS       , WAIT, SQL*Net message to client                                 ,             9,      1.8us,      .0%, |          |
     99, SYS       , WAIT, SQL*Net message from client                               ,       4672912,   934.58ms,    93.5%, |@@@@@@@@@@|
     99, SYS       , WAIT, events in waitclass Other                                 ,             6,      1.2us,      .0%, |          |
--  End of Stats snap 1, end=2012-09-02 20:03:55, seconds=5

---------------------------------------------------------------------------------
Active% | SQL_ID          | EVENT                               | WAIT_CLASS
---------------------------------------------------------------------------------
     2% | 88r4qn9mwhcf5   | enq: KO - fast object checkpoint    | Application
     2% | 88r4qn9mwhcf5   | ON CPU                              | ON CPU

--  End of ASH snap 1, end=2012-09-02 20:03:55, seconds=5, samples_taken=43

Let’s now fake the table stats so it looks like that there’s only 5 blocks in it – way below the _small_table_threshold value

SQL> EXEC DBMS_STATS.SET_TABLE_STATS(user,'T',numblks=>5);

PL/SQL procedure successfully completed.

SQL> SELECT COUNT(*) FROM t /* attempt 2 */;

  COUNT(*)
----------
      1000

The direct path reads are gone – we are doing regular buffered reads now!

SQL> @snapper ash,stats=sw,sinclude=table.scan|gets 5 1 99

Sampling SID 99 with interval 5 seconds, taking 1 snapshots...

-- Session Snapper v3.54 by Tanel Poder ( http://blog.tanelpoder.com )

----------------------------------------------------------------------------------------------------------------------------------------
    SID, USERNAME  , TYPE, STATISTIC                                                 ,         DELTA, HDELTA/SEC,    %TIME, GRAPH
----------------------------------------------------------------------------------------------------------------------------------------
     99, SYS       , STAT, db block gets                                             ,             1,        .17,
     99, SYS       , STAT, db block gets from cache                                  ,             1,        .17,
     99, SYS       , STAT, consistent gets                                           ,         11865,      1.98k,
     99, SYS       , STAT, consistent gets from cache                                ,         11865,      1.98k,
     99, SYS       , STAT, consistent gets from cache (fastpath)                     ,         11528,      1.92k,
     99, SYS       , STAT, consistent gets - examination                             ,             1,        .17,
     99, SYS       , STAT, no work - consistent read gets                            ,         11851,      1.98k,
     99, SYS       , STAT, cleanouts only - consistent read gets                     ,             1,        .17,
     99, SYS       , STAT, table scans (long tables)                                 ,             1,        .17,
     99, SYS       , STAT, table scan rows gotten                                    ,        738834,    123.14k,
     99, SYS       , STAT, table scan blocks gotten                                  ,         11852,      1.98k,
     99, SYS       , TIME, parse time elapsed                                        ,            84,       14us,      .0%, |          |
     99, SYS       , TIME, DB CPU                                                    ,        109983,    18.33ms,     1.8%, |@         |
     99, SYS       , TIME, sql execute elapsed time                                  ,        116709,    19.45ms,     1.9%, |@         |
     99, SYS       , TIME, DB time                                                   ,        117102,    19.52ms,     2.0%, |@         |
     99, SYS       , WAIT, db file scattered read                                    ,         63956,    10.66ms,     1.1%, |@         |
     99, SYS       , WAIT, SQL*Net message to client                                 ,             8,     1.33us,      .0%, |          |
     99, SYS       , WAIT, SQL*Net message from client                               ,       5119722,   853.29ms,    85.3%, |@@@@@@@@@ |
--  End of Stats snap 1, end=2012-09-02 20:06:19, seconds=6

---------------------------------------------------------------------------------
Active% | SQL_ID          | EVENT                               | WAIT_CLASS
---------------------------------------------------------------------------------
     2% | 07sgczqj432mr   | db file scattered read              | User I/O

--  End of ASH snap 1, end=2012-09-02 20:06:19, seconds=5, samples_taken=46

Note that I deliberately forced a hard parse (with the “attempt 2″ comment) to compile a new cursor. The _direct_read_decision_statistics_driven parameter is not part of the optimizer environment, so a new child cursor would not be automatically created after the parameter change (the same applies to the _small_table_threshold and _serial_direct_read parameters, by the way). But when I change the SQL text, then an entirely new (parent and child) cursor will be compiled anyway.

But wait a minute! Why do I need to compile a new cursor to get Oracle to read the new block count value from optimizer stats?!

I have said in the beginning of this post (and in many other places) that the direct path read decision is not done by the optimizer anyway and is a runtime decision done during every execution, every time any segment (including individual partitions) is scanned during query runtime. This is true for the old (up to 11.2.0.1) Oracle versions, where a direct path decision is done based on the actual, current block count in the segment header, thus the decision can suddenly change when a segment grows by a few blocks, crossing the _small_table_threshold calculation threshold. Perhaps due to performance stability reasons, this seems to have changed.

My tests on 11.2.0.2 have so far shown that when using the new statistics-driven direct path read decisions, each segments’ block counts are stored somewhere in the compiled cursor and reused during next executions of it, even if the block count of the segment changes in the optimizer stats later on! This might result in somewhat better stability as long as you don’t gather new stats – and your buffer cache size (and already cached block counts) don’t change. However if the amount of cached blocks of a segment does change (due to other, index-based accesses for example), then the direct path decision can still change during runtime. It’s just the block counts which are stored in the cursor, but the other factors affecting the decision (buffer cache size, cached block counts) can still change.

This topic is especially relevant on Exadata, as the entire Smart Scanning functionality depends on whether a direct path read IO method gets picked for full segment scans. When experimenting with this, you’ve got to be pretty careful and thorough (to not come to wrong conclusions) as there multiple moving parts and factors involved in the decisions:

  1. Are block counts taken from optimizer stats or segment header
  2. Do the segment header and/or optimizer stats block counts change
  3. Does the buffer cache size change (thus the _small_table_threshold too)
  4. Does the amount of cached blocks of a segment change
  5. Parallel vs Serial execution
  6. Are buffered parallel full scans allowed (the in-memory PX feature of 11.2)
  7. Did a new child cursor get created or the old one reused
  8. etc :)

Update: there’s an interesting comment in another page about how the incorrectly handled subpartitions statistics cause Oracle to not choose a direct path read for some subpartitions.

Note that this year’s only Advanced Oracle Troubleshooting class takes place in the end of April/May 2014, so sign up now if you plan to attend this year!

This entry was posted in Exadata, Oracle. Bookmark the permalink.

28 Responses to Optimizer statistics-driven direct path read decision for full table scans (_direct_read_decision_statistics_driven)

  1. Tanel Poder says:

    By the way, I haven’t ran these tests on 11.2.0.3 so feel free to test this out (but make sure you don’t fall into any of the above-mentioned traps when interpreting results)

  2. You say that you think the segment’s block counts are stored with a cursor but other changes could still cause a switch to (or away from) direct path reads. Do you have a test case that shows that behavior? Flushing the buffer cache between runs could do it I think. It still seems odd to me that the optimizer picks a plan without considering the ramifications of smart scans on Exadata, but that’s a whole other topic.

    • Tanel Poder says:

      Yes, I tested this too. Flushing buffer cache caused sudden flipping to direct path reads, then manually forcing buffered reads *in another session* to read some blocks back into cache, caused buffered reads to be picked again in the original session.

      So, it seems like the direct path read *decision* is still done in runtime, for every full segment scan, but in 11.2.0.2+ one of the inputs is just “cached” inside the cursor, while other inputs are queried in real time from “the source” again and again.

      This optimizer-statistics based direct path read decision is not done by the optimizer, it’s still the full table scan row source (qertbFetch function for tables) which does this decision in runtime, so the optimizer has nothing to do with this, but optimizer *stats* do :)

    • Tanel Poder says:

      And it’d be more correct to say that not the segment’s block counts are stored, but the optimizer stats block counts are stored in the cursor when _direct_read_decision_statistics_driven = TRUE. If this parameter is false (or you’re on 11.2.0.1) then you can do this test:

      1) Insert 10 rows into a table
      2) SELECT COUNT(*) FROM it – you get buffered read
      3) Insert let’s say 10 Million rows more into the table
      4) Run exactly the same SELECT COUNT(*) FROM it again – you’ll get direct path read

      So, the old way is that we get the block counts again “from the source” every time we scan. The new way “caches” this block count info in a cursor (and “the source” is not segment header, but optimizer stats stored in data dictionary).

  3. Tanel Poder says:

    One more comment – I was talking about SERIAL direct path reads here, the parallel full scan direct path reads have an entirely different logic (and it has radically changed thanks to the in-memory PX in 11.2.0.2 and then even more in 11.2.0.3 thanks to improvements to the parallel cluster caching policy).

  4. GregG says:

    Thanks Tanel for investigation.
    Could You point me to some info about 6. Are buffered parallel full scans allowed (the in-memory PX feature of 11.2) ?
    Regards
    GregG

  5. Björn says:

    Hi

    Does not this execution to set the statistics on the table EXEC DBMS_STATS.SET_TABLE_STATS(user,’T',numblks=>5); invalid all the cursors on the table and that why it force a new compile of a cursors ?

    BR

    Björn

    • Tanel Poder says:

      Nope, the default is still the “auto_invalidate” for setting stats too… Which does not immediately invalidate the cursors, but distributes the potential invalidations over a 5 hour period.

  6. Houri says:

    Have you investigated the effect of the 10049 event on the direct path read decision?

    mohamed@mhouri> select * from v$version;

    BANNER
    ——————————————————————————
    Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 – Production
    PL/SQL Release 11.2.0.1.0 – Production
    CORE 11.2.0.1.0 Production
    TNS for 32-bit Windows: Version 11.2.0.1.0 – Production
    NLSRTL Version 11.2.0.1.0 – Production

    mohamed@mhouri> SELECT blocks FROM user_tables WHERE table_name = ‘TA1′;

    BLOCKS
    ———-
    26308

    mohamed@mhouri> SELECT name,block_size,buffers FROM v$buffer_pool;

    NAME BLOCK_SIZE BUFFERS
    ——————– ———- ———-
    DEFAULT 8192 24304

    mohamed@mhouri> SELECT COUNT(*) FROM ta1;

    COUNT(*)
    ———-
    1637840

    ———————————————————————–
    Active% | SQL_ID | EVENT | WAIT_CLASS
    ———————————————————————–
    67% | 1fnj2210p4q4v | direct path read | User I/O
    22% | 1fnj2210p4q4v | enq: KO – fast object che | Application
    11% | 1fnj2210p4q4v | ON CPU | ON CPU

    mohamed@mhouri> alter session set events ’10949 trace name context forever, level 1′;

    Session altered.

    mohamed@mhouri> SELECT COUNT(*) FROM ta1;

    COUNT(*)
    ———-
    1637840
    ———————————————————————–
    Active% | SQL_ID | EVENT | WAIT_CLASS
    ———————————————————————–
    89% | 1fnj2210p4q4v | db file scattered read | User I/O
    11% | 1fnj2210p4q4v | ON CPU | ON CPU

    Thanks a lot for your videos

    Mohamed Houri

    • Tanel Poder says:

      Hi Mohamed,

      I haven’t looked deeper into the effects of this event, as it’s not something I’d use in production anyway … and starting from 11.2.0.2 the _serial_direct_read would give enough control over the direct path read behavior.

  7. somu says:

    Sir,
    I have a doubt on the above usage of your snapper code. you have used it like this
    @snapper ash,stats=sw,sinclude=table.scan|gets 5 1 99
    Can you please tell me how will snapper behave for the stats=sw option. Sorry if my question is silly sir.

  8. Pingback: Upgraded to 11GR2? Congrats! You are in Direct Reads trouble! | The Pythian Blog

  9. GregG says:

    Hi Somu,
    as far as I know snapper with stats=sw will report statistics as well as waits .
    Regards
    GregG

    • Tanel Poder says:

      Yeah all you need to do is read the documentation included in snapper.sql to know what the SW options mean :) Yep I only wanted v$sesstat Statistics and Wait events (and not Time model).

  10. daniesh shaikh says:

    Hi Tanel,

    Thanks for this post.

    I am facing the same issue. We created a Materialized view which is taking time. Refresh is taking more than 1 hr. I see lots of dbfile scattered read and sequential read. when i see the execution plan it shows full tablescan. In my case below.

    SQL> select blocks from dba_tables where table_name=’GL_BALANCES’;

    BLOCKS
    ———-
    5254041

    SQL> SELECT name,block_size,buffers FROM v$buffer_pool;

    NAME BLOCK_SIZE BUFFERS
    ——————– ———- ———-
    DEFAULT 8192 163789

    The table is bigger than the buffer cache.

    If i recreate the materialized view with more pctfree and extents will it help to improve the MV refresh.

    Please help me if i raise an SR with oracle. They say that they don’t help in tuning new query.

    Thanks

    • Tanel Poder says:

      I don’t think you’re facing exactly the same issue here … as I just explained how the direct path read decision is done in latest Oracle versions, but you seem to have a general SQL optimization problem at hand, which is not necessarily related to this blog entry’s topic.

      You need to approach this as a SQL performance problem – you need to *measure* where most of the time of the materialized view refresh is spent (and the “where” would consist of two main parts: #1 which execution plan lines, #2 which wait events take the most time).

      There are tools for that (SQL Trace + tkprof (or some other profiler), ASH reports, DBMS_XPLAN for displaying execution plan – or if you have Diag+Tuning Pack licenses, then the SQL Monitoring report would be a perfect starting point). It may be just an inefficient execution plan and that SQL plan should be optimized. Or, assuming that this is an infrequently refreshed mview, then it should be done in parallel (which enables direct path reads with asynchronous prefetching for greater IO throughput – assuming that your IO subsystem can deliver it).

      I don’t have time for tuning individual SQL statements for other people here in this blog, but rather publish information about systematic optimization/troubleshooting approach and interesting Oracle internals details – but you need to do the work of putting this (and other information) into use on your problems yourself.

  11. jc nars says:

    Thx much Tanel for your correspondence via Twitter…how does DIRECT PATH READ TEMP waitevent work? I mean, I understand DIRECT PATH READ for a session’s workspace to directly grab “table data” on disk bypassing SGA.

    As far as “temp data” is concerned, Isn’t the default behaviour itself DIRECT PATH READ TEMP?
    I mean, when a foreground wants to do an action (sort, LOB operations) that will go to the disk’s temp file, doesn’t it directly go to the disk(‘s temp file) from PGA?

  12. Pingback: Small Tables | Oracle Scratchpad

  13. Pingback: Oracle SQL

  14. Tanel, please, could you tell me – is my test correct? I tested with changing blocks count through INDEX_STATS hint: http://orasql.org/2013/03/07/just-test-of-adaptive-direct-path-reads-with-index_stats/

  15. Pingback: Direct Path Reads Again | Coskan's Approach to Oracle

  16. Pingback: Confluence: Oracle

  17. Pingback: Forcing Smart Scans on Exadata – is _serial_direct_read parameter safe to use in production? | Tanel Poder's blog: IT & Mobile for Geeks and Pros

  18. Pingback: Direct path read and fast full index scans | Frits Hoogland Weblog

  19. Pingback: Confluence: Productie

  20. Pingback: Cache anomaly | Oracle Scratchpad

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>