Optimizer statistics-driven direct path read decision for full table scans (_direct_read_decision_statistics_driven)

Tanel Poder

2012/09/04

Hello all fellow Oracle geeks and technology enthusiasts! Long time no see ;-)

In the hacking session about Oracle full table scans and direct path reads I explained how the direct path read decision is not done by the optimizer, but instead during every execution, separately for every single segment (partition) scanned in the query. I also explained how the __small_table_threshold_ parameter and the X$KCBOQH.NUM_BUF(which keeps track of how many buffers of any segment are currently cached) are used for determining whether to scan using direct path reads or not.

If you don’t know what the _small_table_threshold parameter is and how it relates to the direct path read decision, then read this post by Alex Fatkulin first.

In addition to the abovementioned values, Oracle needs to know how many blocks there are to scan (under the HWM) in a segment. This was traditionally done by reading in the segment header block first (using a regular buffered single block read – “db file sequential read”).

Starting from Oracle 11.2.0.2, things have changed a little. Instead of making the direct path read decision based on the actual segment block count extracted from the segment header, Oracle actually takes this number from TAB$.BLKCNT (_dbatables.blocks) or TABPART$.BLKCNT, IND$.LEAFCNT etc.

There’s a new parameter __direct_read_decision_statisticsdriven which controls this:

 SQL> @pd direct_read_decision
 Show all parameters and session values from x$ksppi/x$ksppcv...

NAME                                               VALUE    DESCRIPTION
 -------------------------------------------------- -------- ----------------------------------------------------------
 _direct_read_decision_statistics_driven            TRUE     enable direct read decision based on optimizer statistics

Note that even though the block counts are taken from the optimizer _statistics_ in data dictionary, it’s not the optimizer who does the direct path read decision in the traditional sense (it’s not a cost-based decision).

Here’s an example from my test database:

SQL> CREATE TABLE t AS SELECT * FROM dba_source;

Table created.

SQL> @gts t
Gather Table Statistics for table t...

PL/SQL procedure successfully completed.

SQL>
SQL> SELECT blocks FROM user_tables WHERE table_name = 'T';

BLOCKS
----------
10704

SQL>

The table uses 10704 blocks (up HWM).

SQL> SELECT name,block_size,buffers FROM v$buffer_pool;

NAME                 BLOCK_SIZE    BUFFERS
-------------------- ---------- ----------
DEFAULT                    8192       9424

The table (10704 blocks) is bigger than the entire buffer cache (9424 buffers). And this is way bigger than the __small_tablethreshold value of 168 in my instance (watch the full table scans and direct path reads) hacking session for more about this parameter.

So whenever I run a SELECT COUNT(*) FROM t, I see direct path reads show up:

SQL> @snapper ash,stats=sw,sinclude=table.scan|gets 5 1 99
Sampling SID 99 with interval 5 seconds, taking 1 snapshots...

-- Session Snapper v3.54 by Tanel Poder (  )

----------------------------------------------------------------------------------------------------------------------------------------
    SID, USERNAME  , TYPE, STATISTIC                                                 ,         DELTA, HDELTA/SEC,    %TIME, GRAPH
----------------------------------------------------------------------------------------------------------------------------------------
     99, SYS       , STAT, db block gets                                             ,             1,         .2,
     99, SYS       , STAT, db block gets from cache                                  ,             1,         .2,
     99, SYS       , STAT, consistent gets                                           ,         11867,      2.37k,
     99, SYS       , STAT, consistent gets from cache                                ,             7,        1.4,
     99, SYS       , STAT, consistent gets from cache (fastpath)                     ,             6,        1.2,
     99, SYS       , STAT, consistent gets - examination                             ,             1,         .2,
     99, SYS       , STAT, consistent gets direct                                    ,         11860,      2.37k,
     99, SYS       , STAT, no work - consistent read gets                            ,         11859,      2.37k,
     99, SYS       , STAT, cleanouts only - consistent read gets                     ,             1,         .2,
     99, SYS       , STAT, table scans (long tables)                                 ,             1,         .2,
     99, SYS       , STAT, table scans (direct read)                                 ,             1,         .2,
     99, SYS       , STAT, table scan rows gotten                                    ,        739834,    147.97k,
     99, SYS       , STAT, table scan blocks gotten                                  ,         11860,      2.37k,
     99, SYS       , TIME, parse time elapsed                                        ,            46,      9.2us,      .0%, |          |
     99, SYS       , TIME, DB CPU                                                    ,         79988,       16ms,     1.6%, |@         |
     99, SYS       , TIME, sql execute elapsed time                                  ,        254990,       51ms,     5.1%, |@         |
     99, SYS       , TIME, DB time                                                   ,        255375,    51.08ms,     5.1%, |@         |
     99, SYS       , WAIT, enq: KO - fast object checkpoint                          ,        174947,    34.99ms,     3.5%, |@         |
     99, SYS       , WAIT, direct path read                                          ,          1280,      256us,      .0%, |          |
     99, SYS       , WAIT, SQL*Net message to client                                 ,             9,      1.8us,      .0%, |          |
     99, SYS       , WAIT, SQL*Net message from client                               ,       4672912,   934.58ms,    93.5%, |@@@@@@@@@@|
     99, SYS       , WAIT, events in waitclass Other                                 ,             6,      1.2us,      .0%, |          |
--  End of Stats snap 1, end=2012-09-02 20:03:55, seconds=5

---------------------------------------------------------------------------------
Active% | SQL_ID          | EVENT                               | WAIT_CLASS
---------------------------------------------------------------------------------
     2% | 88r4qn9mwhcf5   | enq: KO - fast object checkpoint    | Application
     2% | 88r4qn9mwhcf5   | ON CPU                              | ON CPU

--  End of ASH snap 1, end=2012-09-02 20:03:55, seconds=5, samples_taken=43

Let’s now fake the table stats so it looks like that there’s only 5 blocks in it – way below the __small_tablethreshold value

SQL> EXEC DBMS_STATS.SET_TABLE_STATS(user,'T',numblks=>5);

PL/SQL procedure successfully completed.

SQL> SELECT COUNT(*) FROM t /* attempt 2 */;

  COUNT(*)
----------
      1000

The direct path reads are gone – we are doing regular buffered reads now!

SQL> @snapper ash,stats=sw,sinclude=table.scan|gets 5 1 99

Sampling SID 99 with interval 5 seconds, taking 1 snapshots...

-- Session Snapper v3.54 by Tanel Poder (  )

----------------------------------------------------------------------------------------------------------------------------------------
    SID, USERNAME  , TYPE, STATISTIC                                                 ,         DELTA, HDELTA/SEC,    %TIME, GRAPH
----------------------------------------------------------------------------------------------------------------------------------------
     99, SYS       , STAT, db block gets                                             ,             1,        .17,
     99, SYS       , STAT, db block gets from cache                                  ,             1,        .17,
     99, SYS       , STAT, consistent gets                                           ,         11865,      1.98k,
     99, SYS       , STAT, consistent gets from cache                                ,         11865,      1.98k,
     99, SYS       , STAT, consistent gets from cache (fastpath)                     ,         11528,      1.92k,
     99, SYS       , STAT, consistent gets - examination                             ,             1,        .17,
     99, SYS       , STAT, no work - consistent read gets                            ,         11851,      1.98k,
     99, SYS       , STAT, cleanouts only - consistent read gets                     ,             1,        .17,
     99, SYS       , STAT, table scans (long tables)                                 ,             1,        .17,
     99, SYS       , STAT, table scan rows gotten                                    ,        738834,    123.14k,
     99, SYS       , STAT, table scan blocks gotten                                  ,         11852,      1.98k,
     99, SYS       , TIME, parse time elapsed                                        ,            84,       14us,      .0%, |          |
     99, SYS       , TIME, DB CPU                                                    ,        109983,    18.33ms,     1.8%, |@         |
     99, SYS       , TIME, sql execute elapsed time                                  ,        116709,    19.45ms,     1.9%, |@         |
     99, SYS       , TIME, DB time                                                   ,        117102,    19.52ms,     2.0%, |@         |
     99, SYS       , WAIT, db file scattered read                                    ,         63956,    10.66ms,     1.1%, |@         |
     99, SYS       , WAIT, SQL*Net message to client                                 ,             8,     1.33us,      .0%, |          |
     99, SYS       , WAIT, SQL*Net message from client                               ,       5119722,   853.29ms,    85.3%, |@@@@@@@@@ |
--  End of Stats snap 1, end=2012-09-02 20:06:19, seconds=6

---------------------------------------------------------------------------------
Active% | SQL_ID          | EVENT                               | WAIT_CLASS
---------------------------------------------------------------------------------
     2% | 07sgczqj432mr   | db file scattered read              | User I/O

--  End of ASH snap 1, end=2012-09-02 20:06:19, seconds=5, samples_taken=46

Note that I deliberately forced a hard parse (with the “attempt 2” comment) to compile a new cursor. The __direct_read_decision_statisticsdriven parameter is not part of the optimizer environment, so a new child cursor would not be automatically created after the parameter change (the same applies to the __small_table_threshold_ and __serial_directread parameters, by the way). But when I change the SQL text, then an entirely new (parent and child) cursor will be compiled anyway.

But wait a minute! Why do I need to compile a new cursor to get Oracle to read the new block count value from optimizer stats?!

I have said in the beginning of this post (and in many other places) that the direct path read decision is not done by the optimizer anyway and is a runtime decision done during every execution, every time any segment (including individual partitions) is scanned during query runtime. This is true for the old (up to 11.2.0.1) Oracle versions, where a direct path decision is done based on the actual, current block count in the segment header, thus the decision can suddenly change when a segment grows by a few blocks, crossing the __small_tablethreshold calculation threshold. Perhaps due to performance stability reasons, this seems to have changed.

My tests on 11.2.0.2 have so far shown that when using the new statistics-driven direct path read decisions, each segments’ block counts are stored somewhere in the compiled cursor and reused during next executions of it, even if the block count of the segment changes in the optimizer stats later on! This might result in somewhat better stability as long as you don’t gather new stats – and your buffer cache size (and already cached block counts) don’t change. However if the amount of cached blocks of a segment _does_ change (due to other, index-based accesses for example), then the direct path decision can still change during runtime. It’s just the block counts which are stored in the cursor, but the other factors affecting the decision (buffer cache size, cached block counts) can still change.

This topic is especially relevant on Exadata, as the entire Smart Scanning functionality depends on whether a direct path read IO method gets picked for full segment scans. When experimenting with this, you’ve got to be pretty careful and thorough (to not come to wrong conclusions) as there multiple moving parts and factors involved in the decisions:

  1. Are block counts taken from optimizer stats or segment header
  2. Do the segment header and/or optimizer stats block counts change
  3. Does the buffer cache size change (thus the __small_tablethreshold too)
  4. Does the amount of cached blocks of a segment change
  5. Parallel vs Serial execution
  6. Are buffered parallel full scans allowed (the in-memory PX feature of 11.2)
  7. Did a new child cursor get created or the old one reused
  8. etc :)

_Update: there’s an interesting comment in another page about how the incorrectly handled subpartitions statistics cause Oracle to not choose a direct path read for some subpartitions._