Advanced Oracle Troubleshooting Guide – Part 10: Index unique scan doing multiblock reads?!

When you troubleshoot an Oracle (performance) problem you usually want to find out what are the problem sessions doing (wrong) and then why are they doing it.

The “what” question is usually answered using some wait interface based method – like ASH or SQL*Trace, which both add plenty of extra details to the plain wait event data.

My normal session troubleshooting data collection sequence consists of only three steps:

  1. Interpret session-level top wait and top SQL data (ASH, SQL*Trace or custom tools sampling session-level V$views)
  2. Interpret session-level performance counter data (V$SESSTAT)
  3. Sample the problem process stack and list the top stack branches where execution spent the most time

Only looking into #1 is enough in 90% of problem cases. But for advanced troubleshooting of complex and exotic problems I regularly have to dive into the #2 – V$SESSTAT performance counters (That’s the reason why I wrote Snapper for example).

Maybe only once out of every 40-50 troubleshooting cases I dive down into #3 – process stack sampling. While process stack sampling doesn’t lie – it helps especially in these cases where the Oracle’s instrumentation itself doesn’t have a clue what a session/process is doing – it’s like a microscope. It gives you very detailed overview of one part of the problem, but not the big picture. That’s why I never start from #3 when troubleshooting (and if you do, you are doing it wrong – with an exception of a completely hung database ;) I always start from #1 and proceed to #2 only when needed.

Snapper lists both #1 and #2 data for a process by the way.

Anyway, the reason why I’m writing this post is that a few days ago I received a question about an issue which nicely illustrates a case where step #1 shows that there’s an issue and step #2 explains why it’s happening.

There was an execution plan, which ran slightly slower on a new server compared to the old one – and it did read 36 blocks per execution as opposed to only 5:

select ...
from TABLE_X
where ...
 
call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.02       0.02          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      0.00       0.07         36          5          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      0.02       0.09         36          5          0           1

The “disk” column above shows 36 blocks being read in via physical IO – while only 5 buffers had actually been accessed after that (logical IOs are shown in the “query” column).

Now that’s weird – as logical IOs are the ones which cause physical IOs, so the LIO count should normally be equal or greater than the physical IO count.

What’s more stranger, the execution plan is an index unique scan:

 
 
Rows     Row Source Operation
-------  ---------------------------------------------------
      1  TABLE ACCESS BY INDEX ROWID TABLE_X (cr=5 pr=36 pw=0 time=0 us cost=4 size=26 card=1)
      1   INDEX UNIQUE SCAN TABLE_X_IDX (cr=4 pr=35 pw=0 time=0 us cost=3 size=0 card=1)(object id 51063)
 

… but if you look into the “pr=” column, which means “blocks read via physical read”, then you see that 35 out of the 36 blocks were read in by the INDEX UNIQUE SCAN row source (pr=35). These counters are cumulative, so the pr=36 in the parent TABLE ACCESS BY INDEX ROWID row source means that there was only one block read by physical IO at the table access level. So why does the index access read 35 blocks?

Is it index height? Can’t be – as the max index height is 24 levels (including the leaf level). And such an index would be ridiculous anyway :)

Let’s look into the wait profile:

Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  SQL*Net message to client                       2        0.00          0.00
  Disk file operations I/O                        4        0.00          0.00
  db file sequential read                         4        0.01          0.04
  db file scattered read                          1        0.03          0.03
  SQL*Net message from client                     2        0.15          0.15
********************************************************************************

Wow, an index unique scan which is supposed to traverse the index tree with precise single block reads, has done 4 single block reads and one multiblock read!

Now this is where the wait interface and ASH-style data would not help us any further. We know what’s happening – a multiblock read happening where “it shouldn’t”. We still don’t know why this is happening. We would have to guess from here. And this is where V$SESSTAT comes in – there’s over 600 performance counters maintained for each session and many of them show valuable information about what kind of operations (and how many) the RDBMS kernel has done. So I asked the client to run Snapper on a test session and here’s the output (note that the below output is from a different test run where the queries were executed many times in a loop):


----------------------------------------------------------------------------------------------------------------------------------------
    SID, USERNAME  , TYPE, STATISTIC                                                 ,         DELTA, HDELTA/SEC,    %TIME, GRAPH
----------------------------------------------------------------------------------------------------------------------------------------
   1612, ORACLE    , STAT, opened cursors cumulative                                 ,          6617,      1.32k,
   1612, ORACLE    , STAT, user calls                                                ,         13234,      2.65k,
   1612, ORACLE    , STAT, pinned cursors current                                    ,            -1,        -.2,
   1612, ORACLE    , STAT, session logical reads                                     ,          5293,      1.06k,
   1612, ORACLE    , STAT, CPU used when call started                                ,             7,        1.4,
   1612, ORACLE    , STAT, CPU used by this session                                  ,             7,        1.4,
   1612, ORACLE    , STAT, DB time                                                   ,           306,       61.2,
   1612, ORACLE    , STAT, user I/O wait time                                        ,           223,       44.6,
   1612, ORACLE    , STAT, non-idle wait time                                        ,           224,       44.8,
   1612, ORACLE    , STAT, non-idle wait count                                       ,          6775,      1.36k,
   1612, ORACLE    , STAT, process last non-idle time                                ,             5,          1,
   1612, ORACLE    , STAT, physical read total IO requests                           ,           158,       31.6,
   1612, ORACLE    , STAT, physical read total multi block requests                  ,            31,        6.2,
   1612, ORACLE    , STAT, physical read total bytes                                 ,       8265728,      1.65M,
   1612, ORACLE    , STAT, cell physical IO interconnect bytes                       ,       8265728,      1.65M,
   1612, ORACLE    , STAT, consistent gets                                           ,          5293,      1.06k,
   1612, ORACLE    , STAT, consistent gets from cache                                ,          5293,      1.06k,
   1612, ORACLE    , STAT, consistent gets - examination                             ,          5293,      1.06k,
   1612, ORACLE    , STAT, physical reads                                            ,          1009,      201.8,
   1612, ORACLE    , STAT, physical reads cache                                      ,          1009,      201.8,
   1612, ORACLE    , STAT, physical read IO requests                                 ,           158,       31.6,
   1612, ORACLE    , STAT, physical read bytes                                       ,       8265728,      1.65M,
   1612, ORACLE    , STAT, free buffer requested                                     ,          1008,      201.6,
   1612, ORACLE    , STAT, physical reads cache prefetch                             ,           851,      170.2,
   1612, ORACLE    , STAT, physical reads prefetch warmup                            ,           851,      170.2,
   1612, ORACLE    , STAT, shared hash latch upgrades - no wait                      ,           188,       37.6,
   1612, ORACLE    , STAT, calls to get snapshot scn: kcmgss                         ,          1319,      263.8,
   1612, ORACLE    , STAT, file io service time                                      ,            25,          5,
   1612, ORACLE    , STAT, file io wait time                                         ,       2219641,    443.93k,
   1612, ORACLE    , STAT, min active SCN optimization applied on CR                 ,             1,         .2,
   1612, ORACLE    , STAT, index fetch by key                                        ,          1292,      258.4,
   1612, ORACLE    , STAT, session cursor cache hits                                 ,          6440,      1.29k,
   1612, ORACLE    , STAT, buffer is not pinned count                                ,          1277,      255.4,
   1612, ORACLE    , STAT, parse time elapsed                                        ,             6,        1.2,
   1612, ORACLE    , STAT, parse count (total)                                       ,          6380,      1.28k,
   1612, ORACLE    , STAT, execute count                                             ,          6378,      1.28k,
   1612, ORACLE    , STAT, bytes sent via SQL*Net to client                          ,       2195633,    439.13k,
   1612, ORACLE    , STAT, bytes received via SQL*Net from client                    ,       2721339,    544.27k,
   1612, ORACLE    , STAT, SQL*Net roundtrips to/from client                         ,          6374,      1.27k,
   1612, ORACLE    , TIME, parse time elapsed                                        ,        114215,    22.84ms,     2.3%, |@         |
   1612, ORACLE    , TIME, PL/SQL execution elapsed time                             ,         67004,     13.4ms,     1.3%, |@         |
   1612, ORACLE    , TIME, DB CPU                                                    ,        630000,      126ms,    12.6%, |@@        |
   1612, ORACLE    , TIME, sql execute elapsed time                                  ,       2595410,   519.08ms,    51.9%, |@@@@@@    |
   1612, ORACLE    , TIME, DB time                                                   ,       3190919,   638.18ms,    63.8%, |@@@@@@@   |
   1612, ORACLE    , WAIT, db file sequential read                                   ,        978601,   195.72ms,    19.6%, |@@        |
   1612, ORACLE    , WAIT, db file scattered read                                    ,       1213550,   242.71ms,    24.3%, |@@@       |
   1612, ORACLE    , WAIT, SQL*Net message to client                                 ,          9212,     1.84ms,      .2%, |          |
   1612, ORACLE    , WAIT, SQL*Net message from client                               ,       1667368,   333.47ms,    33.3%, |@@@@      |
--  End of Stats snap 1, end=2012-03-09 14:47:02, seconds=5

See the highlighted lines above and scroll right if you don’t see the metric values…

  • The physical read total multi block requests statistic confirms that indeed there were some multiblock reads done by the monitored session.
  • The physical reads cache prefetch metric indicates that there was some data block prefetching happening! This explains why we had multiblock reads done where only single block reads were really needed.
  • And the physical reads prefetch warmup metric explains why did we do any prefetching at all – it’s for buffer cache warmup, where after a restart or a significant buffer cache size increase Oracle decides to populate the empty buffer cache with any data as fast as possible. Oracle assumes that a random multiblock read doesn’t take that much more time than a random single block read anyway, so why not just read a bunch of additional blocks from around that single block that the index access wanted to read. The read operation will be slightly slower, but it might help to avoid multiple future single block reads as they are already in cache. This should affect things only when you restart the instance or significantly increase the buffer cache size. So on a steady state production system you shouldn’t see it (and it’s for a good cause anyway), but if you do performance tests on some test database, change something, bounce the instance and immediately run a test – this may affect your test results. Btw, this is controlled by a hidden parameter _db_cache_pre_warm, but I’m mentioning this just for educational purposes here, I have never fixed a problem by changing this parameter so you should touch it only in your sandbox hacking database :)

And now back to the original topic – this post wasn’t really about this specific prewarm feature, but rather a yet another example of how the V$SESSTAT metrics tell you much more than the usual wait interface and TOP-sql based approach. And you do not need to enable any tracing or undocumented events for getting these detailed metrics out of Oracle, just query the V$SESSION views at the right time. As I said in the beginning, I don’t jump to the microscopic view unless the higher level views aren’t helpful enough, so I recommend you to follow the 3-step “Advanced Oracle Troubleshooting” method when trying to diagnose what the heck some session is doing and why ;^)

This entry was posted in Oracle. Bookmark the permalink.

11 Responses to Advanced Oracle Troubleshooting Guide – Part 10: Index unique scan doing multiblock reads?!

  1. Christoph says:

    Wow! Great detective work,as usual. Well explained, too.
    Thanks!

  2. Chris says:

    Explained clearly and concisely, I think Christoph sums it up well. Great detective work Tanel Poder “Holmes” :-)

  3. Marc says:

    Nice article, but what do you suggest to prevent the multi block read? pin index in memory? :)

    • Tanel Poder says:

      Hi Marc,

      No need to turn this off – just got to be aware of that this may be happening when doing performance tests.

      This should only happen when you bounce the instance or significantly increase the buffer cache , so you pretty much should not see this after the instance has been up for a while (in busy production environments this shouldn’t take more than a few minutes).

      And although it may make an individual SQL execution slower, it’s for greater good – thanks to that multiblock read some future SQL executions (which look for other values) may be able to avoid physical reads as these blocks are already prefetched to cache. Of course there may be some very specific workloads where this prefetching may not help, however once the buffer cache is “warm”, then this prefetching is turned off automatically.

      Another interesting detail which I didn’t fully explain in the blog entry is that how come we had more physical reads than logical reads. Normally physical reads are always preceded by a logical read. It’s the logical read function which invokes the physical read function if it doesn’t find the block it needs from the buffer cache (or PGA buffer in case of the direct path reads). So, while the query execution only did 5 logical reads, the prefetching “engine” which is probably somewhere in the buffer cache layer, decided to read more blocks in… but that execution didn’t visit all of them, so there was less logical reads than blocks physically read…

      • Prashanth says:

        Hi,

        So increasing the set arraysize parameter in sqlplus/client tool
        will decrease the logical reads in case of full table scans .

        Thanks,
        Prashanth

        • Tanel Poder says:

          In the simplest cases yes, but if you have a SORT ORDER BY in the execution plan, or a filter which filters out most of rows, then every user fetch may end up processing millions of blocks first before returning an arrayful of rows back, so the difference might be very small.

          Btw, the same would apply to index range scans etc too.

  4. Saurabh Manroy says:

    Nice article… Looks like oracle is imitating intelligent prefetch algorithm already used in san controller cache technology ?

    • Tanel Poder says:

      It’s somewhat different as it kicks in only when some of the buffer cache is still unused – but the point, that it’s only slightly more expensive to read multiple blocks worth of data from the same disk location compared to only one block only, is the same…

  5. Yousuf says:

    Hi Tanel,

    One small question out of box.

    My Oracle Block size is (8KB). As per calculation Oracle using only 4KB space per block instead of 8KB..

    4KB + 1 KB PCTFREE = 5KB.

    Not able to figure out where it’s missing 3KB…

    Below are some quick details for reference.

    The Table TEST having 271 columns with below data types.

    Data Types as follow:

    1) CHAR(1) having 37 columns
    2) CHAR(11) having 1 column.
    3) Date having 30 columns
    4) NUMBER having 92 columns
    5) VARCHAR2 having 111 columns.

    Create Table Storage clause

    tablespace USERS
    pctfree 10
    initrans 100
    maxtrans 255
    storage
    (
    initial 5M
    next 5M
    minextents 1
    maxextents unlimited
    pctincrease 0
    )

    Each Extent size 5MB.

    SQL> select SEGMENT_SPACE_MANAGEMENT,initial_extent,NEXT_EXTENT,MIN_EXTENTS,EXTENT_MANAGEMENT,ALLOCATION_TYPE from dba_tablespaces where tablespace_name=’FCCDATALAR';

    SEGMEN INITIAL_EXTENT NEXT_EXTENT MIN_EXTENTS EXTENT_MAN ALLOCATIO

    —— ————– ———– ———– ———- ———
    AUTO 5242880 5242880 1 LOCAL UNIFORM

    Below are query to find “Number of rows in each block”

    SQL> select
    dbms_rowid.rowid_relative_fno(rowid) File#,
    dbms_rowid.rowid_block_number(rowid) Block#,
    count(1)
    from TEST
    group by
    dbms_rowid.rowid_relative_fno(rowid),
    dbms_rowid.rowid_block_number(rowid)
    order by
    1,2 ;

    FILE# BLOCK# COUNT(1)
    ———- ———- ———-
    168 3869580 6
    168 3869581 7

    To Find Average row length…

    SQL> select AVG_ROW_LEN,AVG_SPACE,NUM_ROWS,BLOCKS from user_tables where table_name=’TEST';

    AVG_ROW_LEN AVG_SPACE NUM_ROWS BLOCKS
    ———– ———- ———- ———-
    673 0 515960 81234

    Calculation:

    SQL> select AVG_ROW_LEN*7 “Bytes” from user_tables where table_name=’TEST';

    Bytes
    ———-
    4711

    4 KB + 1 KB PCTFREE + 1KB overhead = 6 KB. (am considering maximum 1 KB block overhead & db having table space ASSM AUTO + LOCAL)

    Still 2KB is missing per block or is there any thing missing in calculation?

    Appreciate your response…

    Thanks,
    Yousuf

    • Tanel Poder says:

      So if you do a CREATE TABLE tmp AS SELECT * FROM your_table WHERE rownum < = 100000; --what do you see for the new table?

      This could be about deleted rows. The thing with ASSM is that it has 4 "freeness" levels, so there's a chance that it won't be a candidate for further inserts (after it got full and some rows deleted again) before the free space in it increases to more than 25% (sort of like PCTUSED used to be).

      You could also be hitting some ASSM bug, I did an insert-select to one of my tables (in 12.1.0.2) and I see plenty of formatted, but almost unused blocks for some reason:

      SQL> SELECT * FROM TABLE(space_tools.get_space_usage(‘SYSTEM’, ‘T_WIDE’, ‘TABLE’));

      COLUMN_VALUE
      —————————————————————
      Full blocks /MB 9654 75
      Unformatted blocks/MB 240 2
      Free Space 0-25% 9171 72
      Free Space 25-50% 1 0
      Free Space 50-75% 0 0
      Free Space 75-100% 211 2

      I used my space_tools script: http://blog.tanelpoder.com/files/scripts/tools/space_tools/space_usage.sql

      So, I would run some manual tests like I mentioned above… use just plain INSERT SELECT to a test table to see if you have this unused space problem immediately (and use INSERT APPEND to see if you get different results). If just an insert select (without deletes) wastes space, it looks like a bug. Then it’s either SR the test case or trace it further yourself with diag events 10612 & 10613

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>