Exadata CAN do smart scans on bitmap indexes

As I’m finishing up a performance chapter for the Exadata book (a lot of work!), I thought to take a quick break and write a blog entry.

This is not really worth putting into my Oracle Exadata Performance series (which so far has only 1 article in it anyway) .. so this is a little stand-alone article …

Everybody knows that the Exadata smart scan can be used when scanning tables (and table partitions). You should also know that smart scan can be used with fast full scan on Oracle B-tree indexes (a fast full scan on an index segment is just like a full table scan, only on the index segment (and ignoring branch blocks)).

For some reason there’s a (little) myth circulating that smart scans aren’t used for scanning bitmap indexes.

So, here’s evidence, that smart scan can be used when scanning bitmap indexes:

SQL> select /*+ tanel3 */ count(*) from t1 where owner like '%XYZXYZ%';

...

Plan hash value: 39555139

-----------------------------------------------------------------------------------
| Id  | Operation                             | Name        | E-Rows | Cost (%CPU)|
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |             |        |   505 (100)|
|   1 |  SORT AGGREGATE                       |             |      1 |            |
|   2 |   BITMAP CONVERSION COUNT             |             |    400K|   505   (0)|
|*  3 |    BITMAP INDEX STORAGE FAST FULL SCAN| BI_T1_OWNER |        |            |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - storage(("OWNER" LIKE '%XYZXYZ%' AND "OWNER" IS NOT NULL))
       filter(("OWNER" LIKE '%XYZXYZ%' AND "OWNER" IS NOT NULL))

So, as you see the execution plan sure shows a FAST FULL SCAN on a BITMAP INDEX segment, which happens to be on Exadata STORAGE.

Also, you see a storage() predicate applied on the line 3 of the execution plan, which means that Oracle will attempt to use a smart scan predicate offload – but this can’t always be done!

So, you can’t really determine whether a smart scan happened during execution just by looking into the execution plan, you should really check some V$SESSION statistics too. That’s where my Snapper script becomes handy.

I started Snapper on my session just before running the above query. The “smart table scan” and “smart index scan” performance counters are updated right after Oracle has opened the segment header and determines, from the number of blocks in the segment, whether to call the smart scan codepath or not. In other words, the smart scan counters are inremented in the beginning of the segment scan.

The output is following (some irrelevant counters are stripped for brevity):


@snapper all 5 1 "301"
Sampling SID 301 with interval 5 seconds, taking 1 snapshots...
setting stats to all due to option = all

-- Session Snapper v3.52 by Tanel Poder @ E2SN ( http://tech.e2sn.com )

-------------------------------------------------------------------------------------------------------------------------------------
    SID, USERNAME  , TYPE, STATISTIC                                                 ,     HDELTA, HDELTA/SEC,    %TIME, GRAPH
-------------------------------------------------------------------------------------------------------------------------------------
    301, TANEL     , STAT, physical read total IO requests                           ,         13,        2.6,
    301, TANEL     , STAT, physical read total multi block requests                  ,          4,         .8,
    301, TANEL     , STAT, physical read requests optimized                          ,          1,         .2,
    301, TANEL     , STAT, physical read total bytes optimized                       ,      8.19k,      1.64k,
    301, TANEL     , STAT, physical read total bytes                                 ,      4.63M,     925.7k,
    301, TANEL     , STAT, cell physical IO interconnect bytes                       ,     10.02k,         2k,
    301, TANEL     , STAT, physical reads                                            ,        565,        113,
    301, TANEL     , STAT, physical reads cache                                      ,          1,         .2,
    301, TANEL     , STAT, physical reads direct                                     ,        564,      112.8,
    301, TANEL     , STAT, physical read IO requests                                 ,         13,        2.6,
    301, TANEL     , STAT, physical read bytes                                       ,      4.63M,     925.7k,
    301, TANEL     , STAT, db block changes                                          ,          1,         .2,
    301, TANEL     , STAT, cell physical IO bytes eligible for predicate offload     ,      4.62M,    924.06k,
    301, TANEL     , STAT, cell physical IO interconnect bytes returned by smart scan,      1.82k,      364.8,
    301, TANEL     , STAT, cell blocks processed by cache layer                      ,        564,      112.8,
    301, TANEL     , STAT, cell blocks processed by txn layer                        ,        564,      112.8,
    301, TANEL     , STAT, cell blocks processed by index layer                      ,        564,      112.8,
    301, TANEL     , STAT, cell blocks helped by minscn optimization                 ,        564,      112.8,
    301, TANEL     , STAT, cell index scans                                          ,          1,         .2,
    301, TANEL     , STAT, index fast full scans (full)                              ,          1,         .2,
    301, TANEL     , STAT, index fast full scans (direct read)                       ,          1,         .2,
    301, TANEL     , STAT, bytes sent via SQL*Net to client                          ,        334,       66.8,
    301, TANEL     , STAT, bytes received via SQL*Net from client                    ,        298,       59.6,
    301, TANEL     , STAT, SQL*Net roundtrips to/from client                         ,          2,         .4,
    301, TANEL     , STAT, cell flash cache read hits                                ,          1,         .2,
    301, TANEL     , TIME, hard parse elapsed time                                   ,     1.17ms,    233.8us,      .0%, |          |
    301, TANEL     , TIME, parse time elapsed                                        ,      1.5ms,    300.2us,      .0%, |          |
    301, TANEL     , TIME, DB CPU                                                    ,       11ms,      2.2ms,      .2%, |          |
    301, TANEL     , TIME, sql execute elapsed time                                  ,     82.2ms,    16.44ms,     1.6%, |@         |
    301, TANEL     , TIME, DB time                                                   ,    84.36ms,    16.87ms,     1.7%, |@         |
    301, TANEL     , WAIT, enq: KO - fast object checkpoint                          ,    16.18ms,     3.24ms,      .3%, |          |
    301, TANEL     , WAIT, gc cr grant 2-way                                         ,      223us,     44.6us,      .0%, |          |
    301, TANEL     , WAIT, gc current grant 2-way                                    ,      136us,     27.2us,      .0%, |          |
    301, TANEL     , WAIT, cell smart index scan                                     ,    56.04ms,    11.21ms,     1.1%, |@         |
    301, TANEL     , WAIT, SQL*Net message to client                                 ,        7us,      1.4us,      .0%, |          |
    301, TANEL     , WAIT, SQL*Net message from client                               ,      4.42s,   884.47ms,    88.4%, |@@@@@@@@@ |
    301, TANEL     , WAIT, cell single block physical read                           ,      541us,    108.2us,      .0%, |          |
    301, TANEL     , WAIT, events in waitclass Other                                 ,     2.22ms,    443.2us,      .0%, |          |
--  End of Stats snap 1, end=2011-03-13 19:36:31, seconds=5

As you see from the above “cell index scans” statistic – indeed one index segment was scanned using the cell smart scan method.

So, I would rather call this feature “smart segment scan” to reflect that smart scan can scan more than just tables…

I guess one of the reasons why few people have seen smart bitmap index scans in action is that (single-column) bitmap indexes tend to be small. Smaller than corresponding table segments and B-tree index segments. On partitioned tables they’re much more likely going to be under the “_small_table_threshold” calculation which is used for determining whether to do a direct path full segment scan or not (yes, the _small_table_threshold applies to fast full index scan and fast full bitmap index scan too, not just table scans). So, it’s likely that Oracle chooses to do a regular, buffered full bitmap segment scan and thus won’t even consider using smart scan (as smart scans require direct path reads).

By the way – the direct path read (or not) decision is done per segment – not per object (like a table or index). So if you have 10 partitions in a table (or index), half of them are large, half are smaller, then Oracle may end up using direct path reads (and smart scan) on 5 of them and buffered (dumb) scan on the other 5. If you run something like Snapper on the session, then you’d see the smart scan counters go up by 5 only. As written above, Oracle decides whether to do direct path reads (and smart scan) right after opening the header block of a segment (partition) and reading out how many blocks this partition’s segment has below HWM.

The above applied to serial direct path reads – the Parallel Execution slaves should always read using direct path mode, right? …. Wrong :)

Well, partially wrong… In 11.2.0.2, if the parallel_degree_policy = manual, then yes, PX slaves behave like usual and always force a direct path read (and try to use a smart scan). However, with parallel_degree_policy = AUTO, which is the future of PX auto-management, Oracle can decide to do a buffered parallel scan instead, again disabling the use of smart scan…

One more note – I didn’t say anything about whether you should or should not use (bitmap) indexes on Exadata, it’s an entirely different discussion. I just brought out that the smart scan is used for scanning table segments, B-tree index segments and bitmap index segments if conditions are right.

And in the end I have to say…. that even with this evidence you can’t be fully sure that a smart scan was used throughout the entire segment, but more about this in the book and perhaps in a later blog article. We have interesting times ahead ;-)

Note that this year’s only Advanced Oracle Troubleshooting class takes place in the end of April/May 2014, so sign up now if you plan to attend this year!

This entry was posted in Uncategorized and tagged , , , . Bookmark the permalink.

20 Responses to Exadata CAN do smart scans on bitmap indexes

  1. Tom says:

    To leverage In-Memory PX initialization parameter PARALLEL_DEGREE_POLICY must be set to AUTO (default MANUAL). Once this is set, the database controls which objects are eligible to be read into the buffer cache and which object will reside there at any point in time. It is not possible to manual control the behavior for specific statements.

  2. Tanel Poder says:

    @Tom
    Yep, that’s pretty much what I said in my above post :-)

    There is a “_parallel_cluster_cache_policy” parameter which probably controls this … But I haven’t played with it at all.. And don’t recommend hacking this anyway. Not that many clients (on 11.2.0.2) dare to use parallel_adaptive_policy (yet) anyway. But that will be the future – and should relieve us from all that crap of manually planning & setting the PX degrees etc. Just like the automatic undo relieved us from manual rollback segment tuning and the automatic workarea size policy relieved us from the sort/hash area size tuning. Yet, in some rare cases it’s needed :-)

  3. Tom says:

    We said the same thing. Just a little bit differently is all. :) I was just more or less clarifying what you said and chalking it up to the new In-memory PX feature.

  4. Hey Tanel,

    Aren’t you supposed to be working on a chapter. ;)

    The In-memory Parallel behavior is controlled by “_parallel_cluster_cache_policy” as you suspected. It has two valid values “cached” which enables the feature and “adaptive” which appears to disable it (although the value makes one think it might do something complicated than just disabling it). These values are set automatically when switching between MANUAL and AUTO for parallel_degree_policy but can also be set manually (not that I’m recommending do so).

    Also there is at least one other object type that can benefit from smart scans besides tables and indexes, materialized views.

  5. Tanel Poder says:

    @Kerry Osborne

    Yep that’s a good point – materialized views too!

    They’re really regular table segments physically, so can be scanned with the same table access codepath, unlike indexes which have the slight difference of having branch blocks in them too… Anyway, this still goes together with what I said – that the smart scan can be called a smart *segment* scan …

    Btw, this assumes that your database won’t crash with ORA-600 when smart scanning a mview ;-)

  6. Tom says:

    You have to hit a 600 at some point. Just make sure you open an SR and get a bug filed so it can be fixed by 11.2.0.3 :)

  7. Sundar says:

    Hi Tannel,

    There was a performacnce problem when we use bit map indexes with advanced compression due to the hakan factor

    Ref: http://oracle-randolf.blogspot.com/2010/07/compression-restrictions.html

    Is there any impact of this on Exadata ? Especially with HCC ?

    Thanks
    Sundar R

  8. Pingback: Will Oracle Exadata Database Machine Eventually Support Offload Processing for Everything? « Kevin Closson's Blog: Platforms, Databases and Storage

  9. gary says:

    Hi Tanel
    I have OLTP compression enabled on the table

    CREATE BITMAP INDEX
    I1 ON
    T1(
    col 1Asc
    )
    NOLOGGING TABLESPACE ts_index01

    taking very long time.

    The wait event is : cell single block physical read.

    Any ideas what I can do?

    Thank u

    • Tanel Poder says:

      Hi Gary,

      Can you run snapper on the slow session and see whether it reports any “table fetch continued row” numbers? You could be just hitting migrated rows.

      • gary says:

        Hi Tanel
        HEre is output from snapper

        – Session Snapper v3.54 by Tanel Poder ( http://blog.tanelpoder.com )

        —————————————————————————————————————————————-
        SID, USERNAME , TYPE, STATISTIC , DELTA, HDELTA/SEC, %TIME, GRAPH
        —————————————————————————————————————————————-
        2527, RXI , STAT, session logical reads , 20139, 4.03k,
        2527, RXI , STAT, user I/O wait time , 370, 74,
        2527, RXI , STAT, non-idle wait time , 370, 74,
        2527, RXI , STAT, non-idle wait count , 9986, 2k,
        2527, RXI , STAT, session uga memory , 65488, 13.1k,
        2527, RXI , STAT, session pga memory , 65536, 13.11k,
        2527, RXI , STAT, physical read total IO requests , 9987, 2k,
        2527, RXI , STAT, physical read total multi block requests , 1, .2,
        2527, RXI , STAT, physical read total bytes , 82853888, 16.57M,
        2527, RXI , STAT, cell physical IO interconnect bytes , 82853888, 16.57M,
        2527, RXI , STAT, consistent gets , 20143, 4.03k,
        2527, RXI , STAT, consistent gets direct , 20143, 4.03k,
        2527, RXI , STAT, physical reads , 10115, 2.02k,
        2527, RXI , STAT, physical reads direct , 10115, 2.02k,
        2527, RXI , STAT, physical read IO requests , 9988, 2k,
        2527, RXI , STAT, physical read bytes , 82862080, 16.57M,
        2527, RXI , STAT, file io wait time , 3697302, 739.46k,
        2527, RXI , STAT, Number of read IOs issued , 1, .2,
        2527, RXI , STAT, no work – consistent read gets , 20134, 4.03k,
        2527, RXI , STAT, table scan rows gotten , 991961, 198.39k,
        2527, RXI , STAT, table scan blocks gotten , 10150, 2.03k,
        2527, RXI , STAT, table fetch continued row , 618, 123.6,
        2527, RXI , STAT, buffer is pinned count , 5, 1,
        2527, RXI , STAT, buffer is not pinned count , 9984, 2k,
        2527, RXI , STAT, workarea memory allocated , 58, 11.6,
        2527, RXI , TIME, DB CPU , 841871, 168.37ms, 16.8%, |@@ |
        2527, RXI , TIME, sql execute elapsed time , 3998042, 799.61ms, 80.0%, |@@@@@@@@ |
        2527, RXI , TIME, DB time , 3998042, 799.61ms, 80.0%, |@@@@@@@@ |
        2527, RXI , WAIT, cell single block physical read , 3699316, 739.86ms, 74.0%, |@@@@@@@@ |
        – End of Stats snap 1, end=2012-09-01 19:58:05, seconds=5

        ———————————————————————————
        Active% | SQL_ID | EVENT | WAIT_CLASS
        ———————————————————————————
        90% | gyb77qgwctnmk | cell single block physical read | User I/O
        10% | gyb77qgwctnmk | ON CPU | ON CPU

        – End of ASH snap 1, end=2012-09-01 19:58:05, seconds=5, samples_taken=41

        • Tanel Poder says:

          Indeed, looks like chained or migrated rows are at least a part of this problem (“table fetch continued row” shows how many times a next row-piece of a chained/migrated row was fetched).

          But for DW/Analytics workloads on Exadata, I would put my effort on achieving smart scans (and hash joins) instead of (bitmap)indexes and star transformations anyway …

    • Tanel Poder says:

      By the way, on Exadata there should be much less need for bitmap indexing (and star transformations), I usually try to get the queries take advantage of just full table (smart) scans and proper partition pruning…

      • gary says:

        Hi Tanel
        This is a Siebel Analytics application. COTS product. I have been trying to convince them of the need for testing without bitmap indexes.

        thanks

  10. sasanka ghosh says:

    Not pretending .. speaking from my heart ..please spare my audacity to post a comment in this blog ..

    i am very angry that Exadata has given liberty to write crap sql’s and in process people having more PR ( public relation) than technical know will survive in this cruel world.

    These are my findings in General for Exadata Tuning in DWH Env for Fact tables. (Query only not Loading). Most important is the first one.
    1. When , How( i.e. the parameters), frequency , no of users firing the same query with diff parameters in nearly same time. DA’s, DBA’s needs to be aware of and should have some authority over it .
    2. Partition , partition, Partition. If possible Sub Partition carefully.
    3. Compress.compress.compress.
    4.Do not use Btree index except exceptional condition and in that case also make it GLOBAL PARTITIONED though DBA’s will cringe. Local Btree indexes are Junk.
    5. Local BITMAP indexes are workable or may give some benefit when the overall table size is under 30-40 GB or at anytime though the table may be of X GB but the query never uses more than 35-40 GB of the table after Partition Pruning.
    6.Initally drop indexes but monitor and if at all create indexes do it on case by case basis.

    7. Try to use partition in clever way though the partitiuon column is not in the predicate list.

    Please provide a reason why average Joe like me will can survive and throw some light on BITMAP indexes..

    • Tanel Poder says:

      Hi Sasanka,

      There are some common patterns, but it always depends. Exadata doesn’t change most of the DW design patterns (like partitioning), the greatest changes are around indexing philosophy. And this still mostly depends on how frequently you have to execute your SQL statements – if it’s many times per second, you need some indexes to reduce te amount of IO (and take advantage of buffer cache to reduce it even further).

      A well designed dimensional DW can be loaded directly into newest HCC partitions and they don’t get touched anymore.

      A not-so-well designed DW may end up updating and merging a large amount of rows after their initial insert – and it will take a few days/weeks until these rows end up in their final state. I would not use HCC for such partitions until they reach their steady state.

      Bitmap indexes can hurt performance on Exadata if after the bitmap merging there are still tens or hundreds of thousands (or millions of) fact table ROWIDs left to examine – these blocks are read in via regular buffered, possibly random single block, reads.

      Anyway, it always depends – the best bet is to understand the capabilities and fundamentals of your database system (and the hardware) and what the application wants to do – and make intelligent decisions based on understanding these facts

      • Tanel Poder says:

        But yep, more powerful systems allow developers get away with even crappier SQL. You can enable resource capping for developers in their dev Exadatas, so that they’d never get more than let’s say 2% of the full power of the Exadata box (or whatever is the likely resource they could use in production, minus some), thus they’d notice inefficient query right in development.

        If technically crap people can grab all the glory (with management) for others achievements just because their PR skills, then it’s time to either:

        1) Get some serious self-PR skills and go to war :)
        2) Fire your boss
        3) Switch your employer/team

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>