Combining Bloom Filter Offloading and Storage Indexes on Exadata

Here’s a little known feature of Exadata – you can use a Bloom filter computed from a join column of a table to skip disk I/Os against another table it is joined to. This not the same as the Bloom filtering of the datablock contents in Exadata storage cells, but rather avoiding reading in some storage regions from the disks completely.

So, you can use storage indexes to skip I/Os against your large fact table, based on a bloom filter calculated from a small dimension table!

This is useful especially for dimensional star schemas, as your SQL statements might not have direct predicates on your large fact tables at all, all results will be determined by looking up relevant dimension records and then performing a hash join to the fact table (whether you should have some direct predicates against the fact tables, for performance reasons, is a separate topic for some other day :-)

Let me show an example using the SwingBench Order Entry schema. The first output is from Oracle 11.2.0.3 BP21 on Cellsrv 12.1.1.1.0:

SQL> ALTER SESSION SET "_serial_direct_read"=ALWAYS;

Session altered.

SQL> SELECT
  2      /*+ LEADING(c)
  3          NO_SWAP_JOIN_INPUTS(o)
  4          INDEX_RS_ASC(c(cust_email))
  5          FULL(o)
  6          MONITOR
  7      */
  8      *
  9  FROM
 10      soe.customers c
 11    , soe.orders o
 12  WHERE
 13      o.customer_id = c.customer_id
 14  AND c.cust_email = 'florencio@ivtboge.com'
 15  /

CUSTOMER_ID CUST_FIRST_NAME                CUST_LAST_NAME  ...
----------- ------------------------------ --------------
  399999199 brooks                         laxton        

Elapsed: 00:00:55.81

You can ignore the hints in the query, I just used these for get the plan I wanted for my demo. I also forced the serial full segment scans to use direct path reads (and thus attempt Smart Scans on Exadata) using the _serial_direct_read parameter.

Note that while I do have a direct filter predicate on the “small” CUSTOMERS table, I don’t have any predicates on the “large” ORDERS table, so there’s no filter predicate to offload to storage layer on the ORDERS table (but the column projection and HCC decompression can still be offloaded on that table too). Anyway, the query ran in over 55 seconds.

Let’s run this now with PARALLEL degree 2 and compare the results:

SQL> ALTER SESSION FORCE PARALLEL QUERY PARALLEL 2;

Session altered.

SQL> SELECT
  2      /*+ LEADING(c)
  3          NO_SWAP_JOIN_INPUTS(o)
  4          INDEX_RS_ASC(c(cust_email))
  5          FULL(o)
  6          MONITOR
  7      */
  8      *
  9  FROM
 10      soe.customers c
 11    , soe.orders o
 12  WHERE
 13      o.customer_id = c.customer_id
 14  AND c.cust_email = 'florencio@ivtboge.com'
 15  /

CUSTOMER_ID CUST_FIRST_NAME                CUST_LAST_NAME       
----------- ------------------------------ ---------------------
  399999199 brooks                         laxton               

Elapsed: 00:00:03.80

Now the query ran in less than 4 seconds. How come did the same query run close to 15 times faster than in serial? The parallel degree 2 for this simple query should give me max 4 slaves doing the work… 15x speedup indicates that something else has changed as well.

The first thing I would normally suspect is that perhaps the direct path reads were not attempted for the serial query (and Smart Scans did not kick in). That would sure explain the big performance difference… but I did explicitly force the serial direct path reads in my session. Anyway, let’s stop guessing and instead know for sure by measuring these experiments!

SQL Monitoring reports are a good starting point (I have added the MONITOR hint to the query so that the SQL monitoring would kick in immediately also for serial queries).

Here’s the slow serial query:

Global Stats
====================================================================================================
| Elapsed |   Cpu   |    IO    | Application |  Other   | Fetch | Buffer | Read  | Read  |  Cell   |
| Time(s) | Time(s) | Waits(s) |  Waits(s)   | Waits(s) | Calls |  Gets  | Reqs  | Bytes | Offload |
====================================================================================================
|      56 |      49 |     6.64 |        0.01 |     0.01 |     2 |     4M | 53927 |  29GB |  14.53% |
====================================================================================================

SQL Plan Monitoring Details (Plan Hash Value=2205859845)
==========================================================================================================================================
| Id |               Operation               |     Name      | Execs |   Rows   | Read  |  Cell   | Activity |      Activity Detail      |
|    |                                       |               |       | (Actual) | Bytes | Offload |   (%)    |        (# samples)        |
==========================================================================================================================================
|  0 | SELECT STATEMENT                      |               |     1 |        1 |       |         |          |                           |
|  1 |   HASH JOIN                           |               |     1 |        1 |       |         |    42.86 | Cpu (24)                  |
|  2 |    TABLE ACCESS BY GLOBAL INDEX ROWID | CUSTOMERS     |     1 |        1 |       |         |          |                           |
|  3 |     INDEX RANGE SCAN                  | CUST_EMAIL_IX |     1 |        1 |       |         |          |                           |
|  4 |    PARTITION HASH ALL                 |               |     1 |     458M |       |         |          |                           |
|  5 |     TABLE ACCESS STORAGE FULL         | ORDERS        |    64 |     458M |  29GB |  14.53% |    57.14 | Cpu (29)                  |
|    |                                       |               |       |          |       |         |          | cell smart table scan (3) |
==========================================================================================================================================

So, the serial query issued 29 GB worth of IO to the Exadata storage cells, but only 14.53% less data was sent back… not that great reduction in the storage interconnect traffic. Also, looks like all 458 Million ORDERS table rows were sent back from the storage cells (as the ORDERS table didn’t have any direct SQL predicates against it). Those rows were then fed to the HASH JOIN parent row source (DB CPU usage!) and it just threw all of the rows away but one. Talk about inefficiency!

Ok, this is the fast parallel query:

Global Stats
==============================================================================================================
| Elapsed | Queuing |   Cpu   |    IO    | Application |  Other   | Fetch | Buffer | Read  | Read  |  Cell   |
| Time(s) | Time(s) | Time(s) | Waits(s) |  Waits(s)   | Waits(s) | Calls |  Gets  | Reqs  | Bytes | Offload |
==============================================================================================================
|    4.88 |    0.00 |    0.84 |     4.03 |        0.00 |     0.01 |     2 |     4M | 30056 |  29GB |  99.99% |
==============================================================================================================

SQL Plan Monitoring Details (Plan Hash Value=2121763430)
===============================================================================================================================================
| Id |                 Operation                  |     Name      | Execs |   Rows   | Read  |  Cell   | Activity |      Activity Detail      |
|    |                                            |               |       | (Actual) | Bytes | Offload |   (%)    |        (# samples)        |       
===============================================================================================================================================
|  0 | SELECT STATEMENT                           |               |     3 |        1 |       |         |          |                           |       
|  1 |   PX COORDINATOR                           |               |     3 |        1 |       |         |          |                           |       
|  2 |    PX SEND QC (RANDOM)                     | :TQ10001      |     2 |        1 |       |         |          |                           |       
|  3 |     HASH JOIN                              |               |     2 |        1 |       |         |          |                           |       
|  4 |      BUFFER SORT                           |               |     2 |        2 |       |         |          |                           |       
|  5 |       PX RECEIVE                           |               |     2 |        2 |       |         |          |                           |       
|  6 |        PX SEND BROADCAST                   | :TQ10000      |     1 |        2 |       |         |          |                           |       
|  7 |         TABLE ACCESS BY GLOBAL INDEX ROWID | CUSTOMERS     |     1 |        1 |       |         |          |                           |       
|  8 |          INDEX RANGE SCAN                  | CUST_EMAIL_IX |     1 |        1 |       |         |          |                           |       
|  9 |      PX PARTITION HASH ALL                 |               |     2 |     4488 |       |         |          |                           |       
| 10 |       TABLE ACCESS STORAGE FULL            | ORDERS        |    64 |     4488 |  29GB |  99.99% |   100.00 | cell smart table scan (4) |
===============================================================================================================================================

The fast parallel query still issued 29 GB of IO, but only 0.01% of data (compared to the amount of IOs issued) was sent back from the storage cells, so the offload efficiency for that table scan is 99.99%. Also, only a few thousand rows were returned back from the full table scan (so the throw-away by the hash join later in the plan is smaller).

So, where does the big difference in runtime and offload efficiency metrics come from? It’s the same data, same query (looking for the same values) so how come do we have so different Offload Efficiency %?

The problem with looking into a single metric showing some percentage (of what?!) is that it hides a lot of detail. So, lets get systematic and look into some detailed metrics :-)

I’ll use the Exadata Snapper for this purpose, although I could just list the V$SESSTAT metrics it uses as its source. The ExaSnapper lists various Exadata I/O metrics of the monitored session all in one “chart”, in the same unit (MB) & scale – so it will be easy to compare these different cases. If you do not know what the Exadata Snapper is, then check out this article and video.

Here’s the slow serial query (you may need to scroll the output right to see the MB numbers):

SQL> SELECT * FROM TABLE(exasnap.display_snap(:t1,:t2));

NAME
-------------------------------------------------------------------------------------------------------------------------------------------
-- ExaSnapper v0.81 BETA by Tanel Poder @ Enkitec - The Exadata Experts ( http://www.enkitec.com )
-------------------------------------------------------------------------------------------------------------------------------------------
DB_LAYER_IO                    DB_PHYSIO_BYTES               |##################################################|    29291 MB    365 MB/sec
DB_LAYER_IO                    DB_PHYSRD_BYTES               |##################################################|    29291 MB    365 MB/sec
DB_LAYER_IO                    DB_PHYSWR_BYTES               |                                                  |        0 MB      0 MB/sec
AVOID_DISK_IO                  PHYRD_FLASH_RD_BYTES          |################################################# |    29134 MB    363 MB/sec
AVOID_DISK_IO                  PHYRD_STORIDX_SAVED_BYTES     |                                                  |        0 MB      0 MB/sec
REAL_DISK_IO                   SPIN_DISK_IO_BYTES            |                                                  |      157 MB      2 MB/sec
REAL_DISK_IO                   SPIN_DISK_RD_BYTES            |                                                  |      157 MB      2 MB/sec
REAL_DISK_IO                   SPIN_DISK_WR_BYTES            |                                                  |        0 MB      0 MB/sec
REDUCE_INTERCONNECT            PRED_OFFLOADABLE_BYTES        |##################################################|    29291 MB    365 MB/sec
REDUCE_INTERCONNECT            TOTAL_IC_BYTES                |##########################################        |    24993 MB    312 MB/sec
REDUCE_INTERCONNECT            SMART_SCAN_RET_BYTES          |##########################################        |    24993 MB    312 MB/sec
REDUCE_INTERCONNECT            NON_SMART_SCAN_BYTES          |                                                  |        0 MB      0 MB/sec
CELL_PROC_DEPTH                CELL_PROC_DATA_BYTES          |##################################################|    29478 MB    368 MB/sec
CELL_PROC_DEPTH                CELL_PROC_INDEX_BYTES         |                                                  |        0 MB      0 MB/sec
CLIENT_COMMUNICATION           NET_TO_CLIENT_BYTES           |                                                  |        0 MB      0 MB/sec
CLIENT_COMMUNICATION           NET_FROM_CLIENT_BYTES         |                                                  |        0 MB      0 MB/sec

The DB_PHYSDR_BYTES says this session submitted 29291 MB of I/O as far as the DB layer sees it. The PRED_OFFLOADABLE_BYTES says that all of this 29291 MB of I/O was attempted to be done in a smart way (smart scan). The SMART_SCAN_RET_BYTES tells us that 24993 MB worth of data was sent back from the storage cells as a result of the smart scan (about 14.5% less than the amount of IO issued).

However, the PHYRD_STORIDX_SAVED_BYTES shows zero, we couldn’t avoid doing (skip) any I/Os when scanning the ORDERS table. We couldn’t use the storage index as we did not have any direct filter predicates on the ORDERS table.

Anyway, the story is different for the fast parallel query:

SQL> SELECT * FROM TABLE(exasnap.display_snap(:t1,:t2));     

NAME
-------------------------------------------------------------------------------------------------------------------------------------------
-- ExaSnapper v0.81 BETA by Tanel Poder @ Enkitec - The Exadata Experts ( http://www.enkitec.com )              
-------------------------------------------------------------------------------------------------------------------------------------------
DB_LAYER_IO                    DB_PHYSIO_BYTES               |##################################################|    29291 MB   2212 MB/sec
DB_LAYER_IO                    DB_PHYSRD_BYTES               |##################################################|    29291 MB   2212 MB/sec
DB_LAYER_IO                    DB_PHYSWR_BYTES               |                                                  |        0 MB      0 MB/sec
AVOID_DISK_IO                  PHYRD_FLASH_RD_BYTES          |#######################                           |    13321 MB   1006 MB/sec
AVOID_DISK_IO                  PHYRD_STORIDX_SAVED_BYTES     |###########################                       |    15700 MB   1186 MB/sec
REAL_DISK_IO                   SPIN_DISK_IO_BYTES            |                                                  |      270 MB     20 MB/sec
REAL_DISK_IO                   SPIN_DISK_RD_BYTES            |                                                  |      270 MB     20 MB/sec
REAL_DISK_IO                   SPIN_DISK_WR_BYTES            |                                                  |        0 MB      0 MB/sec
REDUCE_INTERCONNECT            PRED_OFFLOADABLE_BYTES        |##################################################|    29291 MB   2212 MB/sec
REDUCE_INTERCONNECT            TOTAL_IC_BYTES                |                                                  |        4 MB      0 MB/sec
REDUCE_INTERCONNECT            SMART_SCAN_RET_BYTES          |                                                  |        4 MB      0 MB/sec
REDUCE_INTERCONNECT            NON_SMART_SCAN_BYTES          |                                                  |        0 MB      0 MB/sec
CELL_PROC_DEPTH                CELL_PROC_DATA_BYTES          |#######################                           |    13591 MB   1027 MB/sec
CELL_PROC_DEPTH                CELL_PROC_INDEX_BYTES         |                                                  |        0 MB      0 MB/sec
CLIENT_COMMUNICATION           NET_TO_CLIENT_BYTES           |                                                  |        0 MB      0 MB/sec
CLIENT_COMMUNICATION           NET_FROM_CLIENT_BYTES         |                                                  |        0 MB      0 MB/sec

The DB_PHYSIO_BYTES is still the same (29GB) as in the previous case – as this database layer metric knows only about the amount of I/Os it has requested, not what actually gets (or doesn’t get) done inside the storage cells. SMART_SCAN_RET_BYTES is only 4MB (compared to almost 25GB previously) so evidently there must be some early filtering going on somewhere in the lower layers.

And now to the main topic of this article – the PHYRD_STORIDX_SAVED_BYTES metric (that comes from the cell physical IO bytes saved by storage index metric in V$SESSTAT) shows that we have managed to avoid doing 15700 MB worth of IO completely thanks to the storage indexes. And this is without having any direct SQL filter predicates on the large ORDERS table! How is this possible? Keep reading :-)

Before going deeper, let’s look into the predicate section of these cursors.

The slow serial query:

--------------------------------------------------------------------------------------
| Id  | Operation                           | Name          | E-Rows | Pstart| Pstop |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |               |        |       |       |
|*  1 |  HASH JOIN                          |               |      1 |       |       |
|   2 |   TABLE ACCESS BY GLOBAL INDEX ROWID| CUSTOMERS     |      1 | ROWID | ROWID |
|*  3 |    INDEX RANGE SCAN                 | CUST_EMAIL_IX |      1 |       |       |
|   4 |   PARTITION HASH ALL                |               |    506M|     1 |    64 |
|   5 |    TABLE ACCESS STORAGE FULL        | ORDERS        |    506M|     1 |    64 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("O"."CUSTOMER_ID"="C"."CUSTOMER_ID")
   3 - access("C"."CUST_EMAIL"='florencio@ivtboge.com')

The fast parallel query:

-------------------------------------------------------------------------------------------
| Id  | Operation                                | Name          | E-Rows | Pstart| Pstop |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                         |               |        |       |       |
|   1 |  PX COORDINATOR                          |               |        |       |       |
|   2 |   PX SEND QC (RANDOM)                    | :TQ10001      |      1 |       |       |
|*  3 |    HASH JOIN                             |               |      1 |       |       |
|   4 |     BUFFER SORT                          |               |        |       |       |
|   5 |      PX RECEIVE                          |               |      1 |       |       |
|   6 |       PX SEND BROADCAST                  | :TQ10000      |      1 |       |       |
|   7 |        TABLE ACCESS BY GLOBAL INDEX ROWID| CUSTOMERS     |      1 | ROWID | ROWID |
|*  8 |         INDEX RANGE SCAN                 | CUST_EMAIL_IX |      1 |       |       |
|   9 |     PX PARTITION HASH ALL                |               |    506M|     1 |    64 |
|* 10 |      TABLE ACCESS STORAGE FULL           | ORDERS        |    506M|     1 |    64 |
-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("O"."CUSTOMER_ID"="C"."CUSTOMER_ID")
   8 - access("C"."CUST_EMAIL"='florencio@ivtboge.com')
  10 - storage(SYS_OP_BLOOM_FILTER(:BF0000,"O"."CUSTOMER_ID"))
       filter(SYS_OP_BLOOM_FILTER(:BF0000,"O"."CUSTOMER_ID"))

See how the parallel plan has the storage() predicate with SYS_OP_BLOOM_FILTER in it and the bloom filter :BF0000 will be compared to the hashed CUSTOMER_ID column values when scanning the ORDERS table in storage cells. So, this shows that your session attempts to push the Bloom filter down to the storage layer.

However, merely seeing the Bloom filter storage predicate in the plan doesn’t tell you whether any IOs were avoided thanks to the storage indexes and Bloom filters. The only way to know would be to run the query and look into the cell physical IO bytes saved by storage index metric (or PHYRD_STORIDX_SAVED_BYTES in Exadata Snapper). Unfortunately it will not be easy to measure this at rowsource level when you have multiple smart scans happening in different locations of your execution plan (more about this in the next blog entry :-)

Anyway, how is Oracle able to avoid reading in blocks of one table based on the output of another table involved in the join?

The answer lies in the parameter _bloom_minmax_enabled. It’s description says “enable or disable bloom min max filtering”. This parameter is true by default and on Exadata it means that in addition to computing the Bloom filter bitmap (based on hashed values in the join column) we also keep track of the smallest and biggest value retrieved from the driving table of the join.

In our example, both the Bloom filter bitmap of all matching CUSTOMER_IDs retrieved from the CUSTOMERS table (after any direct predicate filtering) would be sent to the storage cells and also the biggest and smallest CUSTOMER_ID retrieved from the driving table of the join.

In our hand-crafted example where only a single customer was taken from the driving table, only one bit in the bloom filter would be set and both the MIN and MAX CUSTOMER_ID of interest (in the joined ORDERS table) was set to 399999199 (I had about 400 Million customers generated in my benchmark dataset).

So, knowing that we were looking for only CUSTOMER_IDs in the single value “range” of 399999199, the smart scan could start skipping IOs thanks to the storage indexes in memory! You can think of this as runtime BETWEEN predicate generation from a driving table in a join that then gets offloaded to the storage cell and applied when scanning the other table in the join. IOs can be avoided thanks to knowing the range of values we are looking for and then further row-level filtering can be done using the usual Bloom filter bitmap comparison.

 

Note that the above examples are from Oracle 11.2.0.3 – in this version the Bloom filtering features should kick in only for parallel queries (although Jonathan Lewis has blogged about a case when this happens also on 11.2.0.3).

And this is the ultimate reason why the serial execution was so much slower and less efficient than the parallel run – in my demos, on 11.2.0.3 the Bloom filter usage kicked in only when running the query in parallel.

However, this has changed on Oracle 11.2.0.4 – now also serial queries frequently compute bloom filters and push these to the storage, for the usual bloom filtering reasons and for IO skipping by comparing the bloom filter min/max values to the Exadata storage index memory structures.

This is an example from Oracle 11.2.0.4, the plan is slightly different because in this database my test tables are not partitioned (and I didn’t have any indexes on the tables):

------------------------------------------------------------------
| Id  | Operation                   | Name      | E-Rows |E-Bytes|
------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |        |       |
|*  1 |  HASH JOIN                  |           |      1 |   114 |
|   2 |   JOIN FILTER CREATE        | :BF0000   |      1 |    64 |
|*  3 |    TABLE ACCESS STORAGE FULL| CUSTOMERS |      1 |    64 |
|   4 |   JOIN FILTER USE           | :BF0000   |   4581K|   218M|
|*  5 |    TABLE ACCESS STORAGE FULL| ORDERS    |   4581K|   218M|
------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("O"."CUSTOMER_ID"="C"."CUSTOMER_ID")
   3 - storage("C"."CUST_EMAIL"='florencio@ivtboge.com')
       filter("C"."CUST_EMAIL"='florencio@ivtboge.com')
   5 - storage(SYS_OP_BLOOM_FILTER(:BF0000,"O"."CUSTOMER_ID"))
       filter(SYS_OP_BLOOM_FILTER(:BF0000,"O"."CUSTOMER_ID"))

The Bloom bitmap (and MIN/MAX value) would be computed in the step #2 and would then be used by the full table scan in the step #5 – having the filtering info pushed all the way to the storage layer (assuming that the smart scan did kick in).

If you look into the Outline hints section of this plan, you’ll see the PX_JOIN_FILTER hint that has showed up there, this instructs the optimizer to set up and use the bloom filters (and despite the name, it now can be used for serial queries):

Outline Data
-------------

  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('11.2.0.4')
      DB_VERSION('11.2.0.4')
      ALL_ROWS
      OUTLINE_LEAF(@"SEL$1")
      FULL(@"SEL$1" "C"@"SEL$1")
      FULL(@"SEL$1" "O"@"SEL$1")
      LEADING(@"SEL$1" "C"@"SEL$1" "O"@"SEL$1")
      USE_HASH(@"SEL$1" "O"@"SEL$1")
      PX_JOIN_FILTER(@"SEL$1" "O"@"SEL$1")
      END_OUTLINE_DATA
  */

So if the CBO doesn’t choose it automatically, you can use this hint for testing whether your queries could potentially benefit from this feature (note that it controls the whole join filter propagation logic, not only the Exadata stuff). You can also use the _bloom_serial_filter parameter to disable this behavior on Exadata (not that you should).

Note that I usually treat the storage index IO savings as just an added performance benefit at the system level, something that allows to keep the disk devices a little less busy, therefore yielding higher overall throughput. I do not design my applications’ performance overly dependent on the storage indexes as it’s not possible to easily control which columns end up in the storage index hashtables and that may give you unpredictable results.

That’s all – hopefully this shed some light on this cool combination of multiple different features (Smart Scans + Hash Joins + Bloom Filters + Storage Indexes).

 

This entry was posted in Exadata, Oracle. Bookmark the permalink.

10 Responses to Combining Bloom Filter Offloading and Storage Indexes on Exadata

  1. Arup Nanda says:

    Excellent, just superb. Thank you, Tanel, for this detailed treatise on these unique features that make Exadata exactly that-Exadata.

  2. Tanel Poder says:

    I forgot to mention that starting from 11.2.0.4 this serial bloom filter offloading can be controlled using the _bloom_serial_filter parameter.

  3. Tanel,
    Thanks a lot for this superb post.
    Let me just ask one question.
    Looking both at the serial and the parallel plan I can point out that for the serial plan E-Rows = 506M and A-Rows = 408M. While for the parallel plan E-Rows is still equal to 506M and A-Rows, this time, equals 4488. This big difference between E-Rows (*Starts) and A-Rows for the parallel run cannot be due to a statistic problem because the serial plan shows that the CBO is almost doing good estimations. At this point, I would have suspected that something happened when full scanning the ORDERS table in the parallel plan. How the A-Rows in the parallel run can differ so much from the E-Rows? And of course I would have looked to the predicate part where I would have pointed out the bloom filter applied exactly at this operation.
    The parallel plan has been honoured via a BROADCAST distribution method this is why the Bloom Filter is invisible in the plan (may be it is possible now in 12c I have to check). If the distribution between the parallel slave processes has been done via a HASH-HASH method we would have observed an indication like JOIN FILTER CREATE: BF000 and JOIN FILTER USE: BF000 which would have immediately show us that the bloom filter has been used.
    So my question: do this E-Rows and A-Rows reasoning look appropriate for you in this particular case at least to explain the bloom filter effect?

    Best Regards

  4. Pavol Babel says:

    Tanel it would be interesting to keep data from both tables in Flash Cache (by using cell_flash_cache_keep of course ) and compare time with disabled serial bloom filtering against enablef BF on hard disk.
    Well, now checking you are running 12.1.1.0 on cells, were already all IO from flash (as 12.1.1.0 and 11.3.3.0 storage SW populates flash cache also by smart scans)

    • Tanel Poder says:

      The flash cache IO decision operates at a different level – at a later stage *after* the storage index check. If the storage index avoids I/O, there’s no need to go to flash cache either (nor even check it).

    • Tanel Poder says:

      Ok I guess you meant just the performance difference – brute force scanning of flash disks vs skipping IOs on hard disks… But why not have both? ;-)

      • Pavol Babel says:

        Yes I ment just the differenfe. storage index is interesting feature, however the FC brute force is also huge (for read IO) which means smart scans, SI etc. are not so cool on FC. They are still good of course, but not making such a huge differenfe as on hard disk based IO. That’s why I was interested in those numbers ;)

        • Pavol Babel says:

          Which also means for comparable price with HW price + licences of Storage Servers, you can buy 2 Vilion 6600 or FlashSystem 840 and beat Exadata in lpenty of applications, since Flash is at least twice slower on Exadata (F40 F80) when compared to mentioned solutions.
          And then many of exadata features isntantly disappear in many cases

  5. Evgeny says:

    Hi, Tanel. Thank you for great post! As always you do great job :)
    Just to add to your research, I’ve done the same exercise on my test system with 11.2.0.4 on board and can confirm behavior was exactly like you described above, so I wasn’t able to force direct path unless set parallel degree forced – 2 slaves give 8 times improvement. Perhaps not much changed 11.2.0.3 -> 11.2.0.4 in regards to this behavior.

    • Tanel Poder says:

      Thanks Evgeny. There’s 2 separate issues here 1) whether the direct path read kicks in (that hasn’t changed in 11.2.0.3->11.2.0.4) and 2) whether the bloom filter usage kicks in for serial queries (that has changed in 11.2.0.4)

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>