cell flash cache read hits vs. cell writes to flash cache statistics on Exadata

When the Smart Flash Cache was introduced in Exadata, it was caching reads only. So there were only read “optimization” statistics like cell flash cache read hits and physical read requests/bytes optimized in V$SESSTAT and V$SYSSTAT (the former accounted for the read IO requests that got its data from the flash cache and the latter ones accounted the disk IOs avoided both thanks to the flash cache and storage indexes). So if you wanted to measure the benefit of flash cache only, you’d have to use the cell flash cache read hits metric.

This all was fine until you enabled the Write-Back flash cache in a newer version of cellsrv. We still had only the “read hits” statistic in the V$ views! And when investigating it closer, both the read hits and write hits were accumulated in the same read hits statistic! (I can’t reproduce this on our patched with latest cellsrv anymore, but it was definitely the behavior earlier, as I demoed it in various places).

Side-note: This is likely because it’s not so easy to just add more statistics to Oracle code within a single small patch. The statistic counters are referenced by other modules using macros with their direct numeric IDs (and memory offsets to v$sesstat array) and the IDs & addresses would change when more statistics get added. So, you can pretty much add new statistic counters only with new full patchsets, like It’s the same with instance parameters by the way, that’s why the “spare” statistics and spare parameters exist, they’re placeholders for temporary use, until the new parameter or statistic gets added permanently with a full patchset update.

So, this is probably the reason why both the flash cache read and write hits got initially accumulated under the cell flash cache read hits statistic, but later on this seemed to get “fixed”, so that the read hits only showed read hits and the flash write hits were not accounted anywhere. You can test this easily by measuring your DBWR’s v$sesstat metrics with snapper for example, if you get way more cell flash cache read hits than physical read total IO requests, then you’re probably accumulating both read and write hits in the same metric.

Let’s look into a few different database versions:

SQL> @i

USERNAME             INST_NAME    HOST_NAME                 SID   SERIAL#  VERSION    STARTED 
-------------------- ------------ ------------------------- ----- -------- ---------- --------
SYS                  db12c1       enkdb03.enkitec.com       1497  20671 20131127

SQL> @sys cell%flash

NAME                                                                                  VALUE
---------------------------------------------------------------- --------------------------
cell flash cache read hits                                                          1874361

In the database above, we still have only the read hits metric. But in the Oracle output below, we finally have the flash cache IOs broken down by reads and writes, plus a few special metrics indicating if the block written to already existed in the flash cache (cell overwrites in flash cache) and when the block range written to flash was only partially cached in flash already when the DB issued the write (cell partial writes in flash cache):

SQL> @i

USERNAME             INST_NAME    HOST_NAME                 SID   SERIAL#  VERSION    STARTED 
-------------------- ------------ ------------------------- ----- -------- ---------- --------
SYS                  dbm012       enkdb02.enkitec.com       199   607 20131201

SQL> @sys cell%flash

NAME                                                                                  VALUE
---------------------------------------------------------------- --------------------------
cell writes to flash cache                                                           711439
cell overwrites in flash cache                                                       696661
cell partial writes in flash cache                                                        9
cell flash cache read hits                                                           699240

So, this probably means that the upcoming Oracle will have the flash cache write hit metrics in it too. So in the newer versions there’s no need to get creative when estimating the write-back flash cache hits in our performance scripts (the Exadata Snapper currently tries to derive this value from other metrics, relying on the bug where both read and write hits accumulated under the same metric, so I will need to update it based on the DB version we are running on).

So, when I look into one of the DBWR processes in a DB on Exadata, I see the breakdown of flash read vs write hits:

SQL> @i

USERNAME             INST_NAME    HOST_NAME                 SID   SERIAL#  VERSION    STARTED 
-------------------- ------------ ------------------------- ----- -------- ---------- --------
SYS                  dbm012       enkdb02.enkitec.com       199   607 20131201

SQL> @exadata/cellver
Show Exadata cell versions from V$CELL_CONFIG....

-------------------- -------------------- -------------------- -------------------- ----------         enkcel01              WriteBack            16         enkcel02              WriteBack            16         enkcel03              WriteBack            16        

SQL> @ses2 "select sid from v$session where program like '%DBW0%'" flash

       SID NAME                                                                  VALUE
---------- ---------------------------------------------------------------- ----------
       296 cell writes to flash cache                                            50522
       296 cell overwrites in flash cache                                        43998
       296 cell flash cache read hits                                               36

SQL> @ses2 "select sid from v$session where program like '%DBW0%'" optimized

       SID NAME                                                                  VALUE
---------- ---------------------------------------------------------------- ----------
       296 physical read requests optimized                                         36
       296 physical read total bytes optimized                                  491520
       296 physical write requests optimized                                     25565
       296 physical write total bytes optimized                              279920640

If you are wondering that why is the cell writes to flash cache metric roughly 2x bigger than the physical write requests optimized, it’s because of the ASM double mirroring we use. The physical writes metrics are counted at the database-scope IO layer (KSFD), but the ASM mirroring is done at a lower layer in the Oracle process codepath (KFIO). So when the DBWR issues a 1 MB write, v$sesstat metrics would record a 1 MB IO for it, but the ASM layer at the lower level would actually do 2 or 3x more IO due to double- or triple-mirroring. As the cell writes to flash cache metric is actually sent back from all storage cells involved in the actual (ASM-mirrored) write IOs, then we will see more around 2-3x storage flash write hits, than physical writes issued at the database level (depending on which mirroring level you use). Another way of saying this would be that the “physical writes” metrics are measured at higher level, “above” the ASM mirroring and the “flash hits” metrics are measured at a lower level, “below” the ASM mirroring in the IO stack.

This entry was posted in Exadata, Oracle. Bookmark the permalink.

4 Responses to cell flash cache read hits vs. cell writes to flash cache statistics on Exadata

  1. Jared says:

    Nice info, thanks Tanel.
    Perhaps another way to state it: “physical writes” are write calls at the database level, “flash hits” are measured at the IO level.

  2. Pavol Babel says:

    Hi Tanel,

    it seems when smart can IO size is always 1MB on Exadata X3-2 (and I believe it is the same story on X2-2,X4-2). When smart can is performed on data situated in flash cache, it seems cell storage SW is splitting 1MB IO to 16 io requests of size 64kB (since 64kB seems to be the largest block with sufficient read latency). If I check metric CD_IO_BY_R_LG_SEC on flash disks, it is always 0 (which is consistent with by previous observation). However, CD_IO_BY_W_LG_SEC is not always 0. What kind of IO operation could cause IO size > 64kB on Flash Card?
    My first thought was TEMP WRITE, but writes to temporary tablespace are written to hard disks (and most often they hit disk controller 512MB writeback cache) and the repsonse time for LARGE IO is awfull. What else? Direct writes (INSERT /*+ APPEND */ ) always bypass Flash Cards as well. Should I focus on FlashLog writes, which could count to statistic at CELL LEVEL?

    Pavol Babel
    OCM 10g/11g

  3. Tadeu Camargo says:


    What do you recommends about CELL_FLASH_CACHE for tables?

    Set to KEEP to most used tables?

    Thank you very much!

    • Tanel Poder says:

      The most important thing would be to upgrade the storage cells to at least *or later*, as in this version big changes to flash cache management were done (and IORM also started to control flash cache IO). If you’re on this version or later, Oracle will automatically smart scan tables from flash without having to explicitly mark these as KEEP.

Leave a Reply

Your email address will not be published. Required fields are marked *