cell flash cache read hits vs. cell writes to flash cache statistics on Exadata

When the Smart Flash Cache was introduced in Exadata, it was caching reads only. So there were only read “optimization” statistics like cell flash cache read hits and physical read requests/bytes optimized in V$SESSTAT and V$SYSSTAT (the former accounted for the read IO requests that got its data from the flash cache and the latter ones accounted the disk IOs avoided both thanks to the flash cache and storage indexes). So if you wanted to measure the benefit of flash cache only, you’d have to use the cell flash cache read hits metric.

This all was fine until you enabled the Write-Back flash cache in a newer version of cellsrv. We still had only the “read hits” statistic in the V$ views! And when investigating it closer, both the read hits and write hits were accumulated in the same read hits statistic! (I can’t reproduce this on our patched 11.2.0.3 with latest cellsrv anymore, but it was definitely the behavior earlier, as I demoed it in various places).

Side-note: This is likely because it’s not so easy to just add more statistics to Oracle code within a single small patch. The statistic counters are referenced by other modules using macros with their direct numeric IDs (and memory offsets to v$sesstat array) and the IDs & addresses would change when more statistics get added. So, you can pretty much add new statistic counters only with new full patchsets, like 11.2.0.4. It’s the same with instance parameters by the way, that’s why the “spare” statistics and spare parameters exist, they’re placeholders for temporary use, until the new parameter or statistic gets added permanently with a full patchset update.

So, this is probably the reason why both the flash cache read and write hits got initially accumulated under the cell flash cache read hits statistic, but later on this seemed to get “fixed”, so that the read hits only showed read hits and the flash write hits were not accounted anywhere. You can test this easily by measuring your DBWR’s v$sesstat metrics with snapper for example, if you get way more cell flash cache read hits than physical read total IO requests, then you’re probably accumulating both read and write hits in the same metric.

Let’s look into a few different database versions:

SQL> @i

USERNAME             INST_NAME    HOST_NAME                 SID   SERIAL#  VERSION    STARTED 
-------------------- ------------ ------------------------- ----- -------- ---------- --------
SYS                  db12c1       enkdb03.enkitec.com       1497  20671    12.1.0.1.0 20131127

SQL> @sys cell%flash

NAME                                                                                  VALUE
---------------------------------------------------------------- --------------------------
cell flash cache read hits                                                          1874361

In the 12.1.0.1 database above, we still have only the read hits metric. But in the Oracle 11.2.0.4 output below, we finally have the flash cache IOs broken down by reads and writes, plus a few special metrics indicating if the block written to already existed in the flash cache (cell overwrites in flash cache) and when the block range written to flash was only partially cached in flash already when the DB issued the write (cell partial writes in flash cache):

SQL> @i

USERNAME             INST_NAME    HOST_NAME                 SID   SERIAL#  VERSION    STARTED 
-------------------- ------------ ------------------------- ----- -------- ---------- --------
SYS                  dbm012       enkdb02.enkitec.com       199   607      11.2.0.4.0 20131201

SQL> @sys cell%flash

NAME                                                                                  VALUE
---------------------------------------------------------------- --------------------------
cell writes to flash cache                                                           711439
cell overwrites in flash cache                                                       696661
cell partial writes in flash cache                                                        9
cell flash cache read hits                                                           699240

So, this probably means that the upcoming Oracle 12.1.0.2 will have the flash cache write hit metrics in it too. So in the newer versions there’s no need to get creative when estimating the write-back flash cache hits in our performance scripts (the Exadata Snapper currently tries to derive this value from other metrics, relying on the bug where both read and write hits accumulated under the same metric, so I will need to update it based on the DB version we are running on).

So, when I look into one of the DBWR processes in a 11.2.0.4 DB on Exadata, I see the breakdown of flash read vs write hits:

SQL> @i

USERNAME             INST_NAME    HOST_NAME                 SID   SERIAL#  VERSION    STARTED 
-------------------- ------------ ------------------------- ----- -------- ---------- --------
SYS                  dbm012       enkdb02.enkitec.com       199   607      11.2.0.4.0 20131201

SQL> @exadata/cellver
Show Exadata cell versions from V$CELL_CONFIG....

CELL_PATH            CELL_NAME            CELLSRV_VERSION      FLASH_CACHE_MODE     CPU_COUNT 
-------------------- -------------------- -------------------- -------------------- ----------
192.168.12.3         enkcel01             11.2.3.2.1           WriteBack            16        
192.168.12.4         enkcel02             11.2.3.2.1           WriteBack            16        
192.168.12.5         enkcel03             11.2.3.2.1           WriteBack            16        

SQL> @ses2 "select sid from v$session where program like '%DBW0%'" flash

       SID NAME                                                                  VALUE
---------- ---------------------------------------------------------------- ----------
       296 cell writes to flash cache                                            50522
       296 cell overwrites in flash cache                                        43998
       296 cell flash cache read hits                                               36

SQL> @ses2 "select sid from v$session where program like '%DBW0%'" optimized

       SID NAME                                                                  VALUE
---------- ---------------------------------------------------------------- ----------
       296 physical read requests optimized                                         36
       296 physical read total bytes optimized                                  491520
       296 physical write requests optimized                                     25565
       296 physical write total bytes optimized                              279920640

If you are wondering that why is the cell writes to flash cache metric roughly 2x bigger than the physical write requests optimized, it’s because of the ASM double mirroring we use. The physical writes metrics are counted at the database-scope IO layer (KSFD), but the ASM mirroring is done at a lower layer in the Oracle process codepath (KFIO). So when the DBWR issues a 1 MB write, v$sesstat metrics would record a 1 MB IO for it, but the ASM layer at the lower level would actually do 2 or 3x more IO due to double- or triple-mirroring. As the cell writes to flash cache metric is actually sent back from all storage cells involved in the actual (ASM-mirrored) write IOs, then we will see more around 2-3x storage flash write hits, than physical writes issued at the database level (depending on which mirroring level you use). Another way of saying this would be that the “physical writes” metrics are measured at higher level, “above” the ASM mirroring and the “flash hits” metrics are measured at a lower level, “below” the ASM mirroring in the IO stack.

Note that this year’s only Advanced Oracle Troubleshooting class takes place in the end of April/May 2014, so sign up now if you plan to attend this year!

This entry was posted in Exadata, Oracle. Bookmark the permalink.

2 Responses to cell flash cache read hits vs. cell writes to flash cache statistics on Exadata

  1. Jared says:

    Nice info, thanks Tanel.
    Perhaps another way to state it: “physical writes” are write calls at the database level, “flash hits” are measured at the IO level.

  2. Pavol Babel says:

    Hi Tanel,

    it seems when smart can IO size is always 1MB on Exadata X3-2 (and I believe it is the same story on X2-2,X4-2). When smart can is performed on data situated in flash cache, it seems cell storage SW is splitting 1MB IO to 16 io requests of size 64kB (since 64kB seems to be the largest block with sufficient read latency). If I check metric CD_IO_BY_R_LG_SEC on flash disks, it is always 0 (which is consistent with by previous observation). However, CD_IO_BY_W_LG_SEC is not always 0. What kind of IO operation could cause IO size > 64kB on Flash Card?
    My first thought was TEMP WRITE, but writes to temporary tablespace are written to hard disks (and most often they hit disk controller 512MB writeback cache) and the repsonse time for LARGE IO is awfull. What else? Direct writes (INSERT /*+ APPEND */ ) always bypass Flash Cards as well. Should I focus on FlashLog writes, which could count to statistic at CELL LEVEL?

    Regards
    Pavol Babel
    OCM 10g/11g

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>