Search Results for: snapper

Optimizer statistics-driven direct path read decision for full table scans (_direct_read_decision_statistics_driven)

Hello all fellow Oracle geeks and technology enthusiasts! Long time no see ;-)

In the hacking session about Oracle full table scans and direct path reads I explained how the direct path read decision is not done by the optimizer, but instead during every execution, separately for every single segment (partition) scanned in the query. I also explained how the _small_table_threshold parameter and the X$KCBOQH.NUM_BUF(which keeps track of how many buffers of any segment are currently cached) are used for determining whether to scan using direct path reads or not.

If you don’t know what the _small_table_threshold parameter is and how it relates to the direct path read decision, then read this post by Alex Fatkulin first.

In addition to the abovementioned values, Oracle needs to know how many blocks there are to scan (under the HWM) in a segment. This was traditionally done by reading in the segment header block first (using a regular buffered single block read – “db file sequential read”).

Starting from Oracle 11.2.0.2, things have changed a little. Instead of making the direct path read decision based on the actual segment block count extracted from the segment header, Oracle actually takes this number from TAB$.BLKCNT (dba_tables.blocks) or TABPART$.BLKCNT, IND$.LEAFCNT etc.

There’s a new parameter _direct_read_decision_statistics_driven which controls this:

 SQL> @pd direct_read_decision
 Show all parameters and session values from x$ksppi/x$ksppcv...

NAME                                               VALUE    DESCRIPTION
 -------------------------------------------------- -------- ----------------------------------------------------------
 _direct_read_decision_statistics_driven            TRUE     enable direct read decision based on optimizer statistics
  • When this parameter is FALSE, the direct path read decision is done based on the segment header’s block count (actual block count).
  • When TRUE (default in 11.2.0.2+), the direct path read decision is done based on the block count stored in one of the base tables (TAB$, IND$) – the optimizer statistics

 
Note that even though the block counts are taken from the optimizer statistics in data dictionary, it’s not the optimizer who does the direct path read decision in the traditional sense (it’s not a cost-based decision).

Here’s an example from my test database:

SQL> CREATE TABLE t AS SELECT * FROM dba_source;

Table created.

SQL> @gts t
Gather Table Statistics for table t...

PL/SQL procedure successfully completed.

SQL>
SQL> SELECT blocks FROM user_tables WHERE table_name = 'T';

BLOCKS
----------
10704

SQL>

The table uses 10704 blocks (up HWM).

SQL> SELECT name,block_size,buffers FROM v$buffer_pool;

NAME                 BLOCK_SIZE    BUFFERS
-------------------- ---------- ----------
DEFAULT                    8192       9424

The table (10704 blocks) is bigger than the entire buffer cache (9424 buffers). And this is way bigger than the _small_table_threshold value of 168 in my instance (watch the full table scans and direct path reads) hacking session for more about this parameter.

So whenever I run a SELECT COUNT(*) FROM t, I see direct path reads show up:

SQL> @snapper ash,stats=sw,sinclude=table.scan|gets 5 1 99
Sampling SID 99 with interval 5 seconds, taking 1 snapshots...

-- Session Snapper v3.54 by Tanel Poder ( http://blog.tanelpoder.com )

----------------------------------------------------------------------------------------------------------------------------------------
    SID, USERNAME  , TYPE, STATISTIC                                                 ,         DELTA, HDELTA/SEC,    %TIME, GRAPH
----------------------------------------------------------------------------------------------------------------------------------------
     99, SYS       , STAT, db block gets                                             ,             1,         .2,
     99, SYS       , STAT, db block gets from cache                                  ,             1,         .2,
     99, SYS       , STAT, consistent gets                                           ,         11867,      2.37k,
     99, SYS       , STAT, consistent gets from cache                                ,             7,        1.4,
     99, SYS       , STAT, consistent gets from cache (fastpath)                     ,             6,        1.2,
     99, SYS       , STAT, consistent gets - examination                             ,             1,         .2,
     99, SYS       , STAT, consistent gets direct                                    ,         11860,      2.37k,
     99, SYS       , STAT, no work - consistent read gets                            ,         11859,      2.37k,
     99, SYS       , STAT, cleanouts only - consistent read gets                     ,             1,         .2,
     99, SYS       , STAT, table scans (long tables)                                 ,             1,         .2,
     99, SYS       , STAT, table scans (direct read)                                 ,             1,         .2,
     99, SYS       , STAT, table scan rows gotten                                    ,        739834,    147.97k,
     99, SYS       , STAT, table scan blocks gotten                                  ,         11860,      2.37k,
     99, SYS       , TIME, parse time elapsed                                        ,            46,      9.2us,      .0%, |          |
     99, SYS       , TIME, DB CPU                                                    ,         79988,       16ms,     1.6%, |@         |
     99, SYS       , TIME, sql execute elapsed time                                  ,        254990,       51ms,     5.1%, |@         |
     99, SYS       , TIME, DB time                                                   ,        255375,    51.08ms,     5.1%, |@         |
     99, SYS       , WAIT, enq: KO - fast object checkpoint                          ,        174947,    34.99ms,     3.5%, |@         |
     99, SYS       , WAIT, direct path read                                          ,          1280,      256us,      .0%, |          |
     99, SYS       , WAIT, SQL*Net message to client                                 ,             9,      1.8us,      .0%, |          |
     99, SYS       , WAIT, SQL*Net message from client                               ,       4672912,   934.58ms,    93.5%, |@@@@@@@@@@|
     99, SYS       , WAIT, events in waitclass Other                                 ,             6,      1.2us,      .0%, |          |
--  End of Stats snap 1, end=2012-09-02 20:03:55, seconds=5

---------------------------------------------------------------------------------
Active% | SQL_ID          | EVENT                               | WAIT_CLASS
---------------------------------------------------------------------------------
     2% | 88r4qn9mwhcf5   | enq: KO - fast object checkpoint    | Application
     2% | 88r4qn9mwhcf5   | ON CPU                              | ON CPU

--  End of ASH snap 1, end=2012-09-02 20:03:55, seconds=5, samples_taken=43

Let’s now fake the table stats so it looks like that there’s only 5 blocks in it – way below the _small_table_threshold value

SQL> EXEC DBMS_STATS.SET_TABLE_STATS(user,'T',numblks=>5);

PL/SQL procedure successfully completed.

SQL> SELECT COUNT(*) FROM t /* attempt 2 */;

  COUNT(*)
----------
      1000

The direct path reads are gone – we are doing regular buffered reads now!

SQL> @snapper ash,stats=sw,sinclude=table.scan|gets 5 1 99

Sampling SID 99 with interval 5 seconds, taking 1 snapshots...

-- Session Snapper v3.54 by Tanel Poder ( http://blog.tanelpoder.com )

----------------------------------------------------------------------------------------------------------------------------------------
    SID, USERNAME  , TYPE, STATISTIC                                                 ,         DELTA, HDELTA/SEC,    %TIME, GRAPH
----------------------------------------------------------------------------------------------------------------------------------------
     99, SYS       , STAT, db block gets                                             ,             1,        .17,
     99, SYS       , STAT, db block gets from cache                                  ,             1,        .17,
     99, SYS       , STAT, consistent gets                                           ,         11865,      1.98k,
     99, SYS       , STAT, consistent gets from cache                                ,         11865,      1.98k,
     99, SYS       , STAT, consistent gets from cache (fastpath)                     ,         11528,      1.92k,
     99, SYS       , STAT, consistent gets - examination                             ,             1,        .17,
     99, SYS       , STAT, no work - consistent read gets                            ,         11851,      1.98k,
     99, SYS       , STAT, cleanouts only - consistent read gets                     ,             1,        .17,
     99, SYS       , STAT, table scans (long tables)                                 ,             1,        .17,
     99, SYS       , STAT, table scan rows gotten                                    ,        738834,    123.14k,
     99, SYS       , STAT, table scan blocks gotten                                  ,         11852,      1.98k,
     99, SYS       , TIME, parse time elapsed                                        ,            84,       14us,      .0%, |          |
     99, SYS       , TIME, DB CPU                                                    ,        109983,    18.33ms,     1.8%, |@         |
     99, SYS       , TIME, sql execute elapsed time                                  ,        116709,    19.45ms,     1.9%, |@         |
     99, SYS       , TIME, DB time                                                   ,        117102,    19.52ms,     2.0%, |@         |
     99, SYS       , WAIT, db file scattered read                                    ,         63956,    10.66ms,     1.1%, |@         |
     99, SYS       , WAIT, SQL*Net message to client                                 ,             8,     1.33us,      .0%, |          |
     99, SYS       , WAIT, SQL*Net message from client                               ,       5119722,   853.29ms,    85.3%, |@@@@@@@@@ |
--  End of Stats snap 1, end=2012-09-02 20:06:19, seconds=6

---------------------------------------------------------------------------------
Active% | SQL_ID          | EVENT                               | WAIT_CLASS
---------------------------------------------------------------------------------
     2% | 07sgczqj432mr   | db file scattered read              | User I/O

--  End of ASH snap 1, end=2012-09-02 20:06:19, seconds=5, samples_taken=46

Note that I deliberately forced a hard parse (with the “attempt 2″ comment) to compile a new cursor. The _direct_read_decision_statistics_driven parameter is not part of the optimizer environment, so a new child cursor would not be automatically created after the parameter change (the same applies to the _small_table_threshold and _serial_direct_read parameters, by the way). But when I change the SQL text, then an entirely new (parent and child) cursor will be compiled anyway.

But wait a minute! Why do I need to compile a new cursor to get Oracle to read the new block count value from optimizer stats?!

I have said in the beginning of this post (and in many other places) that the direct path read decision is not done by the optimizer anyway and is a runtime decision done during every execution, every time any segment (including individual partitions) is scanned during query runtime. This is true for the old (up to 11.2.0.1) Oracle versions, where a direct path decision is done based on the actual, current block count in the segment header, thus the decision can suddenly change when a segment grows by a few blocks, crossing the _small_table_threshold calculation threshold. Perhaps due to performance stability reasons, this seems to have changed.

My tests on 11.2.0.2 have so far shown that when using the new statistics-driven direct path read decisions, each segments’ block counts are stored somewhere in the compiled cursor and reused during next executions of it, even if the block count of the segment changes in the optimizer stats later on! This might result in somewhat better stability as long as you don’t gather new stats – and your buffer cache size (and already cached block counts) don’t change. However if the amount of cached blocks of a segment does change (due to other, index-based accesses for example), then the direct path decision can still change during runtime. It’s just the block counts which are stored in the cursor, but the other factors affecting the decision (buffer cache size, cached block counts) can still change.

This topic is especially relevant on Exadata, as the entire Smart Scanning functionality depends on whether a direct path read IO method gets picked for full segment scans. When experimenting with this, you’ve got to be pretty careful and thorough (to not come to wrong conclusions) as there multiple moving parts and factors involved in the decisions:

  1. Are block counts taken from optimizer stats or segment header
  2. Do the segment header and/or optimizer stats block counts change
  3. Does the buffer cache size change (thus the _small_table_threshold too)
  4. Does the amount of cached blocks of a segment change
  5. Parallel vs Serial execution
  6. Are buffered parallel full scans allowed (the in-memory PX feature of 11.2)
  7. Did a new child cursor get created or the old one reused
  8. etc :)

Advanced Oracle Troubleshooting Guide – Part 10: Index unique scan doing multiblock reads?!

When you troubleshoot an Oracle (performance) problem you usually want to find out what are the problem sessions doing (wrong) and then why are they doing it.

The “what” question is usually answered using some wait interface based method – like ASH or SQL*Trace, which both add plenty of extra details to the plain wait event data.

My normal session troubleshooting data collection sequence consists of only three steps:

  1. Interpret session-level top wait and top SQL data (ASH, SQL*Trace or custom tools sampling session-level V$views)
  2. Interpret session-level performance counter data (V$SESSTAT)
  3. Sample the problem process stack and list the top stack branches where execution spent the most time

Only looking into #1 is enough in 90% of problem cases. But for advanced troubleshooting of complex and exotic problems I regularly have to dive into the #2 – V$SESSTAT performance counters (That’s the reason why I wrote Snapper for example).

Maybe only once out of every 40-50 troubleshooting cases I dive down into #3 – process stack sampling. While process stack sampling doesn’t lie – it helps especially in these cases where the Oracle’s instrumentation itself doesn’t have a clue what a session/process is doing – it’s like a microscope. It gives you very detailed overview of one part of the problem, but not the big picture. That’s why I never start from #3 when troubleshooting (and if you do, you are doing it wrong – with an exception of a completely hung database ;) I always start from #1 and proceed to #2 only when needed.

Snapper lists both #1 and #2 data for a process by the way.

Anyway, the reason why I’m writing this post is that a few days ago I received a question about an issue which nicely illustrates a case where step #1 shows that there’s an issue and step #2 explains why it’s happening.

There was an execution plan, which ran slightly slower on a new server compared to the old one – and it did read 36 blocks per execution as opposed to only 5:

select ...
from TABLE_X
where ...
 
call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.02       0.02          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      0.00       0.07         36          5          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      0.02       0.09         36          5          0           1

The “disk” column above shows 36 blocks being read in via physical IO – while only 5 buffers had actually been accessed after that (logical IOs are shown in the “query” column).

Now that’s weird – as logical IOs are the ones which cause physical IOs, so the LIO count should normally be equal or greater than the physical IO count.

What’s more stranger, the execution plan is an index unique scan:

 
 
Rows     Row Source Operation
-------  ---------------------------------------------------
      1  TABLE ACCESS BY INDEX ROWID TABLE_X (cr=5 pr=36 pw=0 time=0 us cost=4 size=26 card=1)
      1   INDEX UNIQUE SCAN TABLE_X_IDX (cr=4 pr=35 pw=0 time=0 us cost=3 size=0 card=1)(object id 51063)
 

… but if you look into the “pr=” column, which means “blocks read via physical read”, then you see that 35 out of the 36 blocks were read in by the INDEX UNIQUE SCAN row source (pr=35). These counters are cumulative, so the pr=36 in the parent TABLE ACCESS BY INDEX ROWID row source means that there was only one block read by physical IO at the table access level. So why does the index access read 35 blocks?

Is it index height? Can’t be – as the max index height is 24 levels (including the leaf level). And such an index would be ridiculous anyway :)

Let’s look into the wait profile:

Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  SQL*Net message to client                       2        0.00          0.00
  Disk file operations I/O                        4        0.00          0.00
  db file sequential read                         4        0.01          0.04
  db file scattered read                          1        0.03          0.03
  SQL*Net message from client                     2        0.15          0.15
********************************************************************************

Wow, an index unique scan which is supposed to traverse the index tree with precise single block reads, has done 4 single block reads and one multiblock read!

Now this is where the wait interface and ASH-style data would not help us any further. We know what’s happening – a multiblock read happening where “it shouldn’t”. We still don’t know why this is happening. We would have to guess from here. And this is where V$SESSTAT comes in – there’s over 600 performance counters maintained for each session and many of them show valuable information about what kind of operations (and how many) the RDBMS kernel has done. So I asked the client to run Snapper on a test session and here’s the output (note that the below output is from a different test run where the queries were executed many times in a loop):


----------------------------------------------------------------------------------------------------------------------------------------
    SID, USERNAME  , TYPE, STATISTIC                                                 ,         DELTA, HDELTA/SEC,    %TIME, GRAPH
----------------------------------------------------------------------------------------------------------------------------------------
   1612, ORACLE    , STAT, opened cursors cumulative                                 ,          6617,      1.32k,
   1612, ORACLE    , STAT, user calls                                                ,         13234,      2.65k,
   1612, ORACLE    , STAT, pinned cursors current                                    ,            -1,        -.2,
   1612, ORACLE    , STAT, session logical reads                                     ,          5293,      1.06k,
   1612, ORACLE    , STAT, CPU used when call started                                ,             7,        1.4,
   1612, ORACLE    , STAT, CPU used by this session                                  ,             7,        1.4,
   1612, ORACLE    , STAT, DB time                                                   ,           306,       61.2,
   1612, ORACLE    , STAT, user I/O wait time                                        ,           223,       44.6,
   1612, ORACLE    , STAT, non-idle wait time                                        ,           224,       44.8,
   1612, ORACLE    , STAT, non-idle wait count                                       ,          6775,      1.36k,
   1612, ORACLE    , STAT, process last non-idle time                                ,             5,          1,
   1612, ORACLE    , STAT, physical read total IO requests                           ,           158,       31.6,
   1612, ORACLE    , STAT, physical read total multi block requests                  ,            31,        6.2,
   1612, ORACLE    , STAT, physical read total bytes                                 ,       8265728,      1.65M,
   1612, ORACLE    , STAT, cell physical IO interconnect bytes                       ,       8265728,      1.65M,
   1612, ORACLE    , STAT, consistent gets                                           ,          5293,      1.06k,
   1612, ORACLE    , STAT, consistent gets from cache                                ,          5293,      1.06k,
   1612, ORACLE    , STAT, consistent gets - examination                             ,          5293,      1.06k,
   1612, ORACLE    , STAT, physical reads                                            ,          1009,      201.8,
   1612, ORACLE    , STAT, physical reads cache                                      ,          1009,      201.8,
   1612, ORACLE    , STAT, physical read IO requests                                 ,           158,       31.6,
   1612, ORACLE    , STAT, physical read bytes                                       ,       8265728,      1.65M,
   1612, ORACLE    , STAT, free buffer requested                                     ,          1008,      201.6,
   1612, ORACLE    , STAT, physical reads cache prefetch                             ,           851,      170.2,
   1612, ORACLE    , STAT, physical reads prefetch warmup                            ,           851,      170.2,
   1612, ORACLE    , STAT, shared hash latch upgrades - no wait                      ,           188,       37.6,
   1612, ORACLE    , STAT, calls to get snapshot scn: kcmgss                         ,          1319,      263.8,
   1612, ORACLE    , STAT, file io service time                                      ,            25,          5,
   1612, ORACLE    , STAT, file io wait time                                         ,       2219641,    443.93k,
   1612, ORACLE    , STAT, min active SCN optimization applied on CR                 ,             1,         .2,
   1612, ORACLE    , STAT, index fetch by key                                        ,          1292,      258.4,
   1612, ORACLE    , STAT, session cursor cache hits                                 ,          6440,      1.29k,
   1612, ORACLE    , STAT, buffer is not pinned count                                ,          1277,      255.4,
   1612, ORACLE    , STAT, parse time elapsed                                        ,             6,        1.2,
   1612, ORACLE    , STAT, parse count (total)                                       ,          6380,      1.28k,
   1612, ORACLE    , STAT, execute count                                             ,          6378,      1.28k,
   1612, ORACLE    , STAT, bytes sent via SQL*Net to client                          ,       2195633,    439.13k,
   1612, ORACLE    , STAT, bytes received via SQL*Net from client                    ,       2721339,    544.27k,
   1612, ORACLE    , STAT, SQL*Net roundtrips to/from client                         ,          6374,      1.27k,
   1612, ORACLE    , TIME, parse time elapsed                                        ,        114215,    22.84ms,     2.3%, |@         |
   1612, ORACLE    , TIME, PL/SQL execution elapsed time                             ,         67004,     13.4ms,     1.3%, |@         |
   1612, ORACLE    , TIME, DB CPU                                                    ,        630000,      126ms,    12.6%, |@@        |
   1612, ORACLE    , TIME, sql execute elapsed time                                  ,       2595410,   519.08ms,    51.9%, |@@@@@@    |
   1612, ORACLE    , TIME, DB time                                                   ,       3190919,   638.18ms,    63.8%, |@@@@@@@   |
   1612, ORACLE    , WAIT, db file sequential read                                   ,        978601,   195.72ms,    19.6%, |@@        |
   1612, ORACLE    , WAIT, db file scattered read                                    ,       1213550,   242.71ms,    24.3%, |@@@       |
   1612, ORACLE    , WAIT, SQL*Net message to client                                 ,          9212,     1.84ms,      .2%, |          |
   1612, ORACLE    , WAIT, SQL*Net message from client                               ,       1667368,   333.47ms,    33.3%, |@@@@      |
--  End of Stats snap 1, end=2012-03-09 14:47:02, seconds=5

See the highlighted lines above and scroll right if you don’t see the metric values…

  • The physical read total multi block requests statistic confirms that indeed there were some multiblock reads done by the monitored session.
  • The physical reads cache prefetch metric indicates that there was some data block prefetching happening! This explains why we had multiblock reads done where only single block reads were really needed.
  • And the physical reads prefetch warmup metric explains why did we do any prefetching at all – it’s for buffer cache warmup, where after a restart or a significant buffer cache size increase Oracle decides to populate the empty buffer cache with any data as fast as possible. Oracle assumes that a random multiblock read doesn’t take that much more time than a random single block read anyway, so why not just read a bunch of additional blocks from around that single block that the index access wanted to read. The read operation will be slightly slower, but it might help to avoid multiple future single block reads as they are already in cache. This should affect things only when you restart the instance or significantly increase the buffer cache size. So on a steady state production system you shouldn’t see it (and it’s for a good cause anyway), but if you do performance tests on some test database, change something, bounce the instance and immediately run a test – this may affect your test results. Btw, this is controlled by a hidden parameter _db_cache_pre_warm, but I’m mentioning this just for educational purposes here, I have never fixed a problem by changing this parameter so you should touch it only in your sandbox hacking database :)

And now back to the original topic – this post wasn’t really about this specific prewarm feature, but rather a yet another example of how the V$SESSTAT metrics tell you much more than the usual wait interface and TOP-sql based approach. And you do not need to enable any tracing or undocumented events for getting these detailed metrics out of Oracle, just query the V$SESSION views at the right time. As I said in the beginning, I don’t jump to the microscopic view unless the higher level views aren’t helpful enough, so I recommend you to follow the 3-step “Advanced Oracle Troubleshooting” method when trying to diagnose what the heck some session is doing and why ;^)

An index of my TPT scripts

A lot of people have asked me whether there’s some sort of index or “table of contents” of my TPT scripts (there’s over 500 scripts in the tpt_public.zip file – http://tech.e2sn.com/oracle-scripts-and-tools )

I have planned to create such index for years, but never got to it. I probably never will :) So a good way to extract the descriptions of some scripts is this (run the command in the directory where you extracted my scripts to):

Note: the single and double-quotes may get messed up when the browser tries to be smart and replace them with nicer looking characters (which Unix doesn’t recognize then). When copying & pasting this command, make sure that the single & double-quotes are the regular ones Unix shell can accept):

$ grep -i Purpose: *.sql | awk -F: ‘{ printf(“%20s %-50s\n”, $1, $3) }’
            bhla.sql      Report which blocks are in buffer cache, protected by a cache
         bufprof.sql      Display buffer gets done by a session and their reason
            calc.sql      Basic calculator and dec/hex converter       
        channels.sql      Report KSR channel message counts by channel endpoints
        curheaps.sql      Show main cursor data block heap sizes and their contents
             dba.sql      Convert Data Block Address (a 6 byte hex number) to file#, block#
             ddl.sql      Extracts DDL statements for specified objects
              df.sql  Show Oracle tablespace free space in Unix df style
        diag_sid.sql      Display current Session Wait info            
        diag_sid.sql      An easy to use Oracle session-level performance snapshot utility
           disco.sql      Generates commands for disconnecting selected sessions
     getplusparm.sql      get sqlplus parameter value (such linesize, pagesize, sqlcode,
            grpn.sql      Quick group by query for aggregating Numeric columns
            hash.sql      Show the hash value, SQL_ID and child number of previously
             i2h.sql      Advanced Oracle Troubleshooting Seminar demo script
              im.sql      Display In-Memory Undo (IMU) buffer usage    
            init.sql Initializes sqlplus variables for 156 character terminal
       kglbroken.sql                                                   
       kglbroken.sql      Report broken kgl locks for an object this can be used for 
            kill.sql      Generates commands for killing selected sessions
              la.sql      Show which latch occupies a given memory address and its stats
     lastchanged.sql      Detect when a datablock in table was last changed
       latchprof.sql      Perform high-frequency sampling on V$LATCHHOLDER
   latchprof_old.sql      Perform high-frequency sampling on V$LATCHHOLDER
      latchprofx.sql      Perform high-frequency sampling on V$LATCHHOLDER
              lh.sql      Show latch holding SIDs and latch details from V$LATCHHOLDER
             lhp.sql      Perform high-frequency sampling on V$LATCHHOLDER
            lhpx.sql      Perform high-frequency sampling on V$LATCHHOLDER
     lotshparses.sql      Generate Lots of hard parses and shared pool activity 
    lotshparses2.sql      Generate Lots of hard parses and shared pool activity 
        lotslios.sql      Generate Lots of Logical IOs for testing purposes
        lotspios.sql      Generate Lots of Physical IOs for testing purposes
     lotssparses.sql      Generate Lots of soft parses and library cache/mutex activity 
    lotssparses2.sql      Generate Lots of soft parses and library cache/mutex activity 
       mutexprof.sql      Display KGX mutex sleep history from v$mutex_sleep_history
       nonshared.sql      Print reasons for non-shared child cursors from v$sql_shared_cursor
      nonshared2.sql      Show the reasons why more child cursors were created instead of
      ostackprof.sql      Take target process stack samples and show an execution profile
            pmem.sql      Show process memory usage breakdown – lookup by process SPID
     pmem_detail.sql      Show process memory usage breakdown details – lookup by process SPID
        prefetch.sql      Show KCB layer prefetch                      
          pvalid.sql      Show valid parameter values from V$PARAMETER_VALID_VALUES
        rowcache.sql      Show parent rowcache entries mathcing an object name
              rs.sql      Display available Redo Strands               
               s.sql      Display current Session Wait and SQL_ID info (10g+)
          sample.sql      Sample any V$ view or X$ table and display aggregated results
      sampleaddr.sql      High-frequency sampling of contents of a SGA memory address
             ses.sql      Display Session statistics for given sessions, filter by
            ses2.sql      Display Session statistics for given sessions, filter by
        sgastatx.sql      Show shared pool stats by sub-pool from X$KSMSS
            smem.sql      Show process memory usage breakdown – lookup by session ID
     smem_detail.sql      Show process memory usage breakdown details – lookup by session ID
         snapper.sql      An easy to use Oracle session-level performance snapshot utility
     snapper3.15.sql      An easy to use Oracle session-level performance snapshot utility
      snapper_v1.sql      An easy to use Oracle session-level performance snapshot utility
      snapper_v2.sql      An easy to use Oracle session-level performance snapshot utility
            stat.sql      Execute SQL statement in script argument and report basic
              sw.sql      Display current Session Wait info            
             sw2.sql      Display current Session Wait info            
             swg.sql      Display given Session Wait info grouped by state and event
             swo.sql      Display current Session Wait info            
          topsql.sql      Show TOP SQL ordered by user-provided criteria
            usql.sql      Show another session’s SQL directly from library cache
           usqlx.sql      Show another session’s SQL directly from library cache
        waitprof.sql      Sample V$SESSION_WAIT at high frequency and show resulting 
              xb.sql      Explain a SQL statements execution plan with execution 
            xde2.sql      Describe X$ tables, column offsets and report indexed fixed table
              xm.sql      Explain a SQL statements execution plan directly from library cache
             xma.sql      Explain a SQL statements execution plan directly from library cache
            xmai.sql      Explain a SQL statements execution plan with execution 
             xms.sql      Explain your last SQL statements execution plan with execution 
            xmsh.sql      Explain a SQL statements execution plan with execution 
            xmsi.sql      Explain a SQL statements execution plan with execution

 

Exadata CAN do smart scans on bitmap indexes

As I’m finishing up a performance chapter for the Exadata book (a lot of work!), I thought to take a quick break and write a blog entry.

This is not really worth putting into my Oracle Exadata Performance series (which so far has only 1 article in it anyway) .. so this is a little stand-alone article …

Everybody knows that the Exadata smart scan can be used when scanning tables (and table partitions). You should also know that smart scan can be used with fast full scan on Oracle B-tree indexes (a fast full scan on an index segment is just like a full table scan, only on the index segment (and ignoring branch blocks)).

For some reason there’s a (little) myth circulating that smart scans aren’t used for scanning bitmap indexes.

So, here’s evidence, that smart scan can be used when scanning bitmap indexes:

SQL> select /*+ tanel3 */ count(*) from t1 where owner like '%XYZXYZ%';

...

Plan hash value: 39555139

-----------------------------------------------------------------------------------
| Id  | Operation                             | Name        | E-Rows | Cost (%CPU)|
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |             |        |   505 (100)|
|   1 |  SORT AGGREGATE                       |             |      1 |            |
|   2 |   BITMAP CONVERSION COUNT             |             |    400K|   505   (0)|
|*  3 |    BITMAP INDEX STORAGE FAST FULL SCAN| BI_T1_OWNER |        |            |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - storage(("OWNER" LIKE '%XYZXYZ%' AND "OWNER" IS NOT NULL))
       filter(("OWNER" LIKE '%XYZXYZ%' AND "OWNER" IS NOT NULL))

So, as you see the execution plan sure shows a FAST FULL SCAN on a BITMAP INDEX segment, which happens to be on Exadata STORAGE.

Also, you see a storage() predicate applied on the line 3 of the execution plan, which means that Oracle will attempt to use a smart scan predicate offload – but this can’t always be done!

So, you can’t really determine whether a smart scan happened during execution just by looking into the execution plan, you should really check some V$SESSION statistics too. That’s where my Snapper script becomes handy.

I started Snapper on my session just before running the above query. The “smart table scan” and “smart index scan” performance counters are updated right after Oracle has opened the segment header and determines, from the number of blocks in the segment, whether to call the smart scan codepath or not. In other words, the smart scan counters are inremented in the beginning of the segment scan.

The output is following (some irrelevant counters are stripped for brevity):


@snapper all 5 1 "301"
Sampling SID 301 with interval 5 seconds, taking 1 snapshots...
setting stats to all due to option = all

-- Session Snapper v3.52 by Tanel Poder @ E2SN ( http://tech.e2sn.com )

-------------------------------------------------------------------------------------------------------------------------------------
    SID, USERNAME  , TYPE, STATISTIC                                                 ,     HDELTA, HDELTA/SEC,    %TIME, GRAPH
-------------------------------------------------------------------------------------------------------------------------------------
    301, TANEL     , STAT, physical read total IO requests                           ,         13,        2.6,
    301, TANEL     , STAT, physical read total multi block requests                  ,          4,         .8,
    301, TANEL     , STAT, physical read requests optimized                          ,          1,         .2,
    301, TANEL     , STAT, physical read total bytes optimized                       ,      8.19k,      1.64k,
    301, TANEL     , STAT, physical read total bytes                                 ,      4.63M,     925.7k,
    301, TANEL     , STAT, cell physical IO interconnect bytes                       ,     10.02k,         2k,
    301, TANEL     , STAT, physical reads                                            ,        565,        113,
    301, TANEL     , STAT, physical reads cache                                      ,          1,         .2,
    301, TANEL     , STAT, physical reads direct                                     ,        564,      112.8,
    301, TANEL     , STAT, physical read IO requests                                 ,         13,        2.6,
    301, TANEL     , STAT, physical read bytes                                       ,      4.63M,     925.7k,
    301, TANEL     , STAT, db block changes                                          ,          1,         .2,
    301, TANEL     , STAT, cell physical IO bytes eligible for predicate offload     ,      4.62M,    924.06k,
    301, TANEL     , STAT, cell physical IO interconnect bytes returned by smart scan,      1.82k,      364.8,
    301, TANEL     , STAT, cell blocks processed by cache layer                      ,        564,      112.8,
    301, TANEL     , STAT, cell blocks processed by txn layer                        ,        564,      112.8,
    301, TANEL     , STAT, cell blocks processed by index layer                      ,        564,      112.8,
    301, TANEL     , STAT, cell blocks helped by minscn optimization                 ,        564,      112.8,
    301, TANEL     , STAT, cell index scans                                          ,          1,         .2,
    301, TANEL     , STAT, index fast full scans (full)                              ,          1,         .2,
    301, TANEL     , STAT, index fast full scans (direct read)                       ,          1,         .2,
    301, TANEL     , STAT, bytes sent via SQL*Net to client                          ,        334,       66.8,
    301, TANEL     , STAT, bytes received via SQL*Net from client                    ,        298,       59.6,
    301, TANEL     , STAT, SQL*Net roundtrips to/from client                         ,          2,         .4,
    301, TANEL     , STAT, cell flash cache read hits                                ,          1,         .2,
    301, TANEL     , TIME, hard parse elapsed time                                   ,     1.17ms,    233.8us,      .0%, |          |
    301, TANEL     , TIME, parse time elapsed                                        ,      1.5ms,    300.2us,      .0%, |          |
    301, TANEL     , TIME, DB CPU                                                    ,       11ms,      2.2ms,      .2%, |          |
    301, TANEL     , TIME, sql execute elapsed time                                  ,     82.2ms,    16.44ms,     1.6%, |@         |
    301, TANEL     , TIME, DB time                                                   ,    84.36ms,    16.87ms,     1.7%, |@         |
    301, TANEL     , WAIT, enq: KO - fast object checkpoint                          ,    16.18ms,     3.24ms,      .3%, |          |
    301, TANEL     , WAIT, gc cr grant 2-way                                         ,      223us,     44.6us,      .0%, |          |
    301, TANEL     , WAIT, gc current grant 2-way                                    ,      136us,     27.2us,      .0%, |          |
    301, TANEL     , WAIT, cell smart index scan                                     ,    56.04ms,    11.21ms,     1.1%, |@         |
    301, TANEL     , WAIT, SQL*Net message to client                                 ,        7us,      1.4us,      .0%, |          |
    301, TANEL     , WAIT, SQL*Net message from client                               ,      4.42s,   884.47ms,    88.4%, |@@@@@@@@@ |
    301, TANEL     , WAIT, cell single block physical read                           ,      541us,    108.2us,      .0%, |          |
    301, TANEL     , WAIT, events in waitclass Other                                 ,     2.22ms,    443.2us,      .0%, |          |
--  End of Stats snap 1, end=2011-03-13 19:36:31, seconds=5

As you see from the above “cell index scans” statistic – indeed one index segment was scanned using the cell smart scan method.

So, I would rather call this feature “smart segment scan” to reflect that smart scan can scan more than just tables…

I guess one of the reasons why few people have seen smart bitmap index scans in action is that (single-column) bitmap indexes tend to be small. Smaller than corresponding table segments and B-tree index segments. On partitioned tables they’re much more likely going to be under the “_small_table_threshold” calculation which is used for determining whether to do a direct path full segment scan or not (yes, the _small_table_threshold applies to fast full index scan and fast full bitmap index scan too, not just table scans). So, it’s likely that Oracle chooses to do a regular, buffered full bitmap segment scan and thus won’t even consider using smart scan (as smart scans require direct path reads).

By the way – the direct path read (or not) decision is done per segment – not per object (like a table or index). So if you have 10 partitions in a table (or index), half of them are large, half are smaller, then Oracle may end up using direct path reads (and smart scan) on 5 of them and buffered (dumb) scan on the other 5. If you run something like Snapper on the session, then you’d see the smart scan counters go up by 5 only. As written above, Oracle decides whether to do direct path reads (and smart scan) right after opening the header block of a segment (partition) and reading out how many blocks this partition’s segment has below HWM.

The above applied to serial direct path reads – the Parallel Execution slaves should always read using direct path mode, right? …. Wrong :)

Well, partially wrong… In 11.2.0.2, if the parallel_degree_policy = manual, then yes, PX slaves behave like usual and always force a direct path read (and try to use a smart scan). However, with parallel_degree_policy = AUTO, which is the future of PX auto-management, Oracle can decide to do a buffered parallel scan instead, again disabling the use of smart scan…

One more note – I didn’t say anything about whether you should or should not use (bitmap) indexes on Exadata, it’s an entirely different discussion. I just brought out that the smart scan is used for scanning table segments, B-tree index segments and bitmap index segments if conditions are right.

And in the end I have to say…. that even with this evidence you can’t be fully sure that a smart scan was used throughout the entire segment, but more about this in the book and perhaps in a later blog article. We have interesting times ahead ;-)

Snapper 3.52 – With Oracle 9.2 support!

As I promised last year, I have 2 christmas gifts for you. I have already forgotten what the other one was supposed to be :), but the first one is Snapper v3.52 which has (the much requested) Oracle 9.2 support!

The syntax is the same, with Snapper you can now sample ASH-like data on Oracle 9.2 too. Instead of SQL_IDs it will display you SQL hash values:

SQL> @snapper ash,ash1=user+sql_id,ash2=sid+event 5 1 all
Sampling SID all with interval 5 seconds, taking 1 snapshots...

-- Session Snapper v3.52 by Tanel Poder @ E2SN ( http://tech.e2sn.com )

--------------------------------------------------
Active% | SQL_HASH_VAL | EVENT
--------------------------------------------------
    19% | 867131449    | db file scattered read
    19% | 1458866975   | db file scattered read
    13% | 1267657304   | db file scattered read
     6% | 884811952    | ON CPU
     6% | 581825634    | db file scattered read
     3% | 867131449    | ON CPU
     3% | 1267657304   | ON CPU
     3% | 1866659945   | ON CPU
     3% | 1671194465   | db file scattered read
     3% | 3021169464   | ON CPU

------------------------------------------------
Active% | USERNAME             | SQL_ID
------------------------------------------------
    23% | SYSTEM               | 867131449
    19% | SYSTEM               | 1458866975
    16% | SYSTEM               | 1267657304
     6% | SYSTEM               | 581825634
     6% | SYSTEM               | 884811952
     3% | SYSTEM               | 1558333473
     3% | SYSTEM               | 1671194465
     3% | SYSTEM               | 1866659945
     3% | SYSTEM               | 1927486197
     3% | SYSTEM               | 2700565926

--------------------------------------------
Active% |    SID | EVENT
--------------------------------------------
    77% |     18 | db file scattered read
    19% |     18 | ON CPU
     3% |     18 | db file sequential read

--  End of ASH snap 1, end=2011-01-10 03:02:58, seconds=5, samples_taken=31

PL/SQL procedure successfully completed.


Other than the 9i change, the rest of the snapper is pretty much the same as earlier, with some minor bugfixes and additions.

You can download it from here.

If you want to get the most out of snapper, read this article here (and make sure you look inside the script!)

Note: Big thanks to Marcus Mönnig who who wrote the additional 9i support code for Snapper first and Jamey Johnston for his additions (and myself for some final polishing fixes ;-)

Note2: I've agreed with Marcus that he could add Snapper into his free performance tool distribution in unchanged form, check out his Mumbai tool which could be useful for Oracle performance monitoring...

Oracle Session Snapper v3.10

Hi all, long time no see!  =8-)

Now as I’m done with the awesome Hotsos Symposium (and the training day which I delivered) and have got some rest, I’ll start publishing some of the cool things I’ve been working on over the past half a year or so.

The first is Oracle Session Snapper version 3!

There are some major improvements in Snapper 3, like ASH style session activity sampling!

When you troubleshoot a session’s performance (or instance performance) then the main things you want to know first are very very simple:

  1. Which SQL statements are being executed
  2. What are they doing, are they working on CPU or waiting.
  3. If waiting, then for what

Often this is enough for troubleshooting what’s wrong. For example, if a session is waiting for a lock, then wait interface will show you that. If a single SQL statement is taking 99% of total response time, the V$SESSION (ASH style) samples will point out the problem SQL and so on. Simple stuff.

However there are cases where you need to go beyond wait interface and use V$SESSTAT (and other) counters and even take a “screwdriver” and open Oracle up from outside by stack tracing :-)

When I wrote the first version of Snapper for my own use some 4-5 years ago I wrote it mainly having the “beyond wait interface” part in mind. So I focused on V$SESSTAT and various other counters and left the basic troubleshooting to other tools. I used to manually sample V$SESSION/V$SESSION_WAIT a few times in a row to get a rough overview of what a session was doing or some other special-purpose scripts.

However after Snapper got more popular and I started getting some feedback about it I saw the need for covering more with Snapper, not just the “beyond wait interface” part, but also the “wait interface” and “which SQL” part too.

So, now I’m presenting Snapper v3 which does all the 3 points above using ASH style V$SESSION sampling and it still has the first step to “beyond wait interface” part in it, which is very useful for advanced performance troubleshooting and diagnosis – I’m talking about the V$SESSTAT counters above.

I’ve made some syntax changes in Snapper too and right now the v3 doesn’t work on Oracle 9.2 (it will work some day :)

To give you an idea of the new ASH style sampling capabilities, heres some example output:

SQL> @snapper ash=sid+event+wait_class,ash1=plsql_object_id+plsql_subprogram_id+sql_id,ash2=program+module+action 5 1 all
Sampling...
-- Session Snapper v3.10 by Tanel Poder @ E2SN ( http://tech.e2sn.com ) --------------------------------------------------------------
Active% |    SID | EVENT                     | WAIT_CLASS
-------------------------------------------------------------- 100% |    133 | db file scattered read    | User I/O 5% |    165 | control file parallel wri | System I/O 2% |    162 | ON CPU                    | ON CPU 2% |    167 | db file parallel write    | System I/O 2% |    166 | log file parallel write   | System I/O
--------------------------------------------------- Active% | PLSQL_OBJE | PLSQL_SUBP | SQL_ID --------------------------------------------------- 77% |            |            | a5xyjp9gt796s 23% |            |            | 4g4u44bk830ms 12% |            |            |
------------------------------------------------------------------------------------------- Active% | PROGRAM                   | MODULE                    | ACTION ------------------------------------------------------------------------------------------- 100% | sqlplus@mac01 (TNS V1-V3) | sqlplus@mac01 (TNS V1-V3) | 5% | oracle@solaris02 (CKPT)   |                           | 2% | oracle@solaris02 (DBW0)   |                           | 2% | oracle@solaris02 (CJQ0)   |                           | 2% | oracle@solaris02 (LGWR)   |                           | --  End of ASH snap 1, end=2010-03-22 17:35:06, seconds=5, samples_taken=43

You can read some usage examples and download it here:

P.S. People who attended Hotsos Symposium Training Day where I demoed the initial version of Snapper v3 – download the new version from above link (v3.10), it’s much more flexible than the one I demoed couple of weeks ago!


Beyond Oracle Wait Interface – Part 2

This is the second part of the joint blog “project” with James Morle, called “The Wait Interface Is Useless (Sometimes)”.

We already did a joint presentation on this topic at UKOUG and more conferences will follow :) Read the first part by James here for intro.

So, where do we go when Oracle’s wait interface doesn’t help us? We will show multiple techniques over time, but here’s where I normally continue when wait interface is “useless”.

I use V$SESSTAT.

Oh, were you expecting something more “advanced” instead of boring old V$SESSTAT’s performance counters which has been available in Oracle for ages? ;-)

Well, there is a reason why the V$SESSTAT has been available in Oracle for ages, very likely even before Wait Interface was introduced (but I wouldn’t know for sure, I was still in elementary school back then or something :).

And the reason is that session level performance counters are VERY useful about finding out WHAT Oracle is doing. Every time you parse, a counter goes up by one in V$SESSTAT for the session. Every time you execute, a counter goes up. Every time you do a logical IO, a counter goes up. Every time you commit, a counter goes up. Every time an index block is split, a counter goes up. You get the point.

Oracle’s V$SESSTAT tells you WHAT Oracle is doing. Wait interface tells you how much TIME is spent waiting for something, but V$SESSTAT counters just tell you how many times some operation was done. You can not conclude how much time was spent by looking just at the number of times something has happened (as Cary will tell you) but nevertheless, the V$SESSTAT counters definitely give you a good clue into WHAT THE [FU|HE]CK is an Oracle session doing – especially when wait interface says you’re not waiting for anything and SQL trace doesn’t print a line.

How many different operations are counted for each session in Oracle? Let’s check (Oracle 11.2):


SQL> select count(*) from v$sesstat where sid = 14;

COUNT(*)
----------
611

Oracle 11.2 keeps track of the counts of 611 different operations, for each session!

That’s VERY valuable source of information for understanding what Oracle is doing and its relatively easy to use.

And that’s exactly the reason why I wrote my Snapper script – I wanted this information to be VERY EASY to use!

And here’s an example – session 14 is stuck, doesn’t respond, user complains. Let’s check the wait interface:

If I want to know what a session is doing, I sample V$SESSION_WAIT first, with my sw.sql script (or I could just query ASH which gives me the same data with history):

SQL> @sw 14

    SID STATE   EVENT                                          SEQ# SEC_IN_WAIT P1                 P2                 P3                 P1TRANSL
------- ------- ---------------------------------------- ---------- ----------- ------------------ ------------------ ------------------ ------------------------------------------
     14 WORKING On CPU / runqueue                              3486         130

1 row selected.

SQL>
SQL> @sw 14

    SID STATE   EVENT                                          SEQ# SEC_IN_WAIT P1                 P2                 P3                 P1TRANSL
------- ------- ---------------------------------------- ---------- ----------- ------------------ ------------------ ------------------ ------------------------------------------
     14 WORKING On CPU / runqueue                              3486         137

1 row selected.

SQL>
SQL>
SQL> @sw 14

    SID STATE   EVENT                                          SEQ# SEC_IN_WAIT P1                 P2                 P3                 P1TRANSL
------- ------- ---------------------------------------- ---------- ----------- ------------------ ------------------ ------------------ ------------------------------------------
     14 WORKING On CPU / runqueue                              3486         139

1 row selected.

SQL>

From here we see 2 things:

  1. 3 samples out of 3 the session was ON CPU (not waiting). 3 samples is not too good for precise statistical sampling but there’s something else I spot in the output
  2. SEC_IN_WAIT (read: seconds in current state) is over 130. This means that this session has been in the current state (ON CPU) for over 130 seconds in row, without waiting for anything in between (the moment the wait state changes, the SEC_IN_WAIT goes back to 0).

So, it’s pretty evident that this session is just burning CPU and wait interface is useless here (as we are not waiting for anything, as far as Oracle sees it of course). Alternatively I could just run top or prstat and check whether the process is 100% on CPU or not.

So, before going on to Snapper, lets look into what kind of SQL this session executes right now (we could sample this also multiple times, to check whether we are constantly running the same statement):

SQL> <strong>select * from table(select dbms_xplan.display_cursor(sql_id,sql_child_number) from v$session where sid = 14);</strong>

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  5vy5qjd3fsn5c, child number 0
-------------------------------------
SELECT     MIN(t1.created), MAX(t1.created) FROM     t1   , t2   , t3
WHERE     t1.object_id = t2.object_id AND t2.object_id = t3.object_id
AND t1.owner = :v AND t2.owner = :v AND t3.owner = :v

Plan hash value: 3271631391

----------------------------------------------------------------------------------------
| Id  | Operation                       | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                |      |       |       |     4 (100)|          |
|   1 |  SORT AGGREGATE                 |      |     1 |    66 |            |          |
|*  2 |   HASH JOIN                     |      |     7 |   462 |     4  (25)| 00:00:01 |
|   3 |    NESTED LOOPS                 |      |       |       |            |          |
|   4 |     NESTED LOOPS                |      |     7 |   329 |     2   (0)| 00:00:01 |
|   5 |      TABLE ACCESS BY INDEX ROWID| T1   |     7 |   196 |     1   (0)| 00:00:01 |
|*  6 |       INDEX RANGE SCAN          | I1   |     7 |       |     1   (0)| 00:00:01 |
|*  7 |      INDEX RANGE SCAN           | I2   |    27 |       |     1   (0)| 00:00:01 |
|*  8 |     TABLE ACCESS BY INDEX ROWID | T2   |     1 |    19 |     1   (0)| 00:00:01 |
|   9 |    TABLE ACCESS BY INDEX ROWID  | T3   |    40 |   760 |     1   (0)| 00:00:01 |
|* 10 |     INDEX RANGE SCAN            | I3   |    40 |       |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("T2"."OBJECT_ID"="T3"."OBJECT_ID")
6 - access("T1"."OWNER"=:V)
7 - access("T2"."OWNER"=:V)
8 - filter("T1"."OBJECT_ID"="T2"."OBJECT_ID")
10 - access("T3"."OWNER"=:V)

33 rows selected.

SQL>

Hmm… can anyone reliably tell why this SQL statement is slow and burns all the CPU time?

The answer is no. An execution PLAN is a plan of actions which would be executed when someone runs the execution plan. It doesn’t tell you anything about what’s exactly going on right now, it doesn’t tell you what exactly is using all the CPU time. Anything we would say at this point, would be a guess! Lets use V$SESSTAT instead, for gathering more hard evidence!

(The syntax of Snapper is documented inside the script or just search for snapper in my blog, plenty of examples :)

SQL> <strong>@snapper out 5 1 14
</strong>
-- Session Snapper v2.02 by Tanel Poder ( http://www.tanelpoder.com )

--
----------------------------------------------------------------------------------------------------------------------
  SID, USERNAME  , TYPE, STATISTIC                               ,         DELTA, HDELTA/SEC,    %TIME, GRAPH
----------------------------------------------------------------------------------------------------------------------
   14, SYS       , STAT, session logical reads                   ,         81351,     16.27k,
   14, SYS       , STAT, consistent gets                         ,         81351,     16.27k,
   14, SYS       , STAT, consistent gets from cache              ,         81351,     16.27k,
   14, SYS       , STAT, consistent gets from cache (fastpath)   ,         81352,     16.27k,
   14, SYS       , STAT, no work - consistent read gets          ,         81381,     16.28k,
   14, SYS       , STAT, table fetch by rowid                    ,       2904784,    580.96k,
   14, SYS       , STAT, index scans kdiixs1                     ,            93,       18.6,
   14, SYS       , STAT, buffer is pinned count                  ,       5736560,      1.15M,
   14, SYS       , STAT, buffer is not pinned count              ,         75248,     15.05k,
   14, SYS       , STAT, no buffer to keep pinned count          ,             1,         .2,
   14, SYS       , TIME, DB CPU                                  ,       6050000,      1.21s,   121.0%, |@@@@@@@@@@|
   14, SYS       , TIME, sql execute elapsed time                ,       6059922,      1.21s,   121.2%, |@@@@@@@@@@|
   14, SYS       , TIME, DB time                                 ,       6059922,      1.21s,   121.2%, |@@@@@@@@@@|
--  End of snap 1, end=2010-01-15 17:33:22, seconds=5

PL/SQL procedure successfully completed.

As our previous sw.sql output showed  – we are not waiting for anything. If we were, then snapper would show you WAIT lines from wait interface (V$SESSION_EVENT) as well. But we don’t see any waits.

So, now its the time to go through all the V$SESSTAT stats reported in Snapper! These are the lines with STAT in them (TIME lines are V$SESS_TIME_MODEL breakdown available since 10g, but they are not detailed enough for diagnosing complex problems).

One thing we see, there is a statistic session logical reads and from the last column of Snapper output we see that this session did over 16000 logical reads (buffer gets) per second during snapper run time (of 5 seconds). That’s already a good indication of why we burn so much CPU time – we are doing lots of logical IOs.

Also, we see that we did 18.6 index range scans per second (the statistic index scans kdiixs1 shows that). That doesn’t seem “much” does it. However, check the value for buffer is pinned count statistic! This counter is updated every time we go to a buffer to get some data from it AND it already happens to be open by my session! Oracle keeps buffers pinned  and relevant buffer handles cached during a database call in some cases (like nested loop joins and index scans) to avoid reopening the buffer again (getting a buffer again if its closed would mean another logical IO).

Nevertheless, we can see that we re-visited some buffers again and again and again – over a million times per second! When checking my index structures I see that every index has only a few hundred blocks, so having millions and millions of buffer visits (buffer is pinned count + session logical reads) means we are heavily re-visiting the same blocks again and again and again! This points directly to NESTED LOOPS (and also INDEX RANGE SCANS) in the execution plan. Nested loops, as the name says loops through inner rowsource and revisits some of its data as many times as the number of rows coming from the outer rowsource (with few special cases, as usual).

And there’s one more statistic in Snapper output which I do NOT see – execute count. This statistic shows how many times the session executed cursors (executed new ones or re-executed the same one). NB! Snapper only reports these V$SESSTAT statistics which changed during the snapshot – as execute count is not reported, it means that the execute count did not increase during the snapshot, thus the session still executed the same statement! (so for example we were not hard parsing and lots of different statements to “justify” this CPU usage, Snapper would have shown parse count (hard) going up if this session had done so).

So, using the above diagnosis, this is the place where I would take a serious look whether the NESTED LOOPS in the above execution plan are a right thing to do. I would check how many rows would actually match the bind variable values (more about that later) in the SQL statement predicates – NESTED LOOPS is not a good operation for joining large number of rows. More about SQL tuning very soon ;-)

To finish this blog entry, this is the sequence of troubleshooting which I usually use (if a user complains or “something” is slow):

  1. Identify the session(s) servicing the user/task with the problem
  2. Check wait interface for these sessions (sw.sql, ASH query) – this gives quick overview whether the session is 100% stuck waiting for something, 100% busy working and burning CPU or somewhere in between
  3. Run Snapper on the sessions to (very easily) see which V$SESSTAT counters have gone up (some counters such memory usage and open cursors current can go down too)
  4. If all this fails or doesn’t give enough evidence of the problem for whatever reason, then it’s time to take a screwdriver and open up Oracle from outside – using stack traces, truss, DTrace etc. We will be blogging about this ;-)

Note that Snapper is just an anonymous PL/SQL block and it doesn’t require any changes to the database!

Ok, back to James now!

Finding the reasons for excessive logical IOs

There’s another interesting thread going on in Oracle-L, about understanding logical IOs and drilling down into their reasons. Of course sometimes (or rather usually) the excessive logical IOs come from a bad execution plan (when a nested loop loops over lots of datablocks again and again or a wrong index is used for driving a query etc), but sometimes the excessive LIOs are caused by some internal issues, like space management etc.

A convenient tool I use for reporting logical IO reasons is (again) my Snapper! It has  an option “b” for reporting Buffer get reasons or as I use below – option “a” shows All information Snapper can show.

There are couple of gotchas though which make this approach imperfect:

  1. The X$ tables Snapper uses for LIO reason reporting contain instance-wide counters, not specific to a single testing session. Thus you either need to be the single user in your database when experimenting and even then the background activity may increment some counters while you are testing too. I have sometimes suspended all other processes (kill -STOP and kill -CONT to resume)  or used Flash Freeze (oradebug ffbegin and ffresumeinst) to hang the whole instance that there would be no other activity going on.
  2. These buffer get reason counters are not maintained properly in Oracle 11g, probably due an optimization effort and some changes for faster pinning of buffer cache buffers (there’s a parameter called _fastpin_enable which is set to 1 in 11g and it enables so called fastpath buffer gets. If you see v$sesstat statistics such “consistent gets from cache (fastpath) or “db block gets from cache (fastpath)” being inremented, then fastpath buffer gets/pins are used. Note that I do have a script which works also on 11g but I’ll write about that one some time in the future :)

Anyway, if you are testing in an environment exclusively used by you, on Oracle 10.2 or lower, then you can run snapper with the gather=a option to report a bunch instance-level statistics in addition to the standard session-level stats:

  1. BUFG – Buffer get reasons (both consistent and current mode gets)
  2. LATG – Latch gets (both willing to wait and immediate gets)
  3. ENQG – Enqueue gets

Here’s an example, prepare for long output:

[Read more...]

Detect chained and migrated rows in Oracle – Part 1

I received a question about migrated rows recently.

It was about how to detect migrated rows in a 200TB data warehouse, with huge tables – as the ANALYZE TABLE xyz LIST CHAINED ROWS INTO command can not be automatically parallelized at table level (as DBMS_STATS can be, but oh, DBMS_STATS doesn’t gather the migrated/chained row info). Therefore the analyze command would pretty much run forever before returning (and committing) the chained row info in the output table. Also as there are regular maintenance jobs running on these tables (I suspect partition maintentance for example), then it wouldn’t be nice to keep running ANALYZE on the whole table constantly.

So, is there any faster or better way for finding the amount of migrated rows?

Ihave two answers to this.

Answer 1:

As we are dealing with a huge 200+ TB data warehouse its tables/indexes are most likely partitioned. Thus you could use the ANALYZE TABLE xyz PARTITION (abc) LIST CHAINED ROWS command to analyze individual partitions, even in parallel (sqlplus sessions) if you like. This would allow you to focus only on the partitions of interest (the latest ones, with the heaviest activity perhaps).

SQL> create table CHAINED_ROWS (
2    owner_name         varchar2(30),
3    table_name         varchar2(30),
4    cluster_name       varchar2(30),
5    partition_name     varchar2(30),
6    subpartition_name  varchar2(30),
7    head_rowid         rowid,     -- actual chained row's head piece address in the segment
8    analyze_timestamp  date
9  );
Table created.
SQL>
SQL> analyze table tmp partition (sys_p501) list chained rows; -- the default table name used for output is "CHAINED_ROWS"
Table analyzed.
SQL> analyze table tmp partition (sys_p502) list chained rows;
Table analyzed.
SQL> select partition_name, count(*) from chained_rows group by partition_name;
PARTITION_NAME                   COUNT(*)
------------------------------ ----------
SYS_P502                              252
SYS_P501                             5602

SQL>

So, from above you see its possible to find out partition (or even sub-partition level row chaining).

However this above command lists you both CHAINED rows and MIGRATED rows (even though Oracle calls them all chained rows internally, as the chaining mechanism is the same for both cases).

Chained row is a row which is too large to fit into a block, so will always have to be split between multiple different blocks – with an exception of intra-block chaining which is used for rows with more than 255 columns. Migrated row on the other hand is a row which has been updated larger than it initially was – and if as a result it doesn’t fit into its original block, the row itself is moved to a new block, but the header (kind of a stub pointer) of the row remains in original location. This is needed so that any indexes on the table would still be able to find that row using original ROWIDs stored in them). If Oracle didn’t leave the row head piece in place then it would always go and update all indexes which have the ROWID of the migrating row in them.

Why should we care whether a row is a real chained row or just a migrated row?

It’s because if the row is chained, then any reorgs would not help you – if a row is too big to fit into a block, its too big to fit into a block no matter how many times you move around the table. (Note that if you have large tables full of rows longer than 8KB there’s likely something wrong with your design).

But migrated rows on the other hand are “chained” into another block due some update which made them not fit into existing block anymore. This happens when PCTFREE is set too low compared to real row growth factor and sometimes you may want to fix it by reorganizing the table/partition with ALTER TABLE/PARTITION MOVE or by backing the rows up, deleting them and reinserting them back to the table (that one makes sense when only a small amount of rows in a table are migrated).

If you are completely sure that you don’t have any rows longer than the free space in an empty block (thus all individual rows would fit into a block and would need to be split among multiple blocks) then you can conclude that all the rows reported were migrated due their growth.

Another option would be to query out all or a sample of these chained/migrated rows and actually measure how long they are if all columns are put together. This could be done using vsize() function (or also dump() and lengthb() in some cases). Of course the column and row header overhead would need to be accounted in as well.

So, this already gets pretty complex and there are more tiny details which we should take into account… thus I will introduce another way to look into the row migration/chaining thing:

Answer 2: (Alternatively called “should we care?”)

[Read more...]

Oracle hidden costs revealed, Part2 – Using DTrace to find why writes in SYSTEM tablespace are slower than in others

I have written two posts in one, about a performance issue with writes in system tablespace and introduction of a little DTrace stack sampling script.

Have you noticed that DML on tables residing in SYSTEM tablespace is slower than tables in other tablespaces?

Here’s an example, I’ll create two similar tables, one in USERS tablespace, other in SYSTEM, and inset into the first one (Oracle 10.2.0.3 on Solaris x64):

SQL> create table t1(a int) tablespace USERS;

Table created.

SQL> create table t2(a int) tablespace SYSTEM;

Table created.

SQL> exec for i in 1..100000 loop insert into t1 values(i); end loop;

PL/SQL procedure successfully completed.

Elapsed: 00:00:03.09

Insert into table in USERS tablespace took 3 seconds.

Ok, let’s commit and flush dirty buffers that they wouldn’t get on the way of next insert.

SQL> commit;

Commit complete.

Elapsed: 00:00:00.02

SQL> alter system checkpoint; -- checkpointing to flush dirty buffers from previous inserts

System altered.

Elapsed: 00:00:01.34
SQL>

And now to the insert into the SYSTEM tablespace table:

SQL> exec for i in 1..100000 loop insert into t2 values(i); end loop;

PL/SQL procedure successfully completed.

Elapsed: 00:00:08.98
SQL>

What?! The same insert took 3 times longer, almost 9 seconds?

Fine! Let’s troubleshoot it!

[Read more...]