Oracle Troubleshooting with Snapper – detecting who’s causing excessive redo generation

My friend asked today a question that how to identify why his Oracle 9.2 database has suddenly started generating loads more redo than usual.

So obviously I recommended him Snapper as first thing, it’s perfect for ad-hoc analysis like that! ( I know I sound biased but if you haven’t used Snapper yet, then now is the time! :)

So, I asked him to run Snapper on all sessions of the instance with 10 second interval and find the session with highest “redo size” delta figure.
However as a normal production database has loads of sessions in it and as each session can return tens of statistic rows, then you might need to do some post-processing of output data to get only the “redo size” numbers.

So I figured that why not do the filtering in Snapper itself, so I made the small additions and released Snapper v1.07.

Usage is simple:

SQL> @snapper out,gather=s,sinclude=%redo_size% 10 1 "select sid from v$session"

-- Session Snapper v1.07 by Tanel Poder ( http://www.tanelpoder.com )

---------------------------------------------------------------------------------------------------------------------------------------------
HEAD,     SID, SNAPSHOT START   ,  SECONDS, TYPE, STATISTIC                               ,         DELTA,  DELTA/SEC,     HDELTA, HDELTA/SEC
---------------------------------------------------------------------------------------------------------------------------------------------
DATA,     129, 20080530 09:07:20,       10, STAT, redo size                               ,     180595100,   18059510,     180.6M,     18.06M
DATA,     166, 20080530 09:07:20,       10, STAT, redo size                               ,        790912,      79091,    790.91k,     79.09k
DATA,     167, 20080530 09:07:20,       10, STAT, redo size                               ,        889796,      88980,     889.8k,     88.98k
--  End of snap 1

PL/SQL procedure successfully completed.

And in the result you already see that there is a single session which generates significantly more redo than others (btw this is an artificially generated example). In my friend’s case there also happened to be a single session , scheduled through DBMS_JOB, contributing to most of the redo generation. Once the troublemaker was identified, the problem could be fixed.

There are few things to note:

  1. The filtering is applied on V$SESSTAT stats only, as it doesn’t make much sense filtering out timed events based just on their name (and not the actual amount of time they contributed to your response time).
  2. As the troubleshooting goal was to identify the top redo generator, I wasn’t interested in Waits and DB Time, so I used gather=s option, which gathers only V$SESSTAT stuff.
  3. There is a new “sinclude” keyword, which specifies the filter condition for V$SESSTAT stats and “linclude” for filtering V$LATCH stats (if latch stat calculation is enabled using gather=l). If you look into Snapper source code, you see that these filter conditions are just used in a LIKE predicate and if on 10g+ then there’s a REGEXP_LIKE used additionally, therefore you can specify more complex filtering expressions.
  4. Note how instead of a single SID I pass “select sid from v$session” to measure all sessions in the instance. You can write any SQL for specifying SIDs to analyze, as long as that SQL returns a list of numbers, fits on one line and is put between doublequotes.

Another example, utilizing simple form of 10g+ regular expressions, where you can specify multiple different patterns for filtering:

SQL> @snapper "out,gather=s,sinclude=redo size|commit|rollback" 10 1 "select sid from v$session"

-- Session Snapper v1.07 by Tanel Poder ( http://www.tanelpoder.com )

---------------------------------------------------------------------------------------------------------------------------------------------
HEAD,     SID, SNAPSHOT START   ,  SECONDS, TYPE, STATISTIC                               ,         DELTA,  DELTA/SEC,     HDELTA, HDELTA/SEC
---------------------------------------------------------------------------------------------------------------------------------------------
DATA,     123, 20080530 21:43:07,       10, STAT, user rollbacks                          ,             1,          0,          1,         .1
DATA,     123, 20080530 21:43:07,       10, STAT, commit cleanouts                        ,            14,          1,         14,        1.4
DATA,     123, 20080530 21:43:07,       10, STAT, commit cleanouts successfully completed ,            14,          1,         14,        1.4
DATA,     123, 20080530 21:43:07,       10, STAT, redo size                               ,      33351344,    3335134,     33.35M,      3.34M
DATA,     123, 20080530 21:43:07,       10, STAT, rollback changes - undo records applied ,          9786,        979,      9.79k,      978.6
DATA,     123, 20080530 21:43:07,       10, STAT, transaction rollbacks                   ,             1,          0,          1,         .1
DATA,     123, 20080530 21:43:07,       10, STAT, commit batch/immediate requested        ,             1,          0,          1,         .1
DATA,     123, 20080530 21:43:07,       10, STAT, commit immediate requested              ,             1,          0,          1,         .1
DATA,     123, 20080530 21:43:07,       10, STAT, commit batch/immediate performed        ,             1,          0,          1,         .1
DATA,     123, 20080530 21:43:07,       10, STAT, commit immediate performed              ,             1,          0,          1,         .1
DATA,     161, 20080530 21:43:07,       10, STAT, redo size                               ,            68,          7,         68,        6.8
--  End of snap 1

PL/SQL procedure successfully completed.

Yet another example where I display few logical IO stats and cache buffers chains latches stats together:

SQL> @snapper out,gather=sl,sinclude=gets,linclude=cache 5 1 123

-- Session Snapper v1.07 by Tanel Poder ( http://www.tanelpoder.com )

---------------------------------------------------------------------------------------------------------------------------------------------
HEAD,     SID, SNAPSHOT START   ,  SECONDS, TYPE, STATISTIC                               ,         DELTA,  DELTA/SEC,     HDELTA, HDELTA/SEC
---------------------------------------------------------------------------------------------------------------------------------------------
DATA,     123, 20080530 21:37:33,        5,  LAT, cache buffers lru chain                 ,             1,          0,          1,         .2
DATA,     123, 20080530 21:37:33,        5,  LAT, cache buffers chains                    ,        488174,      97635,    488.17k,     97.63k
DATA,     123, 20080530 21:37:33,        5,  LAT, logical standby cache                   ,             1,          0,          1,         .2
DATA,     123, 20080530 21:37:33,        5,  LAT, change notification client cache latch  ,             1,          0,          1,         .2
DATA,     123, 20080530 21:37:33,        5,  LAT, row cache objects                       ,            19,          4,         19,        3.8
DATA,     123, 20080530 21:37:33,        5, STAT, consistent gets                         ,        487586,      97517,    487.59k,     97.52k
DATA,     123, 20080530 21:37:33,        5, STAT, consistent gets from cache              ,        487586,      97517,    487.59k,     97.52k
DATA,     123, 20080530 21:37:33,        5, STAT, consistent gets from cache (fastpath)   ,        487588,      97518,    487.59k,     97.52k
DATA,     123, 20080530 21:37:33,        5, STAT, no work - consistent read gets          ,        484876,      96975,    484.88k,     96.98k
--  End of snap 1

PL/SQL procedure successfully completed.

So this filtering can be used for easily identifying stuff like highest redo or LIO generator, the heaviest parser… or when used with “%sql*net%” filter, you could identify who’s using the network bandwidth the most:

SQL> @snapper "out,gather=s,sinclude=%sql*net%" 5 1 123

-- Session Snapper v1.07 by Tanel Poder ( http://www.tanelpoder.com )

---------------------------------------------------------------------------------------------------------------------------------------------
HEAD,     SID, SNAPSHOT START   ,  SECONDS, TYPE, STATISTIC                               ,         DELTA,  DELTA/SEC,     HDELTA, HDELTA/SEC
---------------------------------------------------------------------------------------------------------------------------------------------
DATA,     123, 20080530 22:07:47,        5, STAT, bytes sent via SQL*Net to client        ,       1509441,     301888,      1.51M,    301.89k
DATA,     123, 20080530 22:07:47,        5, STAT, bytes received via SQL*Net from client  ,           374,         75,        374,       74.8
DATA,     123, 20080530 22:07:47,        5, STAT, SQL*Net roundtrips to/from client       ,            34,          7,         34,        6.8
--  End of snap 1

PL/SQL procedure successfully completed.

I hope this helps – and happy Snapping! :)

This entry was posted in Oracle and tagged , . Bookmark the permalink.

14 Responses to Oracle Troubleshooting with Snapper – detecting who’s causing excessive redo generation

  1. Karen Morton says:

    Nice additions to an already excellent tool.

    By the way, I was reading through some of your older posts for the first time and came across your post on running SQL scripts via HTTP. I can’t believe I didn’t know this. It amazes me how many most helpful “little things” there are that miss hitting my radar.

    So I wanted to say thanks and I also gave you a big thanks on my blog. I’ll make sure I keep up on your posts from now on so I don’t miss any other little gems. Thanks!

  2. Chen Shapira says:

    Amazing!
    Elegant solution to a problem we have all the time.

    Can you believe that until now we used logminer to find the redo generating sessions?

  3. tanelp says:

    Karen, thanks! :)

    Chen, yeah I believe! Logminer used to be in top of my toolset when diagnosing such issues – until I realized how much easier these things can be done! And it’s even nothing advanced, just sampling v$sesstat.. Makes me wonder, why the heck didn’t I use such simple things already 10 years ago (but focused on “advanced” undocumented parameters and events instead :)

    Thanks for the feedback! :)

  4. Chen Shapira says:

    I wondered why you don’t order the result by the delta, so I checked out the code…

    Nice manual join :)

  5. tanelp says:

    Yep, the main design goal of Snapper was that it should not require creating any database objects, or require any changes as matter of fact.

    So, if I wanted to order in a single anonymous PL/SQL block (without having to create a TYPE object) then I’d need to do it manually with some procedural sort algorithm.

    Initially it was in my TODO list but by now I’ve understood it’s not that much needed, as:

    1) ordering v$sesstat stats doesnt make sense, as every statistic represents a completely different thing (you can’t compare redo size with parse count for example)

    2) the number of different wait / time stats shown per session are is normally low enough for doing quick visual comparison

    3) that’s why snapper has such fixed output format – that you could easily paste it to Excel for post processing and further analysis :)

    But yeah – in snapper code I have implemented a manual nested loop join, with full table scans on both “tables”. So it’s not efficient at all but serves the purpose well and was simplest to implement!

  6. Chen Shapira says:

    The main design goal is the reason you are getting comments on the weekend :)

    Since you don’t change anything, I feel reasonably comfortable running the new version on a production system on a weekend. I know that in worse case, I kill the session and everything is gone.

    But it worked fine. Thanks again.

  7. CJ says:

    Hi Tanel,

    First of all, congrats and thanks for a brilliant weblog and website.

    The snapper tool is quite good, but I am finding it a bit ‘different’. I am used to using runstats, so it might just be a case of getting familiarized.

    I open 2 SQL windows. I take snap session A from session B, and the run the problematic piece of code in session A. Once the code finishes, I take another snap in session A and print the deltas. This is normally good enough for me, and only in rare cases would I require more frequent snaps.

    Right I don’t think now you can get snapper to function in the above-mentioned way, but correct me if I am wrong.

  8. CJ says:

    Hi Tanel,

    Probably my earlier post didn’t make sense. I will try to simplify the question I was asking:

    Is it possible to use the snapper tool such that you generate a snapshot before starting the process, take another snapshot when the process ends and get the report on the changed stats?

  9. tanelp says:

    Snapper doesn’t have this functionality for two reasons:

    1) i haven’t figured out an easy way to persist the snapshot information *without having to create any stored objects in db*

    2) my Sesspack tool has this functionality and more..

  10. Dakshin says:

    Hi Tanel,

    Is there a way to know why my DB was generating 3 times the ‘normal’ redo 3 days back? Want to diagnose so that I can prevent, if possible, such occurrences in the future. Thanks.

  11. Tanel Poder says:

    Yes, you need to access the old archivelogs from past with LogMiner. Read the documentation about DBMS_LOGMNR and V$LOGMNR_CONTENTS

  12. Dakshin says:

    Thanks. But how do I know which sessions / SQLs were causing more redo? Is it that I go through the whole day’s transactions by txn id and try to gauge their impact ? (Mainly look at DML statements)

  13. Tanel Poder says:

    Well if you describe V$LOGMNR_CONTENTS you see there’s a session_id column in there, this gives you session info. And with something like ASH or manual v$session history you could see which dml statements these sessions did at the time. There is no direct way though to link redo generation to a SQL statement as Oracle doesn’t measure redo generation by SQL statement.

  14. Anjul says:

    Excellent Tool – It surprises me everyday :-)

    Thanks!

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>