Oracle Session Snapper – real-time session-level performance stats for DBAs

A post by Jonathan Lewis inspired me to finally complete my version of the Oracle session performance snapper script, which main characteristics are

  1. it reports Oracle session level performance counter and wait information in real time
  2. it does NOT require any database objects to be created

If you are a DBA or consultant working on ad-hoc performance issues, you will like it!

Are you familiar with following situation?

(Monday morning)

Customer: Hey, we need your help! We have serious performance issues in our production environment. It’s a database with x000 online users, several parallel data feeds, continuous batch jobs and reporting going on.
Users have started experiencing occasional bad performance and some batch jobs as well.

You: Ok, lets see it. Do you have any performance monitoring tools installed?

Customer: Yes, we have Xxxxx Xxxxxxxxx installed which shows us a nice green or red light on big screen depending whether there are any problems or not.

You: What color is it showing now?

Customer: Green

You: But you still have the performance problem you described earlier?

Customer: Yes

You: Do you have ANY performance monitoring tools installed?

Customer: Well, we also have a statspack snapshot taken every morning and evening to capture the business workload.

You: Um… Ok… you know forget about it. Can I create a small package to capture some useful performance info on the problematic sessions?

Customer: Yes, but we need to put the DDL scripts through the change review board which gathers every Thursday… but we can’t wait that long!

You: Can we enable tracing?

Customer: Enabling tracing is a change and all changes must go through review board. Also, we don’t really know which exact sessions are affected, it just happens for seemingly random ones… and we can’t just trace every session, can we?

You: Ok, give me a sqlplus window and Excel, we’ll figure something out.

…And now follows a tedious manual SQL execution and copy & paste exercise from various V$ views for getting some meaningful performance information out of Oracle.

Well, not anymore, because The Oracle Session Snapper is in town!

If you know vmstat for Unix, you know it reports you various system level statistic counter deltas over a period you choose.

Well, the Oracle Session Snapper output looks somewhat similar, but it reports you session level deltas of Oracle v$sesstat counters, wait events and starting from 10g also session time model statistics.

All info can be reported in real time, without a need for running and timing multiple SQL scripts and manual calculation of deltas.

And the key unique point of the Session Snapper is – it does not require creation of any database objects, thus no changes in the database at all! Everything is done from within a sqlplus script or anonymous PL/SQL block.
This means that you will be able to get quick session-level performance snapshots even in heavily change-controlled environments, where no object creation whatsoever is allowed without going through a long process.

You can read all the usage details from the script header, but here’s one example of its output:

Tanel@Sol01> @snapper out 1 3 475

-- Session Snapper v1.03 by Tanel Poder ( http://www.tanelpoder.com )

--------------------------------------------------------------------------------------------------------------------------------------------
--        SID, SNAPSHOT START   , SECONDS  , TYPE, STATISTIC                               ,         DELTA,      D/SEC,     HDELTA,   HD/SEC
--------------------------------------------------------------------------------------------------------------------------------------------
DATA,     475, 20070820 01:17:47,         1, STAT, session logical reads                   ,         88232,      88232,     88.23k    88.23k
DATA,     475, 20070820 01:17:47,         1, STAT, consistent gets                         ,         88233,      88233,     88.23k    88.23k
DATA,     475, 20070820 01:17:47,         1, STAT, consistent gets from cache              ,         88232,      88232,     88.23k    88.23k
DATA,     475, 20070820 01:17:47,         1, STAT, calls to get snapshot scn: kcmgss       ,           556,        556,        556       556
DATA,     475, 20070820 01:17:47,         1, STAT, no work - consistent read gets          ,         87677,      87677,     87.68k    87.68k
DATA,     475, 20070820 01:17:47,         1, STAT, table scans (short tables)              ,           139,        139,        139       139
DATA,     475, 20070820 01:17:47,         1, STAT, table scan rows gotten                  ,       7429598,    7429598,      7.43M     7.43M
DATA,     475, 20070820 01:17:47,         1, STAT, table scan blocks gotten                ,         87676,      87676,     87.68k    87.68k
DATA,     475, 20070820 01:17:47,         1, STAT, buffer is pinned count                  ,           138,        138,        138       138
--  End of snap 1
DATA,     475, 20070820 01:17:48,         1, STAT, session logical reads                   ,         87779,      87779,     87.78k    87.78k
DATA,     475, 20070820 01:17:48,         1, STAT, consistent gets                         ,         87772,      87772,     87.77k    87.77k
DATA,     475, 20070820 01:17:48,         1, STAT, consistent gets from cache              ,         87772,      87772,     87.77k    87.77k
DATA,     475, 20070820 01:17:48,         1, STAT, calls to get snapshot scn: kcmgss       ,           552,        552,        552       552
DATA,     475, 20070820 01:17:48,         1, STAT, no work - consistent read gets          ,         87210,      87210,     87.21k    87.21k
DATA,     475, 20070820 01:17:48,         1, STAT, table scans (short tables)              ,           138,        138,        138       138
DATA,     475, 20070820 01:17:48,         1, STAT, table scan rows gotten                  ,       7389897,    7389897,      7.39M     7.39M
DATA,     475, 20070820 01:17:48,         1, STAT, table scan blocks gotten                ,         87211,      87211,     87.21k    87.21k
DATA,     475, 20070820 01:17:48,         1, STAT, buffer is pinned count                  ,           136,        136,        136       136
--  End of snap 2
DATA,     475, 20070820 01:17:49,         1, STAT, session logical reads                   ,         87580,      87580,     87.58k    87.58k
DATA,     475, 20070820 01:17:49,         1, STAT, consistent gets                         ,         87587,      87587,     87.59k    87.59k
DATA,     475, 20070820 01:17:49,         1, STAT, consistent gets from cache              ,         87587,      87587,     87.59k    87.59k
DATA,     475, 20070820 01:17:49,         1, STAT, calls to get snapshot scn: kcmgss       ,           552,        552,        552       552
DATA,     475, 20070820 01:17:49,         1, STAT, no work - consistent read gets          ,         87046,      87046,     87.05k    87.05k
DATA,     475, 20070820 01:17:49,         1, STAT, table scans (short tables)              ,           138,        138,        138       138
DATA,     475, 20070820 01:17:49,         1, STAT, table scan rows gotten                  ,       7375781,    7375781,      7.38M     7.38M
DATA,     475, 20070820 01:17:49,         1, STAT, table scan blocks gotten                ,         87041,      87041,     87.04k    87.04k
DATA,     475, 20070820 01:17:49,         1, STAT, buffer is pinned count                  ,           137,        137,        137       137
--  End of snap 3

PL/SQL procedure successfully completed.

The output contains 3 x 1 second snapshot of session 475 doing heavy nested looping. Note that even though the CPU time used was not updated, the logical IO counts for that session had still increased.
So this tool can be very valuable diagnosing what’s going on when the session seems to be 100% on CPU doing something.

And yes, as I mentioned earlier, this tool can also capture wait and session time model statistics, as seen below:

Tanel@Sol01> @snapper out,gather=wt 30 1 475

-- Session Snapper v1.03 by Tanel Poder ( http://www.tanelpoder.com )

--------------------------------------------------------------------------------------------------------------------------------------------
--        SID, SNAPSHOT START   , SECONDS  , TYPE, STATISTIC                               ,         DELTA,      D/SEC,     HDELTA,   HD/SEC
--------------------------------------------------------------------------------------------------------------------------------------------
DATA,     475, 20070820 01:24:26,        30, TIME, hard parse elapsed time                 ,          1041,         35,     1.04ms    34.7us
DATA,     475, 20070820 01:24:26,        30, TIME, parse time elapsed                      ,         15383,        513,    15.38ms  512.77us
DATA,     475, 20070820 01:24:26,        30, TIME, failed parse elapsed time               ,          1614,         54,     1.61ms    53.8us
DATA,     475, 20070820 01:24:26,        30, TIME, DB CPU                                  ,       3217049,     107235,      3.22s  107.23ms
DATA,     475, 20070820 01:24:26,        30, TIME, sql execute elapsed time                ,      18862882,     628763,     18.86s  628.76ms
DATA,     475, 20070820 01:24:26,        30, TIME, DB time                                 ,      18903962,     630132,      18.9s  630.13ms
DATA,     475, 20070820 01:24:26,        30, WAIT, db file sequential read                 ,        512218,      17074,   512.22ms   17.07ms
DATA,     475, 20070820 01:24:26,        30, WAIT, db file scattered read                  ,      17562414,     585414,     17.56s  585.41ms
DATA,     475, 20070820 01:24:26,        30, WAIT, SQL*Net message to client               ,           762,         25,      762us    25.4us
DATA,     475, 20070820 01:24:26,        30, WAIT, SQL*Net message from client             ,       9549187,     318306,      9.55s  318.31ms
DATA,     475, 20070820 01:24:26,        30, WAIT, SQL*Net break/reset to client           ,          1070,         36,     1.07ms   35.67us
--  End of snap 1

PL/SQL procedure successfully completed.

The output is in CSV and is deliberately formatted the way it could be loaded into a spreadsheet or fed into text manipulation utilities should further processing be required. You should have the sqlplus window at least 140 characters wide when outputting to sqlplus client.

With “trace” option you can also output to your server process tracefile, which allows you to get convenient real-time output using tail -f on the tracefile. However you need execute rights on dbms_system for trace output as the script is using dbms_output.ksdwrt() for writing to tracefile.

The first 5 columns of output should be self-explanatory, the last four columns mean:

DELTA – counter raw deltas between snapshots
D/SEC – counter raw deltas per second
HDELTA – human-readable delta (e.g. M instead of million and ms instead of thousand microseconds etc)
HD/SEC – human-readable deltas per second

The script doesn’t aim to be an example of good programming style (as it started out as just a ad-hoc hack for quickly diagnosing a performance problem).
Also there are quite a few things still in TODO list, but I think this script already fills its purpose – providing quick and simple session level performance snapshots, without any changes to database whatsoever.

So, if you’re still interested, feel free to download The Session Snapper (in form of a single SQL script) from here: http://www.tanelpoder.com/files/scripts/snapper.sql (Obviously you need to proofread it before using in any of your databases).

I have already used a predecessor of this tool in many places, so I hope you find the Snapper useful too. Feel free to feed me back any shortcomings or desired improvements so I could improve it further.

Update: Over time I have improved Snapper and optimized its output format. Here are the update and new features articles:

This entry was posted in Cool stuff, Oracle and tagged , , . Bookmark the permalink.

9 Responses to Oracle Session Snapper – real-time session-level performance stats for DBAs

  1. tanelp says:

    Note that there’s one bug in my script which causes it to not work on Oracle 9i. It will be fixed in version 1.04, to be released as soon as I have some spare time – in a day or two.

  2. Paul says:

    How cool is that?!? I will use it.
    I wrote some programs to store session snapshots in tables, they are very useful, getting more useful all the time, I think, but this quick snapshot thing will also be useful.

    The ascii art is good, you didn’t type it in, I hope. I might have to add some to my programs..

    -a : prints out the ascii art.

    That would be a good addition, I think. New programing standard, every program must emit ascii art on demand.

    All joking aside,

    In 10g, would it be a good idea to also maybe grab the last N minutes worth of v$active_session_history for the specified sessions? Sort of like the OEM last 10 wait events thing… even supply some p1/p2/p3 decodes.

    And how about a version that hits a specific instance via gv$? Sometimes I find more instances than there are sqlnet aliases… maybe the command line would include

    sid:inst_id sid:inst_id …

    I guess that’s pretty lazy of me, but then again you asked. I’m having to do that to many of my own single-instance utilities, especially with server side load balancing that sends workload to instances that don’t even show up in the public tnsnames…

    I will finish by just saying thanks for the script, I’m sure it will be useful.

    Oh and one more unrelated question, what kind of software system is it where the optimizer requires increasingly complex software to adjust its output to provide satisfactory performance?

  3. tanelp says:

    Ok, the bug which caused the snapper to fail on Oracle 9.2 is fixed.

    It was just a PLS_INTEGER overflow in my code, which Oracle 10g didn’t complain about, but 9.2 couldn’t handle.

    The fix was just matter of subtracting 1 from the pls_integer constant I used there.

  4. tanelp says:

    Thanks Paul for good ideas. Making the script “global” using GV$ views is definitely worth considering.

    On the other hand I intend to keep the script and its output simple, as this is just an ad-hoc quick performance tool.

    I’ll implement more sophisticated functionality into sesspack.

  5. tanelp says:

    ..and I did not “draw” the ASCII snapper myself, I used a JGP to ASCII converter. If you google for “GIF to ASCII converter” you’ll find multiple sites which do it for you. You’ll just need to submit the URL of the image you want to convert.

  6. paulk says:

    One problem with gv$ utilities – if an instance is in trouble some of the gv$ views ( not exactly sure which ones, not so much gv$session_wait as some of the others, I’m pretty sure that gv$instances is a bad one and perhaps gv$session, too) can freeze up on you. Sometimes the problem instance is also frozen, however being able to collect information from other instances via v$ is useful. “Yes, we know M8 is crawling, but that’s a backup instance, the others are fine”, when in fact the others aren’t fine, the users are suffering, the problems on one instance are impacting the others, and you can prove it with data collected from v$, but maybe not from gv$. Not a happy scenario, wish it never happened, but for me smetimes it’s necessary to have both gv$ and v$ versions.

  7. tanelp says:

    I think when I add the gv$ sampling feature, I would make it optional, e.g. there’s a parameter global, if it’s omitted the v$ views are used, but when you specify either instance names or inst_id’s then those instances would be snapped through gv$

  8. This script is way cool, but even cooler is the fact that you’re willing to share it with the community! Thanks a lot, Tanel – from now on you may be called by your new nick “Red Snapper”! ;-)
    I’ll fiddle around with it in the next days and may come back for some suggestions.
    Best Regards,
    –==/ Uwe \==–

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>