The simplest query for checking what’s happening in a database

When someone asks you to take a quick look into database performance and for whatever reason you can’t run your usual scripts or performance tools on there, ), then what query would you run first?
Yeah sometimes I’ve been not allowed to run custom scripts nor even touch the keyboard due security policies in effect.

Whenever you’re in such situation you want to be the command both short and effective for showing the database state.

The simplest query for determining database state performance wise would be this:

SQL> select event, state, count(*) from v$session_wait group by event, state order by 3 desc;

EVENT                                                            STATE                 COUNT(*)
---------------------------------------------------------------- ------------------- ----------
rdbms ipc message                                                WAITING                      9
SQL*Net message from client                                      WAITING                      8
log file sync                                                    WAITING                      6
gcs remote message                                               WAITING                      2
PL/SQL lock timer                                                WAITING                      2
PL/SQL lock timer                                                WAITED KNOWN TIME            2
Streams AQ: qmn coordinator idle wait                            WAITING                      1
smon timer                                                       WAITING                      1
log file parallel write                                          WAITING                      1
ges remote message                                               WAITING                      1
SQL*Net message to client                                        WAITED SHORT TIME            1
DIAG idle wait                                                   WAITING                      1
pmon timer                                                       WAITING                      1
db file sequential read                                          WAITING                      1
Streams AQ: waiting for messages in the queue                    WAITING                      1
rdbms ipc message                                                WAITED KNOWN TIME            1
jobq slave wait                                                  WAITING                      1
Streams AQ: qmn slave idle wait                                  WAITING                      1
Streams AQ: waiting for time management or cleanup tasks         WAITING                      1

19 rows selected.

It uses the Oracle wait interface to report what all database sessions are currently doing wait/CPU usage wise. Whenever there’s a systemic issue (like extremely slow log file writes) this query will give good hint towards the cause of problem. Of course just running couple of queries against wait interface doesn’t give you the full picture (as these kinds of database wide “healthchecks” can be misleading as we should be really measuring end user response time breakdown at session level and asking questions like what throughput/response time do you normally get) but nevertheless, if you want to see an instance sessions state overview, this is the simplest query I know.

Interpreting this query output should be combined with reading some OS performance tool output (like vmstat or perfmon), in order to determine whether the problem is induced by CPU overload. For example, if someone is running a parallel backup compression job on the server which is eating all CPU time, some of these waits may be just a side-effect of CPU overload).

Below is a cosmetically enhanced version of this command, as one thing I decode the “WAITED FOR xyz TIME” wait states to “WORKING” and “On CPU / runqueue” as event name as otherwise it’s easy to miss by accident that some sessions are not actually waiting on previous event anymore:

SQL> select
  2     count(*),
  3     CASE WHEN state != 'WAITING' THEN 'WORKING'
  4          ELSE 'WAITING'
  5     END AS state,
  6     CASE WHEN state != 'WAITING' THEN 'On CPU / runqueue'
  7          ELSE event
  8     END AS sw_event
  9  FROM
 10     v$session_wait
 11  GROUP BY
 12     CASE WHEN state != 'WAITING' THEN 'WORKING'
 13          ELSE 'WAITING'
 14     END,
 15     CASE WHEN state != 'WAITING' THEN 'On CPU / runqueue'
 16          ELSE event
 17     END
 18  ORDER BY
 19     1 DESC, 2 DESC
 20  /

  COUNT(*) STATE   EVENT
---------- ------- ----------------------------------------
        11 WAITING log file sync
         9 WAITING rdbms ipc message
         4 WAITING SQL*Net message from client
         3 WAITING PL/SQL lock timer
         2 WORKING On CPU / runqueue
         2 WAITING gcs remote message
         1 WAITING ges remote message
         1 WAITING pmon timer
         1 WAITING smon timer
         1 WAITING jobq slave wait
         1 WAITING Streams AQ: waiting for messages in the
         1 WAITING DIAG idle wait
         1 WAITING Streams AQ: qmn slave idle wait
         1 WAITING Streams AQ: waiting for time management
         1 WAITING db file sequential read
         1 WAITING log file parallel write
         1 WAITING Streams AQ: qmn coordinator idle wait

17 rows selected.

SQL>

Also, sometimes you might want to exclude the background processes and idle sessions from the picture:

SQL> select
  2     count(*),
  3     CASE WHEN state != 'WAITING' THEN 'WORKING'
  4          ELSE 'WAITING'
  5     END AS state,
  6     CASE WHEN state != 'WAITING' THEN 'On CPU / runqueue'
  7          ELSE event
  8     END AS sw_event
  9  FROM
 10     v$session
 11  WHERE
 12      type = 'USER'
 13  AND status = 'ACTIVE'
 14  GROUP BY
 15     CASE WHEN state != 'WAITING' THEN 'WORKING'
 16          ELSE 'WAITING'
 17     END,
 18     CASE WHEN state != 'WAITING' THEN 'On CPU / runqueue'
 19          ELSE event
 20     END
 21  ORDER BY
 22     1 DESC, 2 DESC
 23  /

  COUNT(*) STATE   EVENT
---------- ------- ----------------------------------------
         6 WAITING PL/SQL lock timer
         4 WORKING On CPU / runqueue
         3 WAITING db file sequential read
         1 WAITING read by other session
         1 WAITING Streams AQ: waiting for messages in the
         1 WAITING jobq slave wait

6 rows selected.

By the way, the above scripts report quite similar data what ASH is actually using (especially the instance performance graph which shows you the instance wait summary). ASH nicely puts the CPU count of server into the graph as well (that you would be able to put the number of “On CPU” sessions into perspective), so another useful command to run after this script is “show parameter cpu_count” or better yet, check at OS level to be sure :)

Note that you can use similar technique for easily viewing the instance activity from other perspectives/dimensions, like which SQL is being executed:

SQL> select sql_hash_value, count(*) from v$session
  2  where status = 'ACTIVE' group by sql_hash_value order by 2 desc;

SQL_HASH_VALUE   COUNT(*)
-------------- ----------
             0         20
     966758382          8
    2346103937          2
    3393152264          1
    3349907142          1
    2863564559          1
    4030344732          1
    1631089791          1

8 rows selected.

SQL> select sql_text,users_executing from v$sql where hash_value = 966758382;

SQL_TEXT                                                     USERS_EXECUTING
------------------------------------------------------------ ---------------
BEGIN :1 := orderentry.neworder(:2,:3,:4); END;                           10

So while these queries are nothing advanced nor really new, I hope this helps with the question “The whole database is suddenly slow – where to start?”

Note that this year’s only Advanced Oracle Troubleshooting class takes place in the end of April/May 2014, so sign up now if you plan to attend this year!

This entry was posted in Oracle and tagged , , . Bookmark the permalink.

10 Responses to The simplest query for checking what’s happening in a database

  1. Tom says:

    Tanel, on a related note concerning “On CPU/runqueue”, is it accurate to say that (for simplicity) on a 1 cpu server Oracle could record more than one hour of CPU usage? Specifically, I’m seeing on occasion, 2 hours of “CPU used by this session” in a 1 hour period. Could I be correct in attributing this extra hour to waiting on cpu, ie…run queue? Thanks much for any guidance you can provide!

  2. Jerry says:

    Hi Tanel,

    Do you think it would be useful to grab SUM(SECONDS_IN_WAIT) as well?

    Jerry

    p.s. I really enjoy reading your blog, thanks for doing this.

  3. kenny r. says:

    hi tanel,

    thanks for the tips.

    your blog is always a great read.

  4. tanelp says:

    Hi Tom,

    Oracle asks the process CPU usage numbers from OS, using times() syscall for example. Thus the CPU usage numbers are relatively correct, no such measurement like 2x difference should happen at OS level.

    The issue you are experiencing may be due Oracle updating the session the CPU usage only at the end of the db call. Thus if you have a long-running SQL statement, the “CPU used by this session” is not updated before the statement execution/fetch finishes. If you happen to take a sample just when the execution (which has lasted for X hours) finishes, you’d see all this CPU time reported at once, thus it can be larger than wallclock time during your sampling period.

    Starting from 10g you can sample V$SESS_TIME_MODEL instead (as my Snapper and Sesspack tools do), this way the CPU time is updated every 5 seconds.

  5. tanelp says:

    Hi Jerry,

    I usually run the query few times to get a better statistic sample of what’s happening in the instance.

    The SUM(SECONDS_IN_WAIT) could be deceiving as if there’s a single session which has been waiting for some lock for long time (but there are no other waiters) then the significance of this wait might be pushed too high in the output report. When diagnosing *instance* performance state here, it makes sense to concentrate on the event on which the highest number of sessions happen to wait.. and that’s the philosophy behind ASH enterprise manager screens as well.

  6. mdinh says:

    Why not use SQL_ID versus SQL_HASH_VALUE which in turn can be used with dbms_xplan.display_cursor()

  7. tanelp says:

    Yep, you can use also SQL_ID (or whatever other column of interest). SQL_ID doesn’t exist in 9i so I’m still used to use SQL hash value.

    Btw, SQL_ID is just a fancy presentation of SQL hash value, the library cache is still organized by hash value internally.

  8. Nadine says:

    what is the best action to take to reduce the waiting and working count for CPU usage in the most cost effective manner without hampering user connections and why?

  9. Tanel Poder says:

    @Nadine
    Hi Nadine,

    CPU usage is high because some sessions use too much CPU due to some SQLs using too much CPU (or being executed too frequently). So, the best way to reduce CPU usage, is to identify the TOP CPU consuming SQL statements (from ASH or AWR reports or Statspack) and just optimize them (to use less logical IOs for example or avoid running some SQL at all if design allows caching of values etc)

  10. Pingback: The simplest query for checking what´s happening in a Oracle database « Oracle Logbook

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>