When someone asks you to take a quick look into database performance and for whatever reason you can’t run your usual scripts or performance tools on there, ), then what query would you run first?
Yeah sometimes I’ve been not allowed to run custom scripts nor even touch the keyboard due security policies in effect.
Whenever you’re in such situation you want to be the command both short and effective for showing the database state.
The simplest query for determining database state performance wise would be this:
SQL> select event, state, count(*) from v$session_wait group by event, state order by 3 desc; EVENT STATE COUNT(*) ---------------------------------------------------------------- ------------------- ---------- rdbms ipc message WAITING 9 SQL*Net message from client WAITING 8 log file sync WAITING 6 gcs remote message WAITING 2 PL/SQL lock timer WAITING 2 PL/SQL lock timer WAITED KNOWN TIME 2 Streams AQ: qmn coordinator idle wait WAITING 1 smon timer WAITING 1 log file parallel write WAITING 1 ges remote message WAITING 1 SQL*Net message to client WAITED SHORT TIME 1 DIAG idle wait WAITING 1 pmon timer WAITING 1 db file sequential read WAITING 1 Streams AQ: waiting for messages in the queue WAITING 1 rdbms ipc message WAITED KNOWN TIME 1 jobq slave wait WAITING 1 Streams AQ: qmn slave idle wait WAITING 1 Streams AQ: waiting for time management or cleanup tasks WAITING 1 19 rows selected.
It uses the Oracle wait interface to report what all database sessions are currently doing wait/CPU usage wise. Whenever there’s a systemic issue (like extremely slow log file writes) this query will give good hint towards the cause of problem. Of course just running couple of queries against wait interface doesn’t give you the full picture (as these kinds of database wide “healthchecks” can be misleading as we should be really measuring end user response time breakdown at session level and asking questions like what throughput/response time do you normally get) but nevertheless, if you want to see an instance sessions state overview, this is the simplest query I know.
Interpreting this query output should be combined with reading some OS performance tool output (like vmstat or perfmon), in order to determine whether the problem is induced by CPU overload. For example, if someone is running a parallel backup compression job on the server which is eating all CPU time, some of these waits may be just a side-effect of CPU overload).
Below is a cosmetically enhanced version of this command, as one thing I decode the “WAITED FOR xyz TIME” wait states to “WORKING” and “On CPU / runqueue” as event name as otherwise it’s easy to miss by accident that some sessions are not actually waiting on previous event anymore:
SQL> select
2 count(*),
3 CASE WHEN state != 'WAITING' THEN 'WORKING'
4 ELSE 'WAITING'
5 END AS state,
6 CASE WHEN state != 'WAITING' THEN 'On CPU / runqueue'
7 ELSE event
8 END AS sw_event
9 FROM
10 v$session_wait
11 GROUP BY
12 CASE WHEN state != 'WAITING' THEN 'WORKING'
13 ELSE 'WAITING'
14 END,
15 CASE WHEN state != 'WAITING' THEN 'On CPU / runqueue'
16 ELSE event
17 END
18 ORDER BY
19 1 DESC, 2 DESC
20 /
COUNT(*) STATE EVENT
---------- ------- ----------------------------------------
11 WAITING log file sync
9 WAITING rdbms ipc message
4 WAITING SQL*Net message from client
3 WAITING PL/SQL lock timer
2 WORKING On CPU / runqueue
2 WAITING gcs remote message
1 WAITING ges remote message
1 WAITING pmon timer
1 WAITING smon timer
1 WAITING jobq slave wait
1 WAITING Streams AQ: waiting for messages in the
1 WAITING DIAG idle wait
1 WAITING Streams AQ: qmn slave idle wait
1 WAITING Streams AQ: waiting for time management
1 WAITING db file sequential read
1 WAITING log file parallel write
1 WAITING Streams AQ: qmn coordinator idle wait
17 rows selected.
SQL>
Also, sometimes you might want to exclude the background processes and idle sessions from the picture:
SQL> select
2 count(*),
3 CASE WHEN state != 'WAITING' THEN 'WORKING'
4 ELSE 'WAITING'
5 END AS state,
6 CASE WHEN state != 'WAITING' THEN 'On CPU / runqueue'
7 ELSE event
8 END AS sw_event
9 FROM
10 v$session
11 WHERE
12 type = 'USER'
13 AND status = 'ACTIVE'
14 GROUP BY
15 CASE WHEN state != 'WAITING' THEN 'WORKING'
16 ELSE 'WAITING'
17 END,
18 CASE WHEN state != 'WAITING' THEN 'On CPU / runqueue'
19 ELSE event
20 END
21 ORDER BY
22 1 DESC, 2 DESC
23 /
COUNT(*) STATE EVENT
---------- ------- ----------------------------------------
6 WAITING PL/SQL lock timer
4 WORKING On CPU / runqueue
3 WAITING db file sequential read
1 WAITING read by other session
1 WAITING Streams AQ: waiting for messages in the
1 WAITING jobq slave wait
6 rows selected.
By the way, the above scripts report quite similar data what ASH is actually using (especially the instance performance graph which shows you the instance wait summary). ASH nicely puts the CPU count of server into the graph as well (that you would be able to put the number of “On CPU” sessions into perspective), so another useful command to run after this script is “show parameter cpu_count” or better yet, check at OS level to be sure :)
Note that you can use similar technique for easily viewing the instance activity from other perspectives/dimensions, like which SQL is being executed:
SQL> select sql_hash_value, count(*) from v$session
2 where status = 'ACTIVE' group by sql_hash_value order by 2 desc;
SQL_HASH_VALUE COUNT(*)
-------------- ----------
0 20
966758382 8
2346103937 2
3393152264 1
3349907142 1
2863564559 1
4030344732 1
1631089791 1
8 rows selected.
SQL> select sql_text,users_executing from v$sql where hash_value = 966758382;
SQL_TEXT USERS_EXECUTING
------------------------------------------------------------ ---------------
BEGIN :1 := orderentry.neworder(:2,:3,:4); END; 10
So while these queries are nothing advanced nor really new, I hope this helps with the question “The whole database is suddenly slow – where to start?”




Tanel, on a related note concerning “On CPU/runqueue”, is it accurate to say that (for simplicity) on a 1 cpu server Oracle could record more than one hour of CPU usage? Specifically, I’m seeing on occasion, 2 hours of “CPU used by this session” in a 1 hour period. Could I be correct in attributing this extra hour to waiting on cpu, ie…run queue? Thanks much for any guidance you can provide!
Hi Tanel,
Do you think it would be useful to grab SUM(SECONDS_IN_WAIT) as well?
Jerry
p.s. I really enjoy reading your blog, thanks for doing this.
hi tanel,
thanks for the tips.
your blog is always a great read.
Hi Tom,
Oracle asks the process CPU usage numbers from OS, using times() syscall for example. Thus the CPU usage numbers are relatively correct, no such measurement like 2x difference should happen at OS level.
The issue you are experiencing may be due Oracle updating the session the CPU usage only at the end of the db call. Thus if you have a long-running SQL statement, the “CPU used by this session” is not updated before the statement execution/fetch finishes. If you happen to take a sample just when the execution (which has lasted for X hours) finishes, you’d see all this CPU time reported at once, thus it can be larger than wallclock time during your sampling period.
Starting from 10g you can sample V$SESS_TIME_MODEL instead (as my Snapper and Sesspack tools do), this way the CPU time is updated every 5 seconds.
Hi Jerry,
I usually run the query few times to get a better statistic sample of what’s happening in the instance.
The SUM(SECONDS_IN_WAIT) could be deceiving as if there’s a single session which has been waiting for some lock for long time (but there are no other waiters) then the significance of this wait might be pushed too high in the output report. When diagnosing *instance* performance state here, it makes sense to concentrate on the event on which the highest number of sessions happen to wait.. and that’s the philosophy behind ASH enterprise manager screens as well.
Why not use SQL_ID versus SQL_HASH_VALUE which in turn can be used with dbms_xplan.display_cursor()
Yep, you can use also SQL_ID (or whatever other column of interest). SQL_ID doesn’t exist in 9i so I’m still used to use SQL hash value.
Btw, SQL_ID is just a fancy presentation of SQL hash value, the library cache is still organized by hash value internally.
what is the best action to take to reduce the waiting and working count for CPU usage in the most cost effective manner without hampering user connections and why?
@Nadine
Hi Nadine,
CPU usage is high because some sessions use too much CPU due to some SQLs using too much CPU (or being executed too frequently). So, the best way to reduce CPU usage, is to identify the TOP CPU consuming SQL statements (from ASH or AWR reports or Statspack) and just optimize them (to use less logical IOs for example or avoid running some SQL at all if design allows caching of values etc)