I have published a new article to tech.e2sn.com about recursive sessions and ORA-00018: maximum number of sessions exceeded error message:
Note that I’m working on setting up RSS feed for tech.e2sn too, coming soon :)
Tanel Poder's blog: IT & Mobile for Geeks and Pros
Oracle, Exadata, Linux, Performance, Troubleshooting - Mobile Life and Productivity.
I have published a new article to tech.e2sn.com about recursive sessions and ORA-00018: maximum number of sessions exceeded error message:
Note that I’m working on setting up RSS feed for tech.e2sn too, coming soon :)
What you see below is a common problem. Someone sends you (or posts to a forum) a wide execution plan, which is unreadable because of wrapped lines. For example, this one below:
-------------------------------------------------------------------------------- ------------------- | Id | Operation | Name | E-Rows | OMem | 1Mem | Used-Mem | -------------------------------------------------------------------------------- ------------------- | 0 | SELECT STATEMENT | | | | | | | 1 | SORT AGGREGATE | | 1 | | | | |* 2 | HASH JOIN | | 13 | 1102K| 1102K| 355K (0)| |* 3 | HASH JOIN | | 13 | 988K| 988K| 367K (0)| |* 4 | HASH JOIN | | 13 | 921K| 921K| 621K (0)| |* 5 | HASH JOIN OUTER | | 13 | 836K| 836K| 1224K (0)| |* 6 | HASH JOIN | | 13 | 821K| 821K| 501K (0)| |* 7 | HASH JOIN | | 13 | 1102K| 1102K| 501K (0)| | 8 | MERGE JOIN CARTESIAN| | 1 | | | | |* 9 | TABLE ACCESS FULL | PROFILE$ | 1 | | | | | 10 | BUFFER SORT | | 1 | 73728 | 73728 | | |* 11 | TABLE ACCESS FULL | PROFILE$ | 1 | | | | |* 12 | TABLE ACCESS FULL | USER$ | 36 | | | | | 13 | TABLE ACCESS FULL | PROFNAME$ | 1 | | | | |* 14 | TABLE ACCESS FULL | RESOURCE_GROUP_MAPPING$ | 1 | | | | | 15 | TABLE ACCESS FULL | TS$ | 7 | | | | | 16 | TABLE ACCESS FULL | TS$ | 7 | | | | | 17 | TABLE ACCESS FULL | USER_ASTATUS_MAP | 9 | | | | -------------------------------------------------------------------------------- -------------------
So now you either try to manually edit and fix the execution plan text so you could read it or ask the developer to send the execution plan again. Both approaches take time.
Well, sometimes things are easy – in this particular case I saved the above into a file called /tmp/x and ran the following command:
$ cat /tmp/x | awk '{ printf "%s", $0 ; if (NR % 3 == 0) print } END { print }'
---------------------------------------------------------------------------------------------------
| Id | Operation | Name | E-Rows | OMem | 1Mem | Used-Mem |
---------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | | |
| 1 | SORT AGGREGATE | | 1 | | | |
|* 2 | HASH JOIN | | 13 | 1102K| 1102K| 355K (0)|
|* 3 | HASH JOIN | | 13 | 988K| 988K| 367K (0)|
|* 4 | HASH JOIN | | 13 | 921K| 921K| 621K (0)|
|* 5 | HASH JOIN OUTER | | 13 | 836K| 836K| 1224K (0)|
|* 6 | HASH JOIN | | 13 | 821K| 821K| 501K (0)|
|* 7 | HASH JOIN | | 13 | 1102K| 1102K| 501K (0)|
| 8 | MERGE JOIN CARTESIAN| | 1 | | | |
|* 9 | TABLE ACCESS FULL | PROFILE$ | 1 | | | |
| 10 | BUFFER SORT | | 1 | 73728 | 73728 | |
|* 11 | TABLE ACCESS FULL | PROFILE$ | 1 | | | |
|* 12 | TABLE ACCESS FULL | USER$ | 36 | | | |
| 13 | TABLE ACCESS FULL | PROFNAME$ | 1 | | | |
|* 14 | TABLE ACCESS FULL | RESOURCE_GROUP_MAPPING$ | 1 | | | |
| 15 | TABLE ACCESS FULL | TS$ | 7 | | | |
| 16 | TABLE ACCESS FULL | TS$ | 7 | | | |
| 17 | TABLE ACCESS FULL | USER_ASTATUS_MAP | 9 | | | |
---------------------------------------------------------------------------------------------------
All I did here was that I stripped out line feeds from all lines except every 3rd line (which is the real end of the original line).
Note that if your linesize is very wide (and trimspool/trimout settings are ON) then this script would need some adjustment…
I’m sure this trivial approach doesn’t work in all situations, but with this article I wanted to illustrate that sometimes things which seem hard can be made much easier with a little scripting knowledge. If you are thinking which technology you should learn next – then better check out a Perl, Python or some shell+AWK book :)
By the way, if you want real flexibility displaying your execution plans (from library cache), then check this out:
This is the second part of the joint blog “project” with James Morle, called “The Wait Interface Is Useless (Sometimes)”.
We already did a joint presentation on this topic at UKOUG and more conferences will follow :) Read the first part by James here for intro.
So, where do we go when Oracle’s wait interface doesn’t help us? We will show multiple techniques over time, but here’s where I normally continue when wait interface is “useless”.
I use V$SESSTAT.
Oh, were you expecting something more “advanced” instead of boring old V$SESSTAT’s performance counters which has been available in Oracle for ages? ;-)
Well, there is a reason why the V$SESSTAT has been available in Oracle for ages, very likely even before Wait Interface was introduced (but I wouldn’t know for sure, I was still in elementary school back then or something :).
And the reason is that session level performance counters are VERY useful about finding out WHAT Oracle is doing. Every time you parse, a counter goes up by one in V$SESSTAT for the session. Every time you execute, a counter goes up. Every time you do a logical IO, a counter goes up. Every time you commit, a counter goes up. Every time an index block is split, a counter goes up. You get the point.
Oracle’s V$SESSTAT tells you WHAT Oracle is doing. Wait interface tells you how much TIME is spent waiting for something, but V$SESSTAT counters just tell you how many times some operation was done. You can not conclude how much time was spent by looking just at the number of times something has happened (as Cary will tell you) but nevertheless, the V$SESSTAT counters definitely give you a good clue into WHAT THE [FU|HE]CK is an Oracle session doing – especially when wait interface says you’re not waiting for anything and SQL trace doesn’t print a line.
How many different operations are counted for each session in Oracle? Let’s check (Oracle 11.2):
SQL> select count(*) from v$sesstat where sid = 14; COUNT(*) ---------- 611
Oracle 11.2 keeps track of the counts of 611 different operations, for each session!
That’s VERY valuable source of information for understanding what Oracle is doing and its relatively easy to use.
And that’s exactly the reason why I wrote my Snapper script – I wanted this information to be VERY EASY to use!
And here’s an example – session 14 is stuck, doesn’t respond, user complains. Let’s check the wait interface:
If I want to know what a session is doing, I sample V$SESSION_WAIT first, with my sw.sql script (or I could just query ASH which gives me the same data with history):
SQL> @sw 14
SID STATE EVENT SEQ# SEC_IN_WAIT P1 P2 P3 P1TRANSL
------- ------- ---------------------------------------- ---------- ----------- ------------------ ------------------ ------------------ ------------------------------------------
14 WORKING On CPU / runqueue 3486 130
1 row selected.
SQL>
SQL> @sw 14
SID STATE EVENT SEQ# SEC_IN_WAIT P1 P2 P3 P1TRANSL
------- ------- ---------------------------------------- ---------- ----------- ------------------ ------------------ ------------------ ------------------------------------------
14 WORKING On CPU / runqueue 3486 137
1 row selected.
SQL>
SQL>
SQL> @sw 14
SID STATE EVENT SEQ# SEC_IN_WAIT P1 P2 P3 P1TRANSL
------- ------- ---------------------------------------- ---------- ----------- ------------------ ------------------ ------------------ ------------------------------------------
14 WORKING On CPU / runqueue 3486 139
1 row selected.
SQL>
From here we see 2 things:
So, it’s pretty evident that this session is just burning CPU and wait interface is useless here (as we are not waiting for anything, as far as Oracle sees it of course). Alternatively I could just run top or prstat and check whether the process is 100% on CPU or not.
So, before going on to Snapper, lets look into what kind of SQL this session executes right now (we could sample this also multiple times, to check whether we are constantly running the same statement):
SQL> <strong>select * from table(select dbms_xplan.display_cursor(sql_id,sql_child_number) from v$session where sid = 14);</strong>
PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID 5vy5qjd3fsn5c, child number 0
-------------------------------------
SELECT MIN(t1.created), MAX(t1.created) FROM t1 , t2 , t3
WHERE t1.object_id = t2.object_id AND t2.object_id = t3.object_id
AND t1.owner = :v AND t2.owner = :v AND t3.owner = :v
Plan hash value: 3271631391
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 4 (100)| |
| 1 | SORT AGGREGATE | | 1 | 66 | | |
|* 2 | HASH JOIN | | 7 | 462 | 4 (25)| 00:00:01 |
| 3 | NESTED LOOPS | | | | | |
| 4 | NESTED LOOPS | | 7 | 329 | 2 (0)| 00:00:01 |
| 5 | TABLE ACCESS BY INDEX ROWID| T1 | 7 | 196 | 1 (0)| 00:00:01 |
|* 6 | INDEX RANGE SCAN | I1 | 7 | | 1 (0)| 00:00:01 |
|* 7 | INDEX RANGE SCAN | I2 | 27 | | 1 (0)| 00:00:01 |
|* 8 | TABLE ACCESS BY INDEX ROWID | T2 | 1 | 19 | 1 (0)| 00:00:01 |
| 9 | TABLE ACCESS BY INDEX ROWID | T3 | 40 | 760 | 1 (0)| 00:00:01 |
|* 10 | INDEX RANGE SCAN | I3 | 40 | | 1 (0)| 00:00:01 |
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("T2"."OBJECT_ID"="T3"."OBJECT_ID")
6 - access("T1"."OWNER"=:V)
7 - access("T2"."OWNER"=:V)
8 - filter("T1"."OBJECT_ID"="T2"."OBJECT_ID")
10 - access("T3"."OWNER"=:V)
33 rows selected.
SQL>
Hmm… can anyone reliably tell why this SQL statement is slow and burns all the CPU time?
The answer is no. An execution PLAN is a plan of actions which would be executed when someone runs the execution plan. It doesn’t tell you anything about what’s exactly going on right now, it doesn’t tell you what exactly is using all the CPU time. Anything we would say at this point, would be a guess! Lets use V$SESSTAT instead, for gathering more hard evidence!
(The syntax of Snapper is documented inside the script or just search for snapper in my blog, plenty of examples :)
SQL> <strong>@snapper out 5 1 14 </strong> -- Session Snapper v2.02 by Tanel Poder ( http://www.tanelpoder.com ) -- ---------------------------------------------------------------------------------------------------------------------- SID, USERNAME , TYPE, STATISTIC , DELTA, HDELTA/SEC, %TIME, GRAPH ---------------------------------------------------------------------------------------------------------------------- 14, SYS , STAT, session logical reads , 81351, 16.27k, 14, SYS , STAT, consistent gets , 81351, 16.27k, 14, SYS , STAT, consistent gets from cache , 81351, 16.27k, 14, SYS , STAT, consistent gets from cache (fastpath) , 81352, 16.27k, 14, SYS , STAT, no work - consistent read gets , 81381, 16.28k, 14, SYS , STAT, table fetch by rowid , 2904784, 580.96k, 14, SYS , STAT, index scans kdiixs1 , 93, 18.6, 14, SYS , STAT, buffer is pinned count , 5736560, 1.15M, 14, SYS , STAT, buffer is not pinned count , 75248, 15.05k, 14, SYS , STAT, no buffer to keep pinned count , 1, .2, 14, SYS , TIME, DB CPU , 6050000, 1.21s, 121.0%, |@@@@@@@@@@| 14, SYS , TIME, sql execute elapsed time , 6059922, 1.21s, 121.2%, |@@@@@@@@@@| 14, SYS , TIME, DB time , 6059922, 1.21s, 121.2%, |@@@@@@@@@@| -- End of snap 1, end=2010-01-15 17:33:22, seconds=5 PL/SQL procedure successfully completed.
As our previous sw.sql output showed – we are not waiting for anything. If we were, then snapper would show you WAIT lines from wait interface (V$SESSION_EVENT) as well. But we don’t see any waits.
So, now its the time to go through all the V$SESSTAT stats reported in Snapper! These are the lines with STAT in them (TIME lines are V$SESS_TIME_MODEL breakdown available since 10g, but they are not detailed enough for diagnosing complex problems).
One thing we see, there is a statistic session logical reads and from the last column of Snapper output we see that this session did over 16000 logical reads (buffer gets) per second during snapper run time (of 5 seconds). That’s already a good indication of why we burn so much CPU time – we are doing lots of logical IOs.
Also, we see that we did 18.6 index range scans per second (the statistic index scans kdiixs1 shows that). That doesn’t seem “much” does it. However, check the value for buffer is pinned count statistic! This counter is updated every time we go to a buffer to get some data from it AND it already happens to be open by my session! Oracle keeps buffers pinned and relevant buffer handles cached during a database call in some cases (like nested loop joins and index scans) to avoid reopening the buffer again (getting a buffer again if its closed would mean another logical IO).
Nevertheless, we can see that we re-visited some buffers again and again and again – over a million times per second! When checking my index structures I see that every index has only a few hundred blocks, so having millions and millions of buffer visits (buffer is pinned count + session logical reads) means we are heavily re-visiting the same blocks again and again and again! This points directly to NESTED LOOPS (and also INDEX RANGE SCANS) in the execution plan. Nested loops, as the name says loops through inner rowsource and revisits some of its data as many times as the number of rows coming from the outer rowsource (with few special cases, as usual).
And there’s one more statistic in Snapper output which I do NOT see – execute count. This statistic shows how many times the session executed cursors (executed new ones or re-executed the same one). NB! Snapper only reports these V$SESSTAT statistics which changed during the snapshot – as execute count is not reported, it means that the execute count did not increase during the snapshot, thus the session still executed the same statement! (so for example we were not hard parsing and lots of different statements to “justify” this CPU usage, Snapper would have shown parse count (hard) going up if this session had done so).
So, using the above diagnosis, this is the place where I would take a serious look whether the NESTED LOOPS in the above execution plan are a right thing to do. I would check how many rows would actually match the bind variable values (more about that later) in the SQL statement predicates – NESTED LOOPS is not a good operation for joining large number of rows. More about SQL tuning very soon ;-)
To finish this blog entry, this is the sequence of troubleshooting which I usually use (if a user complains or “something” is slow):
Note that Snapper is just an anonymous PL/SQL block and it doesn’t require any changes to the database!
Ok, back to James now!
Copyright © 2013 · Minimum Theme on Genesis Framework · WordPress · Log in