There was a question in Oracle-L mailinglist recently, regarding excessive SQL*Net break/reset to client waiting by a session.
A quote is below:
We are seeing an insert statement reporting “SQL*Net break/reset to client” as over 1/3 of its time.
On the face of it this event suggests network issues but nothing else backs this up as the cause.
So I looked at the Java code in question and a trace of one of the sessions.
What is happening is that an attempt is made to insert a row, most of the time a duplicate error results, the code catches this exception and does an update.
I was wondering if its the duplicate error and the exception handling which results in this wait event showing up.
My answer to that was following:
Yes, a SQL*Net break/reset happens when an error/unhandled exception is raised during a call (which means that the call executed didn’t complete normally, thus the call state must be reset).
The approach of “insert -> if failed then update” is basically what MERGE does.
You need to change the application to use MERGE command. Alternatively you could use an “update -> if no-rows-updated then insert” approach, but MERGE makes much more sense nowadays.
So, these waits aren’t really indicating any network bottleneck, but rather just the fact that the client needs to be notified if it’s call has failed and Oracle has to clean up after the failed call. Apparently on some (network protocol) architectures or in some scenarios these operations have taken significant time (and may have required a system call), so a kernel developer has decided to wait instrument them.
In this blog entry I want to share a small test case which illustrates the point. First lets check how many times my session has waited for a SQL*Net break/reset wait event:
SQL> select event, total_waits from v$session_event where event like '%reset%' and sid = (select sid from v$mystat where rownum = 1);
no rows selected
As V$SESSION_EVENT (like also V$SYSTEM_EVENT) doesn’t display events for which nobody has waited yet, I see that my session’s wait count for that event is zero.
Now let’s create an error, but also let’s handle it in a PL/SQL exception handler:
SQL> begin
2 execute immediate 'drop table non_existent';
3 exception
4 when others then null;
5 end;
6 /
PL/SQL procedure successfully completed.
SQL> select event, total_waits from v$session_event where event like '%reset%' and sid = (select sid from v$mystat where rownum = 1);
no rows selected
Still no breaks/resets as the error was handled and the OPI call completed properly.
Now let’s run the same PL/SQL block without exception handler:
SQL> begin
2 execute immediate 'drop table non_existent';
3 end;
4 /
begin
*
ERROR at line 1:
ORA-00942: table or view does not exist
ORA-06512: at line 2
SQL> select event, total_waits from v$session_event where event like '%reset%' and sid = (select sid from v$mystat where rownum = 1);
EVENT TOTAL_WAITS
---------------------------------------------------------------- -----------
SQL*Net break/reset to client 2
You see, there are 2 break/reset waits registered.
Also normal SQL (not fired from PL/SQL but from top level instead) causes breaks/resets if it errors:
SQL> select * from non_existent;
select * from non_existent
*
ERROR at line 1:
ORA-00942: table or view does not exist
SQL> select event, total_waits from v$session_event where event like '%reset%' and sid = (select sid from v$mystat where rownum = 1);
EVENT TOTAL_WAITS
---------------------------------------------------------------- -----------
SQL*Net break/reset to client 4
SQL>
Why 2 waits for a single error? I don’t know exact details, but the 10046 trace reveals that one of the waits is registered by a break operation and another one (apparently) not, thus it might be the call state reset operation just accounted separately:
=====================
PARSE ERROR #10:len=27 dep=0 uid=0 oct=3 lid=0 tim=1967444453 err=942
select * from non_existent
WAIT #10: nam='SQL*Net break/reset to client' ela= 2 driver id=1413697536 break?=1 p3=0 obj#=-1 tim=1967444525
WAIT #10: nam='SQL*Net break/reset to client' ela= 180 driver id=1413697536 break?=0 p3=0 obj#=-1 tim=1967444718
WAIT #10: nam='SQL*Net message to client' ela= 2 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=1967444752
Anyway, as I said in my Oracle-L reply, these breaks are caused by bad application design which allows too many unhandled exceptions to be propagated all the way up to the client.
The solutions would be to reduce or eliminate the number of errors occurring, or at least put that code into PL/SQL blocks where errors would be handled and not propagated back to client every time.



Very useful to know.
Thanks for the insight Tanel.
Comment by mohammed — April 10, 2008 @ 2:11 am
Yep, first the break is done to notify the client side to stop sending data, then the reset is done on both sides to cleanup the state and to resume from there. The next message to client is I think an “Successful completion of the ” of the call. So the client thinks that we are still talking.
Comment by Anjo Kolk — April 10, 2008 @ 9:42 pm
Nice! I guessed right :)
Thanks for the info, Anjo
Comment by tanelp — April 10, 2008 @ 9:54 pm
Very good blog entry, put together quickly as well.
Comment by John Hallas — April 10, 2008 @ 10:56 pm
We have a similar problem with top 2 wait events being bb waits and sqlnet break/reset to client event. We figured the cause being the sql something like this “select * form table where… for update nowait;” This table has one row in it. sessions not able to get locks returned “resource busy” message back to client triggering sqlnet break/reset event. Handling the exception at stored procedure worked out really well. however, wondering how to reduce the bbw waits on nowait sql. p3 value being 1 and table is in ASSM auto mode, 10gr2. Any ideas would be appreciated.
Steve
Comment by Steve — April 11, 2008 @ 4:05 am
Great post! (Although I hope that nobody reads this and thinks they should start adding “exception when others then null” to their code…)
Comment by Jeremy Schneider — April 13, 2008 @ 12:20 am
Steve, if the table has only one row and everybody is trying to update that row (some kind of locking, right?) then you can’t really split it up to multiple blocks.
Depending on what operations you serialize using that single-row lock, you may be able to break it down to multiple locks (each protecting a different operation or part of the structure) and have higher throughput like that.
I recommend you to check DBMS_LOCK user defined locks, you may be able to do your serialization better using that… or if you need better flexibility, then Advanced Queueing is for you.
Comment by tanelp — April 14, 2008 @ 10:54 pm
Thanks Jeremy, you’re right.. I think I’ll write a disclaimer to my blog that I only provide bits and pieces of information – but using it still requires the reader’s brain! :)
Comment by tanelp — April 14, 2008 @ 10:57 pm
Thanks Tanel for the suggestion.
Steve.
Comment by Steve — April 15, 2008 @ 12:07 am