SQL*Net break/reset to client

There was a question in Oracle-L mailinglist recently, regarding excessive SQL*Net break/reset to client waiting by a session.

A quote is below:

We are seeing an insert statement reporting “SQL*Net break/reset to client” as over 1/3 of its time.
On the face of it this event suggests network issues but nothing else backs this up as the cause.
So I looked at the Java code in question and a trace of one of the sessions.
What is happening is that an attempt is made to insert a row, most of the time a duplicate error results, the code catches this exception and does an update.
I was wondering if its the duplicate error and the exception handling which results in this wait event showing up.

My answer to that was following:

Yes, a SQL*Net break/reset happens when an error/unhandled exception is raised during a call (which means that the call executed didn’t complete normally, thus the call state must be reset).

The approach of “insert -> if failed then update” is basically what MERGE does.

You need to change the application to use MERGE command. Alternatively you could use an “update -> if no-rows-updated then insert” approach, but MERGE makes much more sense nowadays.

So, these waits aren’t really indicating any network bottleneck, but rather just the fact that the client needs to be notified if it’s call has failed and Oracle has to clean up after the failed call. Apparently on some (network protocol) architectures or in some scenarios these operations have taken significant time (and may have required a system call), so a kernel developer has decided to wait instrument them.

In this blog entry I want to share a small test case which illustrates the point. First lets check how many times my session has waited for a SQL*Net break/reset wait event:

SQL> select event, total_waits from v$session_event where event like '%reset%' and sid = (select sid from v$mystat where rownum = 1);

no rows selected

As V$SESSION_EVENT (like also V$SYSTEM_EVENT) doesn’t display events for which nobody has waited yet, I see that my session’s wait count for that event is zero.

Now let’s create an error, but also let’s handle it in a PL/SQL exception handler:

SQL> begin
  2     execute immediate 'drop table non_existent';
  3  exception
  4     when others then null;
  5  end;
  6  /

PL/SQL procedure successfully completed.

SQL> select event, total_waits from v$session_event where event like '%reset%' and sid = (select sid from v$mystat where rownum = 1);

no rows selected

Still no breaks/resets as the error was handled and the OPI call completed properly.

Now let’s run the same PL/SQL block without exception handler:

SQL> begin
  2     execute immediate 'drop table non_existent';
  3  end;
  4  /
begin
*
ERROR at line 1:
ORA-00942: table or view does not exist
ORA-06512: at line 2


SQL> select event, total_waits from v$session_event where event like '%reset%' and sid = (select sid from v$mystat where rownum = 1);

EVENT                                                            TOTAL_WAITS
---------------------------------------------------------------- -----------
SQL*Net break/reset to client                                              2

You see, there are 2 break/reset waits registered.

Also normal SQL (not fired from PL/SQL but from top level instead) causes breaks/resets if it errors:

SQL> select * from non_existent;
select * from non_existent
              *
ERROR at line 1:
ORA-00942: table or view does not exist


SQL> select event, total_waits from v$session_event where event like '%reset%' and sid = (select sid from v$mystat where rownum = 1);

EVENT                                                            TOTAL_WAITS
---------------------------------------------------------------- -----------
SQL*Net break/reset to client                                              4

SQL>

Why 2 waits for a single error? I don’t know exact details, but the 10046 trace reveals that one of the waits is registered by a break operation and another one (apparently) not, thus it might be the call state reset operation just accounted separately:

=====================
PARSE ERROR #10:len=27 dep=0 uid=0 oct=3 lid=0 tim=1967444453 err=942
select * from non_existent
WAIT #10: nam='SQL*Net break/reset to client' ela= 2 driver id=1413697536 break?=1 p3=0 obj#=-1 tim=1967444525
WAIT #10: nam='SQL*Net break/reset to client' ela= 180 driver id=1413697536 break?=0 p3=0 obj#=-1 tim=1967444718
WAIT #10: nam='SQL*Net message to client' ela= 2 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=1967444752

Anyway, as I said in my Oracle-L reply, these breaks are caused by bad application design which allows too many unhandled exceptions to be propagated all the way up to the client.

The solutions would be to reduce or eliminate the number of errors occurring, or at least put that code into PL/SQL blocks where errors would be handled and not propagated back to client every time.

Note that this year’s only Advanced Oracle Troubleshooting class takes place in the end of April/May 2014, so sign up now if you plan to attend this year!

This entry was posted in Oracle and tagged , , , . Bookmark the permalink.

15 Responses to SQL*Net break/reset to client

  1. mohammed says:

    Very useful to know.

    Thanks for the insight Tanel.

  2. Anjo Kolk says:

    Yep, first the break is done to notify the client side to stop sending data, then the reset is done on both sides to cleanup the state and to resume from there. The next message to client is I think an “Successful completion of the ” of the call. So the client thinks that we are still talking.

  3. tanelp says:

    Nice! I guessed right :)

    Thanks for the info, Anjo

  4. John Hallas says:

    Very good blog entry, put together quickly as well.

  5. Steve says:

    We have a similar problem with top 2 wait events being bb waits and sqlnet break/reset to client event. We figured the cause being the sql something like this “select * form table where… for update nowait;” This table has one row in it. sessions not able to get locks returned “resource busy” message back to client triggering sqlnet break/reset event. Handling the exception at stored procedure worked out really well. however, wondering how to reduce the bbw waits on nowait sql. p3 value being 1 and table is in ASSM auto mode, 10gr2. Any ideas would be appreciated.

    Steve

  6. Great post! (Although I hope that nobody reads this and thinks they should start adding “exception when others then null” to their code…)

  7. tanelp says:

    Steve, if the table has only one row and everybody is trying to update that row (some kind of locking, right?) then you can’t really split it up to multiple blocks.

    Depending on what operations you serialize using that single-row lock, you may be able to break it down to multiple locks (each protecting a different operation or part of the structure) and have higher throughput like that.

    I recommend you to check DBMS_LOCK user defined locks, you may be able to do your serialization better using that… or if you need better flexibility, then Advanced Queueing is for you.

  8. tanelp says:

    Thanks Jeremy, you’re right.. I think I’ll write a disclaimer to my blog that I only provide bits and pieces of information – but using it still requires the reader’s brain! :)

  9. Steve says:

    Thanks Tanel for the suggestion.

    Steve.

  10. Kaido says:

    So, I believe this is not an idle wait? Oracle SOAsuit produces very high amouont of this wait event on my production system. I created the servererror trigger — and within minutes, there was more than 5000 rows in log table. The reason is, that AQ dequeue method, returned an error message, that queue was empty. This means, that SOAsuit (compiled java pcode) does not handle this error. ADDM tells me: “Wait event “SQL*Net break/reset to client” in wait class “Application” was consuming significant database time.” Impact% 30-60. Should I be really worried about this event?

  11. Tanel Poder says:

    Tere Kaido,

    Yes, these waits are not idle as these cause extra work to be done inside the database. If wait interface says its significant, then something should be done about it.

    Instead of polling queues in a loop – you should poll in waiting mode, so instead of the error you just wait until something appears in the queue. Not sure how well this goes together with your application architecture though.

    Another option is, in case you are willing to (and can) change code you could wrap this dequeue operation into a pl/sql block or function which catches the exception if its raised and just returns a different return code (via return or bind variable) when there was nothing in the queue.

    This makes the break/reset wait event go away – but it may just shift the bottleneck elsewhere that instead of braking 100 times per second you will have just the PL/SQL call overhead instead.

  12. Ron Chennells says:

    May be worth checking this metalink note:

    SOA Suite AQ Messaging Generates Error ORA-25228 in the Database Logs [ID 963159.1]

  13. Thanks, usefull article.

  14. Pingback: SQL*Net break/reset to client « Pavel Ermakov's blog

  15. I currently face an environment with a rather high amount of “SQL*Net break/reset to client” waits.
    Thank you for sharing your insights with this useful article.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>