Tanel Poder’s blog: Core IT for geeks and pros

April 10, 2008

SQL*Net break/reset to client

Filed under: Internals, Networking, Oracle, Performance, Troubleshooting — tanelp @ 12:50 am

There was a question in Oracle-L mailinglist recently, regarding excessive SQL*Net break/reset to client waiting by a session.

A quote is below:

We are seeing an insert statement reporting “SQL*Net break/reset to client” as over 1/3 of its time.
On the face of it this event suggests network issues but nothing else backs this up as the cause.
So I looked at the Java code in question and a trace of one of the sessions.
What is happening is that an attempt is made to insert a row, most of the time a duplicate error results, the code catches this exception and does an update.
I was wondering if its the duplicate error and the exception handling which results in this wait event showing up.

My answer to that was following:

Yes, a SQL*Net break/reset happens when an error/unhandled exception is raised during a call (which means that the call executed didn’t complete normally, thus the call state must be reset).

The approach of “insert -> if failed then update” is basically what MERGE does.

You need to change the application to use MERGE command. Alternatively you could use an “update -> if no-rows-updated then insert” approach, but MERGE makes much more sense nowadays.

So, these waits aren’t really indicating any network bottleneck, but rather just the fact that the client needs to be notified if it’s call has failed and Oracle has to clean up after the failed call. Apparently on some (network protocol) architectures or in some scenarios these operations have taken significant time (and may have required a system call), so a kernel developer has decided to wait instrument them.

In this blog entry I want to share a small test case which illustrates the point. First lets check how many times my session has waited for a SQL*Net break/reset wait event:

SQL> select event, total_waits from v$session_event where event like '%reset%' and sid = (select sid from v$mystat where rownum = 1);

no rows selected

As V$SESSION_EVENT (like also V$SYSTEM_EVENT) doesn’t display events for which nobody has waited yet, I see that my session’s wait count for that event is zero.

Now let’s create an error, but also let’s handle it in a PL/SQL exception handler:

(more…)

March 6, 2008

Future appearances

Filed under: Uncategorized — tanelp @ 5:09 am

In addition to Hotsos Symposium this week, I will deliver my Advanced Oracle Troubleshooting presentation and demos at NYOUG DBA SIG this Thursday (06. March 2008) in Oracle’s Park Avenue office in Manhattan, NYC. If you are interested, see the details at http://www.nyoug.org/upcoming_events.htm#dbaname

Also, I will be speaking at Trivadis Performance Days in Zurich  23.-24. April 2008. I will deliver the following presentations/demos:

  • Advanced Oracle Troubleshooting
  • Performance and Scalability Improvements in Oracle 10g and 11g

 If you are interested, visit this link:

https://www.trivadis.com/shop/Kursdetail.aspx?KategorieID=2&SubkategorieID=78&KurseID=608

Hotsos Symposium 2008 presentations and files

Filed under: Administration, Cool stuff, Internals, Oracle, Performance, Tools, Troubleshooting — tanelp @ 4:48 am

I have uploaded the presentations I did at Hotsos Symposium this week to here

Both of my presentations are in that file, along with my TPT scripts, Snapper, Sesspack and PerfSheet.

I delivered two presentations:

  • Advanced Oracle Troubleshooting: No magic is needed, systematic approach will do
  • Oracle Performance Trending and Charting using Sesspack and Excel

If you have any follow up questions or have trouble getting any of my scripts running, please write it to the comments section of this blog.

February 10, 2008

SQL*Net message to client vs SQL*Net more data to client

Filed under: Internals, Networking, Oracle, Performance — tanelp @ 3:35 pm

After my last post about SQL*Net message to client wait event I had a follow-up question about what’s the difference between SQL*Net message to client and SQL*Net more data to client wait events. I’ll post the answer here:

The first session data unit (SDU) bufferful of return data is written to TCP socket buffer under “SQL*Net message to client” wait event.

If Oracle needs to write more data for a call than fits into the “first” SDU buffer, then further writes for that call are done under SQL*Net more data to client event.

So, whether and how much of the “SQL*Net more data to client” vs “SQL*Net message to client” waits you see depends on two things:

  • Amount of data returned to client per call
  • Oracle Net SDU size

A quick test with my Snapper performance tool illustrates this:

In one session run following:

SQL> set arraysize 1
SQL>
SQL> select rownum from dba_source;

    ROWNUM
----------
         1
         2
         3
         4
         5
         6
…many rows returned…

And I measure the waits in the other session:

SQL> @snapper out,gather=w 5 1 141

-- Session Snapper v1.06 by Tanel Poder ( http://www.tanelpoder.com )

---------------------------------------------------------------------------------------------------------------------------------------------
HEAD,     SID, SNAPSHOT START   ,  SECONDS, TYPE, STATISTIC                               ,         DELTA,      D/SEC,     HDELTA,     HD/SEC
---------------------------------------------------------------------------------------------------------------------------------------------
DATA,     141, 20080210 14:59:55,        5, WAIT, SQL*Net message to client               ,          6750,       1350,     6.75ms,     1.35ms
DATA,     141, 20080210 14:59:55,        5, WAIT, SQL*Net message from client             ,       4668258,     933652,      4.67s,   933.65ms
--  End of snap 1

No SQL*Net more data to client is evidenced above as due the arraysize = 1 setting Oracle fetches only 2 rows at a time. Therefore the returned resultset per fetch call always fits into the first SDU sized packet. Note that the reason why Oracle fetches 2 rows when arraysize is 1 is due an OCI optimization which tries to be clever and proactively detect end-of-resultset conditions when fetching a single row - that way there’s a chance to automatically cancel a cursor and release its pins without an explicit cursor cancel or close call. This should have positive effect reducing shared pool fragmentation in some conditions.

Anyway, now I set arraysize to 5000 and run the same SQL again:

(more…)

February 7, 2008

Cary Millsap is blogging

Filed under: Uncategorized — tanelp @ 7:05 pm

If you work with Oracle databases and are interested in system performance, then you probably want to know what Cary Millsap has to say. So, make sure you bookmark his blog: http://carymillsap.blogspot.com/

SQL*Net message to client wait isn’t really what it’s thought to be

Filed under: Internals, Performance, Troubleshooting — tanelp @ 4:35 pm

In a recent Oracle Forum thread a question came up how to use SQL*Net message to client wait events for measuring network latency between server and client. The answer is that you can’t use it for network latency measurements at all, due how TCP stack works and how Oracle uses it.

I’ll paste my answer here too, for people who don’t follow Oracle Forums:

As I wrote in that reply, “SQL*Net message to client” does NOT measure network latency! It merely measures how long it took to put the response message into TCP send buffer on the server!

Once the response packet is put into TCP send buffer, Oracle server process continues on and starts waiting for “SQL*Net message FROM client” again. It’s up to TCP stack to deliver this packet from this point and Oracle server process has no way for measuring directly when did the packet arrive (it would have to start intercepting TCP ACK packets at kernel level for that).

This behaviour also explains, why the “SQL*Net message TO client” waits are usually unbelievably short - like 1 microsecond etc. The light in vacuum travels only 300 meters in one microsecond, the signal in a cable travels much less - yet when sending a single packet from London to Hong-Kong, we see few microsecond “response times”…

Example:

solaris02$ fgrep “SQL*Net message to client” sol10g_ora_837.trc | head
WAIT #8: nam=’SQL*Net message to client’ ela= 2 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=16418611294
WAIT #7: nam=’SQL*Net message to client’ ela= 1 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=16418644515
WAIT #5: nam=’SQL*Net message to client’ ela= 2 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=16428803213
WAIT #5: nam=’SQL*Net message to client’ ela= 1 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=16428880945
WAIT #5: nam=’SQL*Net message to client’ ela= 1 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=16428927443
WAIT #5: nam=’SQL*Net message to client’ ela= 0 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=16428973661
WAIT #5: nam=’SQL*Net message to client’ ela= 1 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=16429019250
WAIT #5: nam=’SQL*Net message to client’ ela= 1 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=16429066742
WAIT #5: nam=’SQL*Net message to client’ ela= 1 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=16429114761
WAIT #5: nam=’SQL*Net message to client’ ela= 1 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=16429162471

Note that you can see longer times spent waiting for “SQL*Net message to client” when sending large amounts of data. This happens when your TCP send buffer gets full, thus TCP stack can not accept further packets. Sending will be blocked until the remote site sends back further ACK packets which state up to which byte in the TCP transmission stream it has received the data.

So, if you’re sending loads of data over a slow link or misconfigured TCP connection, the “SQL*Net message to client” wait time can be used as a low-confidence indicator of your SQL*Net throughput (in conjuction with “bytes sent via SQL*Net to client”), but never a measure of network latency!

Note that this is not OS platform specific, this is just how TCP and SQL*Net work.

February 5, 2008

Oracle hidden costs revealed, part 1 - Does a batch job run faster when executed locally?

Filed under: Internals, Performance, Troubleshooting, Unix/Linux — tanelp @ 12:09 am

This series is about revealing some Oracle’s internal execution costs and inefficiencies. I will analyze few situations and special cases where you can experience a performance hit where you normally wouldn’t expect to.

The first topic is about a question I saw in a recent Oracle Forum thread.

The question goes like this: “Is there any benefit if I run long sql queries from the server (by using telnet,etc) or from the remote by sql client.”

In order to leave out the network transfer cost of resultset for simplicity, I will rephrase the question like that: “Do I get better performance when I execute my server-side batch jobs (which don’t return any data to client) locally from the database server versus a remote application server or workstation?”

The obvious answer would be “NO, it does not matter where from you execute your batch job, as Oracle is a client server database system. All execution is done locally regardless of the client’s location, thus the performance is the same”.

While this sounds plausible in theory, there is (at least) one practical issue which can affect Oracle server performance depending on the clients platform and client libaries version.

It is caused by regular in-band break checking in client server communication channel where out of band break signalling is not available. A test case is below:

(more…)

February 3, 2008

Excellent article on Oracle 11g PL/SQL function result cache

Filed under: Oracle 11g, PL/SQL, Performance, SQL — tanelp @ 3:39 pm

I have so far avoided writing such pointer blog posts which only refer you to another article, but I have to do it with this one. Adrian Billington has written an excellent article on performance of Oracle 11g PL/SQL function result cache. His article is a good example of a thorough, well organized and well written technical content. I really enjoyed reading it and thanks to his thoroughness, he has just saved me some precious time doing that research on my own.

Get the article here: http://www.oracle-developer.net/display.php?id=504

January 14, 2008

Can you write a working SQL statement without using any whitespace?

Filed under: Cool stuff, Oracle, SQL — tanelp @ 12:42 pm

I read this post by Laurent Schneider yesterday. In the comment section Tom Kyte already explained what the issue was about, but I’ll expand this explanation a little.

The question was why should the apparently invalid statement below work? I mean there is no such column nor number as “1.x”), yet the statement works ok:

SQL> select 1.x from dual;

         X
----------
         1

The column header gives a good hint what happened above. Oracle has treated the X as the column alias.

Let’s remove the “x” and see:

SQL> select 1. from dual;

        1.
----------
         1

Now all works as expected, “1.” is treated as number ( 1. = 1.0 )

But the question remains, how come Oracle splits up the string “1.x” and decides that the “x” is the column and “1.” is the number part - considering that there’s no whitespace between the 1. and x?

The answer is that apparently the string tokenizer used by Oracle SQL parser is smart enough that it doesn’t rely only on whitespace for recognizing token delimiters. It is also able to use character class analysis for understanding where a literal ends and the next token (like column alias) starts.

So, it is possible to write valid SQL statements without using any whitespace at all. For example:

(more…)

January 9, 2008

Expensive calculator…

Filed under: Cool stuff, Tools — tanelp @ 5:53 pm

Oracle has evolved over time to much more than just a plain relational database. One option is to use Oracle as an expensive calculator.

When researching or demoing Oracle, it’s quite convenient to do number calculations directly on sqlplus prompt, especially if dealing with internals where lots of stuff is about addresses and offsets shown in hex.

Here’s the script what I use for such purposes: http://www.tanelpoder.com/files/scripts/calc.sql.

It usually saves me couple of seconds every calculation as I don’t have to reopen the calc.exe on my Windows box (I immediately stopped using it after I wrote the script).

But the main benefit is that your calculation outputs remain in the sqlplus output history, so you don’t have to redo them again and you’ll have somewhat better documentation of the output of research steps you’ve done.

Here’s a quick demo:

Add a decimal value to decimal value:

SQL> @calc 10 + 10

                 DEC                  HEX
-------------------- --------------------
                  20                   14

Add a hex value to a decimal:

SQL> @calc 10 + 0x10

                 DEC                  HEX
——————– ——————–
                  26                   1A

Divide a hex value with a decimal one:

SQL> @calc 0xFFFF / 64

                 DEC                  HEX
-------------------- --------------------
                1024                  400

Show a value in both hex and dec:

SQL> @calc 65536 + 0

                 DEC                  HEX
-------------------- --------------------
               65536                10000

Note that you do not need to use “0x” for indicating a hex value, instead just an “x” will do:

SQL> @calc xB0B * x142A2D

                 DEC                  HEX
-------------------- --------------------
          3735928559             DEADBEEF

So, you can use Oracle as a very expensive calculator - but on the other hand it can save some of your time…

Blog at WordPress.com.