Oracle hidden costs revealed, part 1 – Does a batch job run faster when executed locally?

This series is about revealing some Oracle’s internal execution costs and inefficiencies. I will analyze few situations and special cases where you can experience a performance hit where you normally wouldn’t expect to.

The first topic is about a question I saw in a recent Oracle Forum thread.

The question goes like this: “Is there any benefit if I run long sql queries from the server (by using telnet,etc) or from the remote by sql client.”

In order to leave out the network transfer cost of resultset for simplicity, I will rephrase the question like that: “Do I get better performance when I execute my server-side batch jobs (which don’t return any data to client) locally from the database server versus a remote application server or workstation?”

The obvious answer would be “NO, it does not matter where from you execute your batch job, as Oracle is a client server database system. All execution is done locally regardless of the client’s location, thus the performance is the same”.

While this sounds plausible in theory, there is (at least) one practical issue which can affect Oracle server performance depending on the clients platform and client libaries version.

It is caused by regular in-band break checking in client server communication channel where out of band break signalling is not available. A test case is below:

I have a little script called lotslios.sql which just joins OBJ$ table to itself and forces a long nested loop join. This causes lots of LIOs as the script name says.

In my tests I’m connecting to Oracle 10.2.0.3 database on Solaris x64 (and the server is otherwise idle).

I will connect to the same database using Windows sqlplus and Linux sqlplus clients (both on the same LAN network) and run the lotslios.sql script with the same parameter:

Using Windows sqlplus client (11.1.0.6):

SQL> @lotslios 10000

  COUNT(*)
----------
     10000

Elapsed: 00:00:29.28
SQL>

Using Linux sqlplus client (11.1.0.6):

SQL> @lotslios 10000

  COUNT(*)
----------
     10000

Elapsed: 00:00:27.24
SQL>

I ran multiple iterations of these tests to get better statistical sample, but running the same statement from Windows sqlplus client always gave worse response time:

( 29.28 / 27.24 ) - 1 = ~7.5%

So, running this “batch” from Windows sqlplus took 7.5% more time than from a Linux client.

So, how to understand where the extra time is going?

Here are the steps I usually go through such cases:

Check for differences in session’s wait event profile (using Snapper) – this would have clearly shown on which event the extra time was spent. However it didn’t show any differences (in fact, all test runs spinned 100% on CPU as expected in a single-threaded LIO burner test)
Check for differences in session’s statistic counters (using Snapper) – a consistent and reproducible difference in V$SESSTAT performance counters would have given some hints on where to look next – however there was virtually no difference
Check for differences in Oracle server process’es execution profile (using for example truss -c, DTrace profiler, OProfile or process stack sampling in a loop with either pstack or a debugger) – and this gave me a clear indication of the area where to look next (and huge amount of areas on what to NOT waste any time further)

Note that there are only three steps which usually are enough for seeing in which operation or kernel function the extra time is spent – and armed with the knowledge in which component the extra time is spent you can direct your investigation focus, narrow down your Metalink bug search, open a specific TAR or blame the Sysadmins ;-)

This is the beauty of a systematic approach and using right tool for the right problem (versus starting to peek around in random places, changing parameters, restarting instances or comparing all configuration files you can possibly find).

So, I used truss as the first tool for reporting system call execution profile for the server process.

For reference, this is what autotrace measured when running lotslios.sql (note the 4 million logical IOs):

SQL> @lotslios 10000

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
    4089670  consistent gets
          0  physical reads
          0  redo size
        335  bytes sent via SQL*Net to client
        350  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

SQL>

This is what the lotslios.sql generated when executed from Linux sqlplus:

solaris02$ truss -cp 18284
^C
syscall               seconds   calls  errors
read                     .000       2
write                    .000       2
times                    .000      27
yield                    .000     159
                     --------  ------   ----
sys totals:              .001     190      0
usr time:              27.191
elapsed:               31.080

This is what the lotslios.sql executed by Windows sqlplus generated:

solaris02$ truss -cp 19200
^C
syscall               seconds   calls  errors
read                     .000       2
write                    .000       2
times                    .000      45
yield                    .000     196
pollsys                 5.608 1355248
                     --------  ------   ----
sys totals:             5.610 1355493      0
usr time:              33.505
elapsed:               80.360
solaris02$

So, there is a big difference in number of pollsys() system calls, depending on which client was used for connecting. The pollsys syscall is normally used for checking whether there is any data that can be read from a file descriptor (or whether the file descriptor is ready for receiving more writes). As TCP sockets on Unix are also accessed through file descriptors, Oracle could be polling the client TCP connection file descriptor… but (without prior knowledge) we can not be sure.

So, can we dig deeper? Yes we can! Using my favourite digging shovel – pstack.

What we want to do, is stopping Oracle server process exactly when the pollsys system call is invoked. Solaris truss can help here (read the description for the switches I use from truss manpages if you’re not familiar with it).

So I attach to the server process with truss again and run the lotslios.sql:

solaris02$ truss -tpollsys -Tpollsys -p 19200
pollsys(0xFFFFFD7FFFDFB4F8, 1, 0xFFFFFD7FFFDFB4B0, 0x00000000) = 0
solaris02$

Thanks to the -T switch, truss suspends the process immediately when first pollsys syscall is made. And now, finally, we can run pstack to see which functions ended up calling pollsys():

solaris02$ pstack 19200
19200:  oracleSOL10G (LOCAL=NO)
 fffffd7ffdd52caa pollsys  (fffffd7fffdfb4f8, 1, fffffd7fffdfb4b0, 0)
 fffffd7ffdcf9dc2 poll () + 52
 000000000508d37b sntpoltsts () + 12b
 000000000507bc1b snttmoredata () + 2b
 0000000004f7bcdf nsmore2recv () + 25f
 0000000004f9a777 nioqts () + c7
 0000000003706bb0 kdst_fetch () + 4e0
 000000000375d14b kdstf0100101km () + 22b
 00000000036f9dba kdsttgr () + 68a
 00000000033c66bc qertbFetch () + 2ac
 00000000033bebb3 qerjotFetch () + b3
 000000000340edfc qercoFetch () + dc
 0000000003461963 qergsFetch () + 723
 0000000002932e9d opifch2 () + a0d
 00000000028cf6cb kpoal8 () + e3b
 0000000000e97c6c opiodr () + 41c
 0000000003d9f6da ttcpip () + 46a
 0000000000e939d3 opitsk () + 503
 0000000000e96f18 opiino () + 3a8
 0000000000e97c6c opiodr () + 41c
 0000000000e924d1 opidrv () + 2f1
 0000000000e8f90b sou2o () + 5b
 0000000000e552e4 opimai_real () + 84
 0000000000e551b4 main () + 64
 0000000000e54ffc ???????? ()
solaris02$

The main function of interest is snttmoredata(). SNTT prefix means (in my interpretation) System Network Transport TCP_?_ and moredata indicates that a check is made if there’s more data coming from the network connection/socket.

The snttmoredata function was called by nsmore2recv… which (in my interpretation again) means Network Service check for more data to receive.

So based on those two functions in stack it’s quite clear that the polls are caused by Oracle’s network layer. That’s is kind of unexpected as a client server database shouldn’t care much about the network while it’s still executing something for the client. This is the time to ask – what if this client wants to cancel the query?

The observations above are all about user’s ability to cancel a running query.

Oracle client server communication normally works in RPC fashion – for example a client sends a command to Oracle and Oracle doesn’t return anything until the command is completed.

Now if a user tries to cancel their query (using CTRL+C in sqlplus or calling OCIBreak in non-blocking OCI), a cancel packet is sent to server over TCP. The packet will be stored in the server side receive buffer of OS TCP stack and becomes available for reading for the server process (via a TCP socket). However if the server process is in a long-running loop executing a query, it needs to periodically check the TCP receive socket for any outstanding packets. And this is exactly what the pollsys() system call does.

This approach for cancelling an operation is called in-band break, as the break packet is sent in-band with all other traffic. The server process has to be programmed to periodically check for any newly arrived packets, even if it is already busy working on something else.

There are several functions in Oracle kernel where the developers have put the check for in-band breaks. This means that in some highly repetitive operations (like nested loop join) the same functions are hit again and again – causing frequent polling on the TCP socket. And too frequent polling is what causes the peformance degradation.

However Oracle network layer has a sqlnet.ora parameter called break_poll_skip, which can help in such situations. This parameters defines, how many times to just silently skip the TCP socket polling when the nsmore2recv() function is called. The parameter defaults to 3 in recent versions, which means that only 1 of 3 polls are actually executed ( from above test case it’s seen that for 4 million consistent gets roughly 1/3 = 1.3 million pollsys() calls were executed ).

Let’s change the break_poll_skip to 10:

solaris02$ echo break_poll_skip=10 > $ORACLE_HOME/network/admin/sqlnet.ora
solaris02$
solaris02$
solaris02$ truss -cp 3316
^C
syscall               seconds   calls  errors
read                     .000       2
write                    .000       2
times                    .000      33
yield                    .000     226
pollsys                 1.726  406574
                     --------  ------   ----
sys totals:             1.727  406837      0
usr time:              29.526
elapsed:               48.530

Only 1/10th of pollsys calls were executed at this time.

Let’s change the parameter that only 1/1000th of pollsys calls are executed:

solaris02$ echo break_poll_skip=1000 > $ORACLE_HOME/network/admin/sqlnet.ora
solaris02$
solaris02$ truss -cp 3429
^C
syscall               seconds   calls  errors
read                     .000       2
write                    .000       2
times                    .000      29
yield                    .000     178
pollsys                  .029    4066
                     --------  ------   ----
sys totals:              .030    4277      0
usr time:              27.572
elapsed:              107.850
solaris02$

So, setting break_poll_skip can help us reclaim some of the precious CPU time – however this is achieved at the expense of command cancellation responsiveness. If you set Oracle to skip 1000 break polls, then you need to wait longer until Oracle checks the TCP socket and realizes that a cancel packet has arrived. This may not be such a problem for a heavy nested loop where instead of 100 microseconds you get a response in 100 milliseconds. However if your session happens to wait for an enqueue then you might end up waiting for a loooong time for cancel to happen as Oracle calls the nsmore2recv() function once every timeout, which usually is 3 seconds. Multiply this with break_poll_skip value and you may end up waiting for almost an hour.

For now I recommend you NOT to use break_poll_skip – until you have read my 2nd article (which I’ll post in couple of weeks) about more details on in-band breaks and also out-of-band breaks which I haven’t covered so far.

Note that the polling is not a Solaris problem only, it happens in all Unixes as far as I know and likely on Windows as well.

An example from linux is below:

LIN11G1$ strace -cp 28730
Process 28730 attached - interrupt to quit
Process 28730 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
100.00    0.001383           0     54550           poll
  0.00    0.000000           0         1           read
  0.00    0.000000           0         2           write
  0.00    0.000000           0         3           times
  0.00    0.000000           0        12           getrusage
  0.00    0.000000           0        19           gettimeofday
------ ----------- ----------- --------- --------- ----------------
100.00    0.001383                 54587           total
LIN11G1$

Oracle hidden costs revealed, part 1 – Does a batch job run faster when executed locally?

Tanel Poder

2008-02-04