This series is about revealing some Oracle’s internal execution costs and inefficiencies. I will analyze few situations and special cases where you can experience a performance hit where you normally wouldn’t expect to.
The first topic is about a question I saw in a recent Oracle Forum thread.
The question goes like this: “Is there any benefit if I run long sql queries from the server (by using telnet,etc) or from the remote by sql client.”
In order to leave out the network transfer cost of resultset for simplicity, I will rephrase the question like that: “Do I get better performance when I execute my server-side batch jobs (which don’t return any data to client) locally from the database server versus a remote application server or workstation?”
The obvious answer would be “NO, it does not matter where from you execute your batch job, as Oracle is a client server database system. All execution is done locally regardless of the client’s location, thus the performance is the same”.
While this sounds plausible in theory, there is (at least) one practical issue which can affect Oracle server performance depending on the clients platform and client libaries version.
It is caused by regular in-band break checking in client server communication channel where out of band break signalling is not available. A test case is below:
I have a little script called lotslios.sql which just joins OBJ$ table to itself and forces a long nested loop join. This causes lots of LIOs as the script name says.
In my tests I’m connecting to Oracle 10.2.0.3 database on Solaris x64 (and the server is otherwise idle).
I will connect to the same database using Windows sqlplus and Linux sqlplus clients (both on the same LAN network) and run the lotslios.sql script with the same parameter:
Using Windows sqlplus client (22.214.171.124):
SQL> @lotslios 10000 COUNT(*) ---------- 10000 Elapsed: 00:00:29.28 SQL>
Using Linux sqlplus client (126.96.36.199):
SQL> @lotslios 10000 COUNT(*) ---------- 10000 Elapsed: 00:00:27.24 SQL>
I ran multiple iterations of these tests to get better statistical sample, but running the same statement from Windows sqlplus client always gave worse response time:
( 29.28 / 27.24 ) - 1 = ~7.5%
So, running this “batch” from Windows sqlplus took 7.5% more time than from a Linux client.
So, how to understand where the extra time is going?
Here are the steps I usually go through such cases:
- Check for differences in session’s wait event profile (using Snapper) – this would have clearly shown on which event the extra time was spent. However it didn’t show any differences (in fact, all test runs spinned 100% on CPU as expected in a single-threaded LIO burner test)
- Check for differences in session’s statistic counters (using Snapper) – a consistent and reproducible difference in V$SESSTAT performance counters would have given some hints on where to look next – however there was virtually no difference
- Check for differences in Oracle server process’es execution profile (using for example truss -c, DTrace profiler, OProfile or process stack sampling in a loop with either pstack or a debugger) – and this gave me a clear indication of the area where to look next (and huge amount of areas on what to NOT waste any time further)
Note that there are only three steps which usually are enough for seeing in which operation or kernel function the extra time is spent – and armed with the knowledge in which component the extra time is spent you can direct your investigation focus, narrow down your Metalink bug search, open a specific TAR or blame the Sysadmins ;-)
This is the beauty of a systematic approach and using right tool for the right problem (versus starting to peek around in random places, changing parameters, restarting instances or comparing all configuration files you can possibly find).
So, I used truss as the first tool for reporting system call execution profile for the server process.
For reference, this is what autotrace measured when running lotslios.sql (note the 4 million logical IOs):
SQL> @lotslios 10000 Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 4089670 consistent gets 0 physical reads 0 redo size 335 bytes sent via SQL*Net to client 350 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed SQL>
This is what the lotslios.sql generated when executed from Linux sqlplus:
solaris02$ truss -cp 18284 ^C syscall seconds calls errors read .000 2 write .000 2 times .000 27 yield .000 159 -------- ------ ---- sys totals: .001 190 0 usr time: 27.191 elapsed: 31.080
This is what the lotslios.sql executed by Windows sqlplus generated:
solaris02$ truss -cp 19200 ^C syscall seconds calls errors read .000 2 write .000 2 times .000 45 yield .000 196 pollsys 5.608 1355248 -------- ------ ---- sys totals: 5.610 1355493 0 usr time: 33.505 elapsed: 80.360 solaris02$
So, there is a big difference in number of pollsys() system calls, depending on which client was used for connecting. The pollsys syscall is normally used for checking whether there is any data that can be read from a file descriptor (or whether the file descriptor is ready for receiving more writes). As TCP sockets on Unix are also accessed through file descriptors, Oracle could be polling the client TCP connection file descriptor… but (without prior knowledge) we can not be sure.
So, can we dig deeper? Yes we can! Using my favourite digging shovel – pstack.
What we want to do, is stopping Oracle server process exactly when the pollsys system call is invoked. Solaris truss can help here (read the description for the switches I use from truss manpages if you’re not familiar with it).
So I attach to the server process with truss again and run the lotslios.sql:
solaris02$ truss -tpollsys -Tpollsys -p 19200 pollsys(0xFFFFFD7FFFDFB4F8, 1, 0xFFFFFD7FFFDFB4B0, 0x00000000) = 0 solaris02$
Thanks to the -T switch, truss suspends the process immediately when first pollsys syscall is made. And now, finally, we can run pstack to see which functions ended up calling pollsys():
solaris02$ pstack 19200 19200: oracleSOL10G (LOCAL=NO) fffffd7ffdd52caa pollsys (fffffd7fffdfb4f8, 1, fffffd7fffdfb4b0, 0) fffffd7ffdcf9dc2 poll () + 52 000000000508d37b sntpoltsts () + 12b 000000000507bc1b snttmoredata () + 2b 0000000004f7bcdf nsmore2recv () + 25f 0000000004f9a777 nioqts () + c7 0000000003706bb0 kdst_fetch () + 4e0 000000000375d14b kdstf0100101km () + 22b 00000000036f9dba kdsttgr () + 68a 00000000033c66bc qertbFetch () + 2ac 00000000033bebb3 qerjotFetch () + b3 000000000340edfc qercoFetch () + dc 0000000003461963 qergsFetch () + 723 0000000002932e9d opifch2 () + a0d 00000000028cf6cb kpoal8 () + e3b 0000000000e97c6c opiodr () + 41c 0000000003d9f6da ttcpip () + 46a 0000000000e939d3 opitsk () + 503 0000000000e96f18 opiino () + 3a8 0000000000e97c6c opiodr () + 41c 0000000000e924d1 opidrv () + 2f1 0000000000e8f90b sou2o () + 5b 0000000000e552e4 opimai_real () + 84 0000000000e551b4 main () + 64 0000000000e54ffc ???????? () solaris02$
The main function of interest is snttmoredata(). SNTT prefix means (in my interpretation) System Network Transport TCP? and moredata indicates that a check is made if there’s more data coming from the network connection/socket.
The snttmoredata function was called by nsmore2recv… which (in my interpretation again) means Network Service check for more data to receive.
So based on those two functions in stack it’s quite clear that the polls are caused by Oracle’s network layer. That’s is kind of unexpected as a client server database shouldn’t care much about the network while it’s still executing something for the client. This is the time to ask – what if this client wants to cancel the query?
The observations above are all about user’s ability to cancel a running query.
Oracle client server communication normally works in RPC fashion – for example a client sends a command to Oracle and Oracle doesn’t return anything until the command is completed.
Now if a user tries to cancel their query (using CTRL+C in sqlplus or calling OCIBreak in non-blocking OCI), a cancel packet is sent to server over TCP. The packet will be stored in the server side receive buffer of OS TCP stack and becomes available for reading for the server process (via a TCP socket). However if the server process is in a long-running loop executing a query, it needs to periodically check the TCP receive socket for any outstanding packets. And this is exactly what the pollsys() system call does.
This approach for cancelling an operation is called in-band break, as the break packet is sent in-band with all other traffic. The server process has to be programmed to periodically check for any newly arrived packets, even if it is already busy working on something else.
There are several functions in Oracle kernel where the developers have put the check for in-band breaks. This means that in some highly repetitive operations (like nested loop join) the same functions are hit again and again – causing frequent polling on the TCP socket. And too frequent polling is what causes the peformance degradation.
However Oracle network layer has a sqlnet.ora parameter called break_poll_skip, which can help in such situations. This parameters defines, how many times to just silently skip the TCP socket polling when the nsmore2recv() function is called. The parameter defaults to 3 in recent versions, which means that only 1 of 3 polls are actually executed ( from above test case it’s seen that for 4 million consistent gets roughly 1/3 = 1.3 million pollsys() calls were executed ).
Let’s change the break_poll_skip to 10:
solaris02$ echo break_poll_skip=10 > $ORACLE_HOME/network/admin/sqlnet.ora solaris02$ solaris02$ solaris02$ truss -cp 3316 ^C syscall seconds calls errors read .000 2 write .000 2 times .000 33 yield .000 226 pollsys 1.726 406574 -------- ------ ---- sys totals: 1.727 406837 0 usr time: 29.526 elapsed: 48.530
Only 1/10th of pollsys calls were executed at this time.
Let’s change the parameter that only 1/1000th of pollsys calls are executed:
solaris02$ echo break_poll_skip=1000 > $ORACLE_HOME/network/admin/sqlnet.ora solaris02$ solaris02$ truss -cp 3429 ^C syscall seconds calls errors read .000 2 write .000 2 times .000 29 yield .000 178 pollsys .029 4066 -------- ------ ---- sys totals: .030 4277 0 usr time: 27.572 elapsed: 107.850 solaris02$
So, setting break_poll_skip can help us reclaim some of the precious CPU time – however this is achieved at the expense of command cancellation responsiveness. If you set Oracle to skip 1000 break polls, then you need to wait longer until Oracle checks the TCP socket and realizes that a cancel packet has arrived. This may not be such a problem for a heavy nested loop where instead of 100 microseconds you get a response in 100 milliseconds. However if your session happens to wait for an enqueue then you might end up waiting for a loooong time for cancel to happen as Oracle calls the nsmore2recv() function once every timeout, which usually is 3 seconds. Multiply this with break_poll_skip value and you may end up waiting for almost an hour.
For now I recommend you NOT to use break_poll_skip – until you have read my 2nd article (which I’ll post in couple of weeks) about more details on in-band breaks and also out-of-band breaks which I haven’t covered so far.
Note that the polling is not a Solaris problem only, it happens in all Unixes as far as I know and likely on Windows as well.
An example from linux is below:
LIN11G1$ strace -cp 28730 Process 28730 attached - interrupt to quit Process 28730 detached % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 100.00 0.001383 0 54550 poll 0.00 0.000000 0 1 read 0.00 0.000000 0 2 write 0.00 0.000000 0 3 times 0.00 0.000000 0 12 getrusage 0.00 0.000000 0 19 gettimeofday ------ ----------- ----------- --------- --------- ---------------- 100.00 0.001383 54587 total LIN11G1$
NB! If you want to move to the "New World" - offload your data and workloads to Hadoop, without having to re-write your existing applications - check out Gluent. We are making history! ;-)