Peeking into Linux kernel-land using /proc filesystem for quick’n’dirty troubleshooting

This blog entry is about modern Linuxes. In other words RHEL6 equivalents with 2.6.3x kernels and not the ancient RHEL5 with 2.6.18 kernel (wtf?!), which is the most common in enterprises unfortunately. And no, I’m not going to use kernel debuggers or SystemTap scripts here, just plain old “cat /proc/PID/xyz” commands against some useful /proc filesystem entries.

Troubleshooting a “slow” process

Here’s one systematic troubleshooting example I reproduced in my laptop. A DBA was wondering why their find command had been running “much slower”, without returning any results for a while. Knowing the environment, we had a hunch, but I got asked about what would be the systematic approach for troubleshooting this – already ongoing – problem right now.

Luckily the system was running OEL6, so had a pretty new kernel. Actually the 2.6.39 UEK2.

So, let’s do some troubleshooting. First let’s see whether that find process is still alive:

[root@oel6 ~]# ps -ef | grep find
root     27288 27245  4 11:57 pts/0    00:00:01 find . -type f
root     27334 27315  0 11:57 pts/1    00:00:00 grep find

Yep it’s there – PID 27288 (I’ll use that pid throughout the troubleshooting example).

Let’s start from the basics and take a quick look what’s the bottleneck for this process – if it’s not blocked by anything (for example reading everything it needs from cache) it should be 100% on CPU. If it’s bottlenecked by some IO or contention issues, the CPU usage should be lower – or completely 0%.

[root@oel6 ~]# top -cbp 27288
top - 11:58:15 up 7 days,  3:38,  2 users,  load average: 1.21, 0.65, 0.47
Tasks:   1 total,   0 running,   1 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.1%us,  0.1%sy,  0.0%ni, 99.8%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   2026460k total,  1935780k used,    90680k free,    64416k buffers
Swap:  4128764k total,   251004k used,  3877760k free,   662280k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
27288 root      20   0  109m 1160  844 D  0.0  0.1   0:01.11 find . -type f

Top tells me this process is either 0% on CPU or very close to zero percent (so it gets rounded to 0% in the output). There’s an important difference though, as a process that is completely stuck, not having a chance of getting onto CPU at all vs. a process which is getting out of its wait state every now and then (for example some polling operation times out every now and then and thee process chooses to go back to sleep). So, top on Linux is not a good enough tool to show that difference for sure – but at least we know that this process is not burning serious amounts of CPU time.

Let’s use something else then. Normally when a process seems to be stuck like that (0% CPU usually means that the process is stuck in some blocking system call – which causes the kernel to put the process to sleep) I run strace on that process to trace in which system call the process is currently stuck. Also if the process is actually not completely stuck, but returns from a system call and wakes up briefly every now and then, it would show up in strace (as the blocking system call would complete and be entered again a little later):

[root@oel6 ~]# strace -cp 27288
Process 27288 attached - interrupt to quit

^C
^Z
[1]+  Stopped                 strace -cp 27288

[root@oel6 ~]# kill -9 %%
[1]+  Stopped                 strace -cp 27288
[root@oel6 ~]# 
[1]+  Killed                  strace -cp 27288

Oops, the strace command itself got hung too! It didn’t print any output for a long time and didn’t respond to CTRL+C, so I had to put it into background with CTRL+Z and kill it from there. So much for easy diagnosis.

Continue reading

Posted in Oracle | 14 Comments

Even more Snapper – v4.03 now works in SQL Developer too!

If you already downloaded snapper v4, then better re-download it again as the v4.03 also runs in SQL Developer!

Snapper 4 inside SQL Developer

You might also want to check how to enable the DBMS_OUTPUT display in SQL Developer post by “That” Jeff Smith :-)

Update: Niall Litchfield pointed out that the “set serverout on” command was ignored on SQL Developer because I was using the short syntax (serverout instead of serveroutput). Once I changed this, the SQL Developer doesn’t print the warning anymore and there’s no need to explicitly open up the separate DBMS_OUTPUT. I have uploaded Snapper v4.04 with this fix included. Thanks Niall!

Also, thanks go to:

  • Hans-Peter Sloot – for building the first cut of RAC support into a previous Snapper version
  • Ronald Rood – for adding the customizations for SQL Developer support into a previous Snapper version

Note that the manual before/after snapshots don’t work inside SQL Developer right now, will try to fix this later …

Posted in Oracle | 32 Comments

Manual “before” and “after” snapshot support in Snapper v4

Snapper used to require access to DBMS_LOCK, so it could sleep for X seconds between the “before” and “after” performance data snapshots. Now it is possible to get away without using DBMS_LOCK. Instead you will run Snapper twice, once for taking the “before” snapshot, then run your workload and then run Snapper again for taking the “after” snapshot and print the output.

So, the usual way of running snapper is this:

@snapper4 all 5 1 152

This would take 1 5-second performance snapshot SID 152′s V$ views.

With Snapper4 you can use the old way or just add BEGIN or END keywords to the 1st parameter, like this:

@snapper4 all,begin 5 1 152

Whenever using the begin/end syntax, the 2nd and 3rd parameter specifying snapshot duration and count (“5 1″) are just ignored, but they still have to be there right now as snapper requires 4 parameters when called (I’m thinking about changing this in V4.1).

So, let’s try it:

SQL> @snapper4 all,begin 5 1 152
Sampling SID 152 with interval 5 seconds, taking 1 snapshots...
SP2-0552: Bind variable "SNAPPER" not declared.

Oops! There’s some error!

In this version of Snapper, if you want to use the before/after manual snapshot mode, you need to run one more command in your session before starting (you need to do it only once when logging in, not every time you run snapper in your session):

SQL> VAR SNAPPER REFCURSOR
SQL>
SQL> @snapper4 all,begin 5 1 152
Sampling SID 152 with interval 5 seconds, taking 1 snapshots...

-- Session Snapper v4.02 BETA - by Tanel Poder ( http://blog.tanelpoder.com ) - Enjoy the Most Advanced Oracle Troubleshooting Script on the Planet! :)

Taking BEGIN sample ...

PL/SQL procedure successfully completed.

The REFCURSOR trick is how I manage to store the “before” snapshot data even though the Snapper anonymous block exits and gives you back control. See the code for more details. I’ve figured out a way to do this in the snapper script itself (it’s not trivial), but haven’t gotten to this yet. Coming soon :)

Anyway, now as we have taken the before snapshot, we can run whatever commands in the session(s) we are measuring (it doesn’t have to be a single session, you can use the usual snapper syntax to pick which sessions to measure).

Once you are ready to take the end snapshot and print the deltas, run this:

SQL> @snapper4 all,end 5 1 152
Sampling SID 152 with interval 5 seconds, taking 1 snapshots...

-- Session Snapper v4.02 BETA - by Tanel Poder ( http://blog.tanelpoder.com ) - Enjoy the Most Advanced Oracle Troubleshooting Script on the Planet! :)

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    SID, USERNAME  , TYPE, STATISTIC                                                 ,         DELTA, HDELTA/SEC,    %TIME, GRAPH       , NUM_WAITS,  WAITS/SEC,   AVERAGES
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    152, SYS       , STAT, opened cursors cumulative                                 ,         13690,      1.35k,         ,             ,          ,           ,         .7 per execution
    152, SYS       , STAT, opened cursors current                                    ,             1,         .1,         ,             ,          ,           ,  .05*10^-3 per execution
    152, SYS       , STAT, user commits                                              ,            18,       1.78,         ,             ,          ,           ,  .93*10^-3 per execution
    152, SYS       , STAT, recursive calls                                           ,        234274,     23.11k,         ,             ,          ,           ,      12.05 per execution
    152, SYS       , STAT, recursive cpu usage                                       ,           848,      83.64,         ,             ,          ,           ,        .04 per execution
    152, SYS       , STAT, session logical reads                                     ,        267980,     26.43k,         ,             ,          ,           ,      13.78 per execution

... some output removed for brevity ...

    152, SYS       , WAIT, control file sequential read                              ,          1078,   106.33us,      .0%, [          ],       130,      12.82,     8.29us average wait
    152, SYS       , WAIT, log file sync                                             ,         10969,     1.08ms,      .1%, [          ],        15,       1.48,   731.27us average wait
    152, SYS       , WAIT, db file sequential read                                   ,        444277,    43.82ms,     4.4%, [W         ],      1900,      187.4,   233.83us average wait
    152, SYS       , WAIT, db file scattered read                                    ,        812924,    80.18ms,     8.0%, [W         ],      1077,     106.23,    754.8us average wait
    152, SYS       , WAIT, direct path write temp                                    ,          1758,    173.4us,      .0%, [          ],        20,       1.97,     87.9us average wait

--  End of Stats snap 1, end=2013-02-18 23:46:25, seconds=10.1

--  End of ASH snap 1, end=, seconds=, samples_taken=0

And that’s it. Note that there’s no ASH output as if snapper returns control back to you between before & after snapshots, it can not sample the V$SESSION view for printing you the “ASH” top.

With the exception of ASH output and ignoring the snapshot_time and snapshot_count parameter values, Snapper allows you to use the full range of syntax for picking which stats do display (gather,sinclude,winclude,tinclude options) and which sessions to measure (sid@inst, user=joe@2, program=sqlplus%@* and even subqueries). You may want to sample some other session(s) this way or you may want to take before and after snapshots of your own session where snapper runs, there are no restrictions, just remember that when taking snapshots of your own session, there’s some snapper’s own activity which shows up in the session’s metrics. If you want a clean snapshot like autotrace is doing, then use one session for running snapper and another session for running the actual test cases.

Note that I just wrote a blog entry about Snapper v4 RAC GV$ view syntax too.

Enjoy! (and remember that Snapper v4 is still beta and likely has some issues)

Posted in Oracle | Leave a comment

Snapper v4.02 and the Snapper launch party video

I have fixed most of the bugs that showed up during the Snapper launch party session and uploaded the new version (v4) of Snapper here:

I have also uploaded the launch party hacking session video to enkitec.tv:

I have not updated the snapper documentation yet, but here are the main improvements:

  1. RAC support (query remote instance’s sessions performance data via GV$)
  2. Manually taken “before” and “after” snapshot support. No need to use DBMS_LOCK to sleep for a certain number of seconds between snapshots
  3. Useful averages in the right side of the output (e.g. average bytes sent/received per roundtrip etc). I will keep adding new averages over the coming weeks.

RAC support

I have kept the syntax backwards-compatible, so when you run Snapper like this …

@snapper ash 5 1 user=tanel

… it will still query only local instance’s sessions.

But if you want to measure all tanel’s sessions in let’s say instance #2 only, you can just add @2 to the session filter condition:

@snapper ash 5 1 user=tanel@2

Or if you want all instances then use @*:

@snapper ash 5 1 user=tanel@*

If you want to measure a single session with sid 1234 in instance #2, you can do it in two ways:

@snapper ash 5 1 1234@2

Or use the inst_id,sid tuple syntax:

@snapper ash 5 1 (2,1234)

Note that the brackets are important, as otherwise snapper thinks that you want to measure SID 2 and SID 1234 in the local instance. You can pass multiple inst_id,SID tuples to the snapper though:

@snapper ash 5 1 (2,1234),(4,5678),(3,999)

Snapper still supports the qc= syntax too, to list all sessions started by a PX query coordinator session:

@snapper ash 5 1 qc=1234@*

Note that the qc= syntax is a bit broken/inconsistent right now – what the qc=1234@* means right now is that Snapper looks up the SID 1234 from every instance and lists slave sessions from all instances. If you use qc=1234@2 for example, then Snapper would only look for QC in the instance #2 and any slaves that are also in instance #2 (but with inter-instance parallelism you may have slaves in other instances too). I will fix it in a near-future version.

I will write about the other major features in separate blog entries, but if you want to learn about it now, then grab a beer (or coffe) and watch the latest enkitec.tv show (it’s a bit over 2 hours :-))

P.S. Snapper 4 is still beta! Some queries (like the qc=) one may take a while to run, running dynamic queries with subqueries against GV$ views efficiently is tricky business. This is why I have kept the old snapper v3 (which is more stable and pretty efficient) in the standard (old) snapper address.

Update: If you want to see what Snapper is capable of (it’s about the old version 3), check this document about using Snapper for Oracle performance measurement.

Posted in Oracle, Productivity | Leave a comment

Drilling Deep Into Exadata Performance with ASH, SQL Monitoring and Exadata Snapper – slides and a hacking session!

The promised hacking session about Exadata performance troubleshooting will happen on Thursday 21st February 9am-11am PST (2 hours). I will show a number of examples and demos where the Exadata Snapper shows what was going on in storage cells when running Smart Scans or just doing IO.

Register here (free stuff!):

Slides:

I will post the scripts on the day of this session.

Note that you should read through the slides before the hacking session – as I’m not going to spend much time on presenting slides, as the name says – it’s a hacking session, so 95% demos. I will show a couple of important slides which explain the essence of Exadata Snapper (check slide 17, 21, 22), but otherwise it’s all demos.

This hacking session will also give a little preview of what’s going to happen in my new Advanced Exadata Performance seminars – see below ;-) As far as I know, there is no other seminar in the world (well at least available to public) where you can learn so deep knowledge about Exadata performance troubleshooting, optimization and internals – all arranged around a systematic approach for dealing with Exadata performance issues.

New Seminar – Advanced Exadata Performance: Troubleshooting and Optimization

This advanced seminar is meant for DBAs and developers who already have at least the basic experience of working with Exadata and have seen some of the common performance problems that may occur in production or performance tests. In this class we will dive deep into details of how Exadata works internally and then how to systematically troubleshoot performance problems caused either by application, database, storage cell software or Exadata hardware issues. The demos and troubleshooting examples are shown on Enkitec’s in-house Exadata X2-2 and V2 machines and we will use both standard Oracle tools and advanced Exadata troubleshooting scripts written by Tanel.

What students will learn:

  • Understand how Exadata features, the storage cell servers and the DB-cell communication work internally
  • How to use database, OS and cell metrics for systematically troubleshooting Exadata performance issues, from basic to complex problems involving multiple components, bugs and hardware issues
  • How to identify which workloads and SQL statements are not Exadata-friendly and what to do about it
  • How to design for optimal Exadata performance and avoid design and coding pitfalls

What this class is not:

  • This class is not an introductory “what is Exadata” class, but an advanced seminar about getting the most out of your Exadatas and resolving even complex performance problems systematically and fast
  • This class is not an introductory SQL or database tuning class, but focuses on troubleshooting only Exadata-related performance issues of your workloads
  • This class is not a general Exadata infrastructure troubleshooting class (like hardware, networking and Grid Infrastructure configuration problems), instead we focus entirely on performance

 

Day 1 – Exadata Internals, Systematic Troubleshooting and Smart Scans

    • Understanding the low-level internals of Exadata data flow all the way from cell disks to the database nodes – and the physics behind its performance
    • Using basic performance tools like V$SQLSTATS, ASH and SQL Monitoring views on Exadata
    • Troubleshooting why Smart Scans (and direct path reads) do not get used
    • Measuring the efficiency of Smart Scans and how much work gets offloaded to the storage cells
    • Troubleshooting advanced Smart Scan performance problems, like passthrough mode and selective fallback to block IO – with Exadata Snapper
    • Using CELLCLI and cell-level metrics where database & session metrics aren’t enough
Day 2 – Parallel Execution, Hybrid Columnar Compression and OLTP

    • PX issues, including the In-Memory Parallel Execution feature
    • Optimizing hash joins for efficient bloom filtering in the cells
    • Troubleshooting Exadata Hybrid Columnar Compression issues
    • Troubleshooting long commit and disk write latency issues on Exadata
    • Troubleshooting flash caching issues and optimizing for tiered caching
    • Learn to use the Exadata scripts, tools and OS commands Tanel regularly uses for Exadata performance troubleshooting
    • Q&A and free-form demo session and/or reviewing Exadata performance case studies

If you like this – check out my Oracle training and seminars page, the first sessions happen already in May! (And there’s even one on-site in Enkitec office in Irving, TX).

Posted in Exadata, Oracle | Leave a comment

Troubleshooting high CPU usage with poor-man’s stack profiler – in a one-liner!

Here’s an example of a quick’n’dirty way of profiling stack traces on your command line. This is an example from Solaris (but the script should work on Linux too plus other Unixes with minor modifications).

I created a problem case below, based on a case I once troubleshooted at a client site. Note that they had set optimizer_mode = FIRST_ROWS in their database and the optimized came up with a very inefficient execution plan for the select from DBA_LOCK_INTERNAL view below:

SQL> ALTER SESSION SET optimizer_mode = FIRST_ROWS;

Session altered.

SQL> SET TIMING ON

SQL> SELECT * FROM dba_lock_internal;
...
... the output data stripped ...
...

927 rows selected.

Elapsed: 00:23:27.14

It took over 23 minutes to return 927 rows from DBA_LOCK_INTERNAL!

I ran Snapper to see where the time is spent then:

SQL> @snapper4 all 5 1 222
Sampling SID 222 with interval 5 seconds, taking 1 snapshots...

-- Session Snapper v4.00 BETA - by Tanel Poder ( http://blog.tanelpoder.com ) - Enjoy the Most Advanced Oracle Troubleshooting Script on the Planet! :)

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    SID, USERNAME  , TYPE, STATISTIC                                                 ,         DELTA, HDELTA/SEC,    %TIME, GRAPH       , NUM_WAITS,  WAITS/SEC,   AVERAGES
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    222, SYS       , STAT, non-idle wait count                                       ,             1,        .19,         ,             ,          ,           ,          ~ per execution
    222, SYS       , TIME, DB CPU                                                    ,       6028084,      1.17s,   117.2%, [@@@@@@@@@@],          ,           ,
    222, SYS       , TIME, sql execute elapsed time                                  ,       6032677,      1.17s,   117.3%, [##########],          ,           ,
    222, SYS       , TIME, DB time                                                   ,       6032677,      1.17s,   117.3%, [##########],          ,           ,          ~ unaccounted time
    222, SYS       , WAIT, library cache: mutex X                                    ,             6,     1.17us,      .0%, [          ],         1,        .19,        6us average wait

--  End of Stats snap 1, end=2013-02-14 21:30:45, seconds=5.1

---------------------------------------------------------------------------------------------
Active% | SQL_ID          | SQL_CHILD | EVENT                               | WAIT_CLASS
---------------------------------------------------------------------------------------------
   100% | c884zcqpv9y5h   | 0         | ON CPU                              | ON CPU

--  End of ASH snap 1, end=2013-02-14 21:30:45, seconds=5, samples_taken=50

The query is apparently 100% on CPU with no significant waits. Normally I would expect some other metrics to pop up in snapper output in such high-CPU cases, like thousands of session logical reads per second, sorts (rows) showing millions of rows sorted per second or parse count (hard) number in hundreds or thousands per second. These are the “usual suspects”.

But this time none of these additional metrics were incremented by the session. So it’s time to systematically drill down by other means.

Continue reading

Posted in Oracle | 12 Comments

Session Snapper v4 – The World’s Most Advanced Oracle Troubleshooting Script!

Snapper V4 Webinar

Snapper V4 is out!

The major new features include:

  1. RAC support – ability to query stats from remote instances
  2. Manual Before and After snapshot support – no need to use DBMS_LOCK sleeps anymore
  3. Show useful averages and ratios *in addition to* raw metrics for faster troubleshooting
  4. And more! :)

Snapper is still a free-to-use tool and it still does NOT require any object creation nor changes in your databases for use. Now even DBMS_LOCK access isn’t needed, although it’s still useful for convenience.

I have posted the hacking session video to here:

Oh, wait, there’s more! You’re going to love this, especially if you love reverse engineering code puzzles! ;-) 

New Kind of Snapper Logo

Update: If you’re too lazy to run the code below, here’s what the logo looks like :-)

snapper_logo

I have also done major updates to Snapper logo which you should check out. If you have ever looked into the Snapper source code you know the old logo looks like this:

--    +-----=====O=== Welcome to The Session Snapper! (Yes, you are looking at a cheap ASCII
--   /                                                 imitation of a fish and a fishing rod.
--   |                                                 Nevertheless the PL/SQL code below the
--   |                                                 fish itself should be helpful for quick
--   |                                                 catching of relevant Oracle performance
--   |                                                 information.
--   |                                                 So I wish you happy... um... snapping?
--   |                                                )
--   |                       ......
--   |                       iittii,,....
--   ¿                    iiffffjjjjtttt,,
--                ..;;ttffLLLLffLLLLLLffjjtt;;..
--            ..ttLLGGGGGGLLffLLLLLLLLLLLLLLffjjii,,                        ..ii,,
--            ffGGffLLLLLLjjttjjjjjjjjffLLLLLLLLLLjjii..                ..iijj;;....
--          ffGGLLiittjjttttttiittttttttttffLLLLLLGGffii..            ;;LLLLii;;;;..
--        ffEEGGffiittiittttttttttiiiiiiiittjjjjffLLGGLLii..      iiLLLLLLttiiii,,
--      ;;ffDDLLiiiitt,,ttttttttttttiiiiiiiijjjjjjffLLLLffttiiiiffLLGGLLjjtttt;;..
--    ..ttttjjiitt,,iiiiiittttttttjjjjttttttttjjjjttttjjttttjjjjffLLDDGGLLttii..
--    iittiitttt,   ;;iittttttttjjjjjjjjjjttjjjjjjffffffjjjjjjjjjjLLDDGGLLtt;;..
--    jjjjttttii:. ..iiiiffLLGGLLLLLLLLffffffLLLLLLLLLLLLLLLLffffffLLLLLLfftt,,
--    iittttii,,;;,,ttiiiiLLLLffffffjjffffLLLLLLLLffLLffjjttttttttttjjjjffjjii..
--    ,,iiiiiiiiiittttttiiiiiiiiiijjffffLLLLLLLLffLLffttttttii;;;;iiiitttttttt;;..
--    ..iittttttffffttttiiiiiiiiiittttffjjjjffffffffttiittii::    ....,,;;iittii;;
--      ..;;iittttttttttttttttiiiiiittttttttttjjjjjjtttttt;;              ..;;ii;;..
--          ..;;;;iittttttjjttiittttttttttttttjjttttttttii..                  ....
--                ....;;;;ttjjttttiiiiii;;;;;;iittttiiii..
--                      ..;;ttttii;;....      ..;;;;....
--                        ..iiii;;..
--                          ..;;,,
--                            ....
--

As the Snapper script allows you to easily take performance snapshots of Oracle sessions and fish for solutions to your performance problems from the ocean of possible root causes, it makes sense to use a fish in the logo somewhere. I have also added sea and a cruise boat (symbolizing longer vacations thanks to Snapper!) into the new Snapper script file and and have pasted it below.

Scroll down and check it out – I’m sure you see it’s special. Does it ring a bell – what should you do with this special kind of a logo now? Of course – copy&paste it into sqlplus and run the logo! ;-) Note that you should run this on Oracle 11.2 or later.

There are actually two different logos below. The first one (cruise boat) is a low-tech one, suitable for use with Windows CMD.EXE and GUI tools like SQL*Developer.

Low-tech version:

 
SET                                                                     LiNeSiZe 10000         PageSize 5000
SET                                                                                   TrimOut ON           Head Off
                                                                  /**/
                                                                SELECT
                                                               LISTAGG
                                                             (SUBSTR(s,
                                        MOD(r2-1,115)+1,1)) WITHIN GROUP(ORDER BY r) 
                                     FROM(SELECT rownum r,RPAD(RPAD(RPAD(RPAD(LPAD(LPAD
                               ('(',x,'('),20,' '),x+20,')'),40,' '),40+y,' '),50,'W')s FROM(
                                  SELECT CEIL(ABS(SIN(rownum/30)*13))x,CEIL(ABS(COS(rownum/9)*5))y
                                     FROM dual CONNECT BY LEVEL<=115)), (SELECT rownum r2 FROM (dual)
--~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                                          CONNECT BY LEVEL <= 50) GROUP BY MOD(r2-1, 115)    +1;

Hi-tech version:

This version requires a modern xterm emulator like Putty (on Windows) or iTerm2 or Terminator on Mac OSX, if your current terminal ends up showing garbage. Go ahead, make your terminal window wide enough, paste and enjoy :-)

First change the sqlplus settings if you didn’t do so already:

SET LINESIZE 10000 TRIMOUT ON TRIMSPOOL OFF HEAD OFF PAGESIZE 5000

And now paste this in:

 
                                                      SELECT                      
                                                CHR(POWER(3,3))             
                                          ||'[38;5;0m'||LISTAGG(SUBSTR(
                                    REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(s ,')',
                            CHR(POWER(3,3))||'[48;5;'||TRIM(TO_CHAR((POWER(2,4)+CEIL((SIN(r2/5)                                                 +
                    SIN(ROWNUM/150)+1)*3-1)+CEIL((-SIN(ROWNUM/150)+1)*3-1)*6+CEIL((COS(ROWNUM/150)+1)*3-1)                                 *6*6),
              '099'))||'m)'),'(',CHR(POWER(3,3))||'[48;5;'||TRIM(TO_CHAR((POWER(2,4)+CEIL((SIN(r2/5)+SIN(ROWNUM/150)                   +1)*3-1)+
       CEIL((-SIN(ROWNUM   /150)+1)*3-1)*6+CEIL((COS(ROWNUM/150)+1)*3-1)*6*6),'099'))||'m('),' ',CHR(POWER(3,3))||'[48;5;'||TRIM      (TO_CHAR(
(POWER(2,4)+CEIL(4)+CEIL(4)*6+CEIL(5-(r2/8))*6*6),'099'))||'m '),'.',CHR(POWER(3,3))||'[48;5;'||TRIM(TO_CHAR((POWER(2,4)+CEIL(5)+CEIL(4)*6+CEIL
       (3)*6*6),'099'))||'m~'),'W',CHR(POWER(3,3))||'[48;5;'||TRIM(TO_CHAR((POWER(2,4)+CEIL(4+ABS(SIN(r/25)))+CEIL(2-ABS(SIN          (r/5))*2)
           *6+CEIL(2-ABS(SIN(r/5))*2)*6*6),'099'))||'m '),(MOD(r2-1,115)+1)*12+1,12))WITHIN GROUP(ORDER BY r)||CHR(POWER                  (3,3))                        
                ||'[0m' v FROM(SELECT ROWNUM r, RPAD(RPAD(RPAD(RPAD(LPAD(LPAD('(',x,'('),20,' '), x+20,')'),40,' '),                        40+y,
                      '.'),50,'W')s FROM (SELECT CEIL(ABS(SIN(ROWNUM/30)*13))x,CEIL(ABS(COS(ROWNUM                                              *
                            1/DBMS_RANDOM.VALUE(7,7.5))*7))y FROM dual CONNECT BY LEVEL<=115)),
                                    (SELECT ROWNUM r2 FROM dual CONNECT BY LEVEL<=50)
                                               GROUP BY MOD(r2-1,115)+1;

 

So, how do you like the new Snapper logo? :)

Feel free to leave a comment – and if you like this, please share it – all the billions of people of the world deserve to know how cool sqlplus is! :-)

Also, check out my upcoming Advanced Oracle Troubleshooting and Advanced Oracle SQL Tuning seminars and see you all on Wednesday at Snapper V4 Launch Party ;-)

Posted in Cool stuff, Oracle | 41 Comments

Sqlplus is my second home: Part 7 – Downloading files via sqlplus :-)

Some years ago I wrote about how sqlplus allows you to run sqlplus scripts directly from HTTP and FTP locations instead of the local filesystem. By the way, I didn’t even notice – my blog is over 5 years old already! :)

I a recent email thread Marco Gralike just showed the simplest way I to open a HTTP URL and download + list its contents in a CLOB datatype. It’s the HTTPURITYPE and its getCLOB (and getBLOB) methods.

So, basically if you want to download and save a (text) file like a script without a browser for some reason – and assuming that your database server can make outgoing HTTP connections, you can use this trick:

SQL> SET LINES 1000 TRIMSPOOL ON TRIMOUT ON PAGESIZE 0 LONG 99999999 LONGCHUNKSIZE 99999999 FEEDBACK OFF HEAD OFF
SQL> SPOOL $HOME/snapper_download.sql
SQL> 
SQL> select httpuritype('http://blog.tanelpoder.com/files/scripts/snapper.sql').getCLOB() from dual;
--------------------------------------------------------------------------------
--
-- File name:   snapper.sql
-- Purpose:     An easy to use Oracle session-level performance measurement tool
--              which does NOT require any database changes nor creation of any
--              database objects!
--
--              This is very useful for ad-hoc performance diagnosis in environments
--              with restrictive change management processes, where creating

... lots of output snipped ...

SQL> SPOOL OFF

Now you have to open the spool file (snapper_download.sql) and remove any garbage (sqlplus commands) from the top of the spool file and there you go:

SQL> @snapper_download all 5 1 3
Sampling SID 3 with interval 5 seconds, taking 1 snapshots...

-- Session Snapper v3.62 by Tanel Poder ( http://blog.tanelpoder.com )

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    SID, USERNAME  , TYPE, STATISTIC                                                 ,         DELTA, HDELTA/SEC,    %TIME, GRAPH       , NUM_WAITS,  WAITS/SEC,   AVERAGES
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
      3, SYS       , STAT, opened cursors cumulative                                 ,            43,        8.6,         ,             ,          ,           ,
      3, SYS       , STAT, recursive calls                                           ,            44,        8.8,         ,             ,          ,           ,
      3, SYS       , STAT, recursive cpu usage                                       ,             9,        1.8,         ,             ,          ,           ,
      3, SYS       , STAT, CPU used by this session                                  ,             8,        1.6,         ,             ,          ,           ,
      3, SYS       , STAT, in call idle wait time                                    ,           412,       82.4,         ,             ,          ,           ,
      3, SYS       , STAT, session uga memory                                        ,         65512,      13.1k,         ,             ,          ,           ,
      3, SYS       , STAT, session pga memory                                        ,        327680,     65.54k,         ,             ,          ,           ,
      3, SYS       , STAT, calls to get snapshot scn: kcmgss                         ,            43,        8.6,         ,             ,          ,           ,
      3, SYS       , STAT, session cursor cache hits                                 ,            43,        8.6,         ,             ,          ,           ,
      3, SYS       , STAT, session cursor cache count                                ,             1,         .2,         ,             ,          ,           ,
      3, SYS       , STAT, workarea executions - optimal                             ,            48,        9.6,         ,             ,          ,           ,
      3, SYS       , STAT, execute count                                             ,            43,        8.6,         ,             ,          ,           ,
      3, SYS       , STAT, sorts (memory)                                            ,            45,          9,         ,             ,          ,           ,
      3, SYS       , STAT, sorts (rows)                                              ,          1867,      373.4,         ,             ,          ,           ,
      3, SYS       , TIME, PL/SQL execution elapsed time                             ,          5398,     1.08ms,      .1%, [          ],          ,           ,
      3, SYS       , TIME, DB CPU                                                    ,        435933,    87.19ms,     8.7%, [@         ],          ,           ,
      3, SYS       , TIME, sql execute elapsed time                                  ,        129913,    25.98ms,     2.6%, [#         ],          ,           ,
      3, SYS       , TIME, DB time                                                   ,        449166,    89.83ms,     9.0%, [#         ],          ,           ,
      3, SYS       , WAIT, PL/SQL lock timer                                         ,       4127427,   825.49ms,    82.5%, [WWWWWWWWW ],        41,        8.2,   100.67ms

--  End of Stats snap 1, end=2013-01-24 22:45:53, seconds=5

    

--  End of ASH snap 1, end=2013-01-24 22:45:53, seconds=5, samples_taken=41
 

Of course this technique is more for fun and is not reliable for binary files (unless you use something like UTL_ENCODE.BASE64_ENCODE or UUENCODE first). But still pretty fun :) I was happy to see that accessing a HTTP resource within the database has become so simple that it’s just a one-liner (as opposed to all the UTL_HTTP code lines needed for HTTP access).

Have fun (and thanks Marco for the tip :-)

Posted in Cool stuff, Oracle | 5 Comments

Japanese translation of some of my blog articles

Ryota Watabe has translated some of my articles into Japanese and intends to translate more in the future (thanks for that! :)

I have added a little “translations” section to the left sidebar of the blog (scroll down). Or just go to this link:

You might also want to follow Watabe-san on Twitter to get notified of any new Japanese translations that show up.

Posted in Oracle | Leave a comment

What the heck is the INTERNAL_FUNCTION in execution plan predicate section?

Sometimes you see something like this in an execution plan:

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |       |       |     2 (100)|          |
|*  1 |  TABLE ACCESS FULL| T    |     1 |    22 |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("B"=INTERNAL_FUNCTION("A"))

There’s quite a little information available about what the INTERNAL_FUNCTION really is and why does it show up, thus this blog entry.

There’s actually no function called INTERNAL_FUNCTION in Oracle and this is (partially) confirmed also by querying the V$SQLFN_METADATA view:

SQL> @sqlfn %internal%

no rows selected

The common understanding, coming from Oracle documentation is that the INTERNAL_FUNCTION is some sort of a special function doing (implicit) datatype conversion. This is only partially true and not the whole truth – but let’s examine the datatype conversion first and proceed to the explanation and other examples later on.

Implicit datatype conversion

I’m creating a table which stores a date in VARCHAR2 datatype in column A and a date in DATE datatype in column B:

SQL> CREATE TABLE t(a VARCHAR2(20), b DATE);

Table created.

SQL> @desc t
           Name                            Null?    Type
           ------------------------------- -------- ----------------------------
    1      A                                        VARCHAR2(20)
    2      B                                        DATE

SQL> INSERT INTO t VALUES( TO_CHAR(sysdate), sysdate) ;

1 row created.

Now let’s run a simple select query and see its execution plan:

SQL> SELECT * FROM t WHERE a = b;

A                    B
-------------------- -----------------
20130116 17:41:49    20130116 17:41:49

SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR);

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |       |       |     2 (100)|          |
|*  1 |  TABLE ACCESS FULL| T    |     1 |    21 |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("B"=INTERNAL_FUNCTION("A"))

What happens here is that Oracle is forced to (implicitly) add a datatype conversion function around column A, to be able to physically compare two different datatypes. Internally Oracle is not running a comparison "WHERE a = b" anymore, but rather something like "WHERE TO_DATE(a) = b". This is one of the reasons why the INTERNAL_FUNCTION shows up – the code generating the human-readable execution plan from the actual “binary” execution plan is not able to convert the internal opcode to a corresponding human-readable function name, thus shows a default “INTERNAL_FUNCTION” string there instead.

This is Oracle 10g+ behavior, in 9i and before, Oracle just printed nothing as that function name, example output from 9i is below:

1 - filter(("DUAL"."DUMMY")=:TEST_VAR)

Do you see that there are seemingly unnecessary brackets around “DUAL”.”DUMMY” above? Why not just "DUAL.DUMMY"=:TEST_VAR, is this ("DUAL"."DUMMY"):=TEST_VAR usage some typo? It’s actually not a typo, before Oracle 10g you had to be careful to spot any such “unneccesary” brackets as they really indicated there was some function called, something like F(“DUAL”.”DUMMY”), only that the “F” was never printed. In Oracle 10g onwards, a generic “INTERNAL_FUNCTION” is printed instead of nothing in such cases – at least we know there’s some function applied to the column/variable. You need to look into the code (and possibly session-level NLS_ settings) to figure out what function it could logically be (like a function casting TIMESTAMP into a DATE when comparing these datatypes).

Un-unparseable Complex Expressions

Ok, the datatype conversion reason is already covered in Oracle docs and blogosphere. But there’s more.

Check out this example with an OR clause:

SQL> CREATE TABLE t AS SELECT * FROM dba_objects;

Table created.

SQL> SELECT COUNT(*) FROM t WHERE owner = 'SYS' OR owner = 'SYSTEM';

  COUNT(*)
----------
     32272

SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR);

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |       |       |   293 (100)|          |
|   1 |  SORT AGGREGATE    |      |     1 |    17 |            |          |
|*  2 |   TABLE ACCESS FULL| T    | 36652 |   608K|   293   (1)| 00:00:04 |
---------------------------------------------------------------------------

   2 - filter(("OWNER"='SYS' OR "OWNER"='SYSTEM'))

All is fine so far – the DBMS_XPLAN.DISPLAY_CURSOR function, which reads the plans directly from library cache is able to explain the predicate correctly.

Now let’s make the predicate a little more complex, I will add another OR to this predicate, but it’s against a different column:

SQL> SELECT COUNT(*) FROM t WHERE owner = 'SYS' OR owner = 'SYSTEM' OR object_id = 123;
...
SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR);

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |       |       |   293 (100)|          |
|   1 |  SORT AGGREGATE    |      |     1 |    30 |            |          |
|*  2 |   TABLE ACCESS FULL| T    | 36652 |  1073K|   293   (1)| 00:00:04 |
---------------------------------------------------------------------------

   2 - filter((INTERNAL_FUNCTION("OWNER") OR "OBJECT_ID"=123))

Now, suddenly the two conditions on the OWNER table are gone and replaced by an INTERNAL_FUNCTION?

Let’s try an IN operator instead of the OR, but wait, we are checking for values in two different columns (so we can’t put them all into the same IN clause):

SQL> SELECT COUNT(*) FROM t WHERE owner IN ('SYS','SYSTEM','SCOTT') AND object_type = 'TABLE';

  COUNT(*)
----------
      1178

SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR);

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |       |       |   293 (100)|          |
|   1 |  SORT AGGREGATE    |      |     1 |    28 |            |          |
|*  2 |   TABLE ACCESS FULL| T    |   374 | 10472 |   293   (1)| 00:00:04 |
---------------------------------------------------------------------------

   2 - filter((INTERNAL_FUNCTION("OWNER") AND "OBJECT_TYPE"='TABLE'))

Still no luck.
Let’s try a logically simpler operation, with just searching for 3 values in the same column:

SQL> SELECT COUNT(*) FROM t WHERE owner IN ('SYS','SYSTEM','SCOTT');

  COUNT(*)
----------
     32278

SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR);

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |       |       |   293 (100)|          |
|   1 |  SORT AGGREGATE    |      |     1 |    17 |            |          |
|*  2 |   TABLE ACCESS FULL| T    | 31960 |   530K|   293   (1)| 00:00:04 |
---------------------------------------------------------------------------

   2 - filter(("OWNER"='SCOTT' OR "OWNER"='SYS' OR "OWNER"='SYSTEM'))

Works! Oracle has transformed (or at least shows it in the explanation) this IN predicate to a bunch of OR-ed conditions (against the same column).

You might already see what’s going on with the earlier examples – DBMS_XPLAN.DISPLAY_CURSOR is not able to explain “complex” composite predicates applied in a single execution plan step, which include multiple different columns AND at least one of the columns has multiple values to check for (like an in-list or OR-ed predicates).

Where does DISPLAY_CURSOR get its data from and an explanation

DBMS_XPLAN.DISPLAY_CURSOR gets its plan data from V$SQL_PLAN, the predicate section comes from ACCESS_PREDICATES and FILTER_PREDICATES columns. But when I query the V$SQL_PLAN directly, I still see the same problem:

SQL> SELECT id, filter_predicates FROM v$sql_plan WHERE sql_id = 'gcqgrmtna9g1u';

        ID FILTER_PREDICATES
---------- ------------------------------------------------------------
         0
         1
         2 (INTERNAL_FUNCTION("OWNER") AND "OBJECT_TYPE"='TABLE')

And you may have noticed that there are brackets() around the raw ORed conditions above too, which in 9i meant that there was an “unexplained” internal function in the “binary” execution plan around the reported predicates, but in this case (as 10g+ supports the internal_function naming), blank function names shouldn’t occur… Not really sure why there about this, but that’s too low level little detail for this post.

The V$SQL_PLAN view itself accesses the actual “binary” child cursor in library cache (after taking appropriate latches/pins/mutexes) and UNPARSES it. Why such term – well isn’t parsing something that takes a human readable input and translates it into computer-understandable “binary” format. Thus unparsing is the opposite – V$SQL_PLAN accesses the cursor’s “binary” execution plan memory structure and translates it to human-readable execution plan output. There’s even a parameter controlling this V$SQL_PLAN behavior, if it’s set to false, the ACCESS_PREDICATES and FILTER_PREDICATES columns will be empty there:

SQL> @pd unparse
Show all parameters and session values from x$ksppi/x$ksppcv...

NAME                             VALUE                                      DESCRIPTION
----------------------------- --------- -----------------------------------------------
_cursor_plan_unparse_enabled      TRUE          enables/disables using unparse to build
                                                                  projection/predicates

By the way, why do I keep saying “binary” execution plan and in double quotes? It’s because I want to emphasize that the real execution plan that Oracle executes is not in the text form like we see on the screen, the text is just generated for humans, for troubleshooting reasons. The execution plan is not a real executable binary (as in oracle.exe) either, it’s not directly fed to the CPUs for execution. The physical execution plan in the library cache child cursor is a bunch of opcodes, object_ids and pointers for defining the hierarchy and order of rowsource execution. It’s the SQL execution engine, which then loops through these opcodes, decodes them and knows what to do (which rowsource function to call) next.

So, as we’ve seen above, some predicates with complex AND/OR conditions chained together are displayed as INTERNAL_FUNCTION() by DBMS_XPLAN.DISPLAY_CURSOR and V$SQL_PLAN as they are unable to decode (unparse) the execution plan info fully.

Using the good old EXPLAIN PLAN

There’s some good news though! The good old EXPLAIN PLAN command can unparse (some of) these complex predicates properly! As the EXPLAIN PLAN operation parses the given SQL again, in a special, more instrumented way, it has more information at hand apparently (and it uses more memory too). Or it could just be that whoever wrote V$SQL_PLAN, didn’t write the piece of code for unparsing more complex predicates :)

Check this output, where I’m using the explain plan command instead of just running the SQL:

SQL> EXPLAIN PLAN FOR 
     SELECT COUNT(*) FROM t WHERE owner IN ('SYS','SYSTEM','SCOTT') AND object_type = 'TABLE';

Explained.

SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY);

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |    28 |   293   (1)| 00:00:04 |
|   1 |  SORT AGGREGATE    |      |     1 |    28 |            |          |
|*  2 |   TABLE ACCESS FULL| T    |   374 | 10472 |   293   (1)| 00:00:04 |
---------------------------------------------------------------------------

   2 - filter(("OWNER"='SCOTT' OR "OWNER"='SYS' OR "OWNER"='SYSTEM')
              AND "OBJECT_TYPE"='TABLE')

It’s a miracle! The INTERNAL_FUNCTION is gone and all the predicate values are shown correctly. EXPLAIN PLAN command was very useful here.

So, while I usually don’t use the EXPLAIN PLAN command as explain plan can lie to you, then whenever I see an INTERNAL_FUNCTION in the DISPLAY_CURSOR/V$SQL_PLAN/SQL Monitor output, I run an explain plan command for the same query in hope of quickly finding out what the predicates in there really are.

Ok, it’s 2:30am here yet again – off to sleep! :)

Posted in Oracle | 6 Comments