Tanel Poder’s blog: Core IT for Geeks and Pros

May 1, 2009

I have been troubleshooting since I was a kid! :)

Filed under: Cool stuff, Troubleshooting — Tanel Poder @ 8:02 am

People often ask how come I know so much about Oracle (and some Unix) internals and how do I have such a passion for such things.
Also, another thing which you can notice is that for some reason a large amount of hardcore computer hackers, low level troubleshooters and various internals experts happen to come from Russia.

And I can answer why this is so :)

A little bit of history: I was born in Estonia and lived over 20 of my first years there. From 1940 to 1991 Estonia was occupied by Soviet Union (excluding few years during World War II when we were occupied by German forces).

So, as I was born in the end of seventies, most of my childhood we were still part of Soviet Union. And that applies to all russians and other nationalities in that empire too.

  • …This meant that all we could buy was Soviet Union products.
  • …And all these products usually had quality issues.
  • …Which meant that whenever I got some toy as present for christmas or my birthday – I first had to FIX it and only then I could play with it! :)
  • …That usually applied to toy cars, constructors and whatever non-trivial pieces of toys. Some things needed minor tweaking, some required complete disassembly and reassembly (sometimes you needed to improvise and come up with the missing pieces yourself :)
  • …Later on it also applied to bikes etc. The bike could be working in principle but you still needed to tune it to get it running smoothly (and to make sure that it wouldn’t break when you went downhill the max speed you could get).

So, there’s always something good in everything bad – I had to learn troubleshooting since I was a little kid! ;)

Bookmark and Share

April 24, 2009

Tracing Oracle SQL plan execution with DTrace

Filed under: Cool stuff, DTrace, Internals, Oracle, Troubleshooting — Tanel Poder @ 1:11 am

SQL is a declarative language – in other words you just declare what needs to be done and Oracle takes care of the part how it’s done.

However there’s nothing declarative about the actual SQL execution when it happens. SQL plan is just a tree of kernel rowsource functions executed in a specific order (defined in child cursor’s sql area).

The root of SQL plan is where the fetch function (opifch2 for example) gets the rows for passing back to the user (or PL/SQL engine).
The branches are operations like joins, union etc, which don’t have access to any data themselves and can just call other functions recursively to get rows
The leaves are the execution plan operations without any children, they call data layer to acces actual datablocks.

The first execution plan line (with lowest ID) without any children is the one where data access starts, that’s the place where first logical IO happens.

A commented exec plan is below:

SQL> select count(*) from all_users;

  COUNT(*)
----------
        35

SQL> @x

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
SQL_ID  b2zqhgr5tzbpk, child number 0
-------------------------------------
select count(*) from all_users

Plan hash value: 3268326079

--------------------------------------------------------------------------
| Id  | Operation            | Name  | E-Rows |  OMem |  1Mem | Used-Mem |
--------------------------------------------------------------------------
|   1 |  SORT AGGREGATE      |       |      1 |       |       |          |       <- ROOT
|*  2 |   HASH JOIN          |       |     35 |  1517K|  1517K|  637K (0)| <- BRANCH
|*  3 |    HASH JOIN         |       |     35 |  1593K|  1593K| 1361K (0)| <- BRANCH
|   4 |     TABLE ACCESS FULL| TS$   |     13 |       |       |          |   <- LEAF
|*  5 |     TABLE ACCESS FULL| USER$ |     35 |       |       |          |    <- LEAF
|   6 |    TABLE ACCESS FULL | TS$   |     13 |       |       |          |    <- LEAF
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("U"."TEMPTS#"="TTS"."TS#")
   3 - access("U"."DATATS#"="DTS"."TS#")
   5 - filter("U"."TYPE#"=1)

I have written about how to map execution plan lines back to kernel functions here:

http://blog.tanelpoder.com/2008/06/15/advanced-oracle-troubleshooting-guide-part-6-understanding-oracle-execution-plans-with-os_explain/

The above approach is based on pstack, mostly useful for demonstrations but has helped me to diagnose one spinning condition in an execution plan once (that’s the whole reason I came up with this technique).

As I said above, SQL execution just means that the kernel’s rowsource functions are executed in a loop with order and hierarchy specified in the child cursor’s execution plan.

So, if you want to learn and really understand the sequence of SQL plan execution – it’s dead easy with DTrace. Here’s what happens when you fetch from the above execution plan:

(more…)

Bookmark and Share

April 23, 2009

Seminar in Edinburgh next week – discount to UKOUG members (and a little comment about Snapper)

Filed under: Oracle, Training, Troubleshooting — Tanel Poder @ 12:08 am

Just wanted to let you know that there are still places left for my troubleshooting seminar in Edinburgh next week and there’s a 20% discount for UKOUG members.

The event takes place 27-28 April – Check the PiSec seminar page if you’re interested!

Btw I will have whole 10 free days after the Edinburgh event so expect some interesting and useful blog entries then (instead of the marketing stuff what you’ve had to tolerate for last few times ;)

I have moved Snapper v2.01 (out of beta now) to http://www.tanelpoder.com/files/scripts/snapper.sql , it’s layout & output columns are configurable now btw (search for “config” in the script).
If anyone still misses the old version of snapper (v1.07), you can download it from here

Bookmark and Share

April 20, 2009

Oracle buys Sun – hooray!

Filed under: Cool stuff, Oracle — Tanel Poder @ 9:21 am

More details here:

http://www.oracle.com/sun/index.html

Not that I care too much from the business aspect, but I sure hope this means that Solaris will become a viable OS to run Oracle on x64 platform!!! Btw the OS itself is already more than good enough – but it just should be properly supported by Oracle!

Bookmark and Share

Seminar feedback and pictures from Singapore

Filed under: Oracle, Training — Tanel Poder @ 5:54 am

I’ve been busy with a series of seminars, so haven’t managed to blog much…

Karl Arao has posted some feedback from my Singapore seminar and also the pictures we took with attendees. By the way, he has other good Oracle stuff in his blog so check it out here.

Bookmark and Share

April 5, 2009

Done with my seminar in Netherlands – here’s some feedback and couple beta new versions of my scripts

Filed under: Cool stuff, Oracle, Performance, Training, Troubleshooting — Tanel Poder @ 5:09 pm

The Miracle Masterclass I delivered last week in Netherlands was awesome, I had lots of attendees and lots of good questions and feedback. As usual, I ran out of time in the end though. I will make this up with some articles about the stuff I didn’t have time to explain :)

I used Snapper v2.0 beta during my demos there and also an improved version of LatchProfX which can give some additional context information about reasons why latches were taken. As they are beta, not final versions, I have uploaded them into a temporary location:

Update: These scripts are out of beta now so here are their permanent locations:

Also, the Sesspack (session level statspack) v0.05 files are in my TPT script set, under tools/sesspack_v0.05 directory in the zip file:

Toine van Beckhoven attended the seminar as well and wrote a review of it, if you’re still hesitating if you should attend as well, read his review first :)

Bookmark and Share

March 27, 2009

Read OS environment variables using DBMS_SYSTEM.GET_ENV()

Filed under: Cool stuff, Oracle, PL/SQL — Tanel Poder @ 12:28 pm

Check out this article by Vikram Das about how to read OS environment variables using PL/SQL.

I did not know that!

Bookmark and Share

March 26, 2009

I’m an Oracle ACE Director now :)

Filed under: Cool stuff, Oracle — Tanel Poder @ 7:46 pm

Many people have asked me that how come I’m not an Oracle ACE yet. From this week I am an Oracle ACE Director.

Here’s a link to my profile.

Its pretty big honor to be recognized by Oracle Corp at such high level.

I started working with Oracle software in 1997 and got really interested in that stuff after realizing how powerful the database was and how complex some internal details could be. Lots of opportunities to learn and learn I did!

I have been an Oracle Certified Master DBA for almost seven years now ( I took the exam in 2002 ). This is the highest certificate of technical skill Oracle gives out.

I’m also an OakTable Network member since 2004. In my eyes this is the highest recognition by industry peers an Oracle specialist can get. I’m very honored to be part of the team, not only because of the kick-ass knowledge everybody there has, but every member I’ve personally met is actually a cool person!

So, I think Oracle ACE status nicely completes the previous two “credentials” I had (and worked hard for!!!), OCM is the official technical skill recognition by Oracle, OakTable membership is the industry peer recognition and now the ACE director is Oracle’s recognition for my contributions so far.

I’d like to say thanks to Porus Havewala who recommended me for the ACE director program. Check out his blog if you’re interested in Enterprise Manager & such, he’s a Grid Control expert ( http://enterprise-manager.blogspot.com )

Ok, enough of ego-tripping for this year, back to work. I have a mutex contention troubleshooting article to finish ;-)

Bookmark and Share

March 21, 2009

Oracle 11g: Reading alert log via SQL

Filed under: Oracle, Oracle 11g, Troubleshooting — Tanel Poder @ 8:38 pm

Oracle has done some major improvements in the diagnosability infrastructure in version 11g. Here’s one little detail.

Before Oracle 11g it is possible to access the alert log via SQL using an external table or a pipelined function which in turn uses utl_file.
After reading the text you need to parse it to extract the information you need from there.

Starting from 11g Oracle does all this work for you. There is a fixed table X$DBGALERTEXT, when you query it, Oracle reads the log.xml from alert directory (which contains all the data what alert.log does), parses it and returns the details back as rows:

SQL> select message_text from X$DBGALERTEXT where rownum <= 20;

MESSAGE_TEXT
-----------------------------------------------------------------------------------------------------------------
Starting ORACLE instance (normal)
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
Shared memory segment for instance monitoring created
Picked latch-free SCN scheme 2
Using LOG_ARCHIVE_DEST_10 parameter default value as USE_DB_RECOVERY_FILE_DEST
Autotune of undo retention is turned on.
IMODE=BR
ILAT =18
LICENSE_MAX_USERS = 0
SYS auditing is disabled
Starting up ORACLE RDBMS Version: 11.1.0.7.0.
Using parameter settings in client-side pfile /u01/app/oracle/admin/LIN11G/pfile/init.ora on machine linux03
System parameters with non-default values:
  processes                = 150
  memory_target            = 404M
  control_files            = "/u01/oradata/LIN11G/control01.ctl"
  control_files            = "/u01/oradata/LIN11G/control02.ctl"
  control_files            = "/u01/oradata/LIN11G/control03.ctl"
  db_block_size            = 8192

20 rows selected.

This is the text representation, but you can get individual details from other columns as listed below:

(more…)

Bookmark and Share

March 20, 2009

Another LatchProfX use case

Filed under: Internals, Oracle, Performance, Tools, Troubleshooting — Tanel Poder @ 2:35 pm

Riyaj Shamsudeen wrote an excellent article about systematic latch contention troubleshooting.

Especially if the latch contention problem is ongoing, looking into system wide stats (like v$latch.sleep columns) is not the best idea in busy systems. This may sometimes lead you to fixing the wrong problem.

This is because sometimes the latch contention is not caused by some system wide inefficiency but rather by one or few sessions.

The right approach would be to measure the following things:

  1. Which latch (and exact child latches) the problem session is waiting for (query v$session_wait, ASH, sql_trace)
  2. Why is the problem session trying to get that latch so often (query v$sesstat counters for problem session and check execution plan if only single/few statements executed)
  3. Why this latch is busy: who’s holding it (query v$latchholder, LatchProf)
  4. Why is that someone holding the latch so much (query v$sesstat counters for that session or run LatchProfX)

Riyaj used this approach and successfully found out the troublemaker causing heavy library cache latch contention. He used my LatchProfX tool for part of the diagnosis (and I’m very happy to see that my advanced oracle troubleshooting tools find real world use in hands of other troubleshooters/tuners too!).

Here’s an excerpt from latchprofx output (from a little test case I put together), showing which session is holding which library cache latch how much, hold mode (shared/exclusive) and also the actual reason (function name) why the latch is held:

SQL> @latchprofx sid,name,laddr,ksllwnam,ksllwlbl,hmode 159 % 100000

-- LatchProfX 1.08 by Tanel Poder ( http://www.tanelpoder.com )

 SID NAME           LADDR            KSLLWNAM  KSLLWLBL  HMODE        Held  Gets  Held %  Held ms Avg hold ms
---- -------------- ---------------- --------- --------- ---------- ------ ----- ------- -------- -----------
 159 library cache  00000003A93513E0 kglic     child     exclusive   10044  9314   10.04  301.320        .032
 159 library cache  00000003A9351340 kglic     child     exclusive    9895  8458    9.90  296.850        .035
 159 library cache  00000003A9351660 kglic     child     exclusive    9761  8440    9.76  292.830        .035
 159 library cache  00000003A9351700 kglic     child     exclusive    9737  8924    9.74  292.110        .033
 159 library cache  00000003A9351480 kglic     child     exclusive    8999  7765    9.00  269.970        .035
 159 library cache  00000003A93515C0 kglic     child     exclusive    8553  7832    8.55  256.590        .033
 159 library cache  00000003A9351520 kglic     child     exclusive    6852  6828    6.85  205.560        .030

7 rows selected.

kglic means Kernel Generic Library cache Iterate Chain (AFAIK), it’s the function which is executed when you access most X$KGL tables. And this is where Riyaj got the evidence that the latch contention comes from inefficient scanning of library cache, caused by a session running queries against V$SQL_PLAN views with a bad plan.

So, check out Riyaj’s articele, my LatchProfX script, my Latch & Mutex contention troubleshooting conference slides and if you want more, then my Advanced Oracle Troubleshooting Seminar. I recently added Dallas, Denver, Salt Lake City and Hong-Kong to the list of cities where you can attend it this year!

Bookmark and Share
« Newer PostsOlder Posts »

Powered by WordPress