SystemTap is production supported in Redhat EL5.4

If you don’t know what SystemTap is – it’s the Linux world’s attempt to build Solaris DTrace style safe dynamic instrumentation into Linux kernel.

I’m not going into religious discussions which one is better here, I have used both SystemTap and DTrace successfully for diagnosing low level issues inside OS kernel, so both are good enough for me :)

The problem with SystemTap though has been that it’s not production quality, it’s rather been a technology preview.

But with RHEL 5.4 part of it has changed, Redhat says following in this article:

SystemTap is no longer a technology preview, and now has production support. Red Hat recommendeds that users run scripts on development machines before deployment in production environments. Since SystemTap is an optional diagnostic tool, users can easily stop using it in the event of a problem. Options such as -g for Guru mode, and -D* allow users to disable several security checks. Scripts using these options may not be supported.

Red Hat plans to fix problems in SystemTap, or the Linux kernel, as they arise in connection with new scripts. In some cases, a fix may include extending the blacklist for known areas of the Linux kernel that are unsafe to probe. All scripts that use probes targeting blacklisted areas will need to be revised.

SystemTap users are advised to upgrade to this version.


So, at least officially, SystemTap is now supported by Redhat. Well at least as long as you’re not doing crazy stuff with the -g option, the geek mode ;-)

I would still be very careful before using any SystemTap in production, in fact, in critial environments I wouldn’t run it at all, unless there is no other option (and you’re prepared to get a crash + kernel panic).

However the significance of this note is that SystemTap is accepted for production by Redhat in principle. So now it’s just matter of time until it gets stable and widely used enough to be as practical and useful as DTrace is on Solaris.

Tracing Oracle SQL plan execution with DTrace

SQL is a declarative language – in other words you just declare what needs to be done and Oracle takes care of the part how it’s done.

However there’s nothing declarative about the actual SQL execution when it happens. SQL plan is just a tree of kernel rowsource functions executed in a specific order (defined in child cursor’s sql area).

The root of SQL plan is where the fetch function (opifch2 for example) gets the rows for passing back to the user (or PL/SQL engine).
The branches are operations like joins, union etc, which don’t have access to any data themselves and can just call other functions recursively to get rows
The leaves are the execution plan operations without any children, they call data layer to acces actual datablocks.

The first execution plan line (with lowest ID) without any children is the one where data access starts, that’s the place where first logical IO happens.

A commented exec plan is below:

SQL> select count(*) from all_users;

  COUNT(*)
----------
        35

SQL> @x

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
SQL_ID  b2zqhgr5tzbpk, child number 0
-------------------------------------
select count(*) from all_users

Plan hash value: 3268326079

--------------------------------------------------------------------------
| Id  | Operation            | Name  | E-Rows |  OMem |  1Mem | Used-Mem |
--------------------------------------------------------------------------
|   1 |  SORT AGGREGATE      |       |      1 |       |       |          |       <- ROOT
|*  2 |   HASH JOIN          |       |     35 |  1517K|  1517K|  637K (0)| <- BRANCH
|*  3 |    HASH JOIN         |       |     35 |  1593K|  1593K| 1361K (0)| <- BRANCH
|   4 |     TABLE ACCESS FULL| TS$   |     13 |       |       |          |   <- LEAF
|*  5 |     TABLE ACCESS FULL| USER$ |     35 |       |       |          |    <- LEAF
|   6 |    TABLE ACCESS FULL | TS$   |     13 |       |       |          |    <- LEAF
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("U"."TEMPTS#"="TTS"."TS#")
   3 - access("U"."DATATS#"="DTS"."TS#")
   5 - filter("U"."TYPE#"=1)

I have written about how to map execution plan lines back to kernel functions here:

http://blog.tanelpoder.com/2008/06/15/advanced-oracle-troubleshooting-guide-part-6-understanding-oracle-execution-plans-with-os_explain/

The above approach is based on pstack, mostly useful for demonstrations but has helped me to diagnose one spinning condition in an execution plan once (that’s the whole reason I came up with this technique).

As I said above, SQL execution just means that the kernel’s rowsource functions are executed in a loop with order and hierarchy specified in the child cursor’s execution plan.

So, if you want to learn and really understand the sequence of SQL plan execution – it’s dead easy with DTrace. Here’s what happens when you fetch from the above execution plan:

[Read more...]

Oracle hidden costs revealed, Part2 – Using DTrace to find why writes in SYSTEM tablespace are slower than in others

I have written two posts in one, about a performance issue with writes in system tablespace and introduction of a little DTrace stack sampling script.

Have you noticed that DML on tables residing in SYSTEM tablespace is slower than tables in other tablespaces?

Here’s an example, I’ll create two similar tables, one in USERS tablespace, other in SYSTEM, and inset into the first one (Oracle 10.2.0.3 on Solaris x64):

SQL> create table t1(a int) tablespace USERS;

Table created.

SQL> create table t2(a int) tablespace SYSTEM;

Table created.

SQL> exec for i in 1..100000 loop insert into t1 values(i); end loop;

PL/SQL procedure successfully completed.

Elapsed: 00:00:03.09

Insert into table in USERS tablespace took 3 seconds.

Ok, let’s commit and flush dirty buffers that they wouldn’t get on the way of next insert.

SQL> commit;

Commit complete.

Elapsed: 00:00:00.02

SQL> alter system checkpoint; -- checkpointing to flush dirty buffers from previous inserts

System altered.

Elapsed: 00:00:01.34
SQL>

And now to the insert into the SYSTEM tablespace table:

SQL> exec for i in 1..100000 loop insert into t2 values(i); end loop;

PL/SQL procedure successfully completed.

Elapsed: 00:00:08.98
SQL>

What?! The same insert took 3 times longer, almost 9 seconds?

Fine! Let’s troubleshoot it!

[Read more...]