Oracle hidden costs revealed, Part2 – Using DTrace to find why writes in SYSTEM tablespace are slower than in others

I have written two posts in one, about a performance issue with writes in system tablespace and introduction of a little DTrace stack sampling script.

Have you noticed that DML on tables residing in SYSTEM tablespace is slower than tables in other tablespaces?

Here’s an example, I’ll create two similar tables, one in USERS tablespace, other in SYSTEM, and inset into the first one (Oracle 10.2.0.3 on Solaris x64):

SQL> create table t1(a int) tablespace USERS;

Table created.

SQL> create table t2(a int) tablespace SYSTEM;

Table created.

SQL> exec for i in 1..100000 loop insert into t1 values(i); end loop;

PL/SQL procedure successfully completed.

Elapsed: 00:00:03.09

Insert into table in USERS tablespace took 3 seconds.

Ok, let’s commit and flush dirty buffers that they wouldn’t get on the way of next insert.

SQL> commit;

Commit complete.

Elapsed: 00:00:00.02

SQL> alter system checkpoint; -- checkpointing to flush dirty buffers from previous inserts

System altered.

Elapsed: 00:00:01.34
SQL>

And now to the insert into the SYSTEM tablespace table:

SQL> exec for i in 1..100000 loop insert into t2 values(i); end loop;

PL/SQL procedure successfully completed.

Elapsed: 00:00:08.98
SQL>

What?! The same insert took 3 times longer, almost 9 seconds?

Fine! Let’s troubleshoot it!

First (as usual) I start from my Snapper for getting a performance snapshot of the session. I ran both inserts again and used Snapper from another session to monitor the insert activity (for brevity I didn’t monitor all stat counters, but used the sinclude filter to leave only various “gets” in from V$SESSTAT).

This is the insert to USERS tablespace table:

SQL> @snapper out,gather=tsw,sinclude=gets 10 1 148

-- Session Snapper v1.07 by Tanel Poder ( http://www.tanelpoder.com )

---------------------------------------------------------------------------------------------------------------------------------------------
HEAD,     SID, SNAPSHOT START   ,  SECONDS, TYPE, STATISTIC                               ,         DELTA,  DELTA/SEC,     HDELTA, HDELTA/SEC
---------------------------------------------------------------------------------------------------------------------------------------------
DATA,     148, 20080901 17:20:53,       10, STAT, db block gets                           ,        103448,      10345,    103.45k,     10.34k
DATA,     148, 20080901 17:20:53,       10, STAT, db block gets from cache                ,        103448,      10345,    103.45k,     10.34k
DATA,     148, 20080901 17:20:53,       10, STAT, consistent gets                         ,           495,         49,        495,       49.5
DATA,     148, 20080901 17:20:53,       10, STAT, consistent gets from cache              ,           495,         49,        495,       49.5
DATA,     148, 20080901 17:20:53,       10, STAT, consistent gets - examination           ,           219,         22,        219,       21.9
DATA,     148, 20080901 17:20:53,       10, STAT, no work - consistent read gets          ,            89,          9,         89,        8.9
DATA,     148, 20080901 17:20:53,       10, STAT, rollbacks only - consistent read gets   ,            18,          2,         18,        1.8
DATA,     148, 20080901 17:20:53,       10, STAT, cluster key scan block gets             ,            65,          6,         65,        6.5
DATA,     148, 20080901 17:20:53,       10, TIME, hard parse elapsed time                 ,          4090,        409,     4.09ms,      409us
DATA,     148, 20080901 17:20:53,       10, TIME, PL/SQL compilation elapsed time         ,          3139,        314,     3.14ms,    313.9us
DATA,     148, 20080901 17:20:53,       10, TIME, parse time elapsed                      ,          5131,        513,     5.13ms,    513.1us
DATA,     148, 20080901 17:20:53,       10, TIME, PL/SQL execution elapsed time           ,        174288,      17429,   174.29ms,    17.43ms
DATA,     148, 20080901 17:20:53,       10, TIME, DB CPU                                  ,       3108026,     310803,      3.11s,    310.8ms
DATA,     148, 20080901 17:20:53,       10, TIME, sql execute elapsed time                ,       3104607,     310461,       3.1s,   310.46ms
DATA,     148, 20080901 17:20:53,       10, TIME, hard parse (sharing criteria) elapsed ti,          4110,        411,     4.11ms,      411us
DATA,     148, 20080901 17:20:53,       10, TIME, DB time                                 ,       3108026,     310803,      3.11s,    310.8ms
DATA,     148, 20080901 17:20:53,       10, WAIT, log file switch completion              ,         71104,       7110,     71.1ms,     7.11ms
DATA,     148, 20080901 17:20:53,       10, WAIT, SQL*Net message to client               ,             3,          0,        3us,       .3us
DATA,     148, 20080901 17:20:53,       10, WAIT, SQL*Net message from client             ,       6684030,     668403,      6.68s,    668.4ms
--  End of snap 1

PL/SQL procedure successfully completed.

You see, the DB CPU equals to DB time (in other words all the time spent servicing my request in database was spent on CPU, thus there were no waits, blocks involved). And the insert still took roughly 3 seconds.

This is the insert to SYSTEM tablespace table:

SQL> @snapper out,gather=tsw,sinclude=gets 10 1 148

-- Session Snapper v1.07 by Tanel Poder ( http://www.tanelpoder.com )

---------------------------------------------------------------------------------------------------------------------------------------------
HEAD,     SID, SNAPSHOT START   ,  SECONDS, TYPE, STATISTIC                               ,         DELTA,  DELTA/SEC,     HDELTA, HDELTA/SEC
---------------------------------------------------------------------------------------------------------------------------------------------
DATA,     148, 20080901 17:21:23,       11, STAT, db block gets                           ,        102616,       9329,    102.62k,      9.33k
DATA,     148, 20080901 17:21:23,       11, STAT, db block gets from cache                ,        102616,       9329,    102.62k,      9.33k
DATA,     148, 20080901 17:21:23,       11, STAT, consistent gets                         ,           384,         35,        384,      34.91
DATA,     148, 20080901 17:21:23,       11, STAT, consistent gets from cache              ,           384,         35,        384,      34.91
DATA,     148, 20080901 17:21:23,       11, STAT, consistent gets - examination           ,           218,         20,        218,      19.82
DATA,     148, 20080901 17:21:23,       11, STAT, no work - consistent read gets          ,            94,          9,         94,       8.55
DATA,     148, 20080901 17:21:23,       11, STAT, rollbacks only - consistent read gets   ,            17,          2,         17,       1.55
DATA,     148, 20080901 17:21:23,       11, STAT, cluster key scan block gets             ,            65,          6,         65,       5.91
DATA,     148, 20080901 17:21:23,       11, TIME, hard parse elapsed time                 ,          2350,        214,     2.35ms,   213.64us
DATA,     148, 20080901 17:21:23,       11, TIME, PL/SQL compilation elapsed time         ,          1751,        159,     1.75ms,   159.18us
DATA,     148, 20080901 17:21:23,       11, TIME, parse time elapsed                      ,          3359,        305,     3.36ms,   305.36us
DATA,     148, 20080901 17:21:23,       11, TIME, PL/SQL execution elapsed time           ,        172153,      15650,   172.15ms,    15.65ms
DATA,     148, 20080901 17:21:23,       11, TIME, DB CPU                                  ,       8751163,     795560,      8.75s,   795.56ms
DATA,     148, 20080901 17:21:23,       11, TIME, sql execute elapsed time                ,       8749139,     795376,      8.75s,   795.38ms
DATA,     148, 20080901 17:21:23,       11, TIME, hard parse (sharing criteria) elapsed ti,          2367,        215,     2.37ms,   215.18us
DATA,     148, 20080901 17:21:23,       11, TIME, DB time                                 ,       8751163,     795560,      8.75s,   795.56ms
DATA,     148, 20080901 17:21:23,       11, WAIT, SQL*Net message to client               ,             3,          0,        3us,      .27us
DATA,     148, 20080901 17:21:23,       11, WAIT, SQL*Net message from client             ,      -1442849,    -131168,     -1.44s,  -131.17ms
--  End of snap 1

PL/SQL procedure successfully completed.

You see the DB Time is still equal to DB CPU, thus all response was spent on CPU. Moreover, the time model stats tell me that all the elapsed time was spent executing SQL statements (which makes sense as I’m running INSERT SQL in a tight loop).

So, for some reason the INSERT SQL statements take almost 3 times longer when inserting to SYSTEM tablespace.

In cases where there is no waiting going on (things like Oracle Wait Interface / WaitProf would not help here), but only difference in CPU time, I normally look into V$SESSTAT counter differences first. But as seen from above, there is no major difference in counters such logical IOs (db block gets, consistent gets) to justify 3x increase in CPU usage (also there was no major difference in other counters which I have excluded for brevity).

So, the obvious next step is to sample the process stack during execution of both statements and see where the difference is. In my previous posts I’ve used pstack for getting a few stack samples out of a process and a pstack loop could be used here as well.

But for demo purposes (and convenience) I will use my DTrace stack sampling script today.

It’s dead easy to sample user process and kernel stacks using DTrace in Solaris 10 (and MacOS X and FreeBSD too). However the DTrace stack aggregator also takes account of the current program counter offset inside functions when distinguishing between stacks (this means that CPU program counter location main() + 0×10 will be different from main() + 0×12). But I don’t want such granularity, I want to see where the execution was at function level, not instruction level.

So I wrote my script to strip off the PC function offsets from DTrace output and order the stacks the way that most frequently sampled stack “branch” is displayed last, in bottom of the output. So, whatever stack and function you see outputted last, is probably the “troublemaker”. I still recommend scrolling up and reviewing few other stacks too.

So, I ran the insert to USERS tablespace table again and ran my dstackprof script on the server process with SPID 859.

This is from the “well behaving” process, so normally you should just compare this stack profile to the “bad behaving” process stack profile and see if you can spot any major differences in these. I will explain few of the functions on the go.

# ./dstackprof.sh 859

DStackProf v1.02 by Tanel Poder ( http://www.tanelpoder.com )
Sampling pid 859 for 5 seconds with stack depth of 100 frames...

147 samples with stack below
__________________
kcbchg1_main
kcbchg1                 <- buffer change function in buffer cache layer
ktuchg                    <- undo layer generate change
ktbchg2                  <- block (header) change
kdtchg                    <- data layer generate change
kdtwrp
kdtInsRow              <- data layer row insert function
insrowFastPath
insdrvFastPath
inscovexe
insExecStmtExecIniEngine
insexe                   <- OPI insert execution entry point
opiexe                   <- OPI execute (executing my recursive INSERT INTO statement)
opipls
opiodr
rpidrus                  <- all the RPI functions belong to recursive program interface (which allows making recursive calls in db)
skgmstack
rpidru                    <- these RPI functions (and few OPI ones above) are what PL/SQL context switches physically are
rpiswu2
rpidrv
psddr0                  <- most of the functions starting with "p" are PL/SQL execution ones
psdnal
pevm_EXECC
pfrinstr_EXECC
pfrrun_no_tool
pfrrun
plsql_run
peicnt                    <- PL/SQL execution engine entry point
kkxexe                  <- cursor to PL/SQL execution interface
opiexe                   <- OPI call execution dispatcher
kpoal8
opiodr
ttcpip
opitsk
opiino
opiodr
opidrv
sou2o
opimai_real
main
0xe54ffc

193 samples with stack below
__________________
libc.so.1`times         <-- during these 193 samples the execution was in opiexe (note that
opiexe                   -- it's a recursive OPI call execution (as there's the RPI layer below in the
opipls                    -- stack and there's another opiexe below. The lower opiexe is for executing
opiodr                   -- my PL/SQL anonymous block and upper is for the recursive INSERT statement
rpidrus
skgmstack
rpidru
rpiswu2
rpidrv
psddr0
psdnal
pevm_EXECC
pfrinstr_EXECC
pfrrun_no_tool
pfrrun
plsql_run
peicnt
kkxexe
opiexe
kpoal8
opiodr
ttcpip
opitsk
opiino
opiodr
opidrv
sou2o
opimai_real
main
0xe54ffc

3147 Total samples captured

Anyway, the two stacks which I have copied here don’t show anything obvious. And also, see that there were total 3147 samples taken when the process was on CPU. But the execution point was in the “top” stack during only 193 samples, which is around 6% of all samples.

As the above stacks come from a well-behaving process, let’s leave their deeper analysis and just compare them to the bad-behaving process. So, I ran the SYSTEM tablespace insert again (you still remember, we were troubleshooting why inserts into system tablespace were slower, right ;)

The top stack in profile was following:

# ./dstackprof.sh 859

DStackProf v1.02 by Tanel Poder ( http://www.tanelpoder.com )
Sampling pid 859 for 5 seconds with stack depth of 100 frames...

[...some output snipped...]


917 samples with stack below
__________________
libc.so.1`qsort
kd4_entries_overlap
kdb4chk
kd4chk
kcbchk
kco_blkchk    <-- what the heck is this? the name suggests that it's block checking of some kind
kcoapl
kcbapl
kcrfw_redo_gen             <-- also note this function, looks like the block change function (kcbchg) has called
kcbchg1_main                   <-- a function to generate redo on its behalf
kcbchg1
ktuchg
ktbchg2
kdtchg
kdtwrp
kdtInsRow
insrowFastPath
insdrvFastPath
inscovexe
insExecStmtExecIniEngine
insexe
opiexe
opipls
opiodr
rpidrus
skgmstack
rpidru
rpiswu2
rpidrv
psddr0
psdnal
pevm_EXECC
pfrinstr_EXECC
pfrrun_no_tool
pfrrun
plsql_run
peicnt
kkxexe
opiexe
kpoal8
opiodr
ttcpip
opitsk
opiino
opiodr
opidrv
sou2o
opimai_real
main
0xe54ffc

6202 Total samples captured

As most of the top stacks from the bad behaving process were spending their time somewhere in kco_blkchk() function, it rang a bell – database block checking!

However it was weird as the db_block_checking parameter was set false in my instance:

SQL> show parameter db_block_checking

NAME                                 TYPE        VALUE
------------------------------------ ----------- -----
db_block_checking                    string      FALSE
SQL>

After a little digging, I found this parameter:

NAME                          VALUE     DESCRIPTION
----------------------------- --------- -------------------------------------------------------------
_db_always_check_system_ts    TRUE      Always perform block check and checksum for System tablespace

This parameter makes Oracle (since version 8.1.6) do block structure checking in SYSTEM tablespace even if db_block_checking itself is set to FALSE. Considering the criticality of data dictionary objects, it’s a good idea.

So, now you might say: so what that there’s a performance issue, you shouldn’t keep your application tables in SYSTEM tablespace anyway!

Well, I have three words for you: AUD$, FGA_LOG$, DEF$_AQCALL :)

If you rely on auditing functionality heavily and especially if you use advanced replication, having the crucial tables (and associated LOB segments) in SYSTEM tablespace can mean quite a hit to your CPUs and system throughput. In which cases you probably would want to explore the option of moving these tables out of system tablespace.

Warning: before moving anything, consult Metalink or Oracle Support as only few strictly defined ways for moving these object are supported. See notes 731908.1 for Audit and 1037317.6 for Advanced Replication.

So, the key point of this article is that I could have started guessing or setting various events or undocumented parameters for probing Oracle here and there, for figuring out what’s going on. But I used a simple systematic approach instead. When Oracle Wait Interface and V$SESSTAT counters didn’t help, looked into process stack next. And this gave me the vital clue (kdo_blkchk) that there’s block checking going on. And if I hadn’t figured out the cause myself, I would have had some solid evidence for support.

Note that you don’t have to use DTrace for this kind of stack profiling (even though it is very convenient). On non-Solaris 10 OS’es I would use whatever safe stack dump mechanism is available (pstack for example) and post-process the results. I probably wouldn’t take thousands of samples though, but probably 100 or even less.

Also, if you want to get a complete overview of the process execution profile, you would need to do much more complex aggregations, as you’d need to aggregate “child” branches of stack samples with their “parents” (currently the samples of a backtrace “a->b->c” are not accounted in “a->b” for example). But for my case (just checking if there’s any major difference between execution profiles) this approach was good enough.

By the way (if you haven’t given up yet :), check the differences in LatchProfX output. The top is from well-behaving process and bottom one is the bad one:

SQL> @latchprofx sid,name,ksllwnam,ksllwlbl 128 % 1000000

-- LatchProfX 1.06 by Tanel Poder ( http://www.tanelpoder.com )

       SID NAME                                KSLLWNAM                                 KSLLWLBL                   Held       Gets  Held %     Held ms Avg hold ms
---------- ----------------------------------- ---------------------------------------- -------------------- ---------- ---------- ------- ----------- -----------
       128 redo copy                           kcrfw_redo_gen: nowait                   latch                     49399      48937    4.94     674.790        .014
       128 cache buffers chains                kcbchg1: kslbegin: bufs not pinned                                  4574       2320     .46      62.481        .027
       128 cache buffers chains                kcbchg1: kslbegin: call CR func                                     2101       1445     .21      28.700        .020
       128 cache buffers chains                kcbgcur_2                                                           1323       1060     .13      18.072        .017
       128 cache buffers chains                kcbnew: new latch again                                              269        243     .03       3.675        .015
       128 simulator lru latch                 kcbs_simulate: simulate set              setid                       170        170     .02       2.322        .014
       128 redo allocation                     kcrfw_redo_gen: redo allocation 1        strand #                     68         13     .01        .929        .071
       128 cache buffers chains                kcbrls_2                                                              29         29     .00        .396        .014
       128 cache buffers chains                kcbgtcr: fast path                                                    25         25     .00        .342        .014
       128 cache buffers chains                kcbgcur: fast path (shr)                                               8          8     .00        .109        .014
       128 cache buffers lru chain             kcbzgws_1                                                              7          7     .00        .096        .014

SQL> @latchprofx sid,name,ksllwnam,ksllwlbl 128 % 1000000

-- LatchProfX 1.06 by Tanel Poder ( http://www.tanelpoder.com )

       SID NAME                                KSLLWNAM                                 KSLLWLBL                   Held       Gets  Held %     Held ms Avg hold ms
---------- ----------------------------------- ---------------------------------------- -------------------- ---------- ---------- ------- ----------- -----------
       128 redo copy                           kcrfw_redo_gen: nowait                   latch                    356465      79839   35.65    6088.422        .076
       128 cache buffers chains                kcbchg1: kslbegin: bufs not pinned                                  4704       2233     .47      80.344        .036
       128 cache buffers chains                kcbchg1: kslbegin: call CR func                                     3542       1997     .35      60.497        .030
       128 cache buffers chains                kcbgcur_2                                                           1929       1448     .19      32.947        .023
       128 redo allocation                     kcrfw_redo_gen: redo allocation 1        strand #                   1018         34     .10      17.387        .511
       128 shared pool simulator               kglsim_unpin_simhp: fast path                                        713        713     .07      12.178        .017
       128 simulator lru latch                 kcbs_simulate: simulate set              setid                       266        266     .03       4.543        .017
       128 cache buffers chains                kcbnew: new latch again                                              182        171     .02       3.109        .018
       128 cache buffers chains                kcbgtcr: fast path (cr pin)                                           29         29     .00        .495        .017
       128 cache buffers chains                kcbgtcr: fast path                                                    15         15     .00        .256        .017

See, how the redo copy latches were held for much longer (both total time and average time of individual hold) when the block checking was enabled. This indicates that in addition to keeping the buffer pinned, also a redo copy latch is held during block checking (if not using an IMU transaction). If you use high-throughput advanced replication, think that CPU usage is too high for work done and can’t get rid of the redo copy latch contention, this may be a reason.

This entry was posted in Oracle and tagged , , , , . Bookmark the permalink.

26 Responses to Oracle hidden costs revealed, Part2 – Using DTrace to find why writes in SYSTEM tablespace are slower than in others

  1. Noons says:

    awesome, Tanel.
    Thanks a lot for that.
    It certainly puts a nice perspective on all the arguments about “block checking does not impact performance” and other similar nonsense.

    I am also shocked you used dba1.0 methods to dig and investigate these problems! :)

  2. tanelp says:

    Thanks Noons for the comment (and for reading through this long post! :)

    Hehe, I would use DBA2.0 methods if they weren’t useless for troubleshooting cases where Oracle’s internal instrumentation fails ;)

  3. Christian COMMARMOND says:

    A very good reason to repeat my customers: don(t put data in SYSTEM tablespace.
    And from now, thank to you, I’ll also check for audit and replication impacts too.

    Thanks again.

  4. Ricardo says:

    Wonderful lesson and explanation Tanel. Certainly you are one of the genius like Jonathan Lewis.

    Please, post new articles about Oracle internals and debugging methods.

    Thank you very very much Tanel!

  5. Senthil Murugan says:

    Hi Tanel,

    I’m fresh out of college and i know basics of oracle architecture and i’m having basic unix knowledge . I’m interested in becoming oracle DBA.What book you suggest for me to read to have indepth knowledge about oracle DBA (…No OTN documentations :( its too bulky and dry :()

    Thanks :)

  6. Selva says:

    Tanel nice posting, in-depth analysis. I like it. BTW do we have any AIX utility to dump kernel stack or trace active kernel routines as you do with dtrace on Sun.

    Thanks

    Selva

  7. tanelp says:

    Hi Selva,

    AIX should have procstack for user process stack dumping and they have their kernel trace for tracing some kernel stuff.

    AIX kernel tracing philosophy is different from DTRace though, in DTrace you can code your own D program which act on defined your defined events, in AIX kernel trace you can just enable/disable static probes.

    http://publib16.boulder.ibm.com/doc_link/en_US/a_doc_lib/aixprggd/genprogc/trace_facility.htm

  8. tanelp says:

    Btw, AIX 6 has something called “probevue” which aims to be something like DTrace.. never used it though…

  9. Alberto says:

    Hi Tanel,
    great explanation, but i’ve solaris9 and dtrace is missing..it’s only for Solaris 10.
    is there another similar command for my Solaris 9?
    Thanks a lot
    Congratulation again great site…
    Alberto

  10. tanelp says:

    Alberto,

    As far as stack dumping is concerned, you can use pstack in Solaris before version 10. In fact I still use pstack in Solaris 10 for getting one or few samples as its more convenient to type.

  11. Milen Kulev says:

    Excellent article Tanel!
    Really very enlightening!
    Can I conduct the same experiment on Linux? What tool should I use ? I am using SUSE10 and RHEL5.

    The most critical part of being able to interpret the process stack is to know what the functions are doing( and deriving their functionality from their names ;)).
    Is there any way (no matter whether supported by Oracle or not) to get the map “Oracle C function name” -> “function functionality” ?
    The only source that I know is ML:175982.1 ?
    Any help in this direction is highly appreciated.

    Many thanks agian for the really excellent article.
    Milen

  12. tanelp says:

    Hi Milen, thanks.

    I don’t know any other definitive list than the metalink note you mentioned above. I have mentioned 1-2 more metalink notes in my previous blog entries.

    What I normally do when I don’t recognize a function name is google for it and search Metalink “all documents” (including bug descriptions).

    There are few other techniques for getting an idea what a name prefix may be about – using v$latch_misses and v$fixed_view_definition. But these techniques just give rough hint, not the definitive answer.

    But on the other hand – it’s usually the function prefix I’m interested in (e.g. whether its kcb, qer, kks or similar), that’s enough for understanding in which Oracle kernel component the execution is. And most interesting prefixes are documented in that metalink note.

    I plan to blog about it some day :)

  13. Sandro says:

    Excellent very good article Tanel!

    Have you veried if performance degradation are for SYSAUX too?

  14. tanelp says:

    Good question Sandro,

    I just checked on 11.1.0.6 on Windows, the SYSAUX behaved like normal tablespaces.

  15. tanelp says:

    Milen, on Linux you can use pstack (which is just a wrapper around gdb)

    An alternative is to use oprofile, but I haven’t tested it enough so can’t recommend it (especially for production). Few pstack traces have done the trick so far.

    Also, when using debugger-based stack tracing, read this first:
    http://blog.tanelpoder.com/2008/06/14/debugger-dangers/

    There’s another novel approach… but I’ll leave it for next blog entry ;)

  16. Steve says:

    Hi Tanel,

    Good that you did this post. i had similar issue with 10gr2 and 11g, where the inserts taking 3x times more cpu than 10gr2. Didn’t see any waits and DB time was equivalent to DB cpu. Ran the dtraceprof on 11g and noticed the following which didn’t occur in 10gr2. Any idea what these functions are? Wondering if this has anything to do with the 3x cpu usage.

    `qesltcLoadIndexList
    `qesltcLoadIndexes
    `qerltcNoKdtBufferedInsRowCBK
    `qerltcLoadStateMachine
    `qerltcInsertValuesRop
    ‘qerltcFetch

    Thanks.

  17. tanelp says:

    Hi Steve,

    Did you notice any significant differences in Oracle Wait Interface times or v$sesstat counters between the versions?

    I would always start from these before looking into stack sampling. Also, dstacktrace samples only the processes currently on CPU, so if your process waits/sleeps on something, dstackprof doesn’t show any samples for those sleeps.

    If you have already validated with Oracle Wait Interface data that your process is not waiting/sleeping majority of time, then can you post few of the bottom dstacktrace stack samples from both 11g and 10gR2?

  18. tanelp says:

    @Senthil

    I recommend reading this to understand Oracle and Unix touchpoint well:

    http://www.scaleabilities.co.uk/book/scalingOracle8i.pdf (it’s free – and ignore that the name says 8i in title, the core concepts have remained the same)

    And then get this book: http://antognini.ch/top/

  19. Senthil Murugan says:

    Thanks a lot tanel :) .. i really enjoy reading your blog..

    Thanks again.

    K.Senthil Murugan

  20. Ricardo says:

    Hi Tanel,

    I have tested the example and works as you say with the associated problem. However, when the insert is done like this:

    insert into t2 select rownum from dual connect by level <= 100000;

    The output generated is (the two last samples):

    221 samples with stack below
    __________________
    unix`hwblkclr
    unix`pagezero
    genunix`anon_zero
    genunix`segvn_faultpage
    genunix`segvn_fault
    genunix`as_fault
    unix`pagefault
    unix`trap
    unix`_cmntrap

    711 samples with stack below
    __________________
    kghbshrt
    kghalo
    kghgex
    kghfnd
    kghprmalo
    kghalp
    kxsWorkHeapAllocPerm
    qercbiPushState
    qercbiFetch
    qercoFetch
    rwsfcd
    insfch
    insdrv
    inscovexe
    insExecStmtExecIniEngine
    insexe
    opiexe
    kpoal8
    opiodr
    ttcpip
    opitsk
    opiino
    opiodr
    opidrv
    sou2o
    opimai_real
    main
    0xd7de4c

    2920 Total samples captured

    Obviously the way of the insert done makes sense, but why? IsnĀ“t there block checking in direct-path-insert as the checking is done in the buffer cache?

    Thank you for this wonderful blog Tanel!

  21. tanelp says:

    Hi Ricardo,

    The top of the stack says these 711 samples (of 2920) were spent allocating memory (and not doing block checking)

    The reason is that connect by queries are recursive and if you run a query with connect by level <100000, Oracle needs to keep allocating memory for cursor work heap.

    I’ve blogged about it previously:

    http://blog.tanelpoder.com/2008/06/08/generating-lots-of-rows-using-connect-by-safely/

    You could run your test with the query I proposed there to elimiate the memory allocation issue.

    Another reason why the block checking code may be less visible in your test case is that with bulk inserts the block checking can probably be done in a block at a time “bulk fashion” as well but for a tight PL/SQL loop it has to be done after each row insert separately.

    The second stack with 211 samples is Solaris kernel pagefault code, which initializes zero-filled pages for your process when they’re touched the first time.

  22. Shiv says:

    Awesome. I am not a oracle/DB person, but the analysis itself makes me interested in looking into it !

  23. Paresh says:

    This article is awesome, interesting finding. Thank you very much for sharing with Oracle community. Do you have any idea, is this true for sysaux tablespace as well?

  24. Tanel Poder says:

    Hi Paresh,

    As far as I know it should affect only SYSTEM tablespace, mainly because critical stuff – data dictionary is in there. In SYSAUX you don’t have such critical stuff but rather just some supporting things..

  25. luda says:

    I met a ora-00600 error about db_blcok_checksum in oracle 10204,
    ORA-00600[kddummy_blkchk][28][2756885][18019][][]

    when i set the paramter db_blcok_checksum to be false,the db go to normal
    and i moved the table which in datafile 28 and contain dblock 2756885,i also rebuilt
    the index on the table.

    It’s not a bug,but How did it happen??

  26. Kyle Hailey says:

    cool and fun blog ! Awesome
    Curious to see your DTrace script but the link doesn’t work.
    One possible fun visualization of the multiple stack traces is
    http://dtrace.org/blogs/brendan/2011/12/16/flame-graphs/
    - Kyle Hailey

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>