Measuring what matters

Cary Millsap’s recent post prompted me to write down some of the related thoughts in my head.

Here are few of my mantras for systematic troubleshooting and performance tuning, which have materialized in my head over the years of work:

  • Picking the right starting point to troubleshooting and performance tuning is the most important decision in that process.
  • Pick the wrong starting point and you end up going in circles.
  • The scope of your performance data needs to match the scope of your problem, otherwise you end up going in circles.
  • If you don’t measure what matters, you may end up fixing what doesn’t matter.
  • If you’re not systematic in your troubleshooting, you may get lucky, but you don’t want to be dependent on luck! Moreover, you wont’t need to be lucky if you are systematic in your work!
  • Performance tuning is overrated. Fixing fundamental design and coding flaws via changing a magic configuration parameter is a dream just like is getting slim and healthy via eating magic diet pills bought from TV shop.
  • Your response times are too long for only two reasons:
  1. You are doing too much work
  2. You are waiting for too much

…both of the above things can be measured in Oracle…

  • There’s no such thing as slow database or slow system. How can it be slow independently, without anyone experiencing this slowness?
    • If users say that a database is slow, they must be experiencing that somehow! The only way to experience database slowness is via a connection to it, in which case you’ll have a session (to measure).
    • If a monitoring system says that a database is slow, then it must be running and measuring response time of some task just like users do, otherwise it can not reliably say something is slow.
  • Performance is about one thing and one thing only – time. And time is measured in seconds, not in CPU utilization, number of physical IOs or looks of an execution plan.

Here’s a link to a Cary Millsap’s awesome post, read it!

Finding the reasons for excessive logical IOs

There’s another interesting thread going on in Oracle-L, about understanding logical IOs and drilling down into their reasons. Of course sometimes (or rather usually) the excessive logical IOs come from a bad execution plan (when a nested loop loops over lots of datablocks again and again or a wrong index is used for driving a query etc), but sometimes the excessive LIOs are caused by some internal issues, like space management etc.

A convenient tool I use for reporting logical IO reasons is (again) my Snapper! It has  an option “b” for reporting Buffer get reasons or as I use below – option “a” shows All information Snapper can show.

There are couple of gotchas though which make this approach imperfect:

  1. The X$ tables Snapper uses for LIO reason reporting contain instance-wide counters, not specific to a single testing session. Thus you either need to be the single user in your database when experimenting and even then the background activity may increment some counters while you are testing too. I have sometimes suspended all other processes (kill -STOP and kill -CONT to resume)  or used Flash Freeze (oradebug ffbegin and ffresumeinst) to hang the whole instance that there would be no other activity going on.
  2. These buffer get reason counters are not maintained properly in Oracle 11g, probably due an optimization effort and some changes for faster pinning of buffer cache buffers (there’s a parameter called _fastpin_enable which is set to 1 in 11g and it enables so called fastpath buffer gets. If you see v$sesstat statistics such “consistent gets from cache (fastpath) or “db block gets from cache (fastpath)” being inremented, then fastpath buffer gets/pins are used. Note that I do have a script which works also on 11g but I’ll write about that one some time in the future :)

Anyway, if you are testing in an environment exclusively used by you, on Oracle 10.2 or lower, then you can run snapper with the gather=a option to report a bunch instance-level statistics in addition to the standard session-level stats:

  1. BUFG – Buffer get reasons (both consistent and current mode gets)
  2. LATG – Latch gets (both willing to wait and immediate gets)
  3. ENQG – Enqueue gets

Here’s an example, prepare for long output:

[Read more...]

Explain Plan For command may show you the wrong execution plan – Part 1

In Oracle-L mailing list a question was asked about under which conditions can the explain plan report a wrong execution plan (not the one which was actually used when a problem happened).

I replied this, but thought to show an example test case of this problem too:

1) The optimizer statistics the EXPLAIN PLAN ends up using are different
from the statistics the other session ended up using

2) Explain plan does not use bind variable peeking thus will not optimize
for current bind variable values

3) Explain plan treats all bind variables as VARCHAR2, thus you ma have
implicit datatype conversion happening during the plan execution, (meaning
to_char,to_number functions are added around variables/columns) and this for
example may make optimizer to ignore some indexes if you get unlucky.

…Of course explain plan doesn’t really
execute the plan so the implicit datatype conversion you see is in the
explained plan only, but if you actually execute the statement (with correct
bind datatypes) then there’s no implicit datatype conversion. And that’s
where the difference comes from…

And here comes an example of condition number 3 above. Lets use a little bit of bad design out there and put numeric values into varchar2 columns:

SQL> create table t (id varchar2(10), name varchar2(100));

Table created.

SQL> insert into t select to_char(object_id), object_name from dba_objects;

51449 rows created.

Now we add a little index for lookup performance and gather stats:

SQL> create index i on t(id);

Index created.

SQL> exec dbms_stats.gather_table_stats(user,'T',cascade=>true);

PL/SQL procedure successfully completed.

Now lets define a bind variable of NUMBER type and set a value for it:

SQL> var x number
SQL>
SQL> exec :x:=99999

PL/SQL procedure successfully completed.

Now lets use “explain plan for” to estimate the execution plan:

SQL> explain plan for
 2  select sum(length(name)) from t where id >  :x;

Explained.

SQL> select * from table(dbms_xplan.display) ;

PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------
Plan hash value: 3694077449

-------------------------------------------------------------------------------------
| Id  | Operation                    | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |      |     1 |    29 |    56   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE              |      |     1 |    29 |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID| T    |  2572 | 74588 |    56   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN          | I    |   463 |       |     3   (0)| 00:00:01 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

 3 - access("ID">:X)

15 rows selected.

Explain plan command nicely reports that we’d be using an index range scan, which would be a good thing to do given my test data and search condition.

Now lets actually run the statement and see the REAL execution plan actually used for the execution. I’ll use dbms_xplan.display_CURSOR for this. If you don’t pass SQL_ID/child into that function it will just report the last SQL statement executed in your current session. But the key difference between the dbms_xplan.DISPLAY and DISPLAY_CURSOR is that the latter goes to library cache and fetches the actual SQL plan used from there. The explain plan command just reparses the statement and estimates a plan, ignoring any bind variable values and assuming that all bind variables are of type varchar2:

SQL> select sum(length(name)) from t where id >  :x;

SUM(LENGTH(NAME))
-----------------

SQL> select * from table(dbms_xplan.display_cursor);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------
SQL_ID  7zm570j6kj597, child number 0
-------------------------------------
select sum(length(name)) from t where id >  :x

Plan hash value: 2966233522

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |       |       |    60 (100)|          |
|   1 |  SORT AGGREGATE    |      |     1 |    29 |            |          |
|*  2 |   TABLE ACCESS FULL| T    |  2572 | 74588 |    60   (5)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

 2 - filter(TO_NUMBER("ID")>:X)

19 rows selected.

Whatta? We actually used a full table scan!

Also check out a related article by Kerry Osborne

Detect chained and migrated rows in Oracle – Part 1

I received a question about migrated rows recently.

It was about how to detect migrated rows in a 200TB data warehouse, with huge tables – as the ANALYZE TABLE xyz LIST CHAINED ROWS INTO command can not be automatically parallelized at table level (as DBMS_STATS can be, but oh, DBMS_STATS doesn’t gather the migrated/chained row info). Therefore the analyze command would pretty much run forever before returning (and committing) the chained row info in the output table. Also as there are regular maintenance jobs running on these tables (I suspect partition maintentance for example), then it wouldn’t be nice to keep running ANALYZE on the whole table constantly.

So, is there any faster or better way for finding the amount of migrated rows?

Ihave two answers to this.

Answer 1:

As we are dealing with a huge 200+ TB data warehouse its tables/indexes are most likely partitioned. Thus you could use the ANALYZE TABLE xyz PARTITION (abc) LIST CHAINED ROWS command to analyze individual partitions, even in parallel (sqlplus sessions) if you like. This would allow you to focus only on the partitions of interest (the latest ones, with the heaviest activity perhaps).

SQL> create table CHAINED_ROWS (
2    owner_name         varchar2(30),
3    table_name         varchar2(30),
4    cluster_name       varchar2(30),
5    partition_name     varchar2(30),
6    subpartition_name  varchar2(30),
7    head_rowid         rowid,     -- actual chained row's head piece address in the segment
8    analyze_timestamp  date
9  );
Table created.
SQL>
SQL> analyze table tmp partition (sys_p501) list chained rows; -- the default table name used for output is "CHAINED_ROWS"
Table analyzed.
SQL> analyze table tmp partition (sys_p502) list chained rows;
Table analyzed.
SQL> select partition_name, count(*) from chained_rows group by partition_name;
PARTITION_NAME                   COUNT(*)
------------------------------ ----------
SYS_P502                              252
SYS_P501                             5602

SQL>

So, from above you see its possible to find out partition (or even sub-partition level row chaining).

However this above command lists you both CHAINED rows and MIGRATED rows (even though Oracle calls them all chained rows internally, as the chaining mechanism is the same for both cases).

Chained row is a row which is too large to fit into a block, so will always have to be split between multiple different blocks – with an exception of intra-block chaining which is used for rows with more than 255 columns. Migrated row on the other hand is a row which has been updated larger than it initially was – and if as a result it doesn’t fit into its original block, the row itself is moved to a new block, but the header (kind of a stub pointer) of the row remains in original location. This is needed so that any indexes on the table would still be able to find that row using original ROWIDs stored in them). If Oracle didn’t leave the row head piece in place then it would always go and update all indexes which have the ROWID of the migrating row in them.

Why should we care whether a row is a real chained row or just a migrated row?

It’s because if the row is chained, then any reorgs would not help you – if a row is too big to fit into a block, its too big to fit into a block no matter how many times you move around the table. (Note that if you have large tables full of rows longer than 8KB there’s likely something wrong with your design).

But migrated rows on the other hand are “chained” into another block due some update which made them not fit into existing block anymore. This happens when PCTFREE is set too low compared to real row growth factor and sometimes you may want to fix it by reorganizing the table/partition with ALTER TABLE/PARTITION MOVE or by backing the rows up, deleting them and reinserting them back to the table (that one makes sense when only a small amount of rows in a table are migrated).

If you are completely sure that you don’t have any rows longer than the free space in an empty block (thus all individual rows would fit into a block and would need to be split among multiple blocks) then you can conclude that all the rows reported were migrated due their growth.

Another option would be to query out all or a sample of these chained/migrated rows and actually measure how long they are if all columns are put together. This could be done using vsize() function (or also dump() and lengthb() in some cases). Of course the column and row header overhead would need to be accounted in as well.

So, this already gets pretty complex and there are more tiny details which we should take into account… thus I will introduce another way to look into the row migration/chaining thing:

Answer 2: (Alternatively called “should we care?”)

[Read more...]

What’s a good way to learn some Oracle internals every day?

Sometimes when an attendee describes me some totally weird problem during a seminar, I am immediately able to answer something like “Hey this looks like a bug related to this Oracle configuration and can be influenced by xyz”.

And then people ask me “How the hell do you know all this stuff?”

Well, I haven’t been bitten by all of these bugs myself, but I have been doing something for many years, almost every day… reading my email!

Oh, and additionally I have configured Metalink to send me daily updates about new/updated notes, forum articles and… bug descriptions!

The last part is very important. Bug descriptions tell you something about new bugs found (and old bugs rediscovered) and sometimes their details tell you an interesting piece or two about Oracle internals related to them.

[Read more...]

SystemTap is production supported in Redhat EL5.4

If you don’t know what SystemTap is – it’s the Linux world’s attempt to build Solaris DTrace style safe dynamic instrumentation into Linux kernel.

I’m not going into religious discussions which one is better here, I have used both SystemTap and DTrace successfully for diagnosing low level issues inside OS kernel, so both are good enough for me :)

The problem with SystemTap though has been that it’s not production quality, it’s rather been a technology preview.

But with RHEL 5.4 part of it has changed, Redhat says following in this article:

SystemTap is no longer a technology preview, and now has production support. Red Hat recommendeds that users run scripts on development machines before deployment in production environments. Since SystemTap is an optional diagnostic tool, users can easily stop using it in the event of a problem. Options such as -g for Guru mode, and -D* allow users to disable several security checks. Scripts using these options may not be supported.

Red Hat plans to fix problems in SystemTap, or the Linux kernel, as they arise in connection with new scripts. In some cases, a fix may include extending the blacklist for known areas of the Linux kernel that are unsafe to probe. All scripts that use probes targeting blacklisted areas will need to be revised.

SystemTap users are advised to upgrade to this version.


So, at least officially, SystemTap is now supported by Redhat. Well at least as long as you’re not doing crazy stuff with the -g option, the geek mode ;-)

I would still be very careful before using any SystemTap in production, in fact, in critial environments I wouldn’t run it at all, unless there is no other option (and you’re prepared to get a crash + kernel panic).

However the significance of this note is that SystemTap is accepted for production by Redhat in principle. So now it’s just matter of time until it gets stable and widely used enough to be as practical and useful as DTrace is on Solaris.

KGL simulator, shared pool simulator and buffer cache simulator – what are these?

If you have queried v$sgastat on recent Oracle versions (by which I mean 9i and above) you probably have seen allocations for some sort of simulators in Oracle instance. Here’s an example:

SQL> select * from v$sgastat where lower(name) like '%sim%' order by name;

POOL         NAME                            BYTES
------------ -------------------------- ----------
shared pool  kglsim alloc latch area          1700
shared pool  kglsim alloc latches               68
shared pool  kglsim count of pinned he        9248
shared pool  kglsim free heap list             204
shared pool  kglsim free obj list              204
shared pool  kglsim hash table                4104
shared pool  kglsim hash table bkts        2097152
shared pool  kglsim heap                    635536
shared pool  kglsim latch area                1700
shared pool  kglsim latches                     68
shared pool  kglsim main lru count           87040
shared pool  kglsim main lru size           174080
shared pool  kglsim object batch            909440
shared pool  kglsim pin list arr               816
shared pool  kglsim recovery area             2112
shared pool  kglsim sga                      22092
shared pool  kglsim size of pinned mem       18496
shared pool  ksim client list                   84
shared pool  log_simultaneous_copies           480
shared pool  sim cache nbufs                   640
shared pool  sim cache sizes                   640
shared pool  sim kghx free lists                 4
shared pool  sim lru segments                  640
shared pool  sim segment hits                 1280
shared pool  sim segment num bufs              640
shared pool  sim state object                   48
shared pool  sim trace buf                    5140
shared pool  sim trace buf context             120
shared pool  sim_knlasg                       1200
shared pool  simulator hash buckets          16512
shared pool  simulator latch/bucket st        4608

31 rows selected.

See, a bunch of “kgl sim” and then just “sim” allocations.

… or sometimes you can see latch contention on following latches:

SQL> select name from v$latch where name like '%sim%';

NAME
-------------------------------------------------------
ksim membership request latch
simulator lru latch
simulator hash latch
sim partition latch
shared pool simulator
shared pool sim alloc

6 rows selected.

Again, there seems to be some “simulation” work going on in Oracle instance.

So what are these simulators about?

[Read more...]

KGH: NO ACCESS allocations in V$SGASTAT – buffer cache within shared pool!

Since Oracle 10.2 it’s valid to say that buffer cache can be stored inside shared pool.

Now you may think I’m crazy, but read until the end of the post – no matter how crazy I may sound – I have proof!

Here it is:

Few years ago I started noticing a strange memory allocation in shared pool (in V$SGASTAT), called KGH: NO ACCESS.

SQL> select * from v$sgastat where name = 'KGH: NO ACCESS';
POOL         NAME                            BYTES
------------ -------------------------- ----------
shared pool  KGH: NO ACCESS               10513696

SQL>

You see, some 10 MB of memory in shared pool has been allocated for something called KGH: NO ACCESS.

Ok, lets see where this memory resides inside shared pool. We can use x$ksmsp view for that, this view has a line in it for each chunk of memory allocated from it (and also the free chunks), along the reasons (or comments) for what reason these chunks were allocated.

[Read more...]

latch: cache buffers chains latch contention – a better way for finding the hot block

Here’s a treat for Oracle performance professionals and geeks who are looking for more systematic ways for cache buffers chains (CBC) latch contention troubleshooting. Cache buffers chains latches are taken when a process wants to walk through a cache buffer hash chain, looking if the block with required DBA (data block address) is in buffer cache. If the block happens to be in cache, then in most cases it has to be pinned first before use and unpinned after use, to make sure no-one else can perform an incompatible operation on that block at the same time. The modification of pin structures (pinning/unpinning) is also protected by CBC latches.

CBC latch contention can happen for multiple reasons, but one reason is that there is some really hot block in a SMP system with high number of CPUs (or CMT system with high number of threads like Sun T-series servers). Sometimes there happen to be multiple moderately hot blocks “under” the same CBC latch, which can result in latch contention again.

Traditionally DBAs used to look up the child latch address from V$SESSION_WAIT, sql_trace output or ASH and then look up all buffers protected by that latch from X$BH, using HLADDR column (HLADDR stands for Hash Latch Address). I also have a script for that, bhla.sql (Buffer Headers by Latch Address), which reports me all blocks currently in buffer cache, “under” that particular latch and the corresponding data block addresses and object names:

How to detect when a cursor was closed from SQL trace output?

After Randolf’s comment on my last post about identifying cursor SQL text from sql trace file I think one thing needs elaboration.

I mentioned earlier in this post that this cursor dumping technique works “as long as the cursor of interest is still open”.

So how do you know whether this cursor of interest is still open or has been closed and that slot reused by some other statement instead? You would not want to get misled to wrong SQL statement…

Luckily all the info you need is in SQL tracefile.

In 11g you would see “CLOSE #2″ line appearing in tracefile when a cursor is closed.

Before 11g you would see “STAT #2″ lines dumping out SQL plan execution stats – before 11g this is done when cursor is closed, so when you see STAT #2 lines, you know that at that location the cursor in slot #2 was closed.

There are few cases when the stat lines are not printed even if the cursor is closed – for example when the cursor’s library cache lookup (finding suitable shared cursor) happened before SQL trace was enabled. In this case the cursor would not use the extra instrumentation rowsources which generate the STAT lines.

In such case you would just grep the tracefile for PARSING IN CURSOR #2 after your WAIT #2′s. Whenever someone is parsing in a new cursor into slot #2, this means that existing cursor in slot #2 must have been closed somewhere between the PARSING IN CURSOR #2 line and last WAIT/FETCH/EXEC #2 line.

However, there’s one more special case; when the cursor was parsed in before SQL trace was enabled and another OPI call (like EXEC,FETCH) against this cursor is executed, then some Oracle versions actually retrieve the current SQL text and still dump it into trace just like before parsing. This means that if you just have lots of WAIT lines appearing in the trace for a SQL parsed in past, you won’t see the PARSING IN CURSOR lines dumped into tracefile, but if some OPI call is executed against this cursor, then the PARSING IN CURSOR line is dumped just before that OPI call.

I don’t in which exact version this appeared, but in my 10.2.0.1 test instance  this feature gives such output (I ran a select from dba_objects which started fetching lots of rows and then enabled SQL trace from another session):

PARSING IN CURSOR #1 len=29 dep=0 uid=0 oct=3 lid=0 tim=558589701689 hv=3336193532 ad='24f2e864'
select owner from dba_objects
END OF STMT
FETCH #1:c=0,e=5362,p=0,cr=102,cu=0,mis=0,r=500,dep=0,og=1,tim=558589701683
WAIT #1: nam='SQL*Net more data to client' ela= 23 driver id=1111838976 #bytes=2047 p3=0 obj#=39 tim=558589703066
WAIT #1: nam='SQL*Net message from client' ela= 57993 driver id=1111838976 #bytes=1 p3=0 obj#=39 tim=558589761409
WAIT #1: nam='SQL*Net message to client' ela= 3 driver id=1111838976 #bytes=1 p3=0 obj#=39 tim=558589761753
WAIT #1: nam='SQL*Net more data to client' ela= 13 driver id=1111838976 #bytes=2001 p3=0 obj#=39 tim=558589762331
FETCH #1:c=0,e=1193,p=0,cr=93,cu=0,mis=0,r=500,dep=0,og=1,tim=558589762865
WAIT #1: nam='SQL*Net message from client' ela= 60762 driver id=1111838976 #bytes=1 p3=0 obj#=39 tim=558589823801
WAIT #1: nam='SQL*Net message to client' ela= 2 driver id=1111838976 #bytes=1 p3=0 obj#=39 tim=558589824139
WAIT #1: nam='SQL*Net more data to client' ela= 19 driver id=1111838976 #bytes=2001 p3=0 obj#=39 tim=558589825284

Do you notice something missing?

The PARSE #1 and EXEC #1 lines are missing as the actual parsing and execute calls happened way before the SQL trace was enabled. But Oracle was helpful enough to dump out the SQL text when next OPI call (FETCH) was issued under SQL tracing mode (it “noticed” that the SQL text corresponding to cursor #1 had not been dumped to trace yet).

This is helpful, but of course can cause some confusion, as this PARSING IN CURSOR doesn’t really mean parsing as such (as parsing is done under PARSE call), the PARSING IN CURSOR should probably be named as “ABOUT TO PARSE CURSOR” or something like that.