Session-level statspack

Statspack is a useful tool for easily gathering and reporting some Oracle’s historical workload statistics. However it has its limitations and problems:

One of them that in past it used to record only Oracle statistics, measured from inside Oracle. This made Oracle and people using Oracle ignore other crucial statistics like OS workload below Oracle. For example statspack may show you heavy latch contention as the performance problem, while looking at vmstat output one would see that the server just has been heavily overloaded with numerous other jobs (like multiple backup, export and compress jobs overrunning their run-window) and the latch contention is just a symptom of CPU starvation.

This has been somewhat addressed by adding V$OSSTAT to Oracle 10g and statspack now gathers this info as well. Now we need people to start looking into those stats.

Another problem with statspack is which is described in Dan Fink’s blog, about cursor-level CPU accounting limitations before 10gR2.

However neither of those problems are the main limitations of statspack. The main problem is that statspack samples it’s statistic snapshots from system level views like V$SYSSTAT and V$SYSTEM_EVENT. If every single session in your database is performing exactly the same (kind) of operation, then system level aggregates might be quite OK to get an overview what’s going on in the database. But from the moment you start having different kind of sessions (e.g. OLTP vs batch vs data feed vs report) in your database, then system level aggregates used by statspack do heavily distort what the reality looks like for specific sessions.

There’s another gotcha – by default statspack excludes the “SQL*Net message from client” wait event from main report section, stating that it is not important to look into idle events. However by that we will throw away an important piece of information when diagnosing end-to-end performance, the end user experience. The bad thing is that even if we did include the SQL*Net wait event in our reports (by deleting the corresponding row from STATS$IDLE_EVENT), then we would still have no idea how many of those gazillions of system-wide SQL*Net client wait seconds do belong to our specific session or user.
Statspack doesn’t measure session-experience, it just gives you a system-wide aggregate, which cannot be translated back to individual session statistics (just as you can not convert a hash value back to original value – most of the information is just lost!).

How to get session-level detailed overview of database performance, with historical reporting capability then?

Oracle 10g addresses this partially with ASH and AWR and DBMS_MONITOR’s selective statistic sampling ( V$SERV_MOD_ACT_STATS and V$CLIENT_STATS )

While ASH and AWR are both awesome products, they are not usable in Standard Edition nor before 10g ( you may want to check out Kyle Hailey’s ASH simulator instead ) and for using them you need to pay a separate license fee.

So let me introduce my attempt to solve those problems ( NB! Free stuff!!! ;):

  • The poor-man’s version of Automatic Workload Repository: It’s called… um… Semi-Automatic Workload Repository :)
  • The clever-man’s addition to statspack, which takes session-level snapshots: In other words, Sesspack.

The idea is very simple, just take snapshots of V$SESSION_EVENT and some V$SESSTAT (and V$SESS_TIME_MODEL in 10g) statistics and store those in a repository just like statspack does. Snapshots are taken using sesspack.snap_xyz procedures. As a parameter to these procedures I can pass a specific SID, a group of SIDs, a Oracle db username a OS-username or whatever filtering fields can be found from V$SESSION.

Once you have taken your snapshots when running the workload, you can run a report which calculates statistic and wait event deltas between snapshots and that it.

A demo?

[Read more...]

A gotcha with parallel index builds, parallel degree and query plans

Reading the following article about PARALLEL hint by Jonathan Lewis made me remember a somewhat related gotcha with parallelism.

Often when creating (or rebuilding) an index on a large table, doing it with PARALLEL x option makes it go faster – usually in case when your IO subsystem is not the bottleneck and you have enough spare CPU capacity to throw in.

A small example below:



Tanel@Sol01> create table t1 as select * from all_objects;

Table created.

Tanel@Sol01> create index i1 on t1(object_id);

Index created.

Tanel@Sol01> exec dbms_stats.gather_table_stats(user, 'T1', cascade=>true, no_invalidate=>false);

PL/SQL procedure successfully completed.

Ok, for whatever reason I need to rebuild my index, and for speed I do it in parallel:



Tanel@Sol01> alter index i1 rebuild parallel 4;

Index altered.

Tanel@Sol01>
Tanel@Sol01>
Tanel@Sol01> select
  2     sum(object_id)
  3  from
  4     t1
  5  where
  6     object_id > 60000
  7  /

SUM(OBJECT_ID)
--------------
      13233374

Tanel@Sol01>
Tanel@Sol01> @x

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------------------
Plan hash value: 3900446664

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |     1 |     5 |     8   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE   |      |     1 |     5 |            |          |
|*  2 |   INDEX RANGE SCAN| I1   |  2923 | 14615 |     8   (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("OBJECT_ID">60000)

14 rows selected.

The execution plan shows a nice serial range scan for above query. Let’s run the same query with a different value for object_id:



Tanel@Sol01> select
  2     sum(object_id)
  3  from
  4     t1
  5  where
  6     object_id > 10000
  7  /

SUM(OBJECT_ID)
--------------
    1294174783

Tanel@Sol01>
Tanel@Sol01> @x

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------
Plan hash value: 2596547647

-------------------------------------------------------------------------------------------------------------------
| Id  | Operation                 | Name     | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
-------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |          |     1 |     5 |    19   (0)| 00:00:02 |        |      |            |
|   1 |  SORT AGGREGATE           |          |     1 |     5 |            |          |        |      |            |
|   2 |   PX COORDINATOR          |          |       |       |            |          |        |      |            |
|   3 |    PX SEND QC (RANDOM)    | :TQ10000 |     1 |     5 |            |          |  Q1,00 | P->S | QC (RAND)  |
|   4 |     SORT AGGREGATE        |          |     1 |     5 |            |          |  Q1,00 | PCWP |            |
|   5 |      PX BLOCK ITERATOR    |          | 42937 |   209K|    19   (0)| 00:00:02 |  Q1,00 | PCWC |            |
|*  6 |       INDEX FAST FULL SCAN| I1       | 42937 |   209K|    19   (0)| 00:00:02 |  Q1,00 | PCWP |            |
-------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   6 - filter("OBJECT_ID">10000)

18 rows selected.

What? Suddenly my query turned parallel !

I haven’t enabled parallelism for my table! How can Oracle go parallel without my consent?


Tanel@Sol01> select table_name, degree from user_tables  where table_name = 'T1';

TABLE_NAME                     DEGREE
------------------------------ ----------------------------------------
T1                                      1

The answer lies in the result of next query:


Tanel@Sol01> select index_name, degree from user_indexes where table_name = 'T1';

INDEX_NAME                     DEGREE
------------------------------ ----------------------------------------
I1                             4

Parallel index (re)build will persistently set the index parallel degree in data dictionary to the value used during build!

This enables the CBO to be free to consider also parallel query plans and in our second select case a parallel plan seemed to be the best.

Whether this parallel plan actually is the most efficient way to go is a separate question, however this kind of unplanned parallelism may destabilize your system performance, especially as it can kick in only for certain instantiations of your SQL statement. Note that even one parallel-enabled object in your execution plan can parallelize the whole query joining many tables (just as even one table with statistics in a join turns on CBO for the whole cursor).

Combined with bind variable peeking side-effects and way too high parallel_max_servers value (hey it’s just a max value, let’s set it to 500), this can bring your OLTP system to knees at very unexpected times.

So, as my database does not have parallelism planned into it, I will eliminate the troublemaker:

Tanel@Sol01> alter index i1 noparallel;

Index altered.

Tanel@Sol01>
Tanel@Sol01> select
  2     sum(object_id)
  3  from
  4     t1
  5  where
  6     object_id > 10000
  7  /

SUM(OBJECT_ID)
--------------
    1294174783

Tanel@Sol01>
Tanel@Sol01> @x

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------
Plan hash value: 129980005

------------------------------------------------------------------------------
| Id  | Operation             | Name | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |      |     1 |     5 |    19   (0)| 00:00:02 |
|   1 |  SORT AGGREGATE       |      |     1 |     5 |            |          |
|*  2 |   INDEX FAST FULL SCAN| I1   | 42937 |   209K|    19   (0)| 00:00:02 |
------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("OBJECT_ID">10000)

14 rows selected.

So the key point is that unless your databases have planned and managed parallelism used in them, it’s worth to run the following query to identify potential troublemakers and disable their parallelism:


SELECT
	'INDEX' OBJECT_TYPE, OWNER, INDEX_NAME, TRIM(DEGREE)
FROM
	DBA_INDEXES
WHERE
	TRIM(DEGREE) > TO_CHAR(1)
UNION ALL
SELECT
	'TABLE', OWNER, TABLE_NAME, TRIM(DEGREE)
FROM
	DBA_TABLES
WHERE
	TRIM(DEGREE) > TO_CHAR(1)
/

On my test environment it returned the following rows:

OBJEC OWNER                          INDEX_NAME                     TRIM(DEGREE)
----- ------------------------------ ------------------------------ -------------
INDEX SYS                            UTL_RECOMP_SORT_IDX1           DEFAULT
TABLE TANEL                          T                              4

From here we see two addtional things:

  • Parallel operations also persist their degree to tables (using alter table move parallel x or CTAS for example)
  • There’s a parallel degree DEFAULT – which is used when you let the appropriate degree to be decided by optimizer

On the other hand, if you have planned for parallelism, then you probably want to keep the parallelism for tables and indexes consistent, e.g. enable it for all tables requiring parallelism and their indexes, not just for couple of indexes by accident.

Update:
Thanks to Adrian Billington for reminding me that also NOLOGGING flag will stick in data dictionary should you perform nologging operations on an object. You should review those too after rebuilds and reorgs.

My version of SQL string to table tokenizer

This one’s a short post on a fairly random topic as unfortunately I don’t have time today to come up with anything deeper :)

I needed to come up with a delimited string to table tokenizer for an Oracle development project. There are quite a few examples out there how to do that, including Adrian Billington’s www.oracle-developer.net and the http://technology.amis.nl/blog/?p=1631.

So far the simplest solution I had seen was using a bunch of INSTR’s, SUBSTR’s and DECODE’s in a CONNECT BY loop.
However as this application is using Oracle 10g, I could use regular expressions to make the SQL even shorter:



Tanel@Sol01> define separator=":"
Tanel@Sol01> define mystring="A:BCD::EFG:H:IJKL"
Tanel@Sol01>
Tanel@Sol01> SELECT
  2             LEVEL,
  3             REGEXP_REPLACE(
  4                     REGEXP_SUBSTR( '&mystring'||'&separator', '(.*?)&separator', 1, LEVEL )
  5                     , '&separator$'
  6                     , ''
  7             ) TOKEN
  8  FROM
  9     DUAL
 10  CONNECT BY
 11     REGEXP_INSTR( '&mystring'||'&separator', '(.*?)&separator', 1, LEVEL ) > 0
 12  ORDER BY
 13     LEVEL ASC
 14  /

LEVEL TOKEN
----- ----------
    1 A
    2 BCD
    3
    4 EFG
    5 H
    6 IJKL

6 rows selected.

For anyone not yet familiar with regular expressions, I definitely recommend to get to know at least the basics – they can save you hours (or even days) of work in some cases. And the cool thing is that regular expressions are fairly standard across the development platforms too (e.g. if you know Oracle regexp, then it’s very easy to handle Java, perl or C# regexp too).

And regexp is not only a developer thing, it can be very useful for everyday production DBAs as well. For example, every decent “grep” utility has regexp support built in – this makes search for complex conditions from log or config files easier for you.

Advanced Oracle Troubleshooting Guide: When the wait interface is not enough [part 1]

Welcome to read my first real post on this blog!

If I ever manage to post any more entries, the type and style of content will be pretty much as this one: some Oracle problem diagnosis and troubleshooting techniques with some OS and hardware touch in it. Mmm… internals ;-)

Nevertheless I am also a fan of systematic approaches and methods so I plan to propose some less known OS and Oracle techniques for reducing guesswork in advanced troubleshooting even further.

Ok, to the topic.
Troubleshooting. Troubleshooting = finding out what is going on.

This post covers one unexplained issue I once had with Oracle external tables – which eventually turned out to be a problem with Oracle wait interface instrumentation. I used some of these “what’s going on” techniques to find out… what’s going on. Solaris 10 x64 / Oracle 10.2.0.2.


I worked on a project for which I needed to read data through an external table from an Unix pipe ( ever wanted to load compressed flat file contents to Oracle on-the-fly? ;-)

I created a Unix pipe:


$ mknod /tmp/tmp_pipe p

I created an Oracle external table, reading from that pipe:



Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.2.0 - Production
With the Partitioning, OLAP and Data Mining options

USERNAME     INSTANCE_NAME    HOST_NAME                 VER        STARTED      SID SERIAL# SPID
------------ ---------------- ------------------------- ---------- -------- ------- ------- -------
TANEL        SOL01            solaris01                 10.2.0.2.0 20070618     470      14 724

Tanel@Sol01> CREATE DIRECTORY dir AS '/tmp';

Directory created.

Tanel@Sol01> CREATE TABLE ext (
  2     value number
  3  )
  4  ORGANIZATION EXTERNAL (
  5    TYPE oracle_loader
  6    DEFAULT DIRECTORY dir
  7      ACCESS PARAMETERS (
  8      FIELDS TERMINATED BY ';'
  9      MISSING FIELD VALUES ARE NULL
 10      (value)
 11      )
 12      LOCATION ('tmp_pipe')
 13    )
 14  ;

Table created.

Tanel@Sol01> select * from ext;

So far so good… unfortunately this select statement never returned any results. As it turned out later, the gunzip over remote ssh link which should have fed the Unix pipe with flat file data, had got stuck.
Without realizing that, I approached this potential session hang condition with first obvious check – a select from V$SESSION_WAIT:


Tanel@Sol01> select sid, event, state, seq#, seconds_in_wait, p1,p2,p3
  2  from v$session_wait
  3  where sid = 470;

    SID EVENT                          STATE                     SEQ# SECONDS_IN_WAIT         P1         P2         P3
------- ------------------------------ ------------------- ---------- --------------- ---------- ---------- ----------
    470 db file sequential read        WAITED KNOWN TIME          164            7338          1       1892          1

Tanel@Sol01> /

    SID EVENT                          STATE                     SEQ# SECONDS_IN_WAIT         P1         P2         P3
------- ------------------------------ ------------------- ---------- --------------- ---------- ---------- ----------
    470 db file sequential read        WAITED KNOWN TIME          164            7353          1       1892          1

Tanel@Sol01> /

    SID EVENT                          STATE                     SEQ# SECONDS_IN_WAIT         P1         P2         P3
------- ------------------------------ ------------------- ---------- --------------- ---------- ---------- ----------
    470 db file sequential read        WAITED KNOWN TIME          164            7374          1       1892          1

Tanel@Sol01>

The STATE and SECONDS_IN_WAIT columns in V$SESSION_WAIT say we have been crunching the CPU for last two hours, right? (as WAITED… means NOT waiting on any event, in this case the EVENT just shows the last event on which we waited before getting on CPU)

Hmm.. let’s check it out:


$ prstat -p 724
   PID USERNAME  SIZE   RSS STATE  PRI NICE      TIME  CPU PROCESS/NLWP
   724 oracle    621M  533M sleep   59    0   0:00:00 0.0% oracle/1

prstat reports that this process is currently in sleep state, is not using CPU and has used virtually no CPU during its 2-hour “run” time!

Let’s check with ps (which is actually a quite powerful tool):


$ ps -o user,pid,s,pcpu,time,etime,wchan,comm -p 724
    USER   PID S %CPU        TIME     ELAPSED            WCHAN COMMAND
  oracle   724 S  0.0       00:01    02:18:08 ffffffff8135cadc oracleSOL01

ps also confirms that the process 724 has existed for over 2 hours 18 minutes (ELAPSED), but has only used roughly 1 second of CPU time (TIME). The state column “S” also indicates the sleeping status.

So, either Oracle V$SESSION_WAIT or standard Unix tools are lying to us. From above evidence it is pretty clear that it’s Oracle who’s lying (also, in cases like that, lower-level instrumentation always has a better chance to know what’s really going on at the upper level than vice versa).

So, let’s use truss (or strace on Linux, tusc on HP-UX) to see if our code is making any system calls or is sleeping within a system call…


$ truss -p 724
read(14, 0xFFFFFD7FFD6FDE0F, 524273) (sleeping...)

Hmm, as no followup is printed to this line, it looks like the process is waiting for a read operation on a file descriptor 14 to complete.
Which file is this fd 14 about?


$ pfiles 724
724:    oracleSOL01 (LOCAL=NO)

...snip...

  14: S_IFIFO mode:0644 dev:274,2 ino:4036320452 uid:100 gid:300 size:0
      O_RDONLY|O_LARGEFILE
      /tmp/tmp_pipe
...snip...

So from here it’s already pretty obvious where the problem is. There is no data coming from the tmp_pipe. This led me to check what was my gunzip doing on the other end of the pipe and it was stuck, in turn waiting for ssh to feed more data into it. And ssh had got stuck due some network transport issue.

The baseline is that you can rely on low-level (OS) tools to identify what’s really going on when higher level tools (like Oracle wait interface) provide weird or contradicting information, in this case the Oracle wait interface was not recording external table read wait events. I reported this info to Oracle people and I think it has been filed as a bug by now.


This was only a simple demo, identifying a pretty clear case of a session hang, however with use of a pretty intrusive tool ( I would not attach truss to a busy production instance process without thinking twice ).

However there are other options. In the next part of this guide ( when I manage to write it ) I will deal with more complex problems like what to do when the session is not reporting significant waits and is spinning heavily on CPU. Using Oracle and Unix tools it is quite easy to figure out the execution profile of a spinning server process, even without connecting to Oracle at all ( do I hear pstack, mdb and stack tracing? ;-)

As I’ve just started blogging, I would appreciate any feedback, including about things like blog layout, font sizes, readability, understandability etc. Also I think it will take few days before I manage to post the Part 2 of this troubleshooting guide.

Thank you for your patience reading through this :-)

Just a test posting…

This is a test posting, no real content here...
-bash-3.00$ pstack 868
868:    ora_pmon_SOL01
 fffffd7ffc7dddca pollsys  (63726c0, 2, fffffd7fffdfdc90, 0)
 fffffd7ffc7845c2 poll () + 52
 0000000004eb8bd0 ntevpque () + 60
 0000000004eb66e6 ntevque () + b6
 0000000004e93968 nsevwait () + 178
 0000000001905453 ksnwait () + 63
 00000000017b0cc7 ksliwat () + 8a7
 00000000017b1425 kslwaitns () + 15
 00000000018dd89f kskthbwt () + bf
 00000000017b13ed kslwait () + 4d
 00000000017ba502 ksuclnwt () + b2
 00000000017ba8f7 ksucln () + 327
 000000000184ecb5 ksbrdp () + 325
 00000000028d3dd6 opirip () + 2a6
 0000000000e76aea opidrv () + 3ba
 0000000000e73e7b sou2o () + 5b
 0000000000e3a86c opimai_real () + 11c
 0000000000e3a6a4 main () + 64
 0000000000e3a4ec ???????? ()