Detect chained and migrated rows in Oracle – Part 1

I received a question about migrated rows recently.

It was about how to detect migrated rows in a 200TB data warehouse, with huge tables – as the ANALYZE TABLE xyz LIST CHAINED ROWS INTO command can not be automatically parallelized at table level (as DBMS_STATS can be, but oh, DBMS_STATS doesn’t gather the migrated/chained row info). Therefore the analyze command would pretty much run forever before returning (and committing) the chained row info in the output table. Also as there are regular maintenance jobs running on these tables (I suspect partition maintentance for example), then it wouldn’t be nice to keep running ANALYZE on the whole table constantly.

So, is there any faster or better way for finding the amount of migrated rows?

Ihave two answers to this.

Answer 1:

As we are dealing with a huge 200+ TB data warehouse its tables/indexes are most likely partitioned. Thus you could use the ANALYZE TABLE xyz PARTITION (abc) LIST CHAINED ROWS command to analyze individual partitions, even in parallel (sqlplus sessions) if you like. This would allow you to focus only on the partitions of interest (the latest ones, with the heaviest activity perhaps).

SQL> create table CHAINED_ROWS (
2    owner_name         varchar2(30),
3    table_name         varchar2(30),
4    cluster_name       varchar2(30),
5    partition_name     varchar2(30),
6    subpartition_name  varchar2(30),
7    head_rowid         rowid,     -- actual chained row's head piece address in the segment
8    analyze_timestamp  date
9  );

Table created.

SQL>
SQL> analyze table tmp partition (sys_p501) list chained rows; -- the default table name used for output is "CHAINED_ROWS"

Table analyzed.

SQL> analyze table tmp partition (sys_p502) list chained rows;

Table analyzed.

SQL> select partition_name, count(*) from chained_rows group by partition_name;

PARTITION_NAME                   COUNT(*)
------------------------------ ----------
SYS_P502                              252
SYS_P501                             5602

SQL>

So, from above you see its possible to find out partition (or even sub-partition level row chaining).

However this above command lists you both CHAINED rows and MIGRATED rows (even though Oracle calls them all chained rows internally, as the chaining mechanism is the same for both cases).

Chained row is a row which is too large to fit into a block, so will always have to be split between multiple different blocks – with an exception of intra-block chaining which is used for rows with more than 255 columns. Migrated row on the other hand is a row which has been updated larger than it initially was – and if as a result it doesn’t fit into its original block, the row itself is moved to a new block, but the header (kind of a stub pointer) of the row remains in original location. This is needed so that any indexes on the table would still be able to find that row using original ROWIDs stored in them). If Oracle didn’t leave the row head piece in place then it would always go and update all indexes which have the ROWID of the migrating row in them.

Why should we care whether a row is a real chained row or just a migrated row?

It’s because if the row is chained, then any reorgs would not help you – if a row is too big to fit into a block, its too big to fit into a block no matter how many times you move around the table. (Note that if you have large tables full of rows longer than 8KB there’s likely something wrong with your design).

But migrated rows on the other hand are “chained” into another block due some update which made them not fit into existing block anymore. This happens when PCTFREE is set too low compared to real row growth factor and sometimes you may want to fix it by reorganizing the table/partition with ALTER TABLE/PARTITION MOVE or by backing the rows up, deleting them and reinserting them back to the table (that one makes sense when only a small amount of rows in a table are migrated).

If you are completely sure that you don’t have any rows longer than the free space in an empty block (thus all individual rows would fit into a block and would need to be split among multiple blocks) then you can conclude that all the rows reported were migrated due their growth.

Another option would be to query out all or a sample of these chained/migrated rows and actually measure how long they are if all columns are put together. This could be done using vsize() function (or also dump() and lengthb() in some cases). Of course the column and row header overhead would need to be accounted in as well.

So, this already gets pretty complex and there are more tiny details which we should take into account… thus I will introduce another way to look into the row migration/chaining thing:

**Answer 2: (Alternatively called “should we care?”)

This answer looks into the problem from another angle (a more systematic angle by the way). And due my time constraints I will write this in 2 parts, here I will write the concepts and will follow up with a “packaged” use case some time later.

Instead of trying to figure out how many (or what percentage of) rows are migrated in the table, we rather try to measure how much extra work these migrated rows cause to our queries and workload. This is important because we don’t want to start reorganizing anything just because there are migrated rows in a table – maybe these rows don’t cause any significant trouble at all.

Migrated rows are not too bad when noone is accessing them. All that happens is that you use some extra disk space due extra row head pieces left behind in original blocks.

But the effect can be visible when migrated rows are accessed a lot via index access paths. What happens is that (let say you’re doing an index range scan) you find the ROWID of a row matching your search (whichs costs you some logical and maybe physical IOs on the index blocks), then you go to the table block corresponding to the ROWID (which means one more LIO and perhaps PIO) and oops the row is not there anymore! All you find is a row head piece which has another ROWID init, pointing to the actual location of the row. Here you go, you need one more LIO (and maybe PIO) to access that row.

There’s a statistic called “table fetch by rowid” which says how many times Oracle took a ROWID (for example from an index) and went to a table to lookup the actual row. This is a normal statistic which you’ll always see when there’s indexed table access going on.

Luckily Oracle has another statistic called “table fetch continued row” which tells you when we didn’t find all that we wanted from the original row piece and had to follow a pointer to the new location of the migrated row (or next row piece of a chained row).

And no, I’m not going to propose a “good” ratio between these two statistics! :)

I will run a query against the table of interest, measure it with Snapper and do some rough calculation instead.

First I will force the query to do a full segment/partition scan on the dataset (or I could use SAMPLE scan accessing part of the data only). The actual query isn’t even important as long as its doing a full segment scan on the table/partition.

The snapper output for the session doing the full table scan is following:

SQL> @sn 5 133

-- Session Snapper v2.02 by Tanel Poder (  )

--
----------------------------------------------------------------------------------------------------------------------
 SID, USERNAME  , TYPE, STATISTIC                               ,         DELTA, HDELTA/SEC,    %TIME, GRAPH
----------------------------------------------------------------------------------------------------------------------
 133, TANEL     , STAT, opened cursors cumulative               ,             1,         .2,
 133, TANEL     , STAT, user calls                              ,             3,         .6,
 133, TANEL     , STAT, session logical reads                   ,           158,       31.6,
 133, TANEL     , STAT, CPU used when call started              ,             1,         .2,
 133, TANEL     , STAT, CPU used by this session                ,             1,         .2,
 133, TANEL     , STAT, DB time                                 ,             2,         .4,
 133, TANEL     , STAT, consistent gets                         ,           158,       31.6,
 133, TANEL     , STAT, consistent gets from cache              ,           158,       31.6,
 133, TANEL     , STAT, calls to get snapshot scn: kcmgss       ,             9,        1.8,
 133, TANEL     , STAT, no work - consistent read gets          ,           150,         30,
 133, TANEL     , STAT, table scans (short tables)              ,             1,         .2,
 133, TANEL     , STAT, table scan rows gotten                  ,         19539,      3.91k,
 133, TANEL     , STAT, table scan blocks gotten                ,           150,         30,
 133, TANEL     , STAT, session cursor cache hits               ,             1,         .2,
 133, TANEL     , STAT, session cursor cache count              ,             1,         .2,
 133, TANEL     , STAT, parse count (total)                     ,             1,         .2,
 133, TANEL     , STAT, execute count                           ,             1,         .2,
 133, TANEL     , STAT, bytes sent via SQL*Net to client        ,           342,       68.4,
 133, TANEL     , STAT, bytes received via SQL*Net from client  ,           175,         35,
 133, TANEL     , STAT, SQL*Net roundtrips to/from client       ,             2,         .4,
 133, TANEL     , TIME, parse time elapsed                      ,            33,      6.6us,      .0%, |          |
 133, TANEL     , TIME, DB CPU                                  ,         10052,     2.01ms,      .2%, |          |
 133, TANEL     , TIME, sql execute elapsed time                ,         13354,     2.67ms,      .3%, |          |
 133, TANEL     , TIME, DB time                                 ,         13503,      2.7ms,      .3%, |          |
 133, TANEL     , WAIT, SQL*Net message to client               ,             5,        1us,      .0%, |          |
 133, TANEL     , WAIT, SQL*Net message from client             ,       8448078,      1.69s,   169.0%, |@@@@@@@@@@|
--  End of snap 1, end=2009-11-04 22:17:06, seconds=5

PL/SQL procedure successfully completed.

From above you see my table scan operation processed 19539 rows (the table scan rows gotten statistic shows how many rows were fetched from table segment during the scan) and we don’t see any table fetch continued row statistic reported (thus it didn’t increase!)

And this is something important to note. When you do a full table/partition scan then Oracle will silently skip the head (pointer) row pieces of migrated rows as it “knows” that the actual row is going to be somewhere towards the end of the segment (or has already been processed earlier in the segment). Thus, when you have only migrated rows, then full table scan won’t increment table fetch continued row statistic when it hits them. Therefore, if you see some value for table fetch continued row when full scanning through a table/partition, this count is the number of chained row pieces Oracle had to go through in order to find the individual pieces of the rows. If the statistic shows 100, it may mean that there were 100 rows chained into 2 pieces (head and one continuing piece). Or it could mean that there were let say 50 rows which were so large that they needed 3 pieces in total (head and 2 continuation pieces per row – so 50 rows x 2 pieces = 100 table fetch continued row operations). Again it doesn’t really matter how many rows exactly were chained, what matters more is how much extra work this chaining causes us – which is up 100 extra data block accesses (LIOs + potentially PIOs too).

So, when running full table/partition scan (or full scan with SAMPLE parameter to get only a sample of data) you can easily monitor the effect of chained rows by looking into table fetch continued row statistic. As (huge) chained rows are rather a data model design/physical design issue, I stop discussing these here.

So, how to detect the impact of migrated rows then?

As I mentioned above, migrated rows do not __cause huge overhead when full scanning through segments as Oracle can ignore the head rowpieces in the row’s old location.

Lets run the query with an index range scan instead (again I just modified my query against the table to use an index using a hint + query predicate). Again I measured what the query was doing using Snapper:

SQL> @snapper out 5 1 133

-- Session Snapper v2.02 by Tanel Poder (  )

--
----------------------------------------------------------------------------------------------------------------------
 SID, USERNAME  , TYPE, STATISTIC                               ,         DELTA, HDELTA/SEC,    %TIME, GRAPH
----------------------------------------------------------------------------------------------------------------------
 133, TANEL     , STAT, opened cursors cumulative               ,             1,         .2,
 133, TANEL     , STAT, user calls                              ,             3,         .6,
 133, TANEL     , STAT, session logical reads                   ,           121,       24.2,
 133, TANEL     , STAT, CPU used when call started              ,             1,         .2,
 133, TANEL     , STAT, CPU used by this session                ,             1,         .2,
 133, TANEL     , STAT, consistent gets                         ,           121,       24.2,
 133, TANEL     , STAT, consistent gets from cache              ,           121,       24.2,
 133, TANEL     , STAT, consistent gets - examination           ,             2,         .4,
 133, TANEL     , STAT, calls to get snapshot scn: kcmgss       ,             1,         .2,
 133, TANEL     , STAT, no work - consistent read gets          ,           119,       23.8,
 133, TANEL     , STAT, table fetch by rowid                    ,           999,      199.8,
 133, TANEL     , STAT, table fetch continued row               ,            57,       11.4,
 133, TANEL     , STAT, index scans kdiixs1                     ,             1,         .2,
 133, TANEL     , STAT, cursor authentications                  ,             1,         .2,
 133, TANEL     , STAT, buffer is pinned count                  ,          1938,      387.6,
 133, TANEL     , STAT, buffer is not pinned count              ,           117,       23.4,
 133, TANEL     , STAT, parse count (total)                     ,             1,         .2,
 133, TANEL     , STAT, execute count                           ,             1,         .2,
 133, TANEL     , STAT, bytes sent via SQL*Net to client        ,           342,       68.4,
 133, TANEL     , STAT, bytes received via SQL*Net from client  ,           175,         35,
 133, TANEL     , STAT, SQL*Net roundtrips to/from client       ,             2,         .4,
 133, TANEL     , TIME, parse time elapsed                      ,            93,     18.6us,      .0%, |          |
 133, TANEL     , TIME, DB CPU                                  ,          1918,    383.6us,      .0%, |          |
 133, TANEL     , TIME, sql execute elapsed time                ,          1696,    339.2us,      .0%, |          |
 133, TANEL     , TIME, DB time                                 ,          1918,    383.6us,      .0%, |          |
 133, TANEL     , WAIT, SQL*Net message to client               ,             5,        1us,      .0%, |          |
 133, TANEL     , WAIT, SQL*Net message from client             ,       4213259,   842.65ms,    84.3%, |@@@@@@@@@ |
--  End of snap 1, end=2009-11-04 22:20:09, seconds=5

Check the bold stats above:

index scans kdiixs1 – This counter shows how many times we did an index range scan during the snapper run. I ran only one query once (which didn’t have any nested loop joins nor other loops in execution plan which may invoke range scans many times during a single query)
table fetch by rowid – During this index range scan we see that we have found a ROWID from an index and gone to look up the corresponding table row 999 times
table fetch continued row – We didnt find the row we wanted from its original location 57 times out of 999 attempts! This is the important part. We have had to do extra work for 5.7% of the rows accessed. This extra work may mean extra logical IOs and extra logical IOs may mean extra physical IOs (and extra physical IOs mean extra wait time!).You see where I’m going here? By knowing for how many table rows we had to go and look for next rowpiece and by knowing how many LIOs, PIOs and PIO time we spent during the test run, we can (very) roughly calculate how much PIO time and how many LIOs we could save if these rows weren’t migrated, if they were accessible right where the index ROWIDs point to.

I have to admit – there are many factors which influence how many LIOs and PIOs we would actually save. Caching effect is the first one, obviously when a block is in buffer cache, you avoid the associated physical IO when accessing the block again. Also, Oracle can cache buffer handles (and keep cache buffers pinned during the database call), so you could even avoid the cost of a logical IO if you revisit a buffer and get lucky (this is what the buffer is pinned count statistic shows).

So, its not possible to calculate exactly how many LIOs or PIOs you would save, but nevertheless, this table fetch continued row statistic (when compared to table fetch by rowid statistic) does give a good indication whether extra work due row migration is done or not and roughly how much.

So, before even analyzing your tables to see how many migrated/chained rows the tables have in them, I would look into the table fetch continued row/table fetch by rowid statistics in v$sesstat to estimate whether row migration is causing any overhead. You could even start by looking into these stats system wide, in v$sysstat, Statspack/AWR reports but I don’t like this approach as it loses the context of what kind of work did generate which statistic profile if you have diverse workloads.

Phew… I thought this article would be shorter… Here I showed the concept and idea, I do have some scripts which I’ve used in past plus there are some gotchas (I’m sure they’ll come up in questions) which I will discuss in a future post (which may come in few weeks time as I’m veeeery busy right now).

By the way, I hope this article also shows the value of Oracle’s V$SESSTAT performance counters, not everything can be measured using Wait interface, ASH or SQL Trace! In fact this article is a prelude to a series of articles I’ll be writing with James Morle so check out his blog soon!