Detect chained and migrated rows in Oracle – Part 1

I received a question about migrated rows recently.

It was about how to detect migrated rows in a 200TB data warehouse, with huge tables – as the ANALYZE TABLE xyz LIST CHAINED ROWS INTO command can not be automatically parallelized at table level (as DBMS_STATS can be, but oh, DBMS_STATS doesn’t gather the migrated/chained row info). Therefore the analyze command would pretty much run forever before returning (and committing) the chained row info in the output table. Also as there are regular maintenance jobs running on these tables (I suspect partition maintentance for example), then it wouldn’t be nice to keep running ANALYZE on the whole table constantly.

So, is there any faster or better way for finding the amount of migrated rows?

Ihave two answers to this.

Answer 1:

As we are dealing with a huge 200+ TB data warehouse its tables/indexes are most likely partitioned. Thus you could use the ANALYZE TABLE xyz PARTITION (abc) LIST CHAINED ROWS command to analyze individual partitions, even in parallel (sqlplus sessions) if you like. This would allow you to focus only on the partitions of interest (the latest ones, with the heaviest activity perhaps).

SQL> create table CHAINED_ROWS (
2    owner_name         varchar2(30),
3    table_name         varchar2(30),
4    cluster_name       varchar2(30),
5    partition_name     varchar2(30),
6    subpartition_name  varchar2(30),
7    head_rowid         rowid,     -- actual chained row's head piece address in the segment
8    analyze_timestamp  date
9  );
Table created.
SQL>
SQL> analyze table tmp partition (sys_p501) list chained rows; -- the default table name used for output is "CHAINED_ROWS"
Table analyzed.
SQL> analyze table tmp partition (sys_p502) list chained rows;
Table analyzed.
SQL> select partition_name, count(*) from chained_rows group by partition_name;
PARTITION_NAME                   COUNT(*)
------------------------------ ----------
SYS_P502                              252
SYS_P501                             5602

SQL>

So, from above you see its possible to find out partition (or even sub-partition level row chaining).

However this above command lists you both CHAINED rows and MIGRATED rows (even though Oracle calls them all chained rows internally, as the chaining mechanism is the same for both cases).

Chained row is a row which is too large to fit into a block, so will always have to be split between multiple different blocks – with an exception of intra-block chaining which is used for rows with more than 255 columns. Migrated row on the other hand is a row which has been updated larger than it initially was – and if as a result it doesn’t fit into its original block, the row itself is moved to a new block, but the header (kind of a stub pointer) of the row remains in original location. This is needed so that any indexes on the table would still be able to find that row using original ROWIDs stored in them). If Oracle didn’t leave the row head piece in place then it would always go and update all indexes which have the ROWID of the migrating row in them.

Why should we care whether a row is a real chained row or just a migrated row?

It’s because if the row is chained, then any reorgs would not help you – if a row is too big to fit into a block, its too big to fit into a block no matter how many times you move around the table. (Note that if you have large tables full of rows longer than 8KB there’s likely something wrong with your design).

But migrated rows on the other hand are “chained” into another block due some update which made them not fit into existing block anymore. This happens when PCTFREE is set too low compared to real row growth factor and sometimes you may want to fix it by reorganizing the table/partition with ALTER TABLE/PARTITION MOVE or by backing the rows up, deleting them and reinserting them back to the table (that one makes sense when only a small amount of rows in a table are migrated).

If you are completely sure that you don’t have any rows longer than the free space in an empty block (thus all individual rows would fit into a block and would need to be split among multiple blocks) then you can conclude that all the rows reported were migrated due their growth.

Another option would be to query out all or a sample of these chained/migrated rows and actually measure how long they are if all columns are put together. This could be done using vsize() function (or also dump() and lengthb() in some cases). Of course the column and row header overhead would need to be accounted in as well.

So, this already gets pretty complex and there are more tiny details which we should take into account… thus I will introduce another way to look into the row migration/chaining thing:

Answer 2: (Alternatively called “should we care?”)

This answer looks into the problem from another angle (a more systematic angle by the way). And due my time constraints I will write this in 2 parts, here I will write the concepts and will follow up with a “packaged” use case some time later.

Instead of trying to figure out how many (or what percentage of) rows are migrated in the table, we rather try to measure how much extra work these migrated rows cause to our queries and workload. This is important because we don’t want to start reorganizing anything just because there are migrated rows in a table – maybe these rows don’t cause any significant trouble at all.

Migrated rows are not too bad when noone is accessing them. All that happens is that you use some extra disk space due extra row head pieces left behind in original blocks.

But the effect can be visible when migrated rows are accessed a lot via index access paths. What happens is that (let say you’re doing an index range scan) you find the ROWID of a row matching your search (whichs costs you some logical and maybe physical IOs on the index blocks), then you go to the table block corresponding to the ROWID (which means one more LIO and perhaps PIO) and oops the row is not there anymore! All you find is a row head piece which has another ROWID init, pointing to the actual location of the row. Here you go, you need one more LIO (and maybe PIO) to access that row.

There’s a statistic called “table fetch by rowid” which says how many times Oracle took a ROWID (for example from an index) and went to a table to lookup the actual row. This is a normal statistic which you’ll always see when there’s indexed table access going on.

Luckily Oracle has another statistic called “table fetch continued row” which tells you when we didn’t find all that we wanted from the original row piece and had to follow a pointer to the new location of the migrated row (or next row piece of a chained row).

And no, I’m not going to propose a “good” ratio between these two statistics! :)

I will run a query against the table of interest, measure it with Snapper and do some rough calculation instead.

First I will force the query to do a full segment/partition scan on the dataset (or I could use SAMPLE scan accessing part of the data only). The actual query isn’t even important as long as its doing a full segment scan on the table/partition.

The snapper output for the session doing the full table scan is following:

SQL> @sn 5 133

-- Session Snapper v2.02 by Tanel Poder ( http://www.tanelpoder.com )
--
----------------------------------------------------------------------------------------------------------------------
 SID, USERNAME  , TYPE, STATISTIC                               ,         DELTA, HDELTA/SEC,    %TIME, GRAPH
----------------------------------------------------------------------------------------------------------------------
 133, TANEL     , STAT, opened cursors cumulative               ,             1,         .2,
 133, TANEL     , STAT, user calls                              ,             3,         .6,
 133, TANEL     , STAT, session logical reads                   ,           158,       31.6,
 133, TANEL     , STAT, CPU used when call started              ,             1,         .2,
 133, TANEL     , STAT, CPU used by this session                ,             1,         .2,
 133, TANEL     , STAT, DB time                                 ,             2,         .4,
 133, TANEL     , STAT, consistent gets                         ,           158,       31.6,
 133, TANEL     , STAT, consistent gets from cache              ,           158,       31.6,
 133, TANEL     , STAT, calls to get snapshot scn: kcmgss       ,             9,        1.8,
 133, TANEL     , STAT, no work - consistent read gets          ,           150,         30,
 133, TANEL     , STAT, table scans (short tables)              ,             1,         .2,
 133, TANEL     , STAT, table scan rows gotten                  ,         19539,      3.91k,
 133, TANEL     , STAT, table scan blocks gotten                ,           150,         30,
 133, TANEL     , STAT, session cursor cache hits               ,             1,         .2,
 133, TANEL     , STAT, session cursor cache count              ,             1,         .2,
 133, TANEL     , STAT, parse count (total)                     ,             1,         .2,
 133, TANEL     , STAT, execute count                           ,             1,         .2,
 133, TANEL     , STAT, bytes sent via SQL*Net to client        ,           342,       68.4,
 133, TANEL     , STAT, bytes received via SQL*Net from client  ,           175,         35,
 133, TANEL     , STAT, SQL*Net roundtrips to/from client       ,             2,         .4,
 133, TANEL     , TIME, parse time elapsed                      ,            33,      6.6us,      .0%, |          |
 133, TANEL     , TIME, DB CPU                                  ,         10052,     2.01ms,      .2%, |          |
 133, TANEL     , TIME, sql execute elapsed time                ,         13354,     2.67ms,      .3%, |          |
 133, TANEL     , TIME, DB time                                 ,         13503,      2.7ms,      .3%, |          |
 133, TANEL     , WAIT, SQL*Net message to client               ,             5,        1us,      .0%, |          |
 133, TANEL     , WAIT, SQL*Net message from client             ,       8448078,      1.69s,   169.0%, |@@@@@@@@@@|
--  End of snap 1, end=2009-11-04 22:17:06, seconds=5

PL/SQL procedure successfully completed.

From above you see my table scan operation processed 19539 rows (the table scan rows gotten statistic shows how many rows were fetched from table segment during the scan) and we don’t see any table fetch continued row statistic reported (thus it didn’t increase!)

And this is something important to note. When you do a full table/partition scan then Oracle will silently skip the head (pointer) row pieces of migrated rows as it “knows” that the actual row is  going to be somewhere towards the end of the segment (or has already been processed earlier in the segment). Thus, when you have only migrated rows, then full table scan won’t increment table fetch continued row statistic when it hits them. Therefore, if you see some value for table fetch continued row when full scanning through a table/partition, this count is the number of chained row pieces Oracle had to go through in order to find the individual pieces of the rows. If the statistic shows 100, it may mean that there were 100 rows chained into 2 pieces (head and one continuing piece). Or it could mean that there were let say 50 rows which were so large that they needed 3 pieces in total (head and 2 continuation pieces per row – so 50 rows x 2 pieces = 100 table fetch continued row operations). Again it doesn’t really matter how  many rows exactly were chained, what matters more is how much extra work this chaining causes us – which is up 100 extra data block accesses (LIOs + potentially PIOs too).

So, when running full table/partition scan (or full scan with SAMPLE parameter to get only a sample of data) you can easily monitor the effect of chained rows by looking into table fetch continued row statistic. As (huge) chained rows are rather a data model design/physical design issue, I stop discussing these here.

So, how to detect the impact of migrated rows then?

As I mentioned above, migrated rows do not cause huge overhead when full scanning through segments as Oracle can ignore the head rowpieces in the row’s old location.

Lets run the query with an index range scan instead (again I just modified my query against the table to use an index using a hint + query predicate). Again I measured what the query was doing using Snapper:

SQL> @snapper out 5 1 133

-- Session Snapper v2.02 by Tanel Poder ( http://www.tanelpoder.com )

--
----------------------------------------------------------------------------------------------------------------------
 SID, USERNAME  , TYPE, STATISTIC                               ,         DELTA, HDELTA/SEC,    %TIME, GRAPH
----------------------------------------------------------------------------------------------------------------------
 133, TANEL     , STAT, opened cursors cumulative               ,             1,         .2,
 133, TANEL     , STAT, user calls                              ,             3,         .6,
 133, TANEL     , STAT, session logical reads                   ,           121,       24.2,
 133, TANEL     , STAT, CPU used when call started              ,             1,         .2,
 133, TANEL     , STAT, CPU used by this session                ,             1,         .2,
 133, TANEL     , STAT, consistent gets                         ,           121,       24.2,
 133, TANEL     , STAT, consistent gets from cache              ,           121,       24.2,
 133, TANEL     , STAT, consistent gets - examination           ,             2,         .4,
 133, TANEL     , STAT, calls to get snapshot scn: kcmgss       ,             1,         .2,
 133, TANEL     , STAT, no work - consistent read gets          ,           119,       23.8,
 133, TANEL     , STAT, table fetch by rowid                    ,           999,      199.8,
 133, TANEL     , STAT, table fetch continued row               ,            57,       11.4,
 133, TANEL     , STAT, index scans kdiixs1                     ,             1,         .2,
 133, TANEL     , STAT, cursor authentications                  ,             1,         .2,
 133, TANEL     , STAT, buffer is pinned count                  ,          1938,      387.6,
 133, TANEL     , STAT, buffer is not pinned count              ,           117,       23.4,
 133, TANEL     , STAT, parse count (total)                     ,             1,         .2,
 133, TANEL     , STAT, execute count                           ,             1,         .2,
 133, TANEL     , STAT, bytes sent via SQL*Net to client        ,           342,       68.4,
 133, TANEL     , STAT, bytes received via SQL*Net from client  ,           175,         35,
 133, TANEL     , STAT, SQL*Net roundtrips to/from client       ,             2,         .4,
 133, TANEL     , TIME, parse time elapsed                      ,            93,     18.6us,      .0%, |          |
 133, TANEL     , TIME, DB CPU                                  ,          1918,    383.6us,      .0%, |          |
 133, TANEL     , TIME, sql execute elapsed time                ,          1696,    339.2us,      .0%, |          |
 133, TANEL     , TIME, DB time                                 ,          1918,    383.6us,      .0%, |          |
 133, TANEL     , WAIT, SQL*Net message to client               ,             5,        1us,      .0%, |          |
 133, TANEL     , WAIT, SQL*Net message from client             ,       4213259,   842.65ms,    84.3%, |@@@@@@@@@ |
--  End of snap 1, end=2009-11-04 22:20:09, seconds=5

Check the bold stats above:

  1. index scans kdiixs1 – This counter shows how many times we did an index range scan during the snapper run. I ran only one query once (which didn’t have any nested loop joins nor other loops in execution plan which may invoke range scans many times during a single query)
  2. table fetch by rowid – During this index range scan we see that we have found a ROWID from an index and gone to look up the corresponding table row 999 times
  3. table fetch continued row – We didnt find the row we wanted from its original location 57 times out of 999 attempts! This is the important part. We have had to do extra work for 5.7% of the rows accessed. This extra work may mean extra logical IOs and extra logical IOs may mean extra physical IOs (and extra physical IOs mean extra wait time!).You see where I’m going here? By knowing for how many table rows we had to go and look for next rowpiece and by knowing how many LIOs, PIOs and PIO time we spent during the test run, we can (very) roughly calculate how much PIO time and how many LIOs we could save if these rows weren’t migrated, if they were accessible right where the index ROWIDs point to.

I have to admit – there are many factors which influence how many LIOs and PIOs we would actually save. Caching effect is the first one, obviously when a block is in buffer cache, you avoid the associated physical IO when accessing the block again. Also, Oracle can cache buffer handles (and keep cache buffers pinned during the database call), so you could even avoid the cost of a logical IO if you revisit a buffer and get lucky (this is what the buffer is pinned count statistic shows).

So, its not possible to calculate exactly how many LIOs or PIOs you would save, but nevertheless, this table fetch continued row statistic (when compared to table fetch by rowid statistic) does give a good indication whether extra work due row migration is done or not and roughly how much.

So, before even analyzing your tables to see how many migrated/chained rows the tables have in them, I would look into the table fetch continued row/table fetch by rowid statistics in v$sesstat to estimate whether row migration is causing any overhead. You could even start by looking into these stats system wide, in v$sysstat, Statspack/AWR reports but I don’t like this approach as it loses the context of what kind of work did generate which statistic profile if you have diverse workloads.

Phew… I thought this article would be shorter… Here I showed the concept and idea, I do have some scripts which I’ve used in past plus there are some gotchas (I’m sure they’ll come up in questions) which I will discuss in a future post (which may come in few weeks time as I’m veeeery busy right now).

By the way, I hope this article also shows the value of Oracle’s V$SESSTAT performance counters, not everything can be measured using Wait interface, ASH or SQL Trace! In fact this article is a prelude to a series of articles I’ll be writing with James Morle so check out his blog soon!

This entry was posted in Oracle and tagged , , , . Bookmark the permalink.

18 Responses to Detect chained and migrated rows in Oracle – Part 1

  1. Uwe Hesse says:

    Tanel, thank you for this excellent posting about chained resp. migrated rows!
    Especially the part why migrated rows don’t matter much in case of full table scans and the part about how to determine, whether row migration is a problem at all was very interesting to me.
    Keep up your good work!
    Kind regards
    Uwe

  2. Raj says:

    Tanel

    Thank you very much for this piece , especially method 2 and the ratio evaluation.

    When I got this issue for the first time a couple of days back , “table fetch row continued” stats in v$sesstat and “db file sequential read” in v$session_wait are the things
    I got when the table was merging about 2 million rows by comparing on the PK. Later on , I was able to confirm this by ‘analyze table …partition…list chained rows into
    chained_rows’ and then I rebuilt the table.

    So, it very much complies with your first method. Though I would admit the second method is very elegant and gives us better metrics.
    However, when the issue is till on , it would be very nice to do a block dump on the blocks appearing in P2 parameter of ‘db file sequential read’ waits and get the confirmation.
    I just found a link to detect row-chaining via block dump [by Steve Adams]. Just wanted to share.
    http://www.ixora.com.au/q+a/0107/27152941.htm

    —Raj

  3. Ron van Zijl says:

    Hey Tanel,

    Very interesting article. Have you looked into the effects of having ROW MOVEMENT enabled on the table. Especially the 11g option to combine that with an ALTER TABLE SHRINK SPACE COMPACT may relocate the header to where the block actually is.

  4. Tanel Poder says:

    Yes, that’s exactly what ENABLE ROW MOVEMENT does – it allows also row headers move around. Normally row headers don’t move around even if they migrate to other blocks, that way no index updates are needed when row migrates.

  5. Petr says:

    Hallo Tanel,

    I suppose AWR/Statspack is good starting point to find out which read intensive segments have many chained rows (and are candidates for rebuild). I wrote AWR script which shows read intensive segments with high correlation between logical reads and some system statistics (here it would be ‘table fetch continued row’) on some day(s).

    Best Regards,
    Petr

  6. Tanel Poder says:

    @Petr
    Yep, its possible to do correlation analysis – IF you have correlated fluctuation in LIOs per segment and system wide stats. I’ve done something similar in past with other stats, but found it to be not reliable enough in systems with lots of diverse workloads, batch jobs running etc…

  7. Donatello Settembrino says:

    Hi Tanel,
    Your interesting post has confirmed me a consideration that I did some time ago, or rather to consider effectively costs and benefits.For the migrated rows it could think to do an ALTER TABEL T MOVE to allow at a migrate row to fit it in a single block. But, I think it’s not very easy, because after an ALTER TABLE..MOVE I should do REBUILD of the index, because if I don’t do it, the query it will not be able to use the index that remains in a unusable status. And if the indexes are more than one? The operation could start to become costly…

    If I use ALTER TABLE..SHRINK SPACE I can bypass the indexes problem and reuse the space. But if my table, is a partitioned table with storical data and with several hundreds of millions of lines and I realize that there are migrated rows (and I have a performance problem) maybe I should isolate the problem at partition level and execute SHRINK on the single partition, provided that’s worth it.

    Surely, this conversation it’s not valid for the Chained, inasmuch if a row bigger than the dimension block is chained, it will remain chained also after an ALTER TABLE MOVE or SHRINK.

    Is it correct my interpretation?

    Thank you a lot.
    Best Regards

    Donatello Settembrino

  8. Andrew says:

    Hi Tanel,
    may I ask you when do plan to publish Part 2 of your very interesting article?

    Thanks.
    Andrew

  9. Marek says:

    Hi Tanel,
    thanks for this excellent work. The query I made based on your article for detecting row migration.
    Kind regards.

    SELECT
    NVL(VSE.USERNAME,’BACKGROUND’) USERNAME,
    VSE.SID,
    VSE.PROGRAM,
    TFCR.VALUE AS “TABLE FETCH CONTINUED ROW”,
    TFBR.VALUE AS “TABLE FETCH BY ROWID”,
    ROUND(((1-(TFBR.VALUE-TFCR.VALUE)/TFBR.VALUE))*100,2) AS RATIO
    FROM
    (
    SELECT
    VSE.SID, VSE.VALUE
    FROM
    V$SESSTAT VSE
    INNER JOIN V$STATNAME VSN
    ON VSE.STATISTIC#=VSN.STATISTIC#
    WHERE
    UPPER(VSN.NAME)=’TABLE FETCH CONTINUED ROW’
    ) TFCR
    INNER JOIN
    (
    SELECT
    VSE.SID, VSE.VALUE
    FROM
    V$SESSTAT VSE
    INNER JOIN V$STATNAME VSN
    ON VSE.STATISTIC#=VSN.STATISTIC#
    WHERE
    UPPER(VSN.NAME)=’TABLE FETCH BY ROWID’
    ) TFBR
    ON TFCR.SID=TFBR.SID
    INNER JOIN
    V$SESSION VSE
    ON TFCR.SID=VSE.SID
    WHERE
    TFCR.VALUE>0
    ORDER BY VSE.USERNAME

  10. Istvan Stahl says:

    Hi Tanel,

    nice post, as usual :)

    In my experience row migration can cause serious performance issues in 11.2 when updating tables (compared to 11.1 sometimes 100 times – or more – performance degradation).

    The reason behind this as per Oracle:
    “This increased time taken during the update is because in 11.2 we do “Head piece Scan” whereas in 11.1 we used to do “first piece” scan. This change is very important for correctness. If we don’t do this then deadlocks can happen.”

    Lesson learnt: better to do some row migration & design review before doing 11.2 upgrade.

    Best Regards,
    Istvan Stahl

  11. Sathish says:

    Hi,
    is Queries that use an Index to select Migrated rows , will perform additional I/Os ?

    • Tanel Poder says:

      Yes, when you want to retrieve a row from the block where the index ROWID points, then in case of a migrated row, only the row head piece is there with a pointer to the block where the rest of the row actually is – so you need to read this in if it’s not in buffer cache already.

  12. Pavol Babel says:

    Hi Tanel

    I ran into different issue with migrated rows, you can check details on Jonathan’s blog: http://jonathanlewis.wordpress.com/2014/02/10/row-migration/#comment-65762

    There is also mentioning in your article “When you do a full table/partition scan then Oracle will silently skip the head (pointer) row pieces of migrated rows as it “knows” that the actual row is going to be somewhere towards the end of the segment”, according to my experiences. So why the heck is DIRECT_PATH mode performing “table fetch continued row” while performing export (full table scan)?

    Regards
    Pavol Babel

    • Tanel Poder says:

      There are a few gotchas here. For example, if the parallelization is done by splitting the work to ROWID ranges at the application level (data pump?) and each “slave” will read data in that rowid (block) range, then each slave working on its rowid range would need to return all the rows logically in that rowid range, this includes the rows that have physically migrated away (but their rowid still stays as it was). You can test this behavior using the “rowid BETWEEN a AND b” syntax in a SQL query.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>