Where is LOB data stored?

There was a question in Oracle-L about where is the LOB data actually stored (in the row or the LOB segments) and what are the exact conditions when a switch from one to another may happen. The documentation isn’t fully clear about this and the “4000 bytes” number may mislead people to think that you can store 4000 bytes of your data in a LOB item before it must move out-of-line.

I clarified this in my post in Oracle-L, but will post it here too. First check this diagram:
 

 
If you create the LOB column with DISABLE STORAGE IN ROW, then the LOB data is always stored out-of-line. LOB index is always used. Only the LOB ID is stored inline, and the ID is looked up in LOB index, where you’ll get the pointers to actual LOB chunk blocks.
 
If you create the LOB column with ENABLE STORAGE IN ROW, then the LOB data may be stored in-line or out-of-line.
 
If the total LOB data + overhead <= 4000 bytes, then the LOB item will be stored in-line. No LOB index is used, even if you modify the LOB later on as everything is stored in-line with the row and versioning/rollback is achieved with undo data.
 
If the total LOB data + overhead > 4000 bytes, then the LOB item will be stored out-of-line. If the LOB fits into 12 x LOB_chunk_size, then no LOB index entries are created, because the in-line LOB locator can store up to 12 pointers to the LOB chunk blocks for each lob item. So if your LOB chunk size is 8kB, you can store LOB items up to 96kB in size without inserting anything to LOB index. However if the LOB item is bigger, then no pointers are stored in-row and all pointers will be put to the LOB index.
 
Note that once you modify an existing LOB item (which is bigger than 4000 bytes with its overhead), but smaller than 12 x chunk_size, then LOB index will still be used after the first LOB change operation as pointers to the old LOB chunk versions have to be stored in it (LOB segments don’t rely on undo for rollback & consistency, but just use LOB chunk versioning managed by LOB index).
 
The “overhead” of an in-line LOB item is 36 bytes, so the actual LOB data must be 4000 – 36 = 3964 bytes or less in order to fully fit in-row. And note that we are talking about bytes here, not characters. With multibyte character sets a character in a CLOB may take multiple bytes.
 
I wrote a presentation about LOB internals quite some time ago, back when Oracle 10.1 came out I think – check it here:
 

 
Enjoy! :-)
 
P.S. The next Advanced Oracle Troubleshooting v2.0 class will start in 3 weeks! (I just added the online payment option back too for convenience:)
 

NB! If you want to move to the "New World" - and benefit from the awesomeness of Hadoop, without having to re-engineer your existing applications - check out Gluent, my new startup that will make history! ;-)

This entry was posted in Oracle. Bookmark the permalink.

20 Responses to Where is LOB data stored?

  1. Bruno Lavoie says:

    Nice & Detailled information!!
    So, without these details in official docs, where do you find all these informations?
    Many thanks
    Bruno

  2. Kamus says:

    Thanks Tanel, your post always help us a lot. :-)

  3. Ron says:

    Thanks Tanel… the slides were VERY cool (if a little old).

    Is there a way to tell if the lob was stored in line or out of line (other than by looking at the size [using dbms_lob.getlength] and then guessing based on how close the size is to 4000 bytes)?

    I would like to select all rows from a table where the lob was stored out-of-line. Possible?

    • Tanel Poder says:

      Yeah I think the best way is to use DBMS_LOB.GETLENGTH and hope that the 3964 byte length assumption holds true :)

      I checked whether there’s an “ISINLINE” function in DBMS_LOB, like there’s ISSECUREFILE, ISTEMPORARY, ISOPEN … but there isn’t an ISINLINE one…

  4. Manish says:

    Tanel,
    This is a great post Thanks. do you have any updated material about how clobs stored as securefiles?

    Thanks

  5. fabrooms says:

    So how can I check where a LOB is stored? It’s not possible to use the IN_ROW column from dba_lobs… Any ideas?

    • Tanel Poder says:

      Do you mean where an individual LOB item is stored? I don’t know of any direct function call which would examine the LOB locator and tell you that. But you can use DBMS_LOB.GETLENGTH on every LOB item and check if it’s bigger than 3964 bytes.

      Something like this (assuming that the LOB was created with inline storage enabled, otherwise all items would be in the separate LOB segment anyway):

      SQL> SELECT 
               CASE WHEN DBMS_LOB.GETLENGTH(a) > 3964 THEN 'Out-of-line' 
                    ELSE 'Inline' 
               END x, COUNT(*) 
      FROM tl 
      GROUP BY 
           CASE WHEN DBMS_LOB.GETLENGTH(a) > 3964 THEN 'Out-of-line' 
           ELSE 'Inline' 
            END;
      
      X             COUNT(*)
      ----------- ----------
      Inline            1000
      Out-of-line       3001
      
      
  6. fabrooms says:

    Thanks Tanel but I am actually looking for a way to find out if a LOB is in row even if it is below 3964 bytes. This has been bugging me for a while…

    • Tanel Poder says:

      Actually using GETLENGTH can be problematic when using multibyte charactersets … the length may be reported 2000 chars for a CLOB, but it actually might use 4000 chars, thus going out of line…

      The sure way would be to dump the block where the row is in and check the contents of the row itself, if it’s out of line, its just the lob locator there (up to 48 bytes I think).

      Another way is to check v$sesstat (with snapper for example) and select that LOB item by that row’s ID. And then see whether the “lob reads” metric went up or not. If the “lob reads” goes up, the LOB was out-of-line. If it doesn’t then the LOB item was inline. Note that you have to select that lob (or DBMS_LOB.SUBSTR), not just GETLENGTH as the length number will be taken from the LOB locator in the row. Also, use DBMS_LOB.SUBSTR, not just SUBSTR as apparently the SQL engine’s SUBSTR causes also a “lob write” – might be because it creates a temporary LOB “slice” from the original LOB item and then gets its length. Or maybe it’s just a bug.

  7. Vahid Sadeghi says:

    Thanks Tanel …
    You talked about important issue about LOB that i can’t found it in the oracle documents .

  8. K.P says:

    Dear Tanel Poder,

    In AWR of my production database, the event “direct path read” appear at 1st position of top 5 events. I found information about it.

    So, I wonder that whether LOB segment cached on SGA ? Or only PGA read/write it (LOB segment) directly from disk when needed.

    Thanks,
    K.P

    • Tanel Poder says:

      You’ll need to check DBA_LOBS if your system if you are using LOBs and make sure they have the CACHE or CACHE READS attribute set (instead of NOCACHE). This is assuming that you do want to cache them in your buffer cache. Before changing anything, you should verify by measuring what causes these waits. Direct LOB reads would increment the “physical reads direct (lob)” statistic for example. The other way would be to use an ASH report (or my ashtop.sql script) to find out the TOP SQL_IDs waiting for these wait events and seeing which execution plan lines cause these, which tables are involved etch (search for asqlmon.sql in my blog).

      Starting from 11g also serial full table scans (and fast full index scans) can use direct path reads, so my first bet would be on that. Read more about this here (http://blog.tanelpoder.com/2012/09/03/optimizer-statistics-driven-direct-path-read-decision-for-full-table-scans-_direct_read_decision_statistics_driven/).

      The key is to pick one of the sessions and measure what is going on – before changing anything.

  9. Ahmed AANGOUR says:

    Hi Tanel,
    I have some queries that are stuck on the following wait event: securefile direct-write completion

    SID SERIAL# PROG ADDRESS HASH_VALUE SQL_ID CHILD PLAN_HASH_VALUE EXECS AVG_ETIME EVENT SQL_TEXT
    —– ———- ———- —————- ———- ————- —— ————— ———- ———– ——————– —————————————–
    243 24597 sqlplus@pa 00000014DCD21B88 2042300280 1cmat3jwvq1vs 0 1741250015 39 .01 enq: TM – contention DELETE FROM AGR_OP_TRANSCOM_DEAL WHERE TR
    1061 11 JDBC-ALM-F 0000001419A44718 3002849791 2gh9c1qtgrqgz 0 0 77 3,538.14 securefile direct-wr INSERT INTO TCM_MSG_TSK(SOURCE_TSK_ID, TA
    962 13179 JDBC-FXSTD 00000014691D8CA8 3332294959 7113zbz39xk9g 0 0 59 12,995.70 securefile direct-wr insert into TCM_MSG_TSK (MSG_BDY_TYP_COD,
    871 21 JDBC-FXO 00000014691D8CA8 3332294959 7113zbz39xk9g 0 0 59 12,995.70 securefile direct-wr insert into TCM_MSG_TSK (MSG_BDY_TYP_COD,
    485 6719 JDBC-FXSTD 00000014691D8CA8 3332294959 7113zbz39xk9g 0 0 59 12,995.70 securefile direct-wr insert into TCM_MSG_TSK (MSG_BDY_TYP_COD,
    1449 631 JDBC-GDS 00000014691D8CA8 3332294959 7113zbz39xk9g 1 0 129 5,461.79 securefile direct-wr insert into TCM_MSG_TSK (MSG_BDY_TYP_COD,
    582 11247 JDBC-FXO 00000014691D8CA8 3332294959 7113zbz39xk9g 1 0 129 5,461.79 securefile direct-wr insert into TCM_MSG_TSK (MSG_BDY_TYP_COD,
    776 34337 JDBC-GDS 00000014691D8CA8 3332294959 7113zbz39xk9g 1 0 129 5,461.79 securefile direct-wr insert into TCM_MSG_TSK (MSG_BDY_TYP_COD,
    249 8917 JDBC-CRD 0000001429340BF0 1579713576 9uqr07xg2j118 2 3896819816 11 24,694.82 securefile direct-wr UPDATE TCM_MSG_TSK SET SOURCE_TSK_ID = :1

    The SQL_ID 7113zbz39xk9g is the following one:
    INSERT
    INTO TCM_MSG_TSK
    (
    MSG_BDY_TYP_COD, CRE_DT, END_DT, EXEC_CT, LNK_SRC_APN_ID, LNK_TCM_ID, SOURCE_TSK_ID,
    STT_DT, STATUS, TARGET, ID, MSG_BDY, ERR_DSC )
    VALUES
    (
    :1 , :2 , :3 , :4 , :5 , :6 , :7 , :8 , :9 , :10 , :11 , :12 , :13
    );

    In the TCM_MSG_TSK table I have 2 lobs securefile :
    SQL> select column_name, in_row
    2 from dba_lobs where securefile = ‘YES’ and table_name=’TCM_MSG_TSK’;

    COLUMN_NAME IN_
    ————— —
    MSG_BDY NO
    ERR_DSC YES

    I could not find any valuable information about this event.

    The EXECUTION column in V$SQL is no longer incremented. it’s completely stuck.

    Do you have any idea on what is happening here?

    • Tanel Poder says:

      I don’t see the STATE column in above output.. Before looking into wait event, you need to look into STATE if you’re using V$SESSION to see if the session is on CPU or waiting. The EVENT column is misleading if the STATE is not = WAITING.

      • Ahmed AANGOUR says:

        Hi Tanel,

        The column event that you see is the result of the following select: decode(state, ‘WAITING’, a.event, ‘ON CPU’) EVENT

        So the event you see in the output is clearly something where the session is waiting on. We have killed the batch that generated this issue , but this afternoon the issue has come again. here is the new output:

        select sid, serial#, substr(program,1,19) prog, address, hash_value, b.sql_id, child_number child, plan_hash_value, executions execs,
        (elapsed_time/decode(nvl(executions,0),0,1,executions))/1000000 avg_etime, decode(state, ‘WAITING’, a.event, ‘ON CPU’) EVENT, sql_text
        from v$session a, v$sql b
        where status = ‘ACTIVE’
        and username is not null
        and a.sql_id = b.sql_id
        and a.sql_child_number = b.child_number
        –and sql_text not like ‘select sid, substr(program,1,19) prog, address, hash_value, b.sql_id, child_number child,%’ — don’t show this query
        and sid not in (select sid from v$mystat where rownum=1)
        order by sql_id, sql_child_number
        /

        SID SERIAL# PROG ADDRESS HASH_VALUE SQL_ID CHILD PLAN_HASH_VALUE EXECS AVG_ETIME EVENT SQL_TEXT
        —– ———- ———- —————- ———- ————- —— ————— ———- ———– ——————– —————————————–
        243 33925 JDBC-IRD 000000149AAE77F0 3002849791 2gh9c1qtgrqgz 0 0 129 499.14 securefile direct-wr INSERT INTO TCM_MSG_TSK(SOURCE_TSK_ID, TA
        871 59717 JDBC-CRD 000000149AAE77F0 3002849791 2gh9c1qtgrqgz 0 0 129 499.14 securefile direct-wr INSERT INTO TCM_MSG_TSK(SOURCE_TSK_ID, TA
        966 55 JDBC-ALM-F 000000149AAE77F0 3002849791 2gh9c1qtgrqgz 0 0 129 499.14 securefile direct-wr INSERT INTO TCM_MSG_TSK(SOURCE_TSK_ID, TA
        1207 2493 JDBC Thin 00000013546930C8 1680008013 bph2v8pk25sud 0 2422122865 705060 .00 ON CPU INSERT INTO RUL_OLD_VB_LOG_CALLS (DT, RUL

        I have no idea on what this event is telling us.

        • Tanel Poder says:

          Ok, had to be sure. Whenever this happens again – next steps would be to:

          1) run Snapper on the “stuck” sessions to see if any v$sesstat counters increase (especially the securefile related ones)
          2) run pstack a few times on the stuck process – the stack trace should give exact information about what the process is doing (is it calling some flush / disk writes)

          • Ahmed AANGOUR says:

            Hi tanel,

            The snapper is clearly showing that the SIDs 243, 871 and 966 are waiting 100% on the “securefile direct-write completion” wait event:
            @snapper ash=sid+event+wait_class 5 1 all
            ———————————————————…
            ————————————————————————
            Active% | SID | EVENT | WAIT_CLASS
            ————————————————————————
            100% | 243 | securefile direct-write completion | User I/O
            100% | 871 | securefile direct-write completion | User I/O
            97% | 966 | securefile direct-write completion | User I/O
            19% | 446 | ON CPU | ON CPU
            11% | 1443 | ON CPU | ON CPU
            3% | 1498 | ON CPU | ON CPU
            3% | 343 | ON CPU | ON CPU
            3% | 1259 | ON CPU | ON CPU
            3% | 635 | ON CPU | ON CPU
            3% | 157 | ON CPU | ON CPU

            But actually the processes are died. When we look at the trace files we can see the folowing information:
            ORA-00600: internal error code, arguments: [504], [0x14A191DA38], [64], [3], [tablespace key chain], [1], [0], [0x000000000], [], [], [], []

            We’re actually hitting the bug 13912642 on 11.2.0.3 and fixed on 11.2.0.4.

          • Tanel Poder says:

            Yep, it’s the process itself who keeps its V$SESSION slot up to date about what it’s doing – so when it crashes while still being “in a wait event”, it won’t have a chance to update it. PMON later should clean up after the process … but as a process crash dump may take a while, a process may be physically doing something (on CPU) due to crash dump writing, while it’s not doing any useful work anymore.

            Snapper’s STAT section is helpful there (or just running top on the process if it still exists) – as if a process is on CPU, but no metrics in V$SESSTAT (snapper STAT section) increase, then its one indication that the process isn’t doing any useful work anymore or is stuck in some loop. Pstack would have shown where it was stuck (if the process had physically still been there)

Leave a Reply

Your email address will not be published. Required fields are marked *