Generating lots of rows using connect by – safely!

Every now and then I need to generate a large number of rows into some table.

Not that I have any personal urge to do so, but you know, this is needed for some test cases and so on ;-)

It’s quite well known that it’s possible to generate lots of rows using CONNECT BY LEVEL <= #rows syntax.

However as the CONNECT BY LEVEL syntax is recursive by nature, it will consume more memory the more rows you query in one shot ( I think it was Mark Bobak who once posted this observation to Oracle-L, but I didn’t manage to find that thread anymore ).

So, here’s a test case:

SQL> select count(r)
  2  from (
  3     select rownum r from dual connect by rownum <= 100000000
  4  )
  5  /
        select rownum r from dual connect by rownum <= 100000000
                             *
ERROR at line 3:
ORA-04030: out of process memory when trying to allocate 44 bytes (kxs-heap-w,cursor work heap)

After running for a while the server process run out of private memory, used for the CONNECT BY cursor work heap.

Let’s investigate:

Execution plan is following, we have a single recursive CONNECT BY operator:

SQL> @x

PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------------------------
SQL_ID  5n2f53tsa3bq3, child number 0
-------------------------------------
select count(r) from (  select rownum r from dual connect by rownum <= 100000000 )

Plan hash value: 2085675642

---------------------------------------------------------
| Id  | Operation                       | Name | E-Rows |
---------------------------------------------------------
|   1 |  SORT AGGREGATE                 |      |      1 |
|   2 |   VIEW                          |      |      1 |
|   3 |    COUNT                        |      |        |
|   4 |     CONNECT BY WITHOUT FILTERING|      |        |
|   5 |      FAST DUAL                  |      |      1 |
---------------------------------------------------------

I actually ran Snapper on the session performing connect by and we see the large memory allocations from its output as well:

SQL> @snapper out,gather=s,sinclude=memory 3 10 109

-- Session Snapper v1.07 by Tanel Poder ( http://www.tanelpoder.com )

---------------------------------------------------------------------------------------------------------------------------------------------
HEAD,     SID, SNAPSHOT START   ,  SECONDS, TYPE, STATISTIC                               ,         DELTA,  DELTA/SEC,     HDELTA, HDELTA/SEC
---------------------------------------------------------------------------------------------------------------------------------------------
DATA,     109, 20080608 04:48:47,        3, STAT, session uga memory                      ,      72534112,   24178037,     72.53M,     24.18M
DATA,     109, 20080608 04:48:47,        3, STAT, session pga memory                      ,      72810496,   24270165,     72.81M,     24.27M
DATA,     109, 20080608 04:48:47,        3, STAT, sorts (memory)                          ,             1,          0,          1,        .33
--  End of snap 1
DATA,     109, 20080608 04:48:50,        4, STAT, session uga memory                      ,      95642904,   23910726,     95.64M,     23.91M
DATA,     109, 20080608 04:48:50,        4, STAT, session pga memory                      ,      95748096,   23937024,     95.75M,     23.94M
--  End of snap 2
DATA,     109, 20080608 04:48:54,        3, STAT, session uga memory                      ,      96559400,   32186467,     96.56M,     32.19M
DATA,     109, 20080608 04:48:54,        3, STAT, session pga memory                      ,      96665600,   32221867,     96.67M,     32.22M
--  End of snap 3
[...some lines snipped...]
--  End of snap 8
DATA,     109, 20080608 04:49:14,        4, STAT, session uga memory                      ,     100945488,   25236372,    100.95M,     25.24M
DATA,     109, 20080608 04:49:14,        4, STAT, session uga memory max                  ,     100880024,   25220006,    100.88M,     25.22M
DATA,     109, 20080608 04:49:14,        4, STAT, session pga memory                      ,     101056512,   25264128,    101.06M,     25.26M
DATA,     109, 20080608 04:49:14,        4, STAT, session pga memory max                  ,     100990976,   25247744,    100.99M,     25.25M
--  End of snap 9
DATA,     109, 20080608 04:49:18,        3, STAT, session uga memory                      ,     100159920,   33386640,    100.16M,     33.39M
DATA,     109, 20080608 04:49:18,        3, STAT, session uga memory max                  ,     100159920,   33386640,    100.16M,     33.39M
DATA,     109, 20080608 04:49:18,        3, STAT, session pga memory                      ,     100270080,   33423360,    100.27M,     33.42M
DATA,     109, 20080608 04:49:18,        3, STAT, session pga memory max                  ,     100270080,   33423360,    100.27M,     33.42M
--  End of snap 10

PL/SQL procedure successfully completed.

As Snapper calculates session statistic counter deltas then these snapshots show that my process allocated around 30MB of UGA memory per second during the CONNECT BY statement execution. In fact by the crash my process UGA/PGA size had grown to almost 1.8GB in size:

SQL> @ses 109 memory

NAME                                                                  VALUE
---------------------------------------------------------------- ----------
session uga memory                                               1825306316
session uga memory max                                           1825306316
session pga memory                                               1827832112
session pga memory max                                           1827832112
redo blocks read (memory)                                                 0
redo blocks read (memory) by LNS                                          0
workarea memory allocated                                                 0
sorts (memory)                                                          148

8 rows selected.

Note that this memory used for recursive connect by is not reported as workarea memory in it’s traditional sense (like sort, hash and bitmap merge areas), therefore the normal PGA size restriction constraints like PGA_AGGREGATE_TARGET don’t apply, this work area lives outside that domain. In fact if you query V$SQL_WORKAREA_ACTIVE during the SQL execution, nothing is returned for that SQL.

So, how to generate lots of rows fast and safely?

Well, we can still use CONNECT BY with lower recursion and let a cartesian merge join do the multiplication for us!

SQL> select count(r)
  2  from (
  3             select rownum r
  4             from
  5                     (select rownum r from dual connect by rownum <= 1000) a,
  6                     (select rownum r from dual connect by rownum <= 1000) b,
  7                     (select rownum r from dual connect by rownum <= 1000) c
  8             where rownum <= 100000000
  9     )
 10  /

  COUNT(R)
----------
 100000000

100 million rows were generated this time, without running out of memory.

The execution plan is following:

SQL> @x

PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------------------------
SQL_ID  76prmvktb3r9a, child number 0
-------------------------------------
select count(r) from (   select rownum r   from    (select rownum r
from dual connect by rownum <= 1000) a,    (select rownum r from dual
connect by rownum <= 1000) b,    (select rownum r from dual connect by
rownum <= 1000) c   where rownum <= 100000000  )

Plan hash value: 2999494577

-----------------------------------------------------------------------------------------
| Id  | Operation                            | Name | E-Rows |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------
|   1 |  SORT AGGREGATE                      |      |      1 |       |       |          |
|   2 |   VIEW                               |      |      1 |       |       |          |
|*  3 |    COUNT STOPKEY                     |      |        |       |       |          |
|   4 |     MERGE JOIN CARTESIAN             |      |      1 |       |       |          |
|   5 |      MERGE JOIN CARTESIAN            |      |      1 |       |       |          |
|   6 |       VIEW                           |      |      1 |       |       |          |
|   7 |        COUNT                         |      |        |       |       |          |
|   8 |         CONNECT BY WITHOUT FILTERING |      |        |       |       |          |
|   9 |          FAST DUAL                   |      |      1 |       |       |          |
|  10 |       BUFFER SORT                    |      |      1 | 15360 | 15360 |14336  (0)|
|  11 |        VIEW                          |      |      1 |       |       |          |
|  12 |         COUNT                        |      |        |       |       |          |
|  13 |          CONNECT BY WITHOUT FILTERING|      |        |       |       |          |
|  14 |           FAST DUAL                  |      |      1 |       |       |          |
|  15 |      BUFFER SORT                     |      |      1 | 15360 | 15360 |14336  (0)|
|  16 |       VIEW                           |      |      1 |       |       |          |
|  17 |        COUNT                         |      |        |       |       |          |
|  18 |         CONNECT BY WITHOUT FILTERING |      |        |       |       |          |
|  19 |          FAST DUAL                   |      |      1 |       |       |          |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter(ROWNUM<=100000000)

During the query execution the PGA size never grew larger than about a megabyte:

SQL> @ses 110 memory

NAME                                                                  VALUE
---------------------------------------------------------------- ----------
session uga memory                                                   497316
session uga memory max                                               751696
session pga memory                                                   885040
session pga memory max                                              1147184
redo blocks read (memory)                                                 0
redo blocks read (memory) by LNS                                          0
workarea memory allocated                                                 0
sorts (memory)                                                           13

8 rows selected.

Btw, I’ve just updated my WaitProf script to a merge join row-generator to save memory when profiling over longer periods of time (which you probably don’t want to do as WaitProf never sleeps – is constantly on CPU, sampling as fast as it can).

If you want to read about performance of different row generators, then Adrian Billington has written a comprehensive article about the topic: http://www.oracle-developer.net/display.php?id=408

This entry was posted in Cool stuff, Oracle and tagged , . Bookmark the permalink.

8 Responses to Generating lots of rows using connect by – safely!

  1. Thank you for this tip Tanel :)

    SQL> select count(r) from (
    2 select rownum r from
    (select rownum r from dual connect by rownum <= 1000) a,
    3 4 (select rownum r from dual connect by rownum <= 1000) b,
    5 (select rownum r from dual connect by rownum <= 1000) c
    6 where rownum select count(r) from (
    2 (select rownum r from dual connect by rownum <= 100000000) )
    3 /

    COUNT(R)
    ———-
    100000000

    Elapsed: 00:01:50.69

    I tried hash join and parallelism but didn’t make an impotant change.

  2. wordpress cut SQLs in my previous comment, so here are just my results;

    case 1 -
    Elapsed: 00:00:43.22

    case 2 -
    Elapsed: 00:01:50.69

  3. tanelp says:

    Hash join would probably be less efficient here than a cartesian merge join anyway (haven’t tested, but have reason to believe so as cartesian merge join is just a mulitplication loop, no data comparison is needed at all…)

    Also, I suspect this query runs in serial anyway – as far as I know Oracle doesn’t parallelize connect by rowsources (as a self-connect by operation isn’t too suitable for automatic parallelization…)

  4. Mark J. Bobak says:

    Hey Tanel,

    Yes, that was my Oracle-L posting.

    But you’re right, it doesn’t seem to be in the archive anymore.

    Anyhow, cool idea/solution! Thanks for the tip!

    -Mark

  5. peter says:

    Hi Tanel,

    Can you please post your “ses” script?

    thanks
    - peter

  6. rc says:

    One can speed it up by using the with-clause.

    with data1 as
    (
    select rownum r
    from dual
    connect by rownum <= 1000
    ),
    data2 as
    (
    select rownum r
    from data1 d1
    , data1 d2
    , data1 d3
    )
    select count(*)
    from data2
    /

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>