COUNT STOPKEY operation (the where ROWNUM <= N predicate) doesn't process over ~4 Billion rows and returns wrong results

I was running tests on some very large tables on an Exadata cluster and found an interesting bug.

Instead of having to query 4Billion row tables, I have reproduced this case with a cartesian join…

Check this. I’m generating 8 Billion rows using a 3-way cartesian join of set of 2000 rows. So, this results in 2000 * 2000 * 2000 rows, which is 8 billion rows.

SQL>  with sq as (select null from dual connect by level <= 2000)
      select count(*)
      from sq a, sq b, sq c;

COUNT(*)
----------
8000000000

Everything worked well as you see. All 8 billion rows were nicely counted. Let’s modify this query a bit, by adding a WHERE rownum <= 8 000 000 000 predicate, which shouldn’t modify the outcome of my query as 8 billion rows is exactly what I have:

SQL> with sq as (select null from dual connect by level <= 2000)
     select count(*)
     from sq a, sq b, sq c
     where rownum <= 8000000000;

COUNT(*)
----------
4294967295

Ding! (as I’m unable to come up with a better word to present this problem :)

Seems like we have a problem! The COUNT operation thinks that we have only a bit over 4 billion rows returned from the query plan branches below it.

We clearly have a(n easily reproducible ) bug here!

What happens here is that whenever you use ROWNUM < N operator in a query predicate like I have done, you will introduce an additional step to the query plan (COUNT STOPKEY).

Check this simple query:

SELECT * FROM t WHERE ROWNUM <= 1000

Plan hash value: 508354683

--------------------------------------------
| Id  | Operation          | Name | E-Rows |
--------------------------------------------
|   0 | SELECT STATEMENT   |      |        |
|*  1 |  COUNT STOPKEY     |      |        |
|   2 |   TABLE ACCESS FULL| T    |     41 |
--------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(ROWNUM<=1000)

As the above plan explains, the “WHERE rownum <” predicate is applied in the execution plan line (row source) 1 – COUNT STOPKEY. You won’t see this line when you are just counting rows without any “ROWNUM <” predicate.

How this works is that the COUNT STOPKEY knows how many rows you want and will just keep calling its child function under it in the execution plan tree to get more and more rows, until the required amount of rows have been returned. And then the COUNT STOPKEY function just stops calling its child row sources and declares the end-of-data condition.

And here’s the catch – due to a bug, the variable used to keep track of number of processed rows by COUNT STOPKEY is only 4 bytes, even on 64bit platforms. So, it can hold values up to 4294967295 in it (the count returned above), which is 2 ^ 32 – 1. That wouldn’t be so much of a problem in practical real world applications, but what I don’t like is that the operation will silently return wrong data – it will just stop fetching more rows, even though we haven’t reached the ROWNUM <= 8 000 000 000 limit yet and there are rows to fetch from the child row-source, but COUNT STOPKEY declares end-of-data condition and returns ~4B as a result.

This is (obviously a bug) and now there’s a bug number for that as well (thanks to Greg Rahn for letting me know) – Bug# 10214991 (unfortunately it’s not public in MOS).

Now, there’s a reason why this bug has gone unnoticed for so long despite that 4+ billion-row tables have existed for long time (I worked first with such Oracle database in year 2000 – on Windows NT :).

A real-life business query with a construct of WHERE ROWNUM <= X makes sense only when the data is ordered by some meaningful business attribute (a column). Otherwise you’ll get the query results in quite random order. Note that I’m talking about real, business queries here.
The only right way to order data in SQL is via an ORDER BY clause. There is no other right way to do it, even though some hacks sometimes work (and will stop working after the next upgrade)
Nobody (or not many people) have written queries like: give me the sum of 5 billion biggest purchases ordered by the selling price or give me the sum of last 5 billion purchases up to this date. If you replace the billion by just ten, hundred, or just a thousand, then hell yeah, such types of queries are being executed every day (or should I say minute, second).

Whether using ROWNUM in your business queries (instead of actual date range or purchase price in $$$) is a good or bad design is a completely different question &#8211; the fact is that ROWNUM is used in such SQLs.

Now we’ll just need to wait until the data volumes (and reports) get so big that someone actually hits this problem in real life!

So, better patch your database before the end of this decade! :-)

COUNT STOPKEY operation (the where ROWNUM <= N predicate) doesn't process over ~4 Billion rows and returns wrong results

Tanel Poder

2010-10-25