A: The most fundamental difference between hash and nested loop joins

Ok guys, thanks for waiting!

I ended up expanding the article quite a lot compared to what I had originally planned. In fact I only wrote 50% of what I plan to write, I’ll update the rest… um… later… Instead of just stating the difference between the joins I took a step back and elaborated something what I often see people doing (and talking about in newsgroups and lists too).

Basically the most fundamental (or biggest or most important) difference between nested loop and hash joins is that:

  • Hash joins can not look up rows from the inner (probed) row source based on values retrieved from the outer (driving) row source, nested loops can.

In other words, when joining table A and B (A is driving table, B is the probed table), then a nested loop join can take 1st row from A and perform a lookup to B using that value (of the column(s) you join by). Then nested loop takes the next row from A and performs another lookup to table B using the new value. And so on and so on and so on.

This opens up additional access paths to the table B, for example when joining ORDERS and ORDER_ITEMS by ORDER_ID (and ORDER_ID is leading column of PK in ORDER_ITEMS table), then for whatever orders are taken from ORDERS table, we can perform a focused, narrow index range scan on ORDER_ITEMS for every ORDER_ID retrieved from the driving ORDERS table. A hash join can’t do that.

Of course this doesn’t mean that hash joins can’t use any indexes for tables they read – index range scans and unique lookups can still be used under a hash join, but only if there are constant values in the query text (in form of literal or bind variables). If there are no such constant (filter) conditions under a hash join, then the other options to use that index would be to do an INDEX FULL SCAN (which is a range scan from end to end of the index) or INDEX FAST FULL SCAN (which is like a full table scan through the entire index segment). However none of these opportunities give the same benefits as nested loops looking up rows from row source B dynamically based on what was retrieved from A during runtime.

Note that this nested loops benefit isn’t limited to indexes only on table B, the difference is more fundamental than just a specific access path. For example, if table B happens to be a single table hash cluster or indexed X$ table, then the nested loop is also able to do “optimized” lookups from these row-sources, based on the values retrieved from table A.

So, my article with a lot of (loosely) related details is here:

In the comments section of my question, Tom, Bernard Polarski, Christian Antognini and Marc Musette got the closest to what I had in my mind when I asked the question. However, of course your mileage may vary somewhat depending on what kind of problems you have experienced the most over all the years. Also, Jonathan Lewis had a valid comment regarding that the answer depends on what exactly does one mean by “fundamental” and yeah this was open to interpretation.

Nevertheless, I wanted to emphasize that there’s a more important difference between NL and hash joins, than the usual stuff you see in training material which talk about implementation details like hash tables and memory allocation…

Some day I will complete that article, I plan to add some design advice in there, like denormalization opportunities for getting the best of the both worlds etc. But now I’m gonna get a beer instead.

Thanks for reading and answering my blog, I was quite impressed by the volume of comments & answers to my question. I must do this more often!

NULL is not zero!

Some time ago I wrote a post about how COUNT(*) and COUNT(column) are semantically different things (link). Such queries may return different results if the column counted has NULLs in it. And the difference comes from that NULL is not a value, it’s rather a state which says “value unknown” or “no value entered”.

So, you better understand how NULLs interact with your SQL constructs if you call yourself a DBA or a database developer ;-)

Here’s another example about how misunderstanding NULLs may cause your application to return different results than what was intended.

I will create a little table with TWO rows in it:

SQL> create table t(a int);
Table created.

SQL> insert into t values(1);

1 row created.

SQL> insert into t values(null);

1 row created.

SQL> select avg(a) from t;

 AVG(A)
----------
         1

When I take an average of the 2 values in these rows I get average of 1.

Now lets update the NULL (no value) to 0 (an actual value of zero).

SQL> update t set a=0 where a is null;

1 row updated.

SQL> select avg(a) from t;

 AVG(A)
----------
        .5

As you see, as we now have an actual value in the other row (as opposed to “no value”), the AVG function takes that zero into account.

Hopefully this illustrates once more that NULL does not mean zero or any other value, it means NO value. If you do aggregation functions (count,avg) over NULLs then you must understand that Oracle treats NULLs as no value and doesn’t account these “no values”, thus your queries may behave differently than what your intuition might say (and yes its always good to read documentation about what exactly a given SQL construction/function does in the given database engine instead of relying on “common sense”).

Measuring what matters

Cary Millsap’s recent post prompted me to write down some of the related thoughts in my head.

Here are few of my mantras for systematic troubleshooting and performance tuning, which have materialized in my head over the years of work:

  • Picking the right starting point to troubleshooting and performance tuning is the most important decision in that process.
  • Pick the wrong starting point and you end up going in circles.
  • The scope of your performance data needs to match the scope of your problem, otherwise you end up going in circles.
  • If you don’t measure what matters, you may end up fixing what doesn’t matter.
  • If you’re not systematic in your troubleshooting, you may get lucky, but you don’t want to be dependent on luck! Moreover, you wont’t need to be lucky if you are systematic in your work!
  • Performance tuning is overrated. Fixing fundamental design and coding flaws via changing a magic configuration parameter is a dream just like is getting slim and healthy via eating magic diet pills bought from TV shop.
  • Your response times are too long for only two reasons:
  1. You are doing too much work
  2. You are waiting for too much

…both of the above things can be measured in Oracle…

  • There’s no such thing as slow database or slow system. How can it be slow independently, without anyone experiencing this slowness?
    • If users say that a database is slow, they must be experiencing that somehow! The only way to experience database slowness is via a connection to it, in which case you’ll have a session (to measure).
    • If a monitoring system says that a database is slow, then it must be running and measuring response time of some task just like users do, otherwise it can not reliably say something is slow.
  • Performance is about one thing and one thing only – time. And time is measured in seconds, not in CPU utilization, number of physical IOs or looks of an execution plan.

Here’s a link to a Cary Millsap’s awesome post, read it!

VLDB 2008 proceedings, Oracle optimizer plan stability, adaptive cursor sharing and SecureFiles

If you’re interested in leading edge database research (as of 2008 :), the VLDB 2008 proceedings are publicly available now.

Here are direct links to some Oracle-specific ones:

Enjoy! :)

A gotcha with parallel index builds, parallel degree and query plans

Reading the following article about PARALLEL hint by Jonathan Lewis made me remember a somewhat related gotcha with parallelism.

Often when creating (or rebuilding) an index on a large table, doing it with PARALLEL x option makes it go faster – usually in case when your IO subsystem is not the bottleneck and you have enough spare CPU capacity to throw in.

A small example below:


Tanel@Sol01> create table t1 as select * from all_objects;

Table created.

Tanel@Sol01> create index i1 on t1(object_id);

Index created.

Tanel@Sol01> exec dbms_stats.gather_table_stats(user, 'T1', cascade=>true, no_invalidate=>false);

PL/SQL procedure successfully completed.

Ok, for whatever reason I need to rebuild my index, and for speed I do it in parallel:


Tanel@Sol01> alter index i1 rebuild parallel 4;

Index altered.

Tanel@Sol01>
Tanel@Sol01>
Tanel@Sol01> select
  2     sum(object_id)
  3  from
  4     t1
  5  where
  6     object_id > 60000
  7  /

SUM(OBJECT_ID)
--------------
      13233374

Tanel@Sol01>
Tanel@Sol01> @x

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------------------
Plan hash value: 3900446664

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |     1 |     5 |     8   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE   |      |     1 |     5 |            |          |
|*  2 |   INDEX RANGE SCAN| I1   |  2923 | 14615 |     8   (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("OBJECT_ID">60000)

14 rows selected.

The execution plan shows a nice serial range scan for above query. Let’s run the same query with a different value for object_id:


Tanel@Sol01> select
  2     sum(object_id)
  3  from
  4     t1
  5  where
  6     object_id > 10000
  7  /

SUM(OBJECT_ID)
--------------
    1294174783

Tanel@Sol01>
Tanel@Sol01> @x

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------
Plan hash value: 2596547647

-------------------------------------------------------------------------------------------------------------------
| Id  | Operation                 | Name     | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
-------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |          |     1 |     5 |    19   (0)| 00:00:02 |        |      |            |
|   1 |  SORT AGGREGATE           |          |     1 |     5 |            |          |        |      |            |
|   2 |   PX COORDINATOR          |          |       |       |            |          |        |      |            |
|   3 |    PX SEND QC (RANDOM)    | :TQ10000 |     1 |     5 |            |          |  Q1,00 | P->S | QC (RAND)  |
|   4 |     SORT AGGREGATE        |          |     1 |     5 |            |          |  Q1,00 | PCWP |            |
|   5 |      PX BLOCK ITERATOR    |          | 42937 |   209K|    19   (0)| 00:00:02 |  Q1,00 | PCWC |            |
|*  6 |       INDEX FAST FULL SCAN| I1       | 42937 |   209K|    19   (0)| 00:00:02 |  Q1,00 | PCWP |            |
-------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   6 - filter("OBJECT_ID">10000)

18 rows selected.

What? Suddenly my query turned parallel !

I haven’t enabled parallelism for my table! How can Oracle go parallel without my consent?

Tanel@Sol01> select table_name, degree from user_tables  where table_name = 'T1';

TABLE_NAME                     DEGREE
------------------------------ ----------------------------------------
T1                                      1

The answer lies in the result of next query:

Tanel@Sol01> select index_name, degree from user_indexes where table_name = 'T1';

INDEX_NAME                     DEGREE
------------------------------ ----------------------------------------
I1                             4

Parallel index (re)build will persistently set the index parallel degree in data dictionary to the value used during build!

This enables the CBO to be free to consider also parallel query plans and in our second select case a parallel plan seemed to be the best.

Whether this parallel plan actually is the most efficient way to go is a separate question, however this kind of unplanned parallelism may destabilize your system performance, especially as it can kick in only for certain instantiations of your SQL statement. Note that even one parallel-enabled object in your execution plan can parallelize the whole query joining many tables (just as even one table with statistics in a join turns on CBO for the whole cursor).

Combined with bind variable peeking side-effects and way too high parallel_max_servers value (hey it’s just a max value, let’s set it to 500), this can bring your OLTP system to knees at very unexpected times.

So, as my database does not have parallelism planned into it, I will eliminate the troublemaker:

Tanel@Sol01> alter index i1 noparallel;

Index altered.

Tanel@Sol01>
Tanel@Sol01> select
  2     sum(object_id)
  3  from
  4     t1
  5  where
  6     object_id > 10000
  7  /

SUM(OBJECT_ID)
--------------
    1294174783

Tanel@Sol01>
Tanel@Sol01> @x

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------
Plan hash value: 129980005

------------------------------------------------------------------------------
| Id  | Operation             | Name | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |      |     1 |     5 |    19   (0)| 00:00:02 |
|   1 |  SORT AGGREGATE       |      |     1 |     5 |            |          |
|*  2 |   INDEX FAST FULL SCAN| I1   | 42937 |   209K|    19   (0)| 00:00:02 |
------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("OBJECT_ID">10000)

14 rows selected.

So the key point is that unless your databases have planned and managed parallelism used in them, it’s worth to run the following query to identify potential troublemakers and disable their parallelism:

SELECT
	'INDEX' OBJECT_TYPE, OWNER, INDEX_NAME, TRIM(DEGREE)
FROM
	DBA_INDEXES
WHERE
	TRIM(DEGREE) > TO_CHAR(1)
UNION ALL
SELECT
	'TABLE', OWNER, TABLE_NAME, TRIM(DEGREE)
FROM
	DBA_TABLES
WHERE
	TRIM(DEGREE) > TO_CHAR(1)
/

On my test environment it returned the following rows:

OBJEC OWNER                          INDEX_NAME                     TRIM(DEGREE)
----- ------------------------------ ------------------------------ -------------
INDEX SYS                            UTL_RECOMP_SORT_IDX1           DEFAULT
TABLE TANEL                          T                              4

From here we see two addtional things:

  • Parallel operations also persist their degree to tables (using alter table move parallel x or CTAS for example)
  • There’s a parallel degree DEFAULT – which is used when you let the appropriate degree to be decided by optimizer

On the other hand, if you have planned for parallelism, then you probably want to keep the parallelism for tables and indexes consistent, e.g. enable it for all tables requiring parallelism and their indexes, not just for couple of indexes by accident.

Update:
Thanks to Adrian Billington for reminding me that also NOLOGGING flag will stick in data dictionary should you perform nologging operations on an object. You should review those too after rebuilds and reorgs.