Q: The most fundamental difference between HASH and NESTED LOOP joins?

So, what do you think is the most fundamental difference between NESTED LOOPS and HASH JOINS?

This is not a trick question. You’re welcome to write your opinion in the comments section – and I’ll follow up with an article about it (my opinion) later today…

Update: The answer article is here:

http://blog.tanelpoder.com/2010/10/06/a-the-most-fundamental-difference-between-hash-and-nested-loop-joins/

This entry was posted in Oracle and tagged , . Bookmark the permalink.

47 Responses to Q: The most fundamental difference between HASH and NESTED LOOP joins?

  1. HASHes are bad as they involve full scans ;)

    Hehe…better i would wait for your article only :)

  2. Hash joins use Oracle’s provided hash algorithm for probing and under wide range of conditions, it’s most efficient approach.

    NL, on that other hand, do full table scan or index scan to scan the tables.

    It’s accessing technique which is the basic difference … hashing or indexing.

    P.S. Based on my limited knowledge about these join techniques, would better wait for your article.

  3. In a plain explanation, NL act according to the rules defined, never uses facts and figure to act smart. Whereas, HJ takes some onus to act optimally.

    Jaffar

  4. “the most fundamental difference”?

    The smallest item being processed in a nested loop join is a row.

    The smallest item being processed in a hash join is a data block.

    So, a nested loop may be less efficient, especially if all the rows to retrieved are to be extracted from several blocks, since these same blocks will re-read several times when generating the resultset.

    Is this along the lines you were looking for?

    Cheers,

    Mat.

  5. ST says:

    Hash Joins are basically Nested loop joins in a smaller scale (just rows in the same hash bucket will be compared), …

  6. ST says:

    Hash Joins are basically multiple Nested loop joins in a smaller scale (just rows in the same hash bucket will be compared), …

  7. Sokrates says:

    there is not so much of the difference between them …
    they are all mentioned on
    http://jonathanlewis.wordpress.com/2010/08/02/joins

  8. Tom says:

    With both you are joining two things, it could be 2 result sets, 2 tables, or looking through an index and accessing a row by rowid and then joining to something else. So I will explain the fundamental difference at a very basic level. Without getting into the join part per se.

    Nested Loop.

    You have 2 things (A and B) and you want to join them. Say I have 100 rows in each. I go to A, get the first row, then loop through every row in B looking for a match, then I go to row 2 in A and repeat going through B looking for a match. This works well if the amount of rows in A is small, if it is very big, it can be very painful.

    Hash Join

    We build a hash table, we go through A and build the table, then we go through B once and find the matches. It has more overhead but will finish much faster on larger data sets that the NL option.

  9. bernard polarski says:

    Nested loop : The current inner rows fetch is determined by the progression of the outer table row.
    Hash table : all the inner and outer rows are gathered before the first comparison between inner and outer table is performed.

  10. Hi Tanel

    In the last days I gave two presentations about join techniques (one at OOW and the other at SIOUG)… so, I feel compelled to share my 2 cents…

    Honestly, I’m not able to say what is the *most fundamental difference*. In fact, I see four main differences:
    1) HJ processes the right input at most one time; NL might process the right input many times
    2) HJ are not able to apply join conditions through indexes; NL are able to do so
    3) HJ does not support cross joins, theta joins, and partitioned outer joins; NL supports all types of joins
    4) HJ joins supports all types of trees; NL do not support right-deep and zig-zag trees

    I’m looking forward for your opinion on that topic.

    Cheers,
    Chris

  11. Tom says:

    NL works well with bonsai trees.

    HJ works well with redwood trees.

  12. Hmmm… I think that the previous discussion about access paths is really unrelated to this question, since any join method can use any access path.

    Lewis’ posts are helpful (linked by Sokrates). Both joins of course have a nested loop at the heart. The fundamental difference is that in a hashing algorithm, the inner loop operates on a significantly smaller dataset (only the rows that match a particular hash – ideally one)… but in exchange, we have some startup cost (building a hash table).

  13. I believe Nested Loops are how you make bird nests or quilts and Hash Joins were something they did in the sixties which resulted in Woodstock!

  14. Enrique Aviles says:

    The most fundamental difference is the amount of data processed. Tom’s bonsai/redwood analogy is pretty nice.

    Of course, Tanel will blow all of us out of the water with his answer! Can’t wait to see it. :)

  15. It probably depends on what you mean by fundamental – but I think the most fundamental (as in “fundamental particles”) is latching.

  16. krish says:

    Also hash joins work with the equality predicate only.

  17. mpb says:

    I would say the most fundamental difference is in how many times the “inner” data set gets iterated. In NL it’s length(outer data set), in HJ it’s 1.

    There are other interesting things like hash table setup time, relation to predicates etc., but those are more about implementation and approach by the optimizer, not the actual algorithms themselves.

  18. Where’s that follow up article Tanel?

    It really depends what you mean fundamental. Conceptually, practically, usability or limitations.

    But I had to pick one, is that Hash Joins have variable memory usage based on the number of rows (and withd) of the left side of the join, while nested loops have a fixed memory usage regardless of either join size.

  19. Tanel Poder says:

    Patience, the blog entry is coming ;-) I’ll post it here once done.

    By most fundamental I mean the most important… that can be open depending on the reader, so I mean in your opinion… (and I’ll post my opinion and reasoning very soon now).

  20. BunditJ says:

    The hash join is only applicable to the equijoin when joining two tables by creating the key of join in the session memory.

    The join order of tables acting as outer (driving) and inner on the nested loop is significant, and the access path of inner should be dependent on the outer to avoid Cartesian product.

  21. Hans Henrik Krohn says:

    The most fundamental difference is, for me, that in a hash join I can fathom from v$session_longops how far it has come and has left to go accessing the tables, which is very nice. The flip side is that ORA-04030 “out of process memory” occurs more often from hash joins.

  22. Bixapathi says:

    Hash Joins ——> We get two data sets into our session (PGA) by doing full scans/index full scans etc – no more ‘ACCESS’ to the tables/data set ..and from there you will get the joined row out -the ‘JOIN’ will be done in our session area

    Nested loops —–>it is a loop – for each outer iteration we will need to ‘ACCESS’ the inner table /data set – and output the joined row ….we will get the joined rows after each ‘ACCESS’ from the instance

  23. In my view the most fundamental difference between NL and HJ are that
    a) HJ is limited to equi-joins
    b) HJ has to read the entire outer rowset before being able to process and pass the first row from the inner rowset up the access path. An NL join can access the inner rowset as soon as it has the first row from the outer.

  24. Bixapathi says:

    Hash Joins ——> We get two data sets into our session (PGA) by doing full scans/index full scans etc – no more ‘ACCESS’ to the tables/data set ..and from there you will get the joined row out -the ‘JOIN’ will be done in our session area ,I mean – no more logical i/o ,physical i/o once we get the data into our session area etc .

    Nested loops —–>it is a loop – for each outer iteration we will need to ‘ACCESS’ the inner table /data set – and output the joined row ….we will get the joined rows after each ‘ACCESS’ from the instance – each ACCESS will lead to a physical or logical i/o

  25. Awaiting says:

    Hi Tanel,

    Can’t wait any more for your article ? come on :)

  26. Tanel Poder says:

    Ha, it’s coming… soon… too busy… I promised it “later today” on Wednesday I think and I’ll deliver it on that time…. but maybe on Wednesday 72 o’clock or something ;-)

  27. Tanel Poder says:

    And there are couple of answers similar to what I had in my mind in the comments section already!

  28. Awaiting says:

    Any clue about the comments- which are similar to your thoughts ?

  29. Tanel Poder says:

    That’s gonna be a secret until I post the article!!! ;-)

  30. Awaiting says:

    ok, thanks Tanel, Finally -Can we expect the article today before week-end else we can’t enjoy the week-end thinking what might be the fundamental difference between NL and HJ ?

  31. Hi Tanel,

    AFAIk, It’s an alternative to Sort Merge Join (SMJ) and Nested Loops (NL) Join.
    When comparing to NL – if the driving table is having high volume of data or
    high cardinality – then joining with other source tables will be highly expensive –
    when we opt for Nested loops

    HASH Joins – works well or suited for high volume of data processed. Before going for
    availability of indexes on Source tables we need to check some parameters which influence the
    join operation. Such as

    HASH_JOIN_ENABLED
    hash_area_size,
    db_block_size
    hash_multiblock_io_count

    Based on the Size of segments – tables joins and with above parameter values we can merely
    estimate or we will get an idea whether the hash join can be influenced or not.

    we will be waiting for your idea and valuable comments from your end.
    I think from linux Operating system perspective – from Os level we might to trace the
    semaphores – to check the size of sort Oracle Optimizer missing be using. Need to investigate
    from Linux tools.

  32. Amit says:

    Hi Tanel,

    When shall we have your blog on difference between HASH and NESTED LOOP ?I visit your blog daily.
    And I have one more query in my mind.I don’t know whether it is right place to post.I am fresher passed out from college and working as DBA. so can we have blog on how should junior DBA proceed.or if you already have this bolg then can i have link to that blog….If something wrong then please forgive me

  33. Nigel says:

    It could involve nanobots dressed in latex, and the wow factor will still have gone.

    Timing, Tanel, timing!

  34. Tanel Poder says:

    Well, I’m not writing my stuff because of the wow-factor :) Writing good stuff takes time, sometimes more than planned… so bear with me!

  35. srivenu says:

    I think the fundamental difference is as the name suggests. In contract to a NESTED Loops join, there is no looping involved in processing a HASH join (in most cases). The loop is only needed in a HASH Join (called a NESTED LOOPS hash join) when either of the row sources doesnt fit in the build memory. This difference is also what makes hash joins faster for large data sets.

  36. Tanel Poder says:

    To anyone waiting, I plan to publish this article today evening (which is some 8 hours from now in my timezone) unless something comes up… thanks for your patience ;-)

  37. @Tanel Poder
    Tanel,
    God provided true knowledge to an expert – perhaps its you are the ONE, I see in present world.
    Great Sir… !!

  38. Tanel Poder says:

    Ha! I just have spent way too much researching a narrow area in Oracle, but thanks :)

    Sorry to the readers waiting, something came up (yeah I actually have work to do in addition to hacking & blogging) and I’m still not done with the article. New deadline tomorrow :)

  39. Marc Musette says:

    My 2 cents : HJ can not use indexes to apply join conditions.

  40. Victor Wu says:

    My humble opinion:

    A. NESTED LOOP (works for: Equijoin & Nonequijoin)
    1. A driving table (Outer table) is scanned
    2. Each row returned drives a lookup to the inner table
    3. Joining rows are returned

    Notes:
    Most time Oracle Optimizer will favor nested loop when Inner table has index and total number of rows returned are small.

    B. HASH JOIN (works for Equijoin only)
    1. A smaller table is used to build a hash table in memory. hash join
    2. Nonequijoin ==> nested loop
    3. Equijoin , medium rows returned , one table has index ==> nested loop
    4. Equijoin , medium rows returned , both tables do not have index ==> hash join

    Fundamental differences:
    A. Hash join need to finish building the hash table before second row source can probe – Nested loop can lookup right away.
    B. Equijoin or Nonequijoin.
    C. Total number of rows returned.
    D. No indexes & a lot of memory in PGA_AGGREGATE_TARGET ===> Hash join.

  41. Victor Wu says:

    1. NESTED LOOP (works for: Equijoin & Nonequijoin)
    1. A driving table (Outer table) is scanned
    2. Each row returned drives a lookup to the inner table
    3. Joining rows are returned

    Notes:
    Most time Oracle Optimizer will favor nested loop when Inner table has index and total number of rows returned are small.

    2. HASH JOIN (works for Equijoin only)
    1. A smaller table is used to build a hash table in memory.
    2. The second table is hashed (or probing) the hash table in memory
    3. The matching rows are returned

    Notes:
    Most time Oracle Optimizer will favor hash join when a lot of rows returned and no indexes in both tables.

    Summary:
    Oracle Optimizer will decide:
    1. Equijoin & a lot of rows returned ==> hash join
    2. Nonequijoin ==> nested loop
    3. Equijoin , medium rows returned , one table has index ==> nested loop
    4. Equijoin , medium rows returned , both tables do not have index ==> hash join

    Fundamental:
    Fundamental differences:
    A. Hash join need to finish building the hash table before second row source can probe – Nested loop can lookup right away.
    B. Equijoin or Nonequijoin.
    C. Total number of rows returned.
    D. No indexes & a lot of memory in PGA_AGGREGATE_TARGET ===> Hash join.

  42. Victor Wu says:

    Victor Wu :My humble opinion:

  43. bernard polarski says:

    We are looking for THE most fundamental, so one response only : it force you to commit to one single choice.

  44. gonzalo rapido says:

    I am asking me, if you are so busy, why do you open a thread like this?
    Or is the reason just to play around with people and push them to think about something.

  45. Tanel Poder says:

    @gonzalo rapido
    Do your homework – read my blog, it’s in another article.

  46. Ahsan says:

    hi,
    can any one tell me how to calculate IOs using nested loop join, hash joins and sort merge joins. Im a new bee here. :)
    thanks

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>