A few more announcements…

Today’s hacking session was a success – 144 people signed up and around 100 attended – thank you!
  • I will do the next hacking session in November! Don’t know exact date yet, I’ll just need to find a free enough day… I think I’ll go deeper into internals next time to justify the “hacking” part in the name :-)
Last days for getting the early-bird rate at the first Virtual Conference about Systematic Oracle SQL Optimization with Cary Millsap, Jonathan Lewis, Kerry Osborne and me!
Additional Q & A sessions added into the package of my AOT2.0 seminars:
  • NB! As a fresh addition – the AOT2.0 seminar attendees can attend two additional 2 hour Q & A sessions after the seminars!
    • 2 hours in mid-December 2010 (TBD)
    • 2 hours in mid-January 2011 (TBD)
    • The idea is that when you put the new tools and knowledge into use in real life, there will likely be additional questions that arise. The extra Q&A sessions, where Tanel answers questions related to the seminar material (and do even more demos) should help you to get the most out of these seminars!
    • Existing and new attendees of AOT20 seminars, no action is needed from you. I’ll just send you a mail with proposed dates after the seminars…
  • Last chance to sign up to the first part of the Advanced Oracle Troubleshooting v2.0 Online Deep Dives (1-5 Nov)!

Another (secret) hacking session with me – using Oracle Session Snapper for flexible troubleshooting (and fun)

And this time we have audio !!! (Wow!)

Following the huge success of my last hacking session, planned while drinking beer at Graham Woods OOW pre-party and delivered from Miracle’s massive Oracle Closed World event in Thirsty Bear (between drinking beers), I’m announcing another hacking session:

What: Using Session Snapper for flexible Oracle Performance Troubleshooting

When: Wednesday 27th Oct 9:00-10:00 AM PDT (US West coast / California time). Check what’s this in your time zone here

Where: Internet! -> Sign up here: http://tech.e2sn.com/secret

You’ll need to register fast and be “there” on time as my current GotoWebinar account only allow 100 attendees to log on… last time over 100 people signed up, but “luckily” less actually showed up, so nobody got left outside!

BTW, I have figured out what went wrong with audio last time and caused my voice in the end of presentation disappear). A program, which I accidentally launched via a keyboard shortcut, grabbed my Mic input to itself, so gotowebinar’s app couldn’t access it anymore.

See you soon!

COUNT STOPKEY operation (the where ROWNUM <= N predicate) doesn’t process over ~4 Billion rows and returns wrong results

I was running tests on some very large tables on an Exadata cluster and found an interesting bug.

Instead of having to query 4Billion row tables, I have reproduced this case with a cartesian join…

Check this. I’m generating 8 Billion rows using a 3-way cartesian join of set of 2000 rows. So, this results in 2000 * 2000 * 2000 rows, which is 8 billion rows.

SQL>  with sq as (select null from dual connect by level <= 2000)
      select count(*)
      from sq a, sq b, sq c;

COUNT(*)
----------
8000000000

Everything worked well as you see. All 8 billion rows were nicely counted. Let’s modify this query a bit, by adding a WHERE rownum <= 8 000 000 000 predicate, which shouldn’t modify the outcome of my query as 8 billion rows is exactly what I have:

SQL> with sq as (select null from dual connect by level <= 2000)
     select count(*)
     from sq a, sq b, sq c
     where rownum <= 8000000000;

COUNT(*)
----------
4294967295

Ding! (as I’m unable to come up with a better word to present this problem :)

Seems like we have a problem! The COUNT operation thinks that we have only a bit over 4 billion rows returned from the query plan branches below it.

We clearly have a(n easily reproducible ) bug here!

What happens here is that whenever you use ROWNUM < N operator in a query predicate like I have done, you will introduce an additional step to the query plan (COUNT STOPKEY).

Check this simple query:

SELECT * FROM t WHERE ROWNUM <= 1000

Plan hash value: 508354683

--------------------------------------------
| Id  | Operation          | Name | E-Rows |
--------------------------------------------
|   0 | SELECT STATEMENT   |      |        |
|*  1 |  COUNT STOPKEY     |      |        |
|   2 |   TABLE ACCESS FULL| T    |     41 |
--------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(ROWNUM<=1000)

As the above plan explains, the “WHERE rownum <” predicate is applied in the execution plan line (row source) 1 – COUNT STOPKEY. You won’t see this line when you are just counting rows without any “ROWNUM <” predicate.

How this works is that the COUNT STOPKEY knows how many rows you want and will just keep calling its child function under it in the execution plan tree to get more and more rows, until the required amount of rows have been returned. And then the COUNT STOPKEY function just stops calling its child row sources and declares the end-of-data condition.

And here’s the catch – due to a bug, the variable used to keep track of number of processed rows by COUNT STOPKEY is only 4 bytes, even on 64bit platforms. So, it can hold values up to 4294967295 in it (the count returned above), which is 2 ^ 32 – 1. That wouldn’t be so much of a problem in practical real world applications, but what I don’t like is that the operation will silently return wrong data – it will just stop fetching more rows, even though we haven’t reached the ROWNUM <= 8 000 000 000 limit yet and there are rows to fetch from the child row-source, but COUNT STOPKEY declares end-of-data condition and returns ~4B as a result.

This is (obviously a bug) and now there’s a bug number for that as well (thanks to Greg Rahn for letting me know) – Bug# 10214991 (unfortunately it’s not public in MOS).

Now, there’s a reason why this bug has gone unnoticed for so long despite that 4+ billion-row tables have existed for long time (I worked first with such Oracle database in year 2000 – on Windows NT :).

  1. A real-life business query with a construct of WHERE ROWNUM <= X makes sense only when the data is ordered by some meaningful business attribute (a column). Otherwise you’ll get the query results in quite random order. Note that I’m talking about real, business queries here.
  2. The only right way to order data in SQL is via an ORDER BY clause. There is no other right way to do it, even though some hacks sometimes work (and will stop working after the next upgrade)
  3. Nobody (or not many people) have written queries like: give me the sum of 5 billion biggest purchases ordered by the selling price or give me the sum of last 5 billion purchases up to this date. If you replace the billion by just ten, hundred, or just a thousand, then hell yeah, such types of queries are being executed every day (or should I say minute, second).
    Whether using ROWNUM in your business queries (instead of actual date range or purchase price in $$$) is a good or bad design is a completely different question – the fact is that ROWNUM is used in such SQLs.

Now we’ll just need to wait until the data volumes (and reports) get so big that someone actually hits this problem in real life!

So, better patch your database before the end of this decade! :-)

Read currently running SQL statement’s bind variable values using V$SQL_MONITOR.BIND_XML in Oracle 11.2

The title pretty much says it. In Oracle 11.2, if you have the Diag+Tuning Pack licenses and the SQL monitoring kicks in for your SQL statement, then instead of the old fashioned ERRORSTACK dump reading you can just query the V$SQL_MONITOR.BIND_XML to find the values and metadata of your SQL statement’s bind variables.

I’ve written an example here:

And a related comment – V$SQL_BIND_CAPTURE is not a reliable way for identifying the current bind variable values in use. Oracle’s bind capture mechanism does not capture every single bind variable into SGA (it would slow down apps which run lots of short statements with bind variables). The bind capture only selectively samples bind values, during the first execution of a new cursor and then every 15 minutes from there (controlled by _cursor_bind_capture_interval parameter), assuming that new executions of that same cursor are still being started (the capture happens only when execution starts, not later during the execution).

Few new blogs worth checking out

Here are few blogs which I’ve found lately and think that you should also check out if you’re interested in Oracle stuff (and take time to read blogs :)

Oracle performance & Grid Control
http://juliandontcheff.wordpress.com/

Oracle Exadata
http://ferhatsengonul.wordpress.com/

Oracle, Unix, storage (for DW)
http://dsstos.blogspot.com/

Oracle, Storage & Virtualization (VMWare)
http://oraclestorageguy.typepad.com/

Julian Dyke’s new blog (RAC, Data Guard so far – probably internals later :-)
http://juliandyke.wordpress.com/

Announcing the first E2SN Virtual Conference with Jonathan Lewis, Cary Millsap, Kerry Osborne and me – 18-19 Nov!

Yes, it’s official. I’m organizing a virtual conference with some of THE top speakers in the world. The topic is Systematic Oracle SQL Optimization (in real world)

The dates are 18-19 November, the conference lasts for 4 hours on both days, so you’ll be able to still get some work done as well (and immediately apply the knowledge acquired!).

Well, none of the speakers need introduction, but just in case you’ve lived in space for last 20 years, here are the links to their blogs :)

I can tell you, (at least the first 3) people in the above list ROCK!!!

And all of them are OakTable members too :)

This conference will have 4 x 1.5 hour sessions, each delivered by a separate speaker. We aim to systematically cover the path of:

  1. Finding out where is the performance problem (and which SQLs cause it)
  2. Finding out what is the problem SQL execution plan doing and which part of it is slow
  3. How to write and fix your code so that the optimizer wouldn’t hate your SQL
  4. How to fix the SQL execution plan performance problem when you can’t touch the application code!

And as this is the first (pilot) virtual conference, then the price is awesome, especially if you get the early bird rate by signing up before 1. November!

So, check out the abstracts, details, agenda and sign up here!

http://tech.e2sn.com/virtual-conferences

P.S. I expect this event to be awesome!

A: The most fundamental difference between hash and nested loop joins

Ok guys, thanks for waiting!

I ended up expanding the article quite a lot compared to what I had originally planned. In fact I only wrote 50% of what I plan to write, I’ll update the rest… um… later… Instead of just stating the difference between the joins I took a step back and elaborated something what I often see people doing (and talking about in newsgroups and lists too).

Basically the most fundamental (or biggest or most important) difference between nested loop and hash joins is that:

  • Hash joins can not look up rows from the inner (probed) row source based on values retrieved from the outer (driving) row source, nested loops can.

In other words, when joining table A and B (A is driving table, B is the probed table), then a nested loop join can take 1st row from A and perform a lookup to B using that value (of the column(s) you join by). Then nested loop takes the next row from A and performs another lookup to table B using the new value. And so on and so on and so on.

This opens up additional access paths to the table B, for example when joining ORDERS and ORDER_ITEMS by ORDER_ID (and ORDER_ID is leading column of PK in ORDER_ITEMS table), then for whatever orders are taken from ORDERS table, we can perform a focused, narrow index range scan on ORDER_ITEMS for every ORDER_ID retrieved from the driving ORDERS table. A hash join can’t do that.

Of course this doesn’t mean that hash joins can’t use any indexes for tables they read – index range scans and unique lookups can still be used under a hash join, but only if there are constant values in the query text (in form of literal or bind variables). If there are no such constant (filter) conditions under a hash join, then the other options to use that index would be to do an INDEX FULL SCAN (which is a range scan from end to end of the index) or INDEX FAST FULL SCAN (which is like a full table scan through the entire index segment). However none of these opportunities give the same benefits as nested loops looking up rows from row source B dynamically based on what was retrieved from A during runtime.

Note that this nested loops benefit isn’t limited to indexes only on table B, the difference is more fundamental than just a specific access path. For example, if table B happens to be a single table hash cluster or indexed X$ table, then the nested loop is also able to do “optimized” lookups from these row-sources, based on the values retrieved from table A.

So, my article with a lot of (loosely) related details is here:

In the comments section of my question, Tom, Bernard Polarski, Christian Antognini and Marc Musette got the closest to what I had in my mind when I asked the question. However, of course your mileage may vary somewhat depending on what kind of problems you have experienced the most over all the years. Also, Jonathan Lewis had a valid comment regarding that the answer depends on what exactly does one mean by “fundamental” and yeah this was open to interpretation.

Nevertheless, I wanted to emphasize that there’s a more important difference between NL and hash joins, than the usual stuff you see in training material which talk about implementation details like hash tables and memory allocation…

Some day I will complete that article, I plan to add some design advice in there, like denormalization opportunities for getting the best of the both worlds etc. But now I’m gonna get a beer instead.

Thanks for reading and answering my blog, I was quite impressed by the volume of comments & answers to my question. I must do this more often!

New online seminars – Advanced Oracle Troubleshooting v2.0 Deep Dives

As I mentioned in a previous post, I won’t be doing much flying anymore and so am changing all my seminar offering to online seminars.

So, I’ve changed and re-arranged my seminar content into self-contained 4-hour deep dives and thanks to the online nature (no travel needed), people can choose which days they want to attend. If you’re interested in latch contention only, you can attend the Latch Contention deep dive for example etc. Or you can still attend all the deep dives. The cool thing is that these deep dive sessions take only half a day, too (and are priced accordingly). That way you don’t have to skip work for the whole day (or week) and still can get some of your daily work done too. Hopefully it makes your life a bit easier when getting approval to attend the sessions.

As the main feedback from my seminars has been that “there’s too much to learn” within the short 2-3 days I used to do my seminars in, I have arranged the material so that there will be more time to go deep into the subject area. Also, I have planned plenty of time for questions & answers (1 hour out of the 4 hours is planned Q&A sessions and attendees can also ask questions any time during the lecture & demos).

It looks like I will only offer my Advanced Oracle Troubleshooting v2.0 class online this year. I will probably schedule my Advanced SQL Tuning deep dives in January/February 2011 and the Advanced Troubleshooting class again in March/April and so on (until I go public with my other business, when I won’t have time for full length training anymore).

You can check the current schedule and pricing out here:

Here’s a brief outline of individual half-day Deep Dives I offer:

  1. AOT deep dive 1: Systematic approach for Advanced Oracle Troubleshooting
  2. AOT deep dive 2: Troubleshooting physical IO and buffer cache issues
  3. AOT deep dive 3: Troubleshooting commit, redo, undo and transaction issues
  4. AOT deep dive 4: Troubleshooting Oracle SGA/PGA/UGA and OS memory issues
  5. AOT deep dive 5: Troubleshooting shared pool and library cache issues
  6. AOT deep dive 6: Troubleshooting enqueue lock waits and deadlocks
  7. AOT deep dive 7: Troubleshooting latch contention
  8. AOT deep dive 8: Troubleshooting Mutex and “cursor: pin” contention
  9. AOT deep dive 9: Troubleshooting complex hangs and spins
  10. AOT deep dive 10: Troubleshooting crashes, bugs and ORA-600/ORA-7445 errors

So, sign up now, seats are limited ;-)