Why does even a small difference in SQL text cause a hard parse?

I just replied to an Oracle Forum Thread about why does even a small difference in SQL statement text cause it to be hard parsed and loaded as a different cursor. The reason is actually very simple – and I’m posting it into my blog too:

—

The reason why a statement with even a minor difference in text is parsed as a separate cursor is due how Oracle looks up statements from library cache.

First, a little background:

Library cache can contain thousands, tens of thousands or even hundreds of thousands of cursors (yuck!) The latter can happen especially if SGA_TARGET is in use in conjuction with an application which doesn’t use bind variables – this can cause shared pool to grow and grow at the expense of other SGA components, usually buffer cache.

Anyway, the key issue is – library cache can contain lots of cursors and thanks how shared pool memory allocation works, these cursors’ memory structures can lay anywhere in shared pool heaps.

So, now when this new session with a “select * from eMp” query comes in, how do you make sure this statement is not already cached in library cache?

One option would be to always scan through whole shared pool in search of such cursor text. This would be disastrous to performance.

Another option would be keeping some kind of sorted array or index of SQL texts somewhere in shared pool, with pointers to corresponding cursors heaps where execution plans and other required things are kept. This would also require some kind of index maintenance when new cursors are parsed and old are aged out… And also, it would mean quite a lot of string comparison, especially for long SQL statements. You could end up with having to do thousands of string comparisons for finding the matching SQL – if it exists in the cache at all. If it doesn’t exist, this effort is wasted.

Oracle (and many other vendors) has taken a different approach. Whenever a new parse request comes in, Oracle calculates hash value from the cursor text and uses this hash value for looking up corresponding library cache “hash chain” which is a linked list of all cursors which text hashes to that bucket.

For example if the library cache is organized to 131072 hash buckets and a SQL statement’s hash value happens to be 123456789 then a MOD(123456789, 131072) function determines that if the cursor is loaded into library cache, it *must* exist in library cache hash chain number 118037. Thanks to this rule, the Oracle cursor lookup function (kglget/kksfbc) knows that if this cursor didn’t exist under given hash chain, then it is not loaded into library cache at all.

So, instead of scanning through whole library cache or doing loads of string comparisons, we can find our cursor with a simple hash calculation function, go to the appropriate hash chain and need to traverse through that only.

The problem here is, that even if the SQL text is only a little different from other, it’s hash value is very likely going to be different as well (that’s just how the hashing works), thus Oracle will look through a different chain of cursors and won’t see anything outside it. Therefore Oracle concludes this cursor doesn’t exist in cache and needs to be loaded via hard parse.

Some tools like Forms and even PL/SQL do try to generate SQL in canonical form, by stripping out comments, whitespace and changing all text to upper (or lower) case, in hope for eliminating some of these differences.

In practice, when a classic evil application running a tight loop of generated queries with literal values, virtually all of your hash values will be different. This is why Oracle has come up with CURSOR_SHARING parameter, which will detect literal values, strip them out and replace them with bind variables.

There’s a single reason for that – this way there will be a single SQL text string for a statement, therefore it’s hash value will be the same, therefore the cursor always maps to the same library cache hash bucket and can be found for reuse.

Why does even a small difference in SQL text cause a hard parse?

Tanel Poder

2008-08-14