KGL simulator, shared pool simulator and buffer cache simulator – what are these?

If you have queried v$sgastat on recent Oracle versions (by which I mean 9i and above) you probably have seen allocations for some sort of simulators in Oracle instance. Here’s an example:

SQL> select * from v$sgastat where lower(name) like '%sim%' order by name;

POOL         NAME                            BYTES
------------ -------------------------- ----------
shared pool  kglsim alloc latch area          1700
shared pool  kglsim alloc latches               68
shared pool  kglsim count of pinned he        9248
shared pool  kglsim free heap list             204
shared pool  kglsim free obj list              204
shared pool  kglsim hash table                4104
shared pool  kglsim hash table bkts        2097152
shared pool  kglsim heap                    635536
shared pool  kglsim latch area                1700
shared pool  kglsim latches                     68
shared pool  kglsim main lru count           87040
shared pool  kglsim main lru size           174080
shared pool  kglsim object batch            909440
shared pool  kglsim pin list arr               816
shared pool  kglsim recovery area             2112
shared pool  kglsim sga                      22092
shared pool  kglsim size of pinned mem       18496
shared pool  ksim client list                   84
shared pool  log_simultaneous_copies           480
shared pool  sim cache nbufs                   640
shared pool  sim cache sizes                   640
shared pool  sim kghx free lists                 4
shared pool  sim lru segments                  640
shared pool  sim segment hits                 1280
shared pool  sim segment num bufs              640
shared pool  sim state object                   48
shared pool  sim trace buf                    5140
shared pool  sim trace buf context             120
shared pool  sim_knlasg                       1200
shared pool  simulator hash buckets          16512
shared pool  simulator latch/bucket st        4608

31 rows selected.

See, a bunch of “kgl sim” and then just “sim” allocations.

… or sometimes you can see latch contention on following latches:

SQL> select name from v$latch where name like '%sim%';

NAME
-------------------------------------------------------
ksim membership request latch
simulator lru latch
simulator hash latch
sim partition latch
shared pool simulator
shared pool sim alloc

6 rows selected.

Again, there seems to be some “simulation” work going on in Oracle instance.

So what are these simulators about?

These simulators help Oracle to determine how much time and effort would have been saved if buffer cache or shared pool size was bigger than current. And this info is summarized and externalized in various advisor views such V$SHARED_POOL_ADVICE, V$MTTR_TARGET_ADVICE, V$DB_CACHE_ADVICE and so on. These simulation engines were introduced in Oracle 9i along the mentioned V$ views.

Also this stuff is used by MMAN for making SGA resizing decisions if SGA_TARGET (10g+) or MEMORY_TARGET (11g+) is enabled.

I decided to write an article about simulators as quite a few people have been asking me about it lately. So, I first googled around to see if someone has already written about it already and found one here. It turned out it was written by me, I just didn’t know about it anymore :)

So, I will write an edited version of my original explanation here. Note that I don’t know the exact algorithm of the simulator, just explaining the fundamentals here.

KGL simulator is the shared pool memory allocation tracking and simulation engine.

KGL=Kernel General Library cache manager, as the name says it deals with library objects such cursors, cached stored object definitions (PL/SQL stored procs, table definitions etc).

KGL simulator is used for estimating the benefit of caching if the cache was larger than currently. The general idea is that when flushing out a library cache object, it’s hash value (and few other bits of info) are still kept in the KGL simulator hash table. This stores a history of objects which were in memory in past but got flushed out. That’s the key part: we keep a history of objects flushed out (this is done by selective sampling though, not at every object flush as it would have too much overhead).

When loading a library cache object into library cache (which means that no existing such object is in there already), Oracle goes and checks the KGL simulator hash table to see whether an object with matching hash value is in there. If a matching object is found, that means that the required object had been in cache in past, but flushed out due space pressure. That’s the second important part: we know when we would have been able to (probably) just reuse an object if the shared pool had been bigger.

Using that information of how many library cache object (re)loads could have been been avoided if cache had been bigger (thanks to KGL simulator history) and also thanks to knowing how much time the (current) object reloads took, Oracle can predict how much response time would have been saved instancewide if shared pool was bigger. So, we don’t need to increase shared pool itself to measure the benefit, we simulate a bigger shared pool by just keeping a history of aged out library cache objects hash values in a special array – and that’s why there are various kglsim and simulator allocations seen in V$SGASTAT.

The simulation results are seen from v$shared_pool_advice and are used by other advisors such ADDM.

Apparently Oracle keeps track also about how much memory is pinned in the shared pool (thus can’t be flushed out) over time. This info is probably used also for estimating shared pool resize requirements as pinned memory chunks are not usable for anything else (as long as they are pinned) in the shared pool. These measurements are also done on sampling basis.

I did a little test with my lotshparses.sql script (which just runs a lot of hard parses inside a PL/SQL loop) and ran latchprofx to report which latches and why was my session using. Note the “sim” instead of usual % in the latchprofx syntax. I am interested in only these latches which have “sim” in their name in this experiment:

SQL> @latchprofx sid,name,func,hmode 146 sim 100000

-- LatchProfX 1.12 by Tanel Poder ( http://www.tanelpoder.com )

       SID NAME                                FUNC                                     HMODE              Held       Gets  Held %
---------- ----------------------------------- ---------------------------------------- ------------ ---------- ---------- -------
 146 shared pool simulator               kglsim_unpin_simhp                       exclusive           514         49     .51
 146 shared pool simulator               kglsim_upd_newhp                         exclusive           206        198     .21
 146 shared pool simulator               kglsim_scan_lru: scan                    exclusive            95         93     .10
 146 shared pool simulator               kglsim_chg_simhp_free                    exclusive            32         32     .03
 146 shared pool simulator               kglsim_chg_simhp_noob                    exclusive            32         32     .03
 146 shared pool sim alloc               kglsim_chk_heaplist: alloc               exclusive             4          3     .00
 146 shared pool sim alloc               kglsim_chk_objlist: alloc                exclusive             3          2     .00
 146 shared pool simulator               kglsim_chk_objlist: extend               exclusive             1          1     .00
 146 shared pool simulator               kglsim_scan_lru: rebalance               exclusive             1          1     .00

This simulation engine is pretty cool because its the first time someone has tried to measure the time what could be saved by increasing shared pool size. (Response) Time is the ultimate measure of performance, so this simulator measures the right thing. Do you remember what people used before that? Library cache hit ratio! A lot of people back then (and some still today) have been misleaded by measuring the wrong things like hit ratios.

Anyway, while this is a cool feature – it can cause trouble in big systems. This is because the extra CPU usage for simulation and also the potential concurrency issues like simulator latch contention (like the shared pool simulator latch above, these latches are used even on 11g where most library cache latches have been replaced with KGX mutexes for efficiency and avoiding false contention issues). Also, some people are sensitive about the potential large memory allocations the shared pool simulator can take from shared pool (this is controlled by _kglsim_max_percent parameter which defaults to max 5% of total shared pool size). As these allocations are dynamic, then when your application suddenly starts making lots of hard parses due newly released code, causing memory shortage, then the simulator can make things even worse as it can make relatively big allocations for itself too! And as far as I know the simulator arrays don’t shrink back.

Note that I’m talking about big systems here, in most other cases the simulation shouldn’t cause trouble (on recent db versions anyway).

In some databases I have disabled all of the advisors and simulators with setting statistics_level = basic. All features what you see from V$STATISTICS_LEVEL would be disabled when setting statistics_level to basic:

SQL> select statistics_name,activation_level from v$statistics_level;

STATISTICS_NAME                          ACTIVAT
---------------------------------------- -------
Buffer Cache Advice                      TYPICAL
MTTR Advice                              TYPICAL
Timed Statistics                         TYPICAL
Timed OS Statistics                      ALL
Segment Level Statistics                 TYPICAL
PGA Advice                               TYPICAL
Plan Execution Statistics                ALL
Shared Pool Advice                       TYPICAL
Modification Monitoring                  TYPICAL
Longops Statistics                       TYPICAL
Bind Data Capture                        TYPICAL
Ultrafast Latch Statistics               TYPICAL
Threshold-based Alerts                   TYPICAL
Global Cache Statistics                  TYPICAL
Active Session History                   TYPICAL
Undo Advisor, Alerts and Fast Ramp up    TYPICAL
Streams Pool Advice                      TYPICAL
Time Model Events                        TYPICAL
Plan Execution Sampling                  TYPICAL
Automated Maintenance Tasks              TYPICAL
SQL Monitoring                           TYPICAL
Adaptive Thresholds Enabled              TYPICAL
V$IOSTAT_* statistics                    TYPICAL
.
23 rows selected.

Sometimes you don’t want to disable absolutely everything, but only the troublemakers. The good news is that there’s an (undocumented) parameter for disabling every feature separately. As always, make sure that you know what you’re doing before changing these and get blessing from Oracle support in order to not get yourself in trouble.

So, for example the library cache memory simulation (shared pool advice) can be disabled by setting _library_cache_advice = false .

I covered shared pool and library cache above. Oracle has more cache advisors, the underlying principle in them is the same. They save some small metadata/info about flushed out objects in a simulator array and just check that array when someone is allocating memory for new object. If matching object’s metadata still happens to be in the simulator array, then Oracle knows that we could have avoided the time and effort loading (generating) that object back into memory if the corresponding cache had been bigger. Of course the algorithm must be more sophisticated than just checking for object’s existence, there are probably object’s flush timestamp and size stored in the simulator arrays as well.

While the library cache simulator array stores library cache objects hash values for looking up historical memory allocations metadata, then the buffer cache advisor stores just data block addresses (DBAs) of blocks previously in cache. When Oracle spots that a block was previously in cache while intiating physical IO for re-reading it back again, it will measure the time it took to complete the current physical IO and add this info into the simulator array. Again, that’s how Oracle can measure how much response time would have been saved if the block would still have been in the buffer cache.

The buffer cache advisor needs some special attention, as I have had most issues with that (on large systems again), mostly with buffer cache’s simulator lru latch contention in systems with lots of CPUs and physical IOs going on. The problem here is that even though you can have hundreds of thousands of cache buffers chains latches in a system, the number of simulator lru latches remains relatively low by default, causing contention.

Oracle provides a documented parameter for disabling the buffer cache simulation – you need to set db_cache_advice = off (or ready).

So, hopefully this sheds some light into how the advisors work and what the heck the simulators simulate in Oracle. Even though I provided a list of parameters which allow you to control this stuff – don’t change any of them unless you actually have a problem which they should fix (and you are sure that these parameters would fix your problem).


Update:

Jonathan Lewis has written a good article about buffer cache advisor details on his website – I recommend to read that one too if you’re interested in this stuff:

http://www.jlcomp.demon.co.uk/Cache_advice.html

This entry was posted in Oracle and tagged , , . Bookmark the permalink.

16 Responses to KGL simulator, shared pool simulator and buffer cache simulator – what are these?

  1. Tanel Poder says:

    Note: I accidentially published this note while I was writing it, so you may have seen a partial version of it. It’s complete now.

  2. Bert says:

    Tanel…

    Would there be a negative effect on Automatic Shared Memory Mgmt (ASMM) performance if, as you say, set db_cache_advice = off, but leave the SGA_TARGET > 0 ? In other words does ASMM rely on db_cache_advice to function properly?

  3. Tanel Poder says:

    Yes, most of these automatic things (including ASMM) require that the simulators/advisors are enabled.

  4. Connor says:

    re: “_kglsim_max_percent parameter which defaults to max 5%”

    I wonder how many times the advisories come up with a suggestion that an increase of 5% is required…somehow I doubt the suggestion is phrased: “turn me off” :-)

  5. Oh man. That’s fantastic explanation. Thanks for that!

  6. Tanel Poder says:

    Connor, yeah I hope that in 12g there would be a new feature which makes these advisors to turn themselves off if they detect they’re useless! ;-)

  7. Coskan says:

    Thank you for the explanation Tanel.

    Question:

    Once we turn of the advisor how does Oracle decide with resizing?

  8. Tanel Poder says:

    When you turn off advisors (via undoc parameters or via statistics_level=basic) then the auotmatic resizing won’t be done at all.

  9. Coskan says:

    What about if we disable only db_cache_advise ?

    Will it be a problem only for db_cache size and if it is a problem then setting db_cache size as a minimum will be the permanent size or not ?

  10. Tanel,

    I wrote up a few conjectures about the implememtation of the buffer cache advisory some time ago – mainly in response to silly comments that someone was making about the advisory was proof that the buffer cache hit ratio was a useful performance metric.

    Apart from the limited number of latches, I do wonder if the contention on the buffer cache simulator latches is a side effect of (a) an unlucky sample and (b) multiple simulator copies.

    The note is at http://www.jlcomp.demon.co.uk/Cache_advice.html

  11. Tanel Poder says:

    Thanks Jonathan, good details in your note, especially the separate LRU management parts.

    In early versions the buffer cache simulator (which was actually introduced in 8i internally) only had one simulator LRU list per buffer cache working set – and it was plain LRU, not the touch-count based one. This allowed using a single simulator LRU list for simulating multiple different cache sizes (as with plain LRU you can just use only a part of the LRU list to simulate a smaller cache size). It looks like this is still the same in 10g as well (I ran a little group by query against x$kcbsh in my test instance and it didn’t return duplicates, so either I got “lucky” or there’s still a single LRU list).

    The KGL/shared pool simulator has always made time-based measurements, it measures how much *response time* would have been saved if a cached KGL object had been left in cache instead of a flush.

    The buffer cache simulator didn’t have “time saved” info until 10g, so it just reported the estimated *counts* of physical reads and the DBA needed to make their decisions based on that.

    But with Oracle 10g ASMM / sga_target Oracle measures the estimated read *time* (both for local file and global cache IO).

    So, now as both KGL and buffer cahce simulators measure the estimated benefit in same units (time), ASMM can make informed decisions about benefit of resizing memory pools. So, I think since 10g these decisions are time-based as they should be (how would you compare number of physical IO vs number of library cache loads anyway :)

    I definitely agree that the buffer cache advisor, even on 9i, is not based on buffer cache hit ratio, Oracle kernel developers aren’t that stupid ;)

  12. Tanel Poder says:

    And I think the KCB simulator lru latch contention comes from the problem that buffer cache simulator LRU list is a plain LRU list – requiring a LRU list operation every buffer access (when the simulator sample happens), but the real touch-count based LRU list operation does not need to move buffers around at every access.

  13. Tanel Poder says:

    And of course the simulator hash latch contention comes because too small number of hash latches and false contention.

  14. Prem says:

    Never knew how it worked.Now I know it. Thanks for sharing it … Tanel.

  15. Ulan Yisaev says:

    Tanel,
    We have some troubles with the latches. Our DB is 9.2.0.8.0 64bit on AIX 5.3, shared_pool_size is 20G and db_cache_size is 50G.
    Using your excellent tool LatchProfX I found top latches:
    NAME SID SQLHASH LADDR KSLLWNAM Held Held % Held ms
    ——————– ——— ———- —————- —————————————- ———- ——- ———–
    sim partition latch 2 0 0700000CA8CF1190 kcbm_simulate 16228 16.23 21537.802
    sim partition latch 3 0 0700000CA8CF1190 kcbm_simulate 13837 13.84 18364.466
    sim partition latch 2 0 0700000CA8CF12A8 kcbm_simulate 13135 13.14 17432.772
    sim partition latch 3 0 0700000CA8CF12A8 kcbm_simulate 12479 12.48 16562.129

    Here, SID 2 is DBW0 and SID 3 is DBW1. But what does mean the latch “sim partition latch”? I can’t find information on this latch on support.oracle.com. May it be a type of some simulator latch?

    • Tanel Poder says:

      Before looking into latch holders (with latchprofx) you need to determine which latches are waited for the most. So, what does the wait interface say – which latches are waited for the most? (snapper would show you that if you include P1 – it’s the latch address)

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>