If you have queried v$sgastat on recent Oracle versions (by which I mean 9i and above) you probably have seen allocations for some sort of simulators in Oracle instance. Here’s an example:
SQL> select * from v$sgastat where lower(name) like '%sim%' order by name; POOL NAME BYTES ------------ -------------------------- ---------- shared pool kglsim alloc latch area 1700 shared pool kglsim alloc latches 68 shared pool kglsim count of pinned he 9248 shared pool kglsim free heap list 204 shared pool kglsim free obj list 204 shared pool kglsim hash table 4104 shared pool kglsim hash table bkts 2097152 shared pool kglsim heap 635536 shared pool kglsim latch area 1700 shared pool kglsim latches 68 shared pool kglsim main lru count 87040 shared pool kglsim main lru size 174080 shared pool kglsim object batch 909440 shared pool kglsim pin list arr 816 shared pool kglsim recovery area 2112 shared pool kglsim sga 22092 shared pool kglsim size of pinned mem 18496 shared pool ksim client list 84 shared pool log_simultaneous_copies 480 shared pool sim cache nbufs 640 shared pool sim cache sizes 640 shared pool sim kghx free lists 4 shared pool sim lru segments 640 shared pool sim segment hits 1280 shared pool sim segment num bufs 640 shared pool sim state object 48 shared pool sim trace buf 5140 shared pool sim trace buf context 120 shared pool sim_knlasg 1200 shared pool simulator hash buckets 16512 shared pool simulator latch/bucket st 4608 31 rows selected.
See, a bunch of “kgl sim” and then just “sim” allocations.
… or sometimes you can see latch contention on following latches:
SQL> select name from v$latch where name like '%sim%'; NAME ------------------------------------------------------- ksim membership request latch simulator lru latch simulator hash latch sim partition latch shared pool simulator shared pool sim alloc 6 rows selected.
Again, there seems to be some “simulation” work going on in Oracle instance.
So what are these simulators about?
These simulators help Oracle to determine how much time and effort would have been saved if buffer cache or shared pool size was bigger than current. And this info is summarized and externalized in various advisor views such V$SHARED_POOL_ADVICE, V$MTTR_TARGET_ADVICE, V$DB_CACHE_ADVICE and so on. These simulation engines were introduced in Oracle 9i along the mentioned V$ views.
Also this stuff is used by MMAN for making SGA resizing decisions if SGA_TARGET (10g+) or MEMORY_TARGET (11g+) is enabled.
I decided to write an article about simulators as quite a few people have been asking me about it lately. So, I first googled around to see if someone has already written about it already and found one here. It turned out it was written by me, I just didn’t know about it anymore :)
So, I will write an edited version of my original explanation here. Note that I don’t know the exact algorithm of the simulator, just explaining the fundamentals here.
KGL simulator is the shared pool memory allocation tracking and simulation engine.
KGL=Kernel General Library cache manager, as the name says it deals with library objects such cursors, cached stored object definitions (PL/SQL stored procs, table definitions etc).
KGL simulator is used for estimating the benefit of caching if the cache was larger than currently. The general idea is that when flushing out a library cache object, it’s hash value (and few other bits of info) are still kept in the KGL simulator hash table. This stores a history of objects which were in memory in past but got flushed out. That’s the key part: we keep a history of objects flushed out (this is done by selective sampling though, not at every object flush as it would have too much overhead).
When loading a library cache object into library cache (which means that no existing such object is in there already), Oracle goes and checks the KGL simulator hash table to see whether an object with matching hash value is in there. If a matching object is found, that means that the required object had been in cache in past, but flushed out due space pressure. That’s the second important part: we know when we would have been able to (probably) just reuse an object if the shared pool had been bigger.
Using that information of how many library cache object (re)loads could have been been avoided if cache had been bigger (thanks to KGL simulator history) and also thanks to knowing how much time the (current) object reloads took, Oracle can predict how much response time would have been saved instancewide if shared pool was bigger. So, we don’t need to increase shared pool itself to measure the benefit, we simulate a bigger shared pool by just keeping a history of aged out library cache objects hash values in a special array – and that’s why there are various kglsim and simulator allocations seen in V$SGASTAT.
The simulation results are seen from v$shared_pool_advice and are used by other advisors such ADDM.
Apparently Oracle keeps track also about how much memory is pinned in the shared pool (thus can’t be flushed out) over time. This info is probably used also for estimating shared pool resize requirements as pinned memory chunks are not usable for anything else (as long as they are pinned) in the shared pool. These measurements are also done on sampling basis.
I did a little test with my lotshparses.sql script (which just runs a lot of hard parses inside a PL/SQL loop) and ran latchprofx to report which latches and why was my session using. Note the “sim” instead of usual % in the latchprofx syntax. I am interested in only these latches which have “sim” in their name in this experiment:
SQL> @latchprofx sid,name,func,hmode 146 sim 100000 -- LatchProfX 1.12 by Tanel Poder ( http://www.tanelpoder.com ) SID NAME FUNC HMODE Held Gets Held % ---------- ----------------------------------- ---------------------------------------- ------------ ---------- ---------- ------- 146 shared pool simulator kglsim_unpin_simhp exclusive 514 49 .51 146 shared pool simulator kglsim_upd_newhp exclusive 206 198 .21 146 shared pool simulator kglsim_scan_lru: scan exclusive 95 93 .10 146 shared pool simulator kglsim_chg_simhp_free exclusive 32 32 .03 146 shared pool simulator kglsim_chg_simhp_noob exclusive 32 32 .03 146 shared pool sim alloc kglsim_chk_heaplist: alloc exclusive 4 3 .00 146 shared pool sim alloc kglsim_chk_objlist: alloc exclusive 3 2 .00 146 shared pool simulator kglsim_chk_objlist: extend exclusive 1 1 .00 146 shared pool simulator kglsim_scan_lru: rebalance exclusive 1 1 .00
This simulation engine is pretty cool because its the first time someone has tried to measure the time what could be saved by increasing shared pool size. (Response) Time is the ultimate measure of performance, so this simulator measures the right thing. Do you remember what people used before that? Library cache hit ratio! A lot of people back then (and some still today) have been misleaded by measuring the wrong things like hit ratios.
Anyway, while this is a cool feature – it can cause trouble in big systems. This is because the extra CPU usage for simulation and also the potential concurrency issues like simulator latch contention (like the shared pool simulator latch above, these latches are used even on 11g where most library cache latches have been replaced with KGX mutexes for efficiency and avoiding false contention issues). Also, some people are sensitive about the potential large memory allocations the shared pool simulator can take from shared pool (this is controlled by _kglsim_max_percent parameter which defaults to max 5% of total shared pool size). As these allocations are dynamic, then when your application suddenly starts making lots of hard parses due newly released code, causing memory shortage, then the simulator can make things even worse as it can make relatively big allocations for itself too! And as far as I know the simulator arrays don’t shrink back.
Note that I’m talking about big systems here, in most other cases the simulation shouldn’t cause trouble (on recent db versions anyway).
In some databases I have disabled all of the advisors and simulators with setting statistics_level = basic. All features what you see from V$STATISTICS_LEVEL would be disabled when setting statistics_level to basic:
SQL> select statistics_name,activation_level from v$statistics_level; STATISTICS_NAME ACTIVAT ---------------------------------------- ------- Buffer Cache Advice TYPICAL MTTR Advice TYPICAL Timed Statistics TYPICAL Timed OS Statistics ALL Segment Level Statistics TYPICAL PGA Advice TYPICAL Plan Execution Statistics ALL Shared Pool Advice TYPICAL Modification Monitoring TYPICAL Longops Statistics TYPICAL Bind Data Capture TYPICAL Ultrafast Latch Statistics TYPICAL Threshold-based Alerts TYPICAL Global Cache Statistics TYPICAL Active Session History TYPICAL Undo Advisor, Alerts and Fast Ramp up TYPICAL Streams Pool Advice TYPICAL Time Model Events TYPICAL Plan Execution Sampling TYPICAL Automated Maintenance Tasks TYPICAL SQL Monitoring TYPICAL Adaptive Thresholds Enabled TYPICAL V$IOSTAT_* statistics TYPICAL . 23 rows selected.
Sometimes you don’t want to disable absolutely everything, but only the troublemakers. The good news is that there’s an (undocumented) parameter for disabling every feature separately. As always, make sure that you know what you’re doing before changing these and get blessing from Oracle support in order to not get yourself in trouble.
So, for example the library cache memory simulation (shared pool advice) can be disabled by setting _library_cache_advice = false .
I covered shared pool and library cache above. Oracle has more cache advisors, the underlying principle in them is the same. They save some small metadata/info about flushed out objects in a simulator array and just check that array when someone is allocating memory for new object. If matching object’s metadata still happens to be in the simulator array, then Oracle knows that we could have avoided the time and effort loading (generating) that object back into memory if the corresponding cache had been bigger. Of course the algorithm must be more sophisticated than just checking for object’s existence, there are probably object’s flush timestamp and size stored in the simulator arrays as well.
While the library cache simulator array stores library cache objects hash values for looking up historical memory allocations metadata, then the buffer cache advisor stores just data block addresses (DBAs) of blocks previously in cache. When Oracle spots that a block was previously in cache while intiating physical IO for re-reading it back again, it will measure the time it took to complete the current physical IO and add this info into the simulator array. Again, that’s how Oracle can measure how much response time would have been saved if the block would still have been in the buffer cache.
The buffer cache advisor needs some special attention, as I have had most issues with that (on large systems again), mostly with buffer cache’s simulator lru latch contention in systems with lots of CPUs and physical IOs going on. The problem here is that even though you can have hundreds of thousands of cache buffers chains latches in a system, the number of simulator lru latches remains relatively low by default, causing contention.
Oracle provides a documented parameter for disabling the buffer cache simulation – you need to set db_cache_advice = off (or ready).
So, hopefully this sheds some light into how the advisors work and what the heck the simulators simulate in Oracle. Even though I provided a list of parameters which allow you to control this stuff – don’t change any of them unless you actually have a problem which they should fix (and you are sure that these parameters would fix your problem).
Update:
Jonathan Lewis has written a good article about buffer cache advisor details on his website – I recommend to read that one too if you’re interested in this stuff:






Note: I accidentially published this note while I was writing it, so you may have seen a partial version of it. It’s complete now.
Tanel…
Would there be a negative effect on Automatic Shared Memory Mgmt (ASMM) performance if, as you say, set db_cache_advice = off, but leave the SGA_TARGET > 0 ? In other words does ASMM rely on db_cache_advice to function properly?
Yes, most of these automatic things (including ASMM) require that the simulators/advisors are enabled.
re: “_kglsim_max_percent parameter which defaults to max 5%”
I wonder how many times the advisories come up with a suggestion that an increase of 5% is required…somehow I doubt the suggestion is phrased: “turn me off” :-)
Oh man. That’s fantastic explanation. Thanks for that!
Connor, yeah I hope that in 12g there would be a new feature which makes these advisors to turn themselves off if they detect they’re useless! ;-)
Thank you for the explanation Tanel.
Question:
Once we turn of the advisor how does Oracle decide with resizing?
When you turn off advisors (via undoc parameters or via statistics_level=basic) then the auotmatic resizing won’t be done at all.
What about if we disable only db_cache_advise ?
Will it be a problem only for db_cache size and if it is a problem then setting db_cache size as a minimum will be the permanent size or not ?
Tanel,
I wrote up a few conjectures about the implememtation of the buffer cache advisory some time ago – mainly in response to silly comments that someone was making about the advisory was proof that the buffer cache hit ratio was a useful performance metric.
Apart from the limited number of latches, I do wonder if the contention on the buffer cache simulator latches is a side effect of (a) an unlucky sample and (b) multiple simulator copies.
The note is at http://www.jlcomp.demon.co.uk/Cache_advice.html
Thanks Jonathan, good details in your note, especially the separate LRU management parts.
In early versions the buffer cache simulator (which was actually introduced in 8i internally) only had one simulator LRU list per buffer cache working set – and it was plain LRU, not the touch-count based one. This allowed using a single simulator LRU list for simulating multiple different cache sizes (as with plain LRU you can just use only a part of the LRU list to simulate a smaller cache size). It looks like this is still the same in 10g as well (I ran a little group by query against x$kcbsh in my test instance and it didn’t return duplicates, so either I got “lucky” or there’s still a single LRU list).
The KGL/shared pool simulator has always made time-based measurements, it measures how much *response time* would have been saved if a cached KGL object had been left in cache instead of a flush.
The buffer cache simulator didn’t have “time saved” info until 10g, so it just reported the estimated *counts* of physical reads and the DBA needed to make their decisions based on that.
But with Oracle 10g ASMM / sga_target Oracle measures the estimated read *time* (both for local file and global cache IO).
So, now as both KGL and buffer cahce simulators measure the estimated benefit in same units (time), ASMM can make informed decisions about benefit of resizing memory pools. So, I think since 10g these decisions are time-based as they should be (how would you compare number of physical IO vs number of library cache loads anyway :)
I definitely agree that the buffer cache advisor, even on 9i, is not based on buffer cache hit ratio, Oracle kernel developers aren’t that stupid ;)
And I think the KCB simulator lru latch contention comes from the problem that buffer cache simulator LRU list is a plain LRU list – requiring a LRU list operation every buffer access (when the simulator sample happens), but the real touch-count based LRU list operation does not need to move buffers around at every access.
And of course the simulator hash latch contention comes because too small number of hash latches and false contention.
Never knew how it worked.Now I know it. Thanks for sharing it … Tanel.