Operating systems are lazy allocating memory

There was a discussion about whether Oracle really allocates all memory for SGA immediately on instance startup or not. And further, whether Oracle allocates memory beyond the SGA_TARET if SGA_MAX_SIZE is larger than it.
It’s worth reading this thread first: http://forums.oracle.com/forums/thread.jspa?threadID=535400&tstart=0

I will paste an edited version of my reply to here as well:

Don’t confuse address space set-up with allocating physical memory pages from RAM!

Even if ipcs -m shows x GB as the SGA shm segment length, it doesn’t mean this memory has actually been initialized and taken from RAM.

Decent OS’es do only initialize the pageable meory pages when they’re touched the first time, so a shm segment showing 10GB in ipcs -m output may be only 10% “used” really as some pages have never been touched.There are many things which affect when and if the memory is actually *allocated*, the ones I remember right now are:

1) using solaris ISM – means Oracle will be usng non-pageable large pages – the shm seg size you see in ipcs is fully allocated from RAM and locked in RAM.

2) using Solaris DISM, the SGA shm segment is pageable (small pages in Solaris 8, large pages from Solaris 9) and may not necessarily be allocated from RAM

3) using lock_sga=true -> the SGA shm segment is allocated from RAM and locked in RAM

4) using _lock_sga_areas -> some ranges of pages in SGA shm segment are locked to memory, some pages of SGA shm segment may still be uninitialized

5) using _pre_page_sga=true -> all pages of SGA shm segment are touched on startup

6) few others like _db_cache_pre_warm which affect memory page touching on startup…

7) using memory_target on Oracle 11g

So, there are *many* things which affect physical memory allocation, but generally, unless you’re using non-pageable pages, not all SGA-size worth of memory is allocated from OS during instance startup.

Normally these artificial instance startup errors after setting sga_max_size to xxxGB come from hitting max shm segment size or max RAM + swap size (on Unixes). On linux on the other hand you can overallocate memory as Linux doesn’t back anonymous memory mappings with swap space (linux starts killing “random” processes instead when running out-of-memory. nice, huh?)

This means that if your SGA_TARGET is lower than SGA_MAX_SIZE during startup then the pages “above” SGA_TARGET will never be touched, thus not allocated!

And if you ramp down SGA_TARGET during your instance lifetime, then the pages “above” the new SGA_TARGET won’t be touched anymore (after MMAN completes the downsizing), which means these pages will be paged out from physical memory if there’s shortage of free physical memory.

Note that this “lazy” allocation behaviour comes from how modern operating systems work, it’s not a feature of Oracle. Oracle just has an option to request some specific behaviour from OS on some platforms (like requesting ISM using SHARE_MMU flag on Solaris when setting up the SGA SHM segment).

Thanks to this heavy “virtualization” of virtual memory pages and short codepath requirements for VM handling, its often hard to get a complete and accurate picture of individual processes & SHM segments physical memory usage.

This entry was posted in Oracle and tagged , , . Bookmark the permalink.

16 Responses to Operating systems are lazy allocating memory

  1. Matt says:

    I’m the mattyb poster in the Oracle forums. I’d really like to thank Tanel for taking the time to explain this – really cleared up what was happening for me.

  2. tanelp says:

    You’re welcome, Matt. I will probably blog more about (Unix) memory management stuff in the future…

  3. coskan says:

    excellent post

    I did not use solaris but I learned how it allocates SGA without using.

    keep blogging oracle blogosphere needs you

  4. Tom says:

    This is about the only place I have found understandable information about how Oracle SGA interacts with working set and virtual memory. A question for Windows DBA’s….

    Windows has a 2Gb process limit (unless using /3GB switch). Which of the various memory values we can see using pslist (from sysinternals) or taskmanager is this limit on? i.e. is it that we are allowed 2Gb of VM, or a 2Gb working set, or a 2Gb private memory….

    I ask, because my VM size on windows is consistently much larger than my working set.

  5. Tanel Poder says:

    An update regarding the linux memory allocation (and not backing memory allocations by swap space reservations):

    There’s a linux kernel parameter, which controls whether overcommit can happen:

    $ cat /proc/sys/vm/overcommit_memory
    0

    You should keep this two (2) for your database servers to avoid any trouble!

    “man -s 5 proc” says:

           /proc/sys/vm/overcommit_memory
                  This file contains the kernel virtual memory accounting mode. Values are:
                  0: heuristic overcommit (this is the default)
                  1: always overcommit, never check
                  2: always check, never overcommit
                  In  mode 0, calls of mmap(2) with MAP_NORESERVE set are not checked, and the default check is very weak, leading to
                  the risk of getting a process "OOM-killed".  Under Linux 2.4 any non-zero value implies mode 1.  In mode 2  (avail-
                  able  since Linux 2.6), the total virtual address space on the system is limited to (SS + RAM*(r/100)), where SS is
                  the size of the swap space, and RAM is the size of the  physical  memory,  and  r  is  the  contents  of  the  file
                  /proc/sys/vm/overcommit_ratio.
    
    

  6. Tanel Poder says:

    Of course changing this parameter shouldn’t be just done without T&T – think & test!

    If your existing systems work without problems, it probably makes sense to not touch them. But for new systems I would set the parameter AND swap space size accordingly.

    Man pages and /usr/share/doc/kernel-doc*/Documentation/vm/overcommit-accounting files give some additional info.

    Also, the value 2 is supported from 2.6 (production) kernels only. In 2.4 you should keep the parameter 0.

  7. Tanel Poder says:

    And also, better test how this thing works on Oracle 11g with MEMORY_TARGET as with any other new feature.
    The overcommit-accounting doc states that the shmfs pages are accounted in overcommit limit as well, but better be sure :)

  8. Of course, on Linux one would be tempted to use hugepages and avoid paging SGA altogether. Linux has a partial equivalent of ISM but fortunately doesn’t have anything like DISM. Of course, that means losing the flexibility of dynamic memory handling, which is the loss that I can live with.

  9. Tanel Poder says:

    I’m not sure if linux has shared pagetable support which Solaris has, but yep as you said, with hugepages you’d get large pages for your SGA locked into memory.

    This doesn’t solve the overcommit problem though for private memory…

  10. Tanel Poder says:

    Btw, if anyone’s interested in an overview of Solaris’es memory management internals, look at this presentation:

    http://www.solarisinternals.com/si/reading/t2-solaris-slides.pdf

  11. JC Dauchy says:

    Is the behaviour about SGA shm segment allocation is the same for AIX as Solaris or other flavor of Unix ?

    I mean, the pages “above” the SGA_TARGET would not be touched ?
    I am a bit confused about all those memory related problems and behaviour concerning memory allocation and used memory, would you have an article on AIX memory management internals ?

    Thanks

  12. Suresh says:

    Hi Tanel,

    Thanks for your work, your site becomes a all time favourite for me :)

    BTW,

    I got an doubt one of our servers has around 15 databased on solaris 9 for which all instances are using ISM while startup, where as only 1 instance using DISM.

    Have verified everything (SGA Parameters, pre_pga,lock_sga etc ) and am not sure why its using DISM

    For all Databases — SGA_MAX_SIZE= and sga_target = 0, if that is the case this DISM using database also should use ISM.

    The server got plenty of Physical RAM, as of now 16gb is free still.

    Am i missed something!

    -Thanks
    Suresh

  13. Suresh says:

    Suresh :Hi Tanel,
    Thanks for your work, your site becomes a all time favourite for me :)
    BTW,
    I got an doubt one of our servers has around 15 databased on solaris 9 for which all instances are using ISM while startup, where as only 1 instance using DISM.
    Have verified everything (SGA Parameters, pre_pga,lock_sga etc ) and am not sure why its using DISM
    For all Databases — SGA_MAX_SIZE= and sga_target = 0, if that is the case this DISM using database also should use ISM.
    The server got plenty of Physical RAM, as of now 16gb is free still.
    Am i missed something!
    -ThanksSuresh

    I got my answer….
    When SGA_MAX_SIZE is configured more than what individual pools sumup then it uses dism.

    In the sun docs… DynamicReconfiguration and Oracle 9i Dynamically Resizable SGA

    -Thanks
    Suresh

  14. PD Malik says:

    Tanel,

    Is there any way of checking whether a system is using pageable or non pageable pages? As it wud certainly depend on the OS I am particularly interested in AIX like JC above?

    Thanks

  15. Robert says:

    Not to be a nit pick, but… check the Solaris man page for mmap with MAP_NORESERVE as it doesn’t allocate backing store and can cause the process to get a SIGBUS if it tries to page in a page when there isn’t any available backing store to provide a page.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>