Identifying shared memory segment users using lsof

Lsof (list open files) is a really useful tool for troubleshooting open file decriptors which prevent a deleted file from being released or a shared memory segment from being removed.

Here’s a little situation on Linux where an Oracle shared memory segment was not released as someone was still using it.

$ ipcs -ma

------ Shared Memory Segments --------
key        shmid      owner      perms      bytes      nattch     status
0x00000000 393216     oracle    640        289406976  1          dest
0xbfb94e30 425985     oracle    640        289406976  18
0x3cf13430 557058     oracle    660        423624704  22

------ Semaphore Arrays --------
key        semid      owner      perms      nsems
0xe2260ff0 1409024    oracle    640        154
0x9df96b74 1671169    oracle    660        154

------ Message Queues --------
key        msqid      owner      perms      used-bytes   messages

The bold line should have disappeared after instance shutdown, but it didn’t. From “natcch” (number of attached processes) column I see there is still some process using the shared memory segment. Thus the segment was not released and even ipcrm command did not remove it (just like with normal files if someone has them open).

So, I needed to identify which process was still using the memory segment. If that had been a normal existing file, I’d could have used /sbin/fuser command to see which process still holds it open, but this only works for existing files with existing directory entries.

However for deleted files, sockets and shared memory segments, you can use lsof command (it’s normally installed by default on Linux, but for Unixes you need to separately download and install).

The SHM ID of that segment was 393216 as ipcs -ma showed, so I simply run lsof to show all open file descriptors and grep for that SHM ID:

$ lsof | egrep "393216|COMMAND"
COMMAND     PID      USER   FD      TYPE     DEVICE       SIZE       NODE NAME
python    18811    oracle  DEL       REG        0,8                393216 /SYSVbfb94e30

See how the NODE column corresponds to SHM ID in ipcs output.

So I kill the PID 18811 which is still attached to the SHM segment:

$ kill 18811

$ ipcs -ma

------ Shared Memory Segments --------
key        shmid      owner      perms      bytes      nattch     status
0xbfb94e30 425985     oracle    640        289406976  18
0x3cf13430 557058     oracle    660        423624704  25

------ Semaphore Arrays --------
key        semid      owner      perms      nsems
0xe2260ff0 1409024    oracle    640        154
0x9df96b74 1671169    oracle    660        154

------ Message Queues --------
key        msqid      owner      perms      used-bytes   messages


Now the shared memory segment is gone and its memory released.

Note that the lsof command is very useful for many other tasks as well. For example it allows you to list open sockets by network protocol, IP, port etc. For example you can determine to which client some server process is talking to, from OS level:

$ lsof -i:1521
COMMAND   PID   USER   FD   TYPE DEVICE SIZE NODE NAME
tnslsnr  6212 oracle   11u  IPv4  49486       TCP *:1521 (LISTEN)
tnslsnr  6212 oracle   13u  IPv4 276708       TCP linux03:1521->linux03:37277 (ESTABLISHED)
tnslsnr  6212 oracle   14u  IPv4 264894       TCP linux03:1521->linux03:41122 (ESTABLISHED)
oracle  22687 oracle   20u  IPv4 264893       TCP linux03:41122->linux03:1521 (ESTABLISHED)
oracle  25250 oracle   15u  IPv4 276707       TCP linux03:37277->linux03:1521 (ESTABLISHED)
oracle  25530 oracle   15u  IPv4 279910       TCP linux03:1521->192.168.247.1:nimsh (ESTABLISHED)

Unfortunately lsof is not installed by default in classic Unixes, but in some shops the sysadmins have chosen to install it. But even then, it may not work for regular users as lsof requires access to kernel memory structures through /dev/kmem or similar. If you can’t get access to lsof then there may be other tools available which can do some tricks lsof can do. For example on Solaris, there’s an useful command pfiles which can show open files of a process and since Solaris 9 ( I think ) it can also report the TCP connection endpoints of network sockets…

This entry was posted in Oracle and tagged , , , , . Bookmark the permalink.

9 Responses to Identifying shared memory segment users using lsof

  1. Chen Shapira says:

    Very cool demo. What was python doing in your SGA?

  2. Tanel Poder says:

    Thanks Chen :)

    Python was reading some interesting stuff out of there ;)

    But I’ve had cases where an Oracle server process fails to die during shutdown and keeps being attached to SGA…

  3. Nayyar Ahmad says:

    Hi,

    i am using Solaris 9, when i ran ipcs -ma command there were several output line, though Solaris does’t mention STATUS column, how can i investigate in this situation ?

  4. jc nars says:

    Hi Tanel,
    How do u do…one of the background processes SMON ran up a HUUUUUUUUUUGE trace file that just left 5% free space. And, the rookie dba rm-ed the file…instead of doing something like echo ”> bigtracefile.

    This is in the first node of Exadata X2.
    Oracle Support says no other option other than to kill SMON to free up the filesystem !
    I will be hanged if I request a bounce…as we just moved from V1 to X2 2 months back.

    Any ‘unsupported’ tricks you guys at Enkitec know of?!!
    Thanks

    P.S.:
    lsof -p 16927 | egrep ‘^COMMAND|trc|trm’
    COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME
    oracle 16927 oracle 44w REG 253,2 48599035904 12880302 /u01/app/oracle/diag/rdbms/edaprd/edaprd1/trace/edaprd1_smon_16927.trc (deleted)
    oracle 16927 oracle 45w REG 253,2 3661829175 12880303 /u01/app/oracle/diag/rdbms/edaprd/edaprd1/trace/edaprd1_smon_16927.trm (deleted)

  5. Tanel Poder says:

    @jc nars

    What you can also do is to identify SMON’s spid (16927) and then:

    1) ORADEBUG SETOSPID 16927
    2) ORADEBUG CLOSE_TRACE

    As this is SMON, I’m not too comfortable sending oradebug commands to it (wouldn’t want to crash it by any chance! :) but you can use this as your last option instead of restarting the instance (But do this at your own risk! :)

  6. Tanel Poder says:

    @jc nars

    But yes, the next time I’d just truncate the file with “> filename.trc” … or make sure such traces aren’t dumped at all…

  7. jc nars says:

    Hi Tanel,
    Thanks much. Following the note “Retrieve deleted files on Unix / Linux using File Descriptors [ID 444749.1]” we did a:
    head -100 /proc/16927/fd/44 > /tmp/file1

    Basically we had wanted to upload to Support the first few lines of the huge file…but after a few mins, the space was back in the filesystem !

    I thank you again for taking time to respond to the blog comment.

  8. Qian Yu says:

    Hi Tanel,
    I meet the same question,following your note,I found I can delete the shmid, but when I delete the one,the other shmid apper. how can I do?
    looking forward to you replay.

  9. Qian Yu says:

    Hi Tanel,
    I meet the same question,following your note, I delete the shmid,but when I delete the one,another one shmid appear. How can I do?
    Looking forward to you replay.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>