Lsof (list open files) is a really useful tool for troubleshooting open file decriptors which prevent a deleted file from being released or a shared memory segment from being removed.
Here’s a little situation on Linux where an Oracle shared memory segment was not released as someone was still using it.
$ ipcs -ma ------ Shared Memory Segments -------- key shmid owner perms bytes nattch status 0x00000000 393216 oracle 640 289406976 1 dest 0xbfb94e30 425985 oracle 640 289406976 18 0x3cf13430 557058 oracle 660 423624704 22 ------ Semaphore Arrays -------- key semid owner perms nsems 0xe2260ff0 1409024 oracle 640 154 0x9df96b74 1671169 oracle 660 154 ------ Message Queues -------- key msqid owner perms used-bytes messages
The bold line should have disappeared after instance shutdown, but it didn’t. From “natcch” (number of attached processes) column I see there is still some process using the shared memory segment. Thus the segment was not released and even ipcrm command did not remove it (just like with normal files if someone has them open).
So, I needed to identify which process was still using the memory segment. If that had been a normal existing file, I’d could have used /sbin/fuser command to see which process still holds it open, but this only works for existing files with existing directory entries.
However for deleted files, sockets and shared memory segments, you can use lsof command (it’s normally installed by default on Linux, but for Unixes you need to separately download and install).
The SHM ID of that segment was 393216 as ipcs -ma showed, so I simply run lsof to show all open file descriptors and grep for that SHM ID:
$ lsof | egrep "393216|COMMAND" COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME python 18811 oracle DEL REG 0,8 393216 /SYSVbfb94e30
See how the NODE column corresponds to SHM ID in ipcs output.
So I kill the PID 18811 which is still attached to the SHM segment:
$ kill 18811 $ ipcs -ma ------ Shared Memory Segments -------- key shmid owner perms bytes nattch status 0xbfb94e30 425985 oracle 640 289406976 18 0x3cf13430 557058 oracle 660 423624704 25 ------ Semaphore Arrays -------- key semid owner perms nsems 0xe2260ff0 1409024 oracle 640 154 0x9df96b74 1671169 oracle 660 154 ------ Message Queues -------- key msqid owner perms used-bytes messages
Now the shared memory segment is gone and its memory released.
Note that the lsof command is very useful for many other tasks as well. For example it allows you to list open sockets by network protocol, IP, port etc. For example you can determine to which client some server process is talking to, from OS level:
$ lsof -i:1521 COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME tnslsnr 6212 oracle 11u IPv4 49486 TCP *:1521 (LISTEN) tnslsnr 6212 oracle 13u IPv4 276708 TCP linux03:1521->linux03:37277 (ESTABLISHED) tnslsnr 6212 oracle 14u IPv4 264894 TCP linux03:1521->linux03:41122 (ESTABLISHED) oracle 22687 oracle 20u IPv4 264893 TCP linux03:41122->linux03:1521 (ESTABLISHED) oracle 25250 oracle 15u IPv4 276707 TCP linux03:37277->linux03:1521 (ESTABLISHED) oracle 25530 oracle 15u IPv4 279910 TCP linux03:1521->192.168.247.1:nimsh (ESTABLISHED)
Unfortunately lsof is not installed by default in classic Unixes, but in some shops the sysadmins have chosen to install it. But even then, it may not work for regular users as lsof requires access to kernel memory structures through /dev/kmem or similar. If you can’t get access to lsof then there may be other tools available which can do some tricks lsof can do. For example on Solaris, there’s an useful command pfiles which can show open files of a process and since Solaris 9 ( I think ) it can also report the TCP connection endpoints of network sockets…
NB! If you want to move to the "New World" - and benefit from the awesomeness of Hadoop, without having to re-engineer your existing applications - check out Gluent, my new startup that will make history! ;-)