I’m sure you are all familiar with this situation:
SQL> alter system kill session '152,33'; alter system kill session '152,33' * ERROR at line 1: ORA-00031: session marked for kill
The session trying to issue the kill will hang for 60 seconds and then return this “session marked for kill” message. And the target session does not get killed at all.
So why is that?
The issue is in what this alter system kill command is doing. It’s not actually killing the target session (like kill -9 would do for OS processes). It just sets a bit in the target sessions state object, which marks that the target session should end. But its entirely up the target session to check this bit and act on it!
So, intead of ALTER SYSTEM KILL SESSION, the command should look something like ALTER SYSTEM ASK SESSION TO COMMIT SUICIDE.
All the kill session command is doing is ASK the target session to clean up and exit – via setting that bit.
Now, normally the target sessions are nice and check that bit often enough in their code, act on it and die.
But sometimes when the target session happens to be busy looping in some tight loop (due a bug perhaps) or is hung, then it never gets to check that “please die” bit and never exits.
This is why DBAs often need to kill the OS process or thread via OS tools to get rid of that session (and its locks, transactions) as when you kill the OS process, PMON will detect it (if not fast enough then it can be woken up via ORADEBUG WAKEUP call few times) and clean up after that session.
So, the “ORA-00031: session marked for kill” message you see after 60 seconds just means that:
1) Your session sets the “please die” bit in target sessions state object
2) Your session waits the target session to die for 60 seconds and times out after it doesn’t happen
3) Your session returns “session marked for kill” error – which means exactly what I wrote in step 1 above (“please die” bit is set)
By the way, if your session waits for the target session to die – what is the wait event you are waiting on then? Every wait, except few bugs, should be instrumented in Oracle.
This is easy to test, run this for example:
SQL> exec dbms_lock.sleep(100)
And immediately after try to kill that sleeping session from your session (in my case my session which issued the alter system kill command was 146). Lets see what it waits for:
SQL> @sw 146 SID STATE EVENT SEQ# SEC_IN_WAIT P1 ------- ------- ---------------------------------------- ---------- ----------- ------------------ 146 WAITING inactive session 140 3 152
See, the killer waits for inactive session event. The parameter 1 (P1) is 152. What does that mean?
V$EVENT_NAME (or V$SESSION_WAIT itself) gives you the answer. I will run my show event descriptions script to display that events parameter meaning:
SQL> @sed "inactive session" EVENT# EVENT_NAME PARAMETER1 PARAMETER2 ------ ------------------------------------------------------- -------------------- --------------- 296 inactive session session# waited SQL>
So, the killer waits for inactive session wait for 60 seconds and then times out. If the target session to be killed ever checks that “please die” bit, it will clean up and exit, otherwise it could be stuck forever!
This is the case when you need to log on to the server and kill the target process from OS level. If you can’t log on to target OS for whatever reason (but still have SYSDBA access) then you could try attaching to target process with oradebug and running ORADEBUG EVENT IMMEDIATE CRASH as I’ve explained here.
My preferred approach for killing sessions usually is:
1) of course – verify whether you’re killing the right session first
2) ALTER SYSTEM KILL SESSION
3) If that doesn’t work immediately then check whether the target session has acknowledged the kill and is rolling back its (large) transaction. You can do this by checking V$TRANSACTION.USED_UREC for that session’s transaction (if it has any) and see if its decreasing. If yes, the transaction is rolling back and we just need to wait it to finish. There’s also a bit in V$TRANSACTION which states whether the transaction is rolling back, but I don’t have my notes available right now (I’m on vacation actually :)
4) If there’s no rollback happening and session just seems to be stuck, then its time to kill that session’s process from OS level.
5) If couple of minutes after killing the process from OS level that sessions and its locks & resources are still not released (remember, we have verified that there was no long transaction rollback happening) then I would attach to my own process with oradebug and run “ORADEBUG WAKEUP 2” couple of times (and checking if the session has gone + waiting few seconds between each invocation). The “2” here means Oracle PID of PMON process which is usually 2, but you should check it from your V$PROCESS view.
If the session is still not gone, it’s time to open a support request.
Note that I didn’t post all the little details here – for example, before killing a problem process which is stuck it makes sense to gather some diagnostic data first (like run pstack on the problem process few times to get its current stack trace).
NB! If you want to move to the "New World" - and benefit from the awesomeness of Hadoop, without having to re-engineer your existing applications - check out Gluent, my new startup that will make history! ;-)