Alter system kill session and ORA-00031: session marked for kill

I’m sure you are all familiar with this situation:

SQL> alter system kill session '152,33';
alter system kill session '152,33'
*
ERROR at line 1:
ORA-00031: session marked for kill

The session trying to issue the kill will hang for 60 seconds and then return this “session marked for kill” message. And the target session does not get killed at all.

So why is that?

The issue is in what this alter system kill command is doing. It’s not actually killing the target session (like kill -9 would do for OS processes). It just sets a bit in the target sessions state object, which marks that the target session should end. But its entirely up the target session to check this bit and act on it!

So, intead of ALTER SYSTEM KILL SESSION, the command should look something like ALTER SYSTEM ASK SESSION TO COMMIT SUICIDE.

All the kill session command is doing is ASK the target session to clean up and exit – via setting that bit.

Now, normally the target sessions are nice and check that bit often enough in their code, act on it and die.

But sometimes when the target session happens to be busy looping in some tight loop (due a bug perhaps) or is hung, then it never gets to check that “please die” bit and never exits.

This is why DBAs often need to kill the OS process or thread via OS tools to get rid of that session (and its locks, transactions) as when you kill the OS process, PMON will detect it (if not fast enough then it can be woken up via ORADEBUG WAKEUP call few times) and clean up after that session.

So, the “ORA-00031: session marked for kill” message you see after 60 seconds just means that:

1) Your session sets the “please die” bit in target sessions state object

2) Your session waits the target session to die for 60 seconds and times out after it doesn’t happen

3) Your session returns “session marked for kill” error – which means exactly what I wrote in step 1 above (“please die” bit is set)

By the way, if your session waits for the target session to die – what is the wait event you are waiting on then? Every wait, except few bugs, should be instrumented in Oracle.

This is easy to test, run this for example:

SQL> exec dbms_lock.sleep(100)

And immediately after try to kill that sleeping session from your session (in my case my session which issued the alter system kill command was 146). Lets see what it waits for:

SQL> @sw 146

 SID STATE   EVENT                                          SEQ# SEC_IN_WAIT                 P1
------- ------- ---------------------------------------- ---------- ----------- ------------------
 146 WAITING inactive session                                140           3                152

See, the killer waits for inactive session event. The parameter 1 (P1) is 152. What does that mean?

V$EVENT_NAME (or V$SESSION_WAIT itself) gives you the answer. I will run my show event descriptions script to display that events parameter meaning:

SQL> @sed "inactive session"
EVENT# EVENT_NAME                                              PARAMETER1           PARAMETER2
------ ------------------------------------------------------- -------------------- ---------------
296 inactive session                                        session# waited

SQL>

So, the killer waits for inactive session wait for 60 seconds and then times out. If the target session to be killed ever checks that “please die” bit, it will clean up and exit, otherwise it could be stuck forever!

This is the case when you need to log on to the server and kill the target process from OS level. If you can’t log on to target OS for whatever reason (but still have SYSDBA access) then you could try attaching to target process with oradebug and running ORADEBUG EVENT IMMEDIATE CRASH as I’ve explained here.

My preferred approach for killing sessions usually is:

1) of course – verify whether you’re killing the right session first

2) ALTER SYSTEM KILL SESSION

3) If that doesn’t work immediately then check whether the target session has acknowledged the kill and is rolling back its (large) transaction. You can do this by checking V$TRANSACTION.USED_UREC for that session’s transaction (if it has any) and see if its decreasing. If yes, the transaction is rolling back and we just need to wait it to finish. There’s also a bit in V$TRANSACTION which states whether the transaction is rolling back, but I don’t have my notes available right now (I’m on vacation actually :)

4) If there’s no rollback happening and session just seems to be stuck, then its time to kill that session’s process from OS level.

5) If couple of minutes after killing the process from OS level that sessions and its locks & resources are still not released (remember, we have verified that there was no long transaction rollback happening) then I would attach to my own process with oradebug and run “ORADEBUG WAKEUP 2″ couple of times (and checking if the session has gone + waiting few seconds between each invocation). The “2″ here means Oracle PID of PMON process which is usually 2, but you should check it from your V$PROCESS view.

If the session is still not gone, it’s time to open a support request.

Note that I didn’t post all the little details here – for example, before killing a problem process which is stuck it makes sense to gather some diagnostic data first (like run pstack on the problem process few times to get its current stack trace).

This entry was posted in Oracle and tagged , , . Bookmark the permalink.

16 Responses to Alter system kill session and ORA-00031: session marked for kill

  1. Edgar Chupit says:

    That’s probably one of the main reasons, why in 11g Oracle has introduced ALTER SYSTEM DISCONNECT SESSION … IMMEDIATE command. According to documentation: The IMMEDIATE setting disconnects the session and recovers the entire session state immediately, without waiting for ongoing transactions to complete.
    And it works quite nice in real world, we don’t have to login to DB host to kill sessions anymore.

  2. Kumar says:

    Tanel:
    Very informative article. Is 152 the session from which you issued the kill? When you say ” Your session waits the target session to die for 60 seconds and times out after it doesn’t happen”. So it waits for 60 seconds for the bad session to get killed otherwise it just sets the bit and we hope that that the bit is acknownledged by the bad session? How is the limit of 60 seconds determined?

  3. Anurag says:

    Why to verify step 4) If there’s no rollback happening and session just seems to be stuck, then its time to kill that session’s process from OS level.

    We can directly go and kill session at OS level isn’t ? Killed process anyway going to clear by PMON.

  4. Dani Rey says:

    Thank you Tanel for this great overview about “alter system kill session”. This also refreshed some more in detail stuff (session state object, oradebug tricks), which I learned in you course!

    Keep up the good work!

    Thanks Edgar for the 11g hint. I will definitely have to test this.

  5. Coskan says:

    alter system kill session ” immediate and alter system disconnect session ” immediate are also available in 10.2 and 10.1

    http://download.oracle.com/docs/cd/B19306_01/server.102/b14200/statements_2013.htm#BABEBJFA

    http://download.oracle.com/docs/cd/B14117_01/server.101/b10759/statements_2013.htm#i2282145

    Tanel do you think it is better to use one of them instead of just alter system kill session

  6. Paresh says:

    Thanks Tanel for this great article. Do you know where Oracle is setting “SUICIDE” bit on for to-be-killed session? You are so productive during vacation time too :)

  7. Dave Herring says:

    Great info as always, Tanel!

    Correct me if I’m wrong, but under 9i it seems that after you killed an Oracle session and it gets “marked for kill”, the process goes away but not the session. So at that point you can’t get the SPID, as it’s not found in V$PROCESS.

    I believe this ISN’T the case in 10g. I need to fit 9i on my laptop and test …

  8. Tanel Poder says:

    I will write a separate article about the disconnect session and kill session immediate. In my previous experience the immediate options haven’t worked reliably. I tested on 11.1.0.7 on Linux and still it succeeds only in some cases.

    Regarding why I would not want to kill -9 a session when its already rolling back a long transaction – the reason is that I would prefer the session to clean up after itself, instead of disturbing it and letting smon to take over.

    There can be value in killing the session and letting smon to take over though, if you’re on EE, have lots of CPUs and have fast_start_parallel_rollback enabled, this can make smon spawn multiple slaves for recovery. However it doesn’t always necessarily speed up the rollback and I’ve had bad experiences with completely hung parallel rollbacks in 9.2. I’m sure this issue is fixed by now but I’m just being careful – if a session is already rolling back after itself and if the estimated rollback time is acceptable, then don’t touch it!

  9. Tanel Poder says:

    Stay tuned for another article on this topic!

  10. Raj says:

    very good article and informative. Even in 10g kill immediate is available. As you said some times it works .. some times it doesn’t.

  11. Online Car Insurance >> http://onlinecarinsuranceclaims.com/ says:

    [... - blog.tanelpoder.com is another nice website of information. Car insurance claims [... -

  12. ronald says:

    It worked for me!
    to test if you can kill a session

    1. create the session by logging into the database as user x and execute exec dbms_lock.sleep(100)
    2. create another session and select sid,serial#,event from v$session where username=’userx’;
    3. on the same session, ‘alter system kill session ‘sid,serial#’;
    BIG NOTE: Wait for 60 seconds and don’t be impatient. It’ll be killed after sometime.

  13. Tanel Poder says:

    @ronald
    Ronald, you’ll probably find that the session got finally “killed” after 100 seconds, the time you requested it to sleep. But if the sleep is longer, or the session is hopelessly stuck in some loop (due a bug) then the session won’t be killed until the sleep/operation completes.

  14. Girish Sharma says:

    Sir,

    As you promised “I will write a separate article about the disconnect session and kill session immediate.”, i am just requesting you to, if you please get time to write it or if you have posted it, kindly share the link, so that I may know what is mystery behind simple kill and disconnect session.

    Best Regards
    Girish Sharma

  15. Luis Santos says:

    I´m also looking for “Kill vs disconnect session” article. If this exists please let´s us know.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>