Killing an Oracle process from inside Oracle

I had a following situation few days ago – I was running a CREATE TABLE AS SELECT over a heterogenous services dblink. However I cancelled this command via pressing CTRL+C twice in Windows sqlplus (this actually just kills the client sqlplus and not the call).

Anyway, when I wanted to drop that table involved, this happened:

SQL> drop table MYTABLE;
drop table MYTABLE
           *
ERROR at line 1:
ORA-00054: resource busy and acquire with NOWAIT specified

I can’t drop a table as someone is holding a lock on it. Fair enough, this was a dev environment used only by me, so I used DBA_OBJECTS.OBJECT_ID to find out the object ID of that table:

SQL> @o MYTABLE

owner                     object_name                    object_type        CREATED           LAST_DDL_TIME     status           OID      D_OID
------------------------- ------------------------------ ------------------ ----------------- ----------------- --------- ---------- ----------
XYZ_DEV01_OWNER           MYTABLE                        TABLE              20080616 11:08:44 20080616 11:08:44 VALID          63764      63764

…and then I queried what enqueue locks were held on that object:

SQL> select * from v$lock where id1=63764;

ADDR     KADDR           SID TY        ID1        ID2      LMODE    REQUEST      CTIME      BLOCK
-------- -------- ---------- -- ---------- ---------- ---------- ---------- ---------- ----------
40034278 40034290        130 TM      63764          0          6          0       2662          0

Ok, I see session 130 holding a TM lock on that table. I queried the corresponding SERIAL# from v$session as well and killed the session:

SQL> alter system kill session '130,8764';
alter system kill session '130,8764'
*
ERROR at line 1:
ORA-00031: session marked for kill

SQL> select * from v$lock where id1=63764;

ADDR     KADDR           SID TY        ID1        ID2      LMODE    REQUEST      CTIME      BLOCK
-------- -------- ---------- -- ---------- ---------- ---------- ---------- ---------- ----------
40034278 40034290        130 TM      63764          0          6          0       2668          0

After hanging for 60 seconds, my kill command gave up (and marked my session for kill), but my lock was still not released… Now what?

This happens when the victim session is so stuck somewhere in an Oracle call that it never has a chance to receive the message it has been killed. And apparently some resources used can in that call can’t be released (for whatever reason, it may be by design, it may just be a bug).

The below queries against V$SESSION and V$LOCK show that even though the session has been marked to be in killed status, it’s still holding a lock:

SQL> @usid 130

USERNAME                SID                 AUDSID OSUSER           MACHINE            PROGRAM              SPID          HASH_VALUE   LASTCALL STATUS
----------------------- -------------- ----------- ---------------- ------------------ -------------------- ------------ ----------- ---------- --------
XYZ_DEV01_OWNER          '130,8764'          33533 1288249          \XYZHOST001        sqlplus.exe          3872          3564023715       4032 KILLED

SQL> select * from v$lock where id1=63764;

ADDR     KADDR           SID TY        ID1        ID2      LMODE    REQUEST      CTIME      BLOCK
-------- -------- ---------- -- ---------- ---------- ---------- ---------- ---------- ----------
40034278 40034290        130 TM      63764          0          6          0       2695          0

Ok, I tried various other options, like kill immediate and disconnect, which should have higher chance to clean up my session properly:

SQL> alter system kill session '130,8764' immediate;
alter system kill session '130,8764' immediate
*
ERROR at line 1:
ORA-00031: session marked for kill

SQL> alter system disconnect session '130,8764' immediate;
alter system disconnect session '130,8764' immediate
*
ERROR at line 1:
ORA-00031: session marked for kill

SQL> select * from v$lock where id1=63764;

ADDR     KADDR           SID TY        ID1        ID2      LMODE    REQUEST      CTIME      BLOCK
-------- -------- ---------- -- ---------- ---------- ---------- ---------- ---------- ----------
40034278 40034290        130 TM      63764          0          6          0       2710          0

Still no luck, lock is there.

So I found the OS PID of my server process (or actually OS Thread ID inside oracle.exe process as I was on Windows) and used oradebug short_stack to check where that process was stuck (the output is slightly formatted):

SQL> oradebug setospid 3872;
Oracle pid: 18, Windows thread id: 3872, image: ORACLE.EXE (SHAD)

SQL> oradebug short_stack;
_ksdxfstk+14<-_ksdxcb+1481<-_ssthreadsrgruncallback+428<-_OracleOradebugThreadStart@4+819
<-7C80B680<-00000000<-71A557C4<-71AB4376<-6298540C<-6298325E<-60A0D931<-609D005F<-609B073D<-609AF9
endExitAgent+202<-_hoxexit+188<-_hogmdis+890<-_hogdisc+8<-_xupidhs+137<-_upidhs+20<-_kpudtch+305
<-_OCIServerDetach+12<-_ncodncf+268<-_npidr0+2300<-_npifcc+46<-_qerrmFBu+457<-_qerrmFetch+1
+1291<-_opiodr+1099<-_rpidrus+178<-_rpidru+88<-_rpiswu2+426<-_rpidrv+1461<-_psddr0+449
<-_psdnal+283<-_pevm_EXIM+153<-_pfrinstr_EXIM+34<-_pfrrun_no_tool+56<-_pfrrun+781<-_plsql_run+738
<-_pr+1099<-_opidrv+819<-_sou2o+45<-_opimai_real+112<-_opimai+92<-_OracleThreadStart@4+726<-7C80B680

This terse stack shows (start reading from bottom right to left) this process is stuck somewhere “above” qerrmFetch (Remote Fetch). I guess the functions starting with “h” above that are heterogenous services functions. By the way, V$SESSION_WAIT didn’t show any wait state changes either and the session itself was still constantly waiting for “HS message to agent” event. So, being stuck in a HS call was probably the reason why that session could not clean itself up.

Now, in such situations one normally would proceed with ORAKILL on Windows or just killing that server process at Unix level (after carefully thinking what you’re about to do). Killing the process makes PMON to perform the cleanup and PMON usually does clean all resources up ok. However I didn’t have access to that Windows server box, so OS kill was out of question.

So, I used another trick. While being connected to the victim process using oradebug, I issued this:

SQL> oradebug event immediate crash;
ORA-00072: process "Windows thread id: 3872, image: ORACLE.EXE (SHAD)" is not active
SQL>

After waiting for a while (crashdump occurred), sqlplus reported that the target process doesn’t exist anymore. Thanks to the crash, PMON had woken up and performed the cleanup.

Let’s see if it helped:

SQL> oradebug short_stack;
ORA-00072: process "Windows thread id: 3872, image: ORACLE.EXE (SHAD)" is not active

SQL> select * from v$lock where id1=63764;

no rows selected

SQL> @usid 130

no rows selected

SQL> drop table MYTABLE;

Table dropped.

Yippee, I finally got rid of that session, lock and could drop my table!

Note that I would still prefer killing the processes with Unix kill or Windows orakill, however there are some bugs with orakill (search in Metalink) that it doesn’t always succeed killing the thread properly. And in my case I didn’t have OS access anyway.

Ideally, the ALTER SYSTEM KILL session command should do all needed cleanup for us, but there are some corner cases involving loops, hangs and bugs where this approach won’t work. There’s a Metalink note 1020720.102 with a list of ways for killing Oracle processes/threads from OS level.

Comments

  1. Nice sum up of the stuff…

    Thanks :)

  2. Amit says:

    Very useful extension for oradebug..:)

    Cheers
    Amit

  3. Polarski bernard says:

    Mike Ault describes a similar process to free from a lock using ‘oradebug wakeup ‘

    Just wondering what is the difference.
    I did some research on this ‘oradebug events immediate crash’ but got no response from google or metalink

  4. tanelp says:

    Hi Bernard, I removed this link as it looks it pointed to pirate book download site. Please don’t post such links here..

  5. tanelp says:

    Regarding the commands, “oradebug events immediate crash” crashes the process, but “oradebug wakeup” should wake up a process if it’s sleeping.

  6. lou says:

    Tanel,

    Will this o script or the o2 script work if I am trying to compile a package if another session is using it. I am talking about if the package is locked?

  7. Good Stuff… Good way + Hard Core… to kill… no no! I should to say “Destroy”.

    Thank you

  8. Fernando Urea says:

    Thanks for your help!!!! I was looking for this for months and i haven´t found any help in the Web

  9. Tanel Poder says:

    Cheers! Of course use this only when you can’t use other more standard methods (like killing the process from OS level with kill (on Unix) or orakill (on Windows))

  10. Haris says:

    Hi Tanel

    Very clear step by step explanations. Good Stuff!!!

    However, I’d like to get your feedback on the same topic (correct me if i am wrong) which provided by Miladin Modrakovic, but he did the investigation on 11g. Here is the link for reference.
    http://oraclue.com/2009/07/28/identifying-pidspid-for-killed-session-in-11g/

    Cheers

  11. omair says:

    great stuff!! helped me much..

  12. Cheers! Of course use this only when you can’t use other more standard methods (like killing the process from OS level with kill (on Unix) or orakill (on Windows))
    ——————————–
    sujatha

  13. Caesar says:

    Wow, it works perfectly, in my case the Orakill commando would not kill the process, it just hangs :P, this did the job, thanks a lot

  14. KG says:

    Tanel
    we had a similar issue on windows 9i database .
    we killed the session at oracle level with alter system kill session then session remained in killed status for hours.
    Then i tried oradebug event immediate crash, but that seems to be stuck for hours
    finally we had to bounce the database as locked object as key for application to work

  15. Tanel Poder says:

    @KG
    On windows you could also use the orakill command (if it works for you). Another way would be to use process explorer to end a single thread in a process, but I’ve never tested this on Oracle, so can’t guarantee that it doesn’t crash your whole oracle.exe process!

  16. KG says:

    @Tanel Poder
    Thanks tanel for Response .

  17. @Tanel Poder You can use process explorer to kill the offending thread on Windows without crashing oracle.exe. I can’t guarantee that it will never crash the instance but I have relatively often used this technique (due to bad experiences with orakill).

  18. simone says:

    Hi there!
    Great hands-on that you put together. However I think this is not covering my situation.
    session status is KILLED but there is no OSpid process (Linux) – waking up POMON is not working and attaching to the process to issue an immediate crash wouldn’t work either because this is no process. No rollback is either affecting the sessions.
    Those sessions they just don’t disappear.
    Can you think of any possible path to follow?
    Cheers
    Simone

  19. Tanel Poder says:

    @simone
    PMON should detect such missing processes and eventually clean them up. But in practice I recall seeing such issue too… how many times did you wake up the PMON proc? Sometimes multiple wakeups are needed.

    If multiple wakeups don’t help, it looks very much like some bug… perhaps with special types of sessions – like are these sessions of some background process or shared server sessions perhaps etc?

Trackbacks

  1. [...] Killing an Oracle process from inside Oracle [...]

  2. [...] This is the case when you need to log on to the server and kill the target process from OS level. If you can’t log on to target OS for whatever reason (but still have SYSDBA access) then you could try attaching to target process with oradebug and running ORADEBUG EVENT IMMEDIATE CRASH as I’ve explained here. [...]

Speak Your Mind

*