Killing an Oracle process from inside Oracle

I had a following situation few days ago – I was running a CREATE TABLE AS SELECT over a heterogenous services dblink. However I cancelled this command via pressing CTRL+C twice in Windows sqlplus (this actually just kills the client sqlplus and not the call).

Anyway, when I wanted to drop that table involved, this happened:

SQL> drop table MYTABLE;
drop table MYTABLE
           *
ERROR at line 1:
ORA-00054: resource busy and acquire with NOWAIT specified

I can’t drop a table as someone is holding a lock on it. Fair enough, this was a dev environment used only by me, so I used DBA_OBJECTS.OBJECT_ID to find out the object ID of that table:

SQL> @o MYTABLE

owner                     object_name                    object_type        CREATED           LAST_DDL_TIME     status           OID      D_OID
------------------------- ------------------------------ ------------------ ----------------- ----------------- --------- ---------- ----------
XYZ_DEV01_OWNER           MYTABLE                        TABLE              20080616 11:08:44 20080616 11:08:44 VALID          63764      63764

…and then I queried what enqueue locks were held on that object:

SQL> select * from v$lock where id1=63764;

ADDR     KADDR           SID TY        ID1        ID2      LMODE    REQUEST      CTIME      BLOCK
-------- -------- ---------- -- ---------- ---------- ---------- ---------- ---------- ----------
40034278 40034290        130 TM      63764          0          6          0       2662          0

Ok, I see session 130 holding a TM lock on that table. I queried the corresponding SERIAL# from v$session as well and killed the session:

SQL> alter system kill session '130,8764';
alter system kill session '130,8764'
*
ERROR at line 1:
ORA-00031: session marked for kill

SQL> select * from v$lock where id1=63764;

ADDR     KADDR           SID TY        ID1        ID2      LMODE    REQUEST      CTIME      BLOCK
-------- -------- ---------- -- ---------- ---------- ---------- ---------- ---------- ----------
40034278 40034290        130 TM      63764          0          6          0       2668          0

After hanging for 60 seconds, my kill command gave up (and marked my session for kill), but my lock was still not released… Now what?

This happens when the victim session is so stuck somewhere in an Oracle call that it never has a chance to receive the message it has been killed. And apparently some resources used can in that call can’t be released (for whatever reason, it may be by design, it may just be a bug).

The below queries against V$SESSION and V$LOCK show that even though the session has been marked to be in killed status, it’s still holding a lock:

SQL> @usid 130

USERNAME                SID                 AUDSID OSUSER           MACHINE            PROGRAM              SPID          HASH_VALUE   LASTCALL STATUS
----------------------- -------------- ----------- ---------------- ------------------ -------------------- ------------ ----------- ---------- --------
XYZ_DEV01_OWNER          '130,8764'          33533 1288249          \XYZHOST001        sqlplus.exe          3872          3564023715       4032 KILLED

SQL> select * from v$lock where id1=63764;

ADDR     KADDR           SID TY        ID1        ID2      LMODE    REQUEST      CTIME      BLOCK
-------- -------- ---------- -- ---------- ---------- ---------- ---------- ---------- ----------
40034278 40034290        130 TM      63764          0          6          0       2695          0

Ok, I tried various other options, like kill immediate and disconnect, which should have higher chance to clean up my session properly:

SQL> alter system kill session '130,8764' immediate;
alter system kill session '130,8764' immediate
*
ERROR at line 1:
ORA-00031: session marked for kill

SQL> alter system disconnect session '130,8764' immediate;
alter system disconnect session '130,8764' immediate
*
ERROR at line 1:
ORA-00031: session marked for kill

SQL> select * from v$lock where id1=63764;

ADDR     KADDR           SID TY        ID1        ID2      LMODE    REQUEST      CTIME      BLOCK
-------- -------- ---------- -- ---------- ---------- ---------- ---------- ---------- ----------
40034278 40034290        130 TM      63764          0          6          0       2710          0

Still no luck, lock is there.

So I found the OS PID of my server process (or actually OS Thread ID inside oracle.exe process as I was on Windows) and used oradebug short_stack to check where that process was stuck (the output is slightly formatted):

SQL> oradebug setospid 3872;
Oracle pid: 18, Windows thread id: 3872, image: ORACLE.EXE (SHAD)

SQL> oradebug short_stack;
_ksdxfstk+14<-_ksdxcb+1481<-_ssthreadsrgruncallback+428<-_OracleOradebugThreadStart@4+819
<-7C80B680<-00000000<-71A557C4<-71AB4376<-6298540C<-6298325E<-60A0D931<-609D005F<-609B073D<-609AF9
endExitAgent+202<-_hoxexit+188<-_hogmdis+890<-_hogdisc+8<-_xupidhs+137<-_upidhs+20<-_kpudtch+305
<-_OCIServerDetach+12<-_ncodncf+268<-_npidr0+2300<-_npifcc+46<-_qerrmFBu+457<-_qerrmFetch+1
+1291<-_opiodr+1099<-_rpidrus+178<-_rpidru+88<-_rpiswu2+426<-_rpidrv+1461<-_psddr0+449
<-_psdnal+283<-_pevm_EXIM+153<-_pfrinstr_EXIM+34<-_pfrrun_no_tool+56<-_pfrrun+781<-_plsql_run+738
<-_pr+1099<-_opidrv+819<-_sou2o+45<-_opimai_real+112<-_opimai+92<-_OracleThreadStart@4+726<-7C80B680

This terse stack shows (start reading from bottom right to left) this process is stuck somewhere “above” qerrmFetch (Remote Fetch). I guess the functions starting with “h” above that are heterogenous services functions. By the way, V$SESSION_WAIT didn’t show any wait state changes either and the session itself was still constantly waiting for “HS message to agent” event. So, being stuck in a HS call was probably the reason why that session could not clean itself up.

Now, in such situations one normally would proceed with ORAKILL on Windows or just killing that server process at Unix level (after carefully thinking what you’re about to do). Killing the process makes PMON to perform the cleanup and PMON usually does clean all resources up ok. However I didn’t have access to that Windows server box, so OS kill was out of question.

So, I used another trick. While being connected to the victim process using oradebug, I issued this:

SQL> oradebug event immediate crash;
ORA-00072: process "Windows thread id: 3872, image: ORACLE.EXE (SHAD)" is not active
SQL>

After waiting for a while (crashdump occurred), sqlplus reported that the target process doesn’t exist anymore. Thanks to the crash, PMON had woken up and performed the cleanup.

Let’s see if it helped:

SQL> oradebug short_stack;
ORA-00072: process "Windows thread id: 3872, image: ORACLE.EXE (SHAD)" is not active

SQL> select * from v$lock where id1=63764;

no rows selected

SQL> @usid 130

no rows selected

SQL> drop table MYTABLE;

Table dropped.

Yippee, I finally got rid of that session, lock and could drop my table!

Note that I would still prefer killing the processes with Unix kill or Windows orakill, however there are some bugs with orakill (search in Metalink) that it doesn’t always succeed killing the thread properly. And in my case I didn’t have OS access anyway.

Ideally, the ALTER SYSTEM KILL session command should do all needed cleanup for us, but there are some corner cases involving loops, hangs and bugs where this approach won’t work. There’s a Metalink note 1020720.102 with a list of ways for killing Oracle processes/threads from OS level.

Note that this year’s only Advanced Oracle Troubleshooting class takes place in the end of April/May 2014, so sign up now if you plan to attend this year!

This entry was posted in Oracle and tagged , . Bookmark the permalink.

50 Responses to Killing an Oracle process from inside Oracle

  1. Nice sum up of the stuff…

    Thanks :)

  2. Amit says:

    Very useful extension for oradebug..:)

    Cheers
    Amit

  3. Polarski bernard says:

    Mike Ault describes a similar process to free from a lock using ‘oradebug wakeup ‘

    Just wondering what is the difference.
    I did some research on this ‘oradebug events immediate crash’ but got no response from google or metalink

  4. tanelp says:

    Hi Bernard, I removed this link as it looks it pointed to pirate book download site. Please don’t post such links here..

  5. tanelp says:

    Regarding the commands, “oradebug events immediate crash” crashes the process, but “oradebug wakeup” should wake up a process if it’s sleeping.

  6. Pingback: Killing an Oracle process without OS access « Oracle Logbook

  7. lou says:

    Tanel,

    Will this o script or the o2 script work if I am trying to compile a package if another session is using it. I am talking about if the package is locked?

  8. Good Stuff… Good way + Hard Core… to kill… no no! I should to say “Destroy”.

    Thank you

  9. Fernando Urea says:

    Thanks for your help!!!! I was looking for this for months and i haven´t found any help in the Web

  10. Tanel Poder says:

    Cheers! Of course use this only when you can’t use other more standard methods (like killing the process from OS level with kill (on Unix) or orakill (on Windows))

  11. Pingback: Alter system kill session and ORA-00031: session marked for kill | Tanel Poder's blog: Core IT for Geeks and Pros

  12. Haris says:

    Hi Tanel

    Very clear step by step explanations. Good Stuff!!!

    However, I’d like to get your feedback on the same topic (correct me if i am wrong) which provided by Miladin Modrakovic, but he did the investigation on 11g. Here is the link for reference.
    http://oraclue.com/2009/07/28/identifying-pidspid-for-killed-session-in-11g/

    Cheers

  13. omair says:

    great stuff!! helped me much..

  14. Cheers! Of course use this only when you can’t use other more standard methods (like killing the process from OS level with kill (on Unix) or orakill (on Windows))
    ——————————–
    sujatha

  15. Caesar says:

    Wow, it works perfectly, in my case the Orakill commando would not kill the process, it just hangs :P, this did the job, thanks a lot

  16. KG says:

    Tanel
    we had a similar issue on windows 9i database .
    we killed the session at oracle level with alter system kill session then session remained in killed status for hours.
    Then i tried oradebug event immediate crash, but that seems to be stuck for hours
    finally we had to bounce the database as locked object as key for application to work

  17. Tanel Poder says:

    @KG
    On windows you could also use the orakill command (if it works for you). Another way would be to use process explorer to end a single thread in a process, but I’ve never tested this on Oracle, so can’t guarantee that it doesn’t crash your whole oracle.exe process!

  18. KG says:

    @Tanel Poder
    Thanks tanel for Response .

  19. @Tanel Poder You can use process explorer to kill the offending thread on Windows without crashing oracle.exe. I can’t guarantee that it will never crash the instance but I have relatively often used this technique (due to bad experiences with orakill).

  20. simone says:

    Hi there!
    Great hands-on that you put together. However I think this is not covering my situation.
    session status is KILLED but there is no OSpid process (Linux) – waking up POMON is not working and attaching to the process to issue an immediate crash wouldn’t work either because this is no process. No rollback is either affecting the sessions.
    Those sessions they just don’t disappear.
    Can you think of any possible path to follow?
    Cheers
    Simone

  21. Tanel Poder says:

    @simone
    PMON should detect such missing processes and eventually clean them up. But in practice I recall seeing such issue too… how many times did you wake up the PMON proc? Sometimes multiple wakeups are needed.

    If multiple wakeups don’t help, it looks very much like some bug… perhaps with special types of sessions – like are these sessions of some background process or shared server sessions perhaps etc?

  22. Eduardo Claro says:

    Tanel,

    Today one of our databases is in a situation that:
    - Long-running Session that was locking others was killed with ‘ALTER SYSTEM KILL SESSION’.
    - After some time, session was still marked as ‘KILLED’, but locks were not release. Issued KILL -9 in Unix.
    - Event after some hours (much more time than the session was running), the lock situation is still there.
    - Tried to user ORADEBUG, but when we try to use SETOSPID we receive ORA-00072:
    SQL> conn /as sysdba
    Connected.
    SQL> oradebug setospid 16318;
    ORA-00072: process “Unix process pid: 16318, image: ” is not active
    - Tried also to use SETORAPID FORCE, but:
    SQL> oradebug setorapid 587 force;
    Statement processed.
    SQL> oradebug event immediate crash;
    ORA-27150: attempt to notify process of pending oradebug call failed
    ORA-27300: OS system dependent operation:kill failed with status: 3
    ORA-27301: OS failure message: No such process
    ORA-27302: failure occurred at: skgpwcall1
    ORA-27303: additional information: target pid = 16318
    - Tried also to wakeup PMON with ORADEBUG WAKEUP 2.

    Nothing worked. Any other idea before bouncing the instance?

    • Selva says:

      I had a similar situation where server process crashed, database session still active – trying to cleanup session – appreciate if you can share any additional info on this thread on how you handled the situation without bouncing database.

  23. Jinwen Zou says:

    Very good trick, good way to terminate a session in logon trigger.
    alter session set events ‘immediate track’

  24. There was very long, but good discussion with Tom Kyte about killing sessions and why is may not (and should not) work.
    http://asktom.oracle.com/pls/apex/f?p=100:11:0::NO::P11_QUESTION_ID:4974573906087

  25. Bingo Bongo says:

    Hi tanel,
    What are the implications of the execution of the command “oradebug event immediate crash” on the java connection pool?
    The WebSphere server could crash?

    Regards

    • Tanel Poder says:

      That particular connection in the connection pool should receive an ORA-3113 error or similar, resulting in an exception of your java code using that connection. The webserver shouldn’t crash :)

      • Bingo Bongo says:

        Thanks tanel for your answer.
        I have a problem in my production enviroment because randomly a transaction remains locked for “latch free” event and oracle kill or unix -9 do not kill the transaction.
        This transaction locks more than 200 objects and during on line service other transactions are blocked for “enqueue” event.
        After shutdown the WebSphere the transactions in wait “enqueue” are released, but “mother” transaction in wait for “latch free” isn’t released and db restart is required.
        I’ll try with the trick.

        Regards.

  26. Bingo Bongo says:

    The value of status in v$session is “active”. The values USED_UBLK & USED_UREC in v$transaction never change. I tried to kill first Oracle (also IMMEDIATE) and after Unix. I tried to kill first Unix and after Oracle, but USED_UBLK & USED_UREC not changes and the transaction is not resolved. The transaction is in wait for “latch free” (cache buffer chain) event.

    • Tanel Poder says:

      That’s not what I asked for :) I asked for the column STATE (not STATUS, different things! :)

      • Bingo Bongo says:

        Sorry!
        In this moment I don’t have my report, but I seem to remember that the value is always WAITED UNKNOWN TIME.
        Tomorrow I will write the exact value.

        Thanks for your time.

        • Tanel Poder says:

          Anything other than WAITING means ON CPU!
          So, WAITED UNKNOWN TIME == ON CPU!

          Check with top from the OS, whether the process is using CPU? If not, there may be a wait interface bug involved.

          Run snapper on the SID: @snapper4 all 10 1 &sid
          (it samples for 10 seconds)

          • Bingo Bongo says:

            Thank you,
            tomorrow will execute the script.
            But is possible that I must wait a lot of time before the objects locked by transaction be released after kill -9 on unix?

            At this time all my hope is in your trick :-)

          • Tanel Poder says:

            Yes its possible that even when doing a kill -9, the PMON (or SMON or PX slaves) who happen to be doing the state object recovery (and possible rollback operation) somehow get stuck. So then you’d want to troubleshoot (measure) what these processes are doing.

            Nevertheless, I would focus on why does the process get stuck (and has to be killed) first, instead of why doesn’t the cleanup happen as fast…

  27. Bingo Bongo says:

    Your considerations are correct.
    Unfortunately, there is no evidence of any activity to rollback the transaction.

    The PL / SQL That Remains in wait for “latch free” is always the same, but do not lock all the time, but randomly.
    That is possible PL / SQL can block the transaction in this way?
    This issue is very strange …

    Okey …
    Here are 23.00 pm … Thanks and good nigth :-)

  28. Merry says:

    Tanel,

    We have an Oracle 9.2 on openvms 7.3 and we have some session that are marked for kill but still visible on v$session via toad. I saw the note Metalink note 1020720.102
    and We execute the following command but this command doesn’t kill those session it keep showing others sessions with a negative value of serial# . We thought this command kill zombies that exists on v$process but doesn’t existe on v$session but it’s not could you help me please ?
    SELECT spid
    FROM v$process
    WHERE NOT EXISTS ( SELECT 1
    FROM v$session
    WHERE paddr = addr);

    • Tanel Poder says:

      So what does your query return – a bunch of SPIDs? And once you check in OS, do these processes still exist there? If the processes are already gone, the PMON needs to clean up – if it doesn’t do it automatically, you’ll need to wake it up using “ORADEBUG WAKEUP “, usually 2. This might not work for you, YMMV. Better get some blessing from support before running ORADEBUG against background procs…

      • Merry says:

        select * from v$process p where not exists ( select * from v$session s where s.paddr = p.addr ) ;

        ADDR PID SPID USERNAME SERIAL# TERMINAL PROGRAM TRACEID BACKGROUND LATCHWAIT LATCHSPIN PGA_USED_MEM PGA_ALLOC_MEM PGA_FREEABLE_MEM PGA_MAX_MEM
        FFFFFEBBB3378B90 1 0 PSEUDO 0 0 0 0
        FFFFFEBBB3388270 51 216d81b0 ORACLE92 -86 UNKNOWN ORACLE (P002) 191545 123677833 122880000 123677833
        FFFFFEBBB3388760 52 216e41b1 ORACLE92 50 UNKNOWN ORACLE (P003) 191545 123735329 122814464 123735329
        FFFFFEBBB3389140 54 216d21b2 ORACLE92 -22 UNKNOWN ORACLE (P004) 191545 123546761 122748928 123546761
        FFFFFEBBB3389B20 56 2171f1b3 ORACLE92 -124 UNKNOWN ORACLE (P005) 191545 125250697 124452864 125250697
        FFFFFEBBB338A500 58 216e35b5 ORACLE92 120 UNKNOWN ORACLE (P006) 191609 158739593 157679616 158739593
        FFFFFEBBB338A9F0 59 216e75b7 ORACLE92 51 UNKNOWN ORACLE (P008) 191521 158739593 157745152 158739593
        FFFFFEBBB338B8C0 62 216e31b6 ORACLE92 38 UNKNOWN ORACLE (P007) 191545 158674057 157679616 158674057
        FFFFFEBBB338BDB0 63 216e61b8 ORACLE92 -55 UNKNOWN ORACLE (P009) 191233 158674057 157679616 158674057

        $ show proc/id=216d81b0

        3-OCT-2013 07:54:21.67 User: ORACLE92 Process ID: 216D81B0
        Node: GS01 Process name: “ORA_DGSN1_P002″

        Terminal:
        User Identifier: [ORACLE92]
        Base priority: 4
        Default file spec: Not available
        Number of Kthreads: 1

        Devices allocated: BG7413:
        BG7414:
        BG7440:
        BG7923:
        BG7924:
        BG7925:
        BG7927:
        BG7928:
        BG7929:

        Soft CPU Affinity: off

        • Merry says:

          I was suspicious about serial# of v$process that have values negative but when i google they said it couldn’t be backgroundprocess.

        • Tanel Poder says:

          It’s fine to have P00x processes there without a corresponding session. These are the parallel slave processes who don’t have anything to do, so no corresponding session associated with them. Once you run a parallel query, some of these processes will have an associated session. So you do not need to kill these processes. If you are not using PX at all, you can set parallel_max_servers / parallel_min_servers = 0.

          • Merry says:

            Thanks Tanel for your help .
            I had one more question please is it normal to have negative troughput rate transfer in spotlight (Toad Quest) of temp file ? that ‘s mean -1 600 000MB/s ?

          • Tanel Poder says:

            Whenever a 3rd party tool shows a “problem” I want to see where the numbers come from, how they’re calculated. In other words, if a correcly written query against V$ views executed in sqlplus also shows the problem, it would be an issue in Oracle, otherwise it’s the 3rd party tool that has a bug or calculates something incorrectly… This one looks like Toad/Spotlight problem

  29. Merry says:

    Tanel,

    We have a RAC Oracle 9.2.0.4 on openvms 7.3

  30. Bingo Bongo says:

    Hi Tanel,
    after one month without transaction in wait for “Latch Free” this day we have 2 transanction in latch free.
    I tried to kill one of the two transactions with the crash of ‘oradebug’ but the effect was the same as the “kill” command: the transaction is hanging and they are formed in other latch free.
    I had to restart the database.

    Now:
    The kill Oracle does not kill the transaction
    The kil Unix does not kill the transaction
    The trick “oradebug crash” does not kill the transaciont.

    1. There are others tricks or commands to kill the sessions?
    2. In v$lock I have 213 locked objects and the TX has BLOCK = 1. Can I delete/unlock the objects without kill the transaction?
    3. Is possible to force the change of the hash_value of a store procedure? Because the transaction hash_value is always the same. I want to verify if the problem is in shared pool.

    Regards.

    • Tanel Poder says:

      First, how did you determine the wait for latch free? Did you check that the V$SESSION.STATE was actually WAITING. If it’s ANYTHING other than WAITING, including WAITED…. it’s NOT waiting, but just burning CPU on some busy loop.

      If you kill/crash a process, it will die without cleaning up any locks, latches etc. It’s the PMON who must detect the deatch of a process and would clean up after it. The cleanup may take a long time if there’s a lot to clean up (large transactions). So you’d need to see if the session has a transaction which is being rolled back and how long this would take (v$transaction.used_ublk).

      If there’s no long-running rollback already going on, then maybe this is happening: your processes parameter value is large and if you’re running some old oracle version, it may be possible that PMON hasn’t realized yet that your process is dead – you either wait until PMON detects the dead process or you wake PMON up manually a number of times until it detects it (with large processes value, the PMON doesn’t check for all processes every time it wakes up). This used to be done with the ORADEBUG WAKEUP call – however in the modern DB versions (10g+) this shouldn’t be much of an issue. And using ORADEBUG against critical background processes would really be a last resort technique anyway, so I advise against it.

      Much more likely it’s a long running transaction that gets rolled back – or you’re hitting some bug.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>