Gwen Shapira has written a nice summary of a problem case where the classic wait interface based troubleshooting method is not always enough for troubleshooting low-level issues.
The top SQL + top wait approach should usually be used as the starting point of troubleshooting a session, workload etc, but often the troubleshooting does not stop there. So, when the wait interface and related tools don’t explain the problem well enough, then you either start guessing from there or dig deeper into performance data. And Gwen used the V$SESSTAT metrics (using my Snapper tool) to understand why was a select statement generating redo this time (there are multiple reasons for that – and V$SESSTAT tells you why).
By the way, I’ve repeated this many times, here’s the data and the order of it what I normally use when I want to understand what’s going on in a session or workload:
- TOP Waits / TOP SQL
- V$SESSTAT counters
- Process stack samples
Sometimes looking into #1 is enough, often I need to look into #1 and #2 and rarely, but regularly I need to look into #3 too. But I always start from #1, then #2 and then, if needed, #3. I call this the “Advanced Oracle Troubleshooting” method ;-)
NB! If you want to move to the "New World" - and benefit from the awesomeness of Hadoop, without having to re-engineer your existing applications - check out Gluent, my new startup that will make history! ;-)