Tanel Poder's blog: IT & Mobile for Geeks and Pros
Oracle, Exadata, Linux, Performance, Troubleshooting - Mobile Life and Productivity.
My contact email for business enquiries is email@example.com
I have been following your blog and you must have read some of my comments too over here as well as over the OTN forums too with the same sign. I am always interested in the internals and P/T stuff. I have started doing it more seriously now from some time. I want your guidance and support to learn all this. How do you research the topics? As you mentioned that you didn’t have the formal IT training so its just amazing what you are doing. I shall appreciate if you can help/guide me to go ahead with the research path. A good mentor is must as there must not be a wrong learning. And I believe, one can’t do P/T if he doesn’t know how oracle works. So can you please help me in it?
In wait for the reply.
I am still planning to write a blog entry about exacly this stuff! So keep tuned!
I was in your recent class of Advanced troubleshooting in DC. It was really great to be in the class, learned a lot and hopefully make practical use of the systematic troubleshooting approach.
I am trying to download your latest set of scripts, where can I find them on your web site?
Hi Sriram, thanks for attending! I enjoyed these three days too!
I have uploaded the seminar logs to my blog and also have put the latest scripts in a zip file there.
Logs are downloadable from here:
The TPT scripts are here:
Have been using your tools alot recently… For some reason when I use usql I get a disconnect:
SQL*Plus: Release 10.2.0.3.0 – Production on Wed Dec 9 12:20:35 2009
Copyright (c) 1982, 2006, Oracle. All Rights Reserved.
Enter user-name: / as sysdba
Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 – 64bit Production
With the Partitioning, Real Application Clusters, OLAP and Data Mining options
USERNAME INST_NAME HOST_NAME SID SERIAL# VERSION STARTED SPID CPID SADDR PADDR
——————– ———— ————————- —– ——– ———- ——– ————— ————— —————- —————-
SYS UK1 lscxw01p1.ukdatapoint.com 198 4875 10.2.0.3.0 20091208 6222 6154 000000021F578F08 000000021F40E918
SQL> select * from v$version;
Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 – 64bi
PL/SQL Release 10.2.0.3.0 – Production
CORE 10.2.0.3.0 Production
TNS for Linux: Version 10.2.0.3.0 – Production
NLSRTL Version 10.2.0.3.0 – Production
SQL> !uname -a
Linux lscxw01p1.ukdatapoint.com 2.6.9-67.0.4.ELsmp #1 SMP Fri Jan 18 05:00:00 EST 2008 x86_64 x86_64 x86_64 GNU/Linux
SQL> @usql 243
HASH_VALUE CH# PLAN_HASH SQL_TEXT
———- —– ———- ————————————————————————————————————–
(XMLAGG889115592 /* Formatted on 2009/09/04 12:11 (Formatter Plus v4.8.7) */
(ea.venue_id AS “venue_id”,
AND activity_id = 301
),te_name = ‘venue_name’) AS “venue_name”
CH# PARENT_HANDLE OBJECT_HANDLE PARSES H_PARSES EXECUTIONS FETCHES ROWS_PROCESSED LIOS PIOS SORTS CPU_MS ELA_MS USERS_EXECUTING
—– —————- —————- ———- ———- ———- ———- ————– ———- ———- ———- ———- ———- —————
0 0000000206CEF2A0 0000000206AD0200 1 1 5 4 4 32488739 16856 68775 867697.956 1240898.07 1
SQL hash value: 586338503 Cursor address: 0000000206CEF2A0
| Statement first parsed at: 2009-12-09/11:19:26 | 3690 seconds ago
select –+ ordered use_nl(p ps)
ERROR at line 1:
ORA-03113: end-of-file on communication channel
ORA-03114: not connected to ORACLE
Have you seen this before?
All other sqls; snapper, sw etc work fine.
Hmm interesting – can you check for a tracefile from udump and send the errorstack dump which should have been written in there?
If no tracefile is generated, check into core_dump_dest and run pstack on the corefile if one is generated and send it to me.
Good morning Tanel,
Have emailed you @ firstname.lastname@example.org
I was playing with the “*ss.awk” systemstate decoder script from LTOM, but frankly I would prefer to get first more knowledge about this kind of dumps. I’ve read in some section here that you had previously taught in a seminar ways to work with them, and I wonder if this particular seminar is still available online?
Best regards, and great thanks for your site and effort!
Hello, I have been following your blog has some time and I believe that it contributes greatly to professional with performance issue in Oracle.
I’ve been developing for some time a tool named Mandela to diagnose performance problems. It need not be installed or create any object in the database analyzed. It only needs read access to the tables of STATSPACK or AWR.
The site is still under development, but if you want can download the Mandela and follow the launch of the latest versions.
What makes Mandela nice? Well, he can show into timeline all statistical system, redo log files switches, statistics I/O (tablespace, and datafile filesystem) and much more…
But what it does well done is to decompose the wait events (two clicks, three at most) and allow drilldown for the top sql by metrics (buffer gets, diskreads, cpu time, elapsed time, etc.) and now for the wait event. That’s right … from a chart of wait events you right click mouse and see the statements SQL that waited for “db file sequential read” or “CPU”, for example.
If you’re interested (and available) to see how Mandela can help you gain productivity, access http://www.trevis.com.br/projetos/mandela/download
If you want to chat or ask questions about the Mandela my contacts is as follows:
EMail / MSN / GTalk: email@example.com
Hey Tanel, can you share some of the books that you refer or that could be helpful to Oracle DBAs in general. Also some specific books on Unix etc. I would be really grateful for that.
I was referring to one of your posts: http://www.freelists.org/post/oracle-l/RE-Calculating-LIOs,11. Yours was the only post on the entire internet that had some information on index prefetching. And I feel I am facing some issue related to the same. We are on 10gR2 10.2.0.4.0.
Did set _db_file_noncontig_mblock_read_count=1 but fetching did not get disabled. In our case, there are 2 queries – one doing an index unique scan and other doing an index range scan and I see “db file scattered reads” for both. Initially I was surprised to see scattered reads for unique/range scan of index but later understood (not sure if my understanding is 100% correct because you mentioned that “db file parallel read” should be seen) that it was index prefetching. And I want to disable that because it seems that on RHEL ext3 file system it is not ocurring but it is ocurring on solaris (sparc & x86_64 both) ufs and that is probably slowing down the system during the initial few minutes. Once it stops (probably by then the cache is warmed up) then the performance is fine.
Can you please comment to my question on OTN as per the link below?
Mr Hans Forbrich on OTN has answered my question and advised that you may have ideas based on your experience.
If you can spare some time, please advise me on OTN as per the link:
Hi Tanel ,
I am facing problem of latch contention.Will u guide me how to solve it
Thanks in advance
Attended the Training Day at the ’10 Hotsos symposium. Enjoyed your presentation. Thought I’d try out your V3.0 PerfSheet Program. Might have a good use here at the NYS DOH Wadsworth Research Center.
When trying to run PerfSheet after creating a valid odbc system DSN using the 10.2.0.1 oracle driver (Using 10.2.3 db server)I get an odbc error. Microsoft Query connects fine and the odbc test connection connects.
Unfortunately, I get an error: “[Microsoft][ODBC Driver Manager] Data source name not found and no default driver specified”
Any suggestions, much appreciated.
You don’t even need to create a DSN manually (PerfSheet uses dyanamic DSNs).
Do you have Oracle client libraries installed there?
And what did you put into the database name (data source) field?
You can either put the tnsnames.ora TNS alias there or if you’re using 10g+ Oracle client drivers, then you can use server:port/DB format -> linux03:1521/ORCL for example…
If it still doesn’t work, please send me an email to firstname.lastname@example.org and let’s get it working.
hi , link not working (http://tech.e2sn.com/oracle-scripts-and-tools/session-snapper)
can you help ? . thanx
It works for me, so it’s probably something on your side. Try to refresh the page, go to tech.e2sn.com etc
Hello Tod or Anyone,
Does any one have idea how to make the existing user READ only so that GG (Golden Gate) or other user can update the data but its own user can only select the data?
Hi Tanel, I have been looking around for your script called usql.sql, but i cant seem to find it. I have downloaded ur zip file as well as google search didnt reveal anything. So i was wondering if you can point to to your usql.sql script that will give me the sql text and all the other good stuff. Thanks.
1. Can you please explain why physical & logical corruption would occur in oracle?
2. Is there anyway we can reproduce Oracle’s physical & logical corruption. When I say logical I am not talking about a bad transaction corrupting the data in a table rather rowid in an index pointing to a wrong block in the table
3. What methods do you suggest to fix these?
This is surya.and am new to oracle10gDBA The main problem to me is Performance Tuning .am not understanding how to become expert in that.please give me some techniques and some material also
Hi Mr. Poder,
I was attending one of your workshop in Hong Kong last year, very useful. From your website, I have found that you did provide a SQL script to translate a SQL_ID into hash value, this is good in 10g.
in 9i each SQL will have an unique HASH Value
in 10g each SQL will have an unique SQL_ID (your script can translate this into a HASH Value)
The problem is the translated HASH Value in 10g is not the as HASH Value in 9i, that is why in 10g Oracle still keeps a column call “old hash value”, I will like to know the SQL statement to generate such OLD HASH VALUE from the SQL_ID.
Why: We have SQL performance report for application running in 9i, during the period of migration from 9i 10 10g or higher, will like to make comparison between TOP SQL from these 2 environment, therefore if we have the right function to convert SQL_ID to a 9i style hash value, we can easily identify which Top SQL in 9i as well as in 10g.
2) Another question is: when using Real Application Test, The RAT report gives out the TOP SQL with SQL_ID that is not the same SQL_ID as the SQL_ID find in v$sqlarea. Looks like RAT has add some prefix to the SQL and result of that we cannot identify the same Top SQL from AWR report against the RAT report.
Thank you very much
1) Unfortunately it’s not possible to convert from SQL_ID to OLD_hash_value as both the 10g SQL_ID and 10g “new” hash_value are computed using MD5 hashing function, but the OLD_hash_value in 9i is computed using the classic Oracle kgghash() function. So you have to match the SQL statement texts (or add a column to your old 9i tables and populate it with new hash value / SQL_ID from the text)
2) If RAT changes the text, then it’s impossible to compare the hash values – I would identify the pattern that RAT always adds (if it is a consistent pattern) and use some regex to remove it and then calculate my own hash values or compare SQL texts directly…
In the past you’ve published something called heapdump analyzer. Is that still publicly available? I’ve just been curious about its capabilities.
please take first my apologies for contacting you as owner of the Taner Polder’s Blog. As Oracle database professional you will be familiar with the advantages/disadvantages of Oracle tracefile analysis. So there is no need to waste your time with long explanations.
The company I work for is developing a tool named QueryAdvisor specialized on Oracle tracefile analysis. QueryAdvisor provides more information and allows easier access to tracefile data then TKPROF (alone or in combination with TRCSESS) or even TRCANLZR does.
We are now looking for DBAs and developers assisting as betatesters. We have chosen this approach to be sure that we will cover a huge amount of different Oracle database configurations (different hardware / releases / patch levels).
You will receive a free Administrator License without restrictions or further obligations for your participation in our betatest and/or writing an article. To avoid any misunderstanding – this is not a new approach to sell you something. We are talking about a free license in exchange of some of your time and experience – nothing more nothing less. We hope that this is a win win situation for both of us.
You will find additional information and the possibility to register as betatester at:
More information how to obtain a free license for writing an article and/or review is available at: http://goo.gl/W2LpF
If you like to do us a special favor – talk to your colleagues about our betatest and share the link.
We will take care that you will not be contacted a second time. Sorry for any inconvenience caused and thanks for your time.
I met you several times in HOTSOS conferences (If you could recall someone bothering you with several questions during your lunch :) )
We have online redo log files about 5G with 12 groups multiplexed (2 members in each group).
We changed it to single member groups. Since then we are seeing log file sequential reads as top event.
What causes this event occurs ?
Thanks for your time reading my question.
Which sessions wait for these events the most? Archivers normally wait for this, but they shouldn’t make the top.
If you have (soft) corruptions in your datablocks (and perhaps in some other cases like just bugs which make oracle think a block is corrupt) then Oracle session who accesses that block will dump some past redo for that buffer – and this means the session has to read from redolog file.
So, the first thing is to check which sessions (foreground or background) are waiting for this event and what are they doing (snapper, ASH, sql trace would be your friends)
I have been reading exacta book, “Achieving Extreme Performance with Oracle Exadata”, and waiting for your book, “Expert Oracle Exadata” to come.
there is a statement that makes me confused. If you wouldn’t mind getting asked, I would like to ask you a question.
What I have read in the book is like the following (on Page 108)
“Please do not confuse the Exadata Smart Flash Cache with the Database Flash Cache. Although both use flash technology, the Database Flash Cache is located in the database server node and essentially provides an extension to the standard operations of the SGA. The Exadata Smart Flash Cache, as you will see in the following pages, works in a different manner and is a part of the Exadata Storage Server. In addition, the Database Flash Cache is not supported in the Exadata environment.”
I know the Database Flash Cache is Oracle 11g features, but this book says that the Database Flash Cache is not supported in the Exadata environment.
I would like to know that this means the Database Flash Cache can’t be used In Exadata?, or the Database Flash Cache can’t be used with Exadata Smart Flash Cache feature?
Thank you in advance.
Database Flash cache means you would attach additional PCI cards to the database hosts (so that they’d be close to the memory). So if you buy an Oracle Database Machine, you may not add extra cards into it… yep you can add peripherals over infiniband or ethernet, but no additional PCI cards into the servers.
Thank you for your reply.
I was just wondering that Database Flash Cache could be configured with PCI Cards(Flash Cache) on Exadata Storage Server.
If I doesn’t add addition PCI cards to server, It are no other ways that Database Flash Cache can be configured, right?
I am following your blog for a long time. Really fan of your knowledge in Oracle Internals. I have one doubt. How we can find out if the row in table is locked and which session is holding the Row Exclusive lock on that row ? and i am talking about really huge table and accessed by around 100 to 200 sessions per mins.
Thanks for great tips on your blog :-)
Not sure if it’s a general WordPress thing or something specific to your host/provider?
But my ZoneAlarm antivirus pops up and claims a virus called Exploit.JS.Retkid.a gets into my Chrome cache when I visit your blog today.
It may be a false alarm – perhaps ZoneAlarm mistakes your formatting java script as looking similar to that exploit?
Anyway, just thought I’d let you know if you didn’t already ;-)
Let me check… I sure haven’t installed any exploits into my blog, maybe it’s a false positive. But I’ll check.
Googled and found this, others seem to have an issue as well: http://themeforest.net/item/karma-clean-and-modern-wordpress-theme/discussion/168737
Funny, when I google it, I do not get that themeforest hit? You must have great google fu :-D
First of all thanks a ton for your wonderful work here.
I was wondering if it possible to add categorywise post in sidebar,that would be very helpful.I think this blog is a wordpress CMS so you can use Collapsing Categories List
plugin. Actually I wanted to read your old posts and with current setup I have to go page by page in reverse order.
Good idea! Let me check the plugin out!
In exadata X2-2, I have noticed query had multiple executions with same plan_hash_value. One child cursor execution with same plan hash value has io_cell_offloading_elligible_bytes >0 and execution is 0 but the elapsed time is very long(200 seconds) and for the other execution it is milliseconds with io_cell_offload_elligible_btes =0 . Have you come across this type of situation and the possible reason?
I can explain in words how much I have learned from you. I appreciate your effort that you share every thing that you discover. I have one Issue.
I was working on issue where is UNDO space was not releasing there were huge amount of UNDO unexpired…I understand the concept of UNDO_RETENTION, UNDO_TUNEDRETENTION, MAXQUERYLENTH , that oracle use to shrink the undo segment..my question is
1. when does the shrinking starts
2. Is there a way to find the when exactly the extent will expire ..I did try dumping the undo header but did not find it what I wanted to
3. is there any view or x$ to externalize the expiry time…
I am getting my first real-world experience with RAC on Cisco UCS and I am not having a good week: Get past the Cisco marketing message and what I find are lost cluster interconnect blocks … and gc resends. I suspect it is due to IO Mux; their way of handling network and storage traffic.
If you have any personal experience, or have contact with others that do, I am looking for someone with whom I can compare notes. I have a feeling I know exactly what I am seeing but I don’t want to draw any conclusions from a single customer and their experience. Add to that the inability to see cluster interconnect traffic inside IO Mux packets and I find I am also flying blind.
I am very new to ORACLE 11G, and would like to know the best function in below requirement.
My application is tie up –JAVA Webinterface with -ORACLE DB.
How to save the query execution plan (for all repeated queries, with out going to Optimizer phase) ?.
How to reduce the time for network flow (from java to oracle).
Gopi, these are pretty wide topics – you might want to read an Oracle performance optimization book to get a foundation (Pro Oracle SQL by Karen Morton is a good one for example).
But for a quick overview about plan stability, check out this presentation:
And regarding network wait reduction (again, this is a longer topic and I don’t know what problem exactly you have), perhaps these articles help:
3) Make sure that you are fetching multiple rows (100+) at a time if your java app retrieves many rows (arraysize/statement fetch size)
Need you expertise to understand the problem which i am facing.
Situation is as follows
1)One of the SQL SERVICE ANALYSIS SERVER(CUBE) Is accessing one the table in the Oracle Database (The query usually take 5-8 minutes for completion)
after some time
2) The other oracle session, truncated same table and repopulating the table.
the query executing in the point 1, keep running for long i.e. more than an hour with high consistent gets if i see the stats and some time it runs fine.
I was under impression that TRUNCATE doesn’t generated UNDO so what was the reason to have such high consistent gets.
the query runs perfectly fine even in the scenario mentioned above.
Kindly let me know if you need any other info.
Thanks in advance.
The first thing to check is whether the SQL execution plan used (plan_hash_value) is different when the SQL runs slowly.
When the table is truncated, are the stats regathered (by an automatic background job) by any chance? If the stats say there’s 0 rows in a table (after a truncate) and you load millions in there, the execution plan may be generated for the estimated very small number of rows (nested loops usually), while actually there are many rows to process thanks to the recent inserts.
Much appreciated your quick response.
I did check the points you mentioned.
My observations are
1) hash value of plan doesn’t change when query runs slow but
Consistent gets are very high than usual if we see Awr report.
2) after truncating, no gather stats are done. It just inserts million records using append hinr
I.e. direct load.
Just wanted to check how truncate and selec on the same table works when both are fired simultaneously from different session?
I am a regular follower of your blog,and much say that you are doing a great service to the oracle community. I have a question for you on num_blks estimates by the optimizer in case of sub-partitioned table, which affects the serial read decision and thus offloading on exadata. Here is the problem:
Oracle Database 11g Enterprise Edition Release 220.127.116.11.0 – 64bit Production
PL/SQL Release 18.104.22.168.0 – Production
CORE 22.214.171.124.0 Production
TNS for Linux: Version 126.96.36.199.0 – Production
NLSRTL Version 188.8.131.52.0 – Production
I have a range partitioned table on date which is further subpartitioned based on two columns. Please find the table definition below:
CREATE TABLE TEST_TABLE
PARTITION BY RANGE (SALE_DATE)
SUBPARTITION BY HASH (ITEM_ID_TYPE, ITEM_ID)
PARTITION P20130729 VALUES LESS THAN (TO_DATE(‘ 2013-07-30 00:00:00′, ‘SYYYY-MM-DD HH24:MI:SS’, ‘NLS_CALENDAR=GREGORIAN’)),
PARTITION P20130730 VALUES LESS THAN (TO_DATE(‘ 2013-07-31 00:00:00′, ‘SYYYY-MM-DD HH24:MI:SS’, ‘NLS_CALENDAR=GREGORIAN’)),
PARTITION P20130731 VALUES LESS THAN (TO_DATE(‘ 2013-08-01 00:00:00′, ‘SYYYY-MM-DD HH24:MI:SS’, ‘NLS_CALENDAR=GREGORIAN’))
Once the table is created I load around 35 million records in the table and then collect statistics like below :
OwnName => USER
,tabname => ‘TEST_TABLE’
,Estimate_Percent => NULL
,Method_Opt => ‘FOR ALL COLUMNS SIZE 1′
,Degree => 8
,Cascade => TRUE
,No_Invalidate => TRUE
,granularity => ‘GLOBAL AND PARTITION’
,force => TRUE);
Lets have a quick look at the stats generated (user_tables and user_tab_partitions) :
TABLE_NAME PARTITION_NAME AVG_ROW_LEN NUM_ROWS BLOCKS
TEST_TABLE P20130729 692 11883448 585047
TEST_TABLE P20130730 692 11895688 586324
TEST_TABLE P20130731 692 11942129 590160
TEST_TABLE 728 35721265 1761531
Now, lets run a query against this table :
select /*+ MONITOR HARDPARSE */ * from incremental_explain_test
where sale_date = to_date(20130701,’YYYYMMDD’)
and item_id_type = ‘stationery’
and item_id = ‘pen’;
Lets look at the “BASE STATISTICAL INFORMATION” of 10053 trace for this sql:
BASE STATISTICAL INFORMATION
Table: TEST_TABLE Alias: TEST_TABLE (making adjustments for partition skews)
ORIGINAL VALUES:: #Rows: 11940686 #Blks: 591438 AvgRowLen: 692.00 ChainCnt: 0.00
ANALYZED: 1 UNANALYZED: 0
#Rows: 11940686 #Blks: 591438 AvgRowLen: 692.00 ChainCnt: 0.00
#Rows: 11940686 #Blks: 197146 AvgRowLen: 692.00 ChainCnt: 0.00
My question is why is the optimizer estimating the #Blks as 197146 which seems to be #blocks in partition (from optimizer stats) divided by number of partitions. Strangely the #Rows is unaffected. Since the number of blocks is dropping so serial direct read is not kicking in and as such no smart scan. Now if I enable “_serial_direct_read” to true or “_direct_read_decision_statistics_driven” to false at session level, then I rightly get the smart scan. Please advice why is the optimizer doing this.
The subpartition stats usage has been a mess for quite a long time (especially before 10.2.0.4, but including 184.108.40.206 – http://perfexpert2.wordpress.com/2012/08/19/cardinality-estimates-on-list-subpartitioned-tables-with-binds-on-partition-literals-on-subpartition/ ).
So your options would be to raise an SR for the optimizer issue, hack the NUMBLKS values to bigger ones or use one of the parameters. I do use _serial_direct_read = always quite frequently on by user/session basis (using logon triggers) to get stable behavior.
Thanks for posting this script (trim_database.sql) … albeit some time ago !
Might I suggest a small adjustment to the SQL to give the DBA the name of the tablespace and its current size, ie:
with query as (
select /*+ NO_MERGE MATERIALIZE */
max(block_id + blocks) highblock
‘– ‘||lpad(t.tablespace_name||’: currently ‘, 29 + length(q.file_id))||
‘alter database datafile ‘|| q.file_id || ‘ resize ‘ || ceil ((q.highblock * t.block_size + t.block_size)/1024) || ‘K;’ cmd
q.tablespace_name = t.tablespace_name
and f.tablespace_name = t.tablespace_name;
“it works for me, your experience may vary”.
oh and, of course, keep up the good work.
Thanks tanel for writings
They are benefical for me
You are awesome blogger for oracle topics.
[...] Contact [...]
[…] Update: there’s an interesting comment in another page about how the incorrectly handled subpartitions statistics cause Oracle to not choose a direct path read for some subpartitions. […]
Notify me of follow-up comments by email.
Notify me of new posts by email.
RSS - Posts
RSS - Comments
Return to top of page
Copyright © 2013 · Minimum Theme on Genesis Framework · WordPress · Log in