-
The foll. sql has a performance problem. How do I tune this?
SELECT a.dt_tm, b.data_blob from audit_event a, audit_data_blob b WHERE a.evnt_id = b.evnt_id and serv_typ_id =5399 and dt_tm > sysdate - 1 ORDER BY a.dt_tm
The explain plan is as follows:
Full table scan on audit_data_blob and
index(range scan) on audit_event_ndx_4(dt_tm)
Even though there is an index on evnt_id column in audit_data_blob , it does not seem to use it.
Why? Please let em know.
thanks.
-
You neglected to add the table alias to all the columns.
Give us this info and maybe we can help
- Chris
-
Hi,
Sorry for the delay. If there is no alias, does it make a difference? Please let me know.
Anyway, the sql is as follows:
SELECT a.dt_tm, b.data_blob from audit_event a, audit_data_blob b WHERE a.evnt_id = b.evnt_id and a.serv_typ_id =5399 and a.dt_tm > sysdate - 1 ORDER BY a.dt_tm
Thanks.
-
If you eventually had added a dt_tm column to table AUDIT_DATA_BLOB, then this statement would suddenly have broken.
As a basic rule, always add a table name or alias prefix to every column in every multi-table statement.
Now, as to why I asked for them - without knowing where the columns came from, how would I possibly have guessed what the statement was truly doing? My mind-reading powers seem to have been on the blink recently ;)
Okay, so now that I have at least some clue of what the statement is doing....
It does indeed seem odd that the statement will not use an index on that column. Reasons for this might include:
- The table is rather small and a full scan will be more efficient
- There are actually few distinct values for that column in the table and every value has many records, meaning that a significant percentage of the table will be read (anything over 5%?)
- The record size for B is very short (remember that a LOB is just a pointer). Therefore, *lots* of records fit in a given block. This will make the optimizer more likely to choose a table scan for the table because it will be very efficient.
So those are the thoughts off the top of my head.
Any of those sound like they might be the cause?
- Chris
-
The total no of records in audit_data_blob are: 2172614
select count(distinct evnt_id) from audit_data_blob results in
2171268 records.
So, the table is huge.
What seems to be the problem?
-
How about the last one? Is the record size really small?
Another thought - do you have current statistics on that table?
Can you post the exact output of the explain plan for the statement? Can you then add a hint to force the index and give us that plan as well?
- Chris
-
sorry for the delay.
The exact explain plan is as follows:
select statement cost = 92552
2.1 sort (order by)
3.1 hash join
4.1 table access(by index rowid) - audit_event
5.1 index(range scan) - audit_event_ndx_4(non-unique)
4.2 table access(full) - audit_data_blob
total no of records in audit_data_blob = 2210359
total no of records in audit_event = 2209077
The stats on the tables are taken every week. This is a production sql.
Thanks.
-
Again, have you tried hinting the index? Is it faster using the index?
Is the record size really small? - that is still my best bet
How many records does this query return?
- Chris
-
The quety returns 105 records. IfI use the index hint on evnt_id, it's not faster. But, it should use the index, isn't it?
Thanks.
-
I am so sorry. I used the wrong index. The query returns 105 records. If I use the index, it is faster, but buffer_gets in v$sqlarea is not below 2000.
Thanks.
-
Lee, can you *please* tell me the record size, or at least the fields and types?
Also, how are you measuring the time? Is the index faster simply because you ran it second and the data was cached? You need to use a tool that can measure to less than a second, make sure nobody else is running anything, and run each test multiple times. That is the only way to accurately measure time in queries. Once you do that, what are the times?
As far as whether it *should* use the index or not - the optimizer will use the plan that it *believe's* is the most efficient. That does not mean *always* doing an indexed Nested Loop by a long shot. When > 5% of the table data is retrieved, as a rule of thumb, the table scan is faster. There are, as I alluded in an earlier response, many other reasons why a table scan might be faster as well. There are many more than what I listed as well.
Finally, quick lesson on LRs. While I *fully* advocate the use of LRs for measuring the improvement in performance, they are not equivalent between table scan plans and indexed plans - the table scan plans may have fewer LRs, but they should generally be weighted heavier because the LR/PR ratio is much higher for table scans and they therefore also usually take up more cache. Regardless, however, what does 2000 have to do with anything?
- Chris
-
our two cents.....
Try this out....
Original sql scrip.
SELECT a.dt_tm, b.data_blob from audit_event a, audit_data_blob b WHERE a.evnt_id = b.evnt_id and a.serv_typ_id =5399 and a.dt_tm > sysdate - 1 ORDER BY a.dt_tm
Suggested
SELECT a.dt_tm, b.data_blob
FROM audit_event a, t_data_blob b
WHERE a.serv_typ_id = 5399
AND a.dt_tm > sysdate - 1
AND b.evnt_id = a.evnt_id
ORDER BY a.dt_tm
-
2000 is the limit which should be the buffer_gets in v$sqlarea a decided by our Lead DBA.
In audit_evnt table-
evnt_id - number(10) not null
serv_typ_id - number(10)
dt_tm - date
Do u need the other columns too? Let em know.
In audit_data_blob table-
evnt_id - number(10) not null
Thanks.