-
tune the sql
Hi Friends,
Can you please help me in tuning the below sql..
I tired lot of option but no sucess.
I am runing oracle 8.1.7.4, sun solars 5.6
SELECT COUNT(*) FROM LISTINGS_NEW WHERE STATUS = 'A' and IMAGE_ID > 0 and END_TIME > sysdate AND
user_ID IN ( SELECT user_ID FROM USERS WHERE STATUS = 'A' )
here user_id is indexed
regards
anandkl
anandkl
-
First, re-write the statement
Code:
SELECT
COUNT(*)
FROM
USERS U ,
LISTINGS_NEW LN
WHERE
U.STATUS = 'A' AND
LN.USER_ID = U.USER_ID AND
LN.STATUS = U.STATUS AND
LN.STATUS = 'A' AND
LN.IMAGE_ID > 0 ND
LN.END_TIME > SYSDATE
Then, notice that LN is restricted by 4 fields. An index on one is not going to be as fast as an index on all 4. Since 2 are single restrictions, the fastest index would likeley be USER_ID, STATUS, IMAGE_ID, END_TIME, but that's a guess.
Anyway, always avoid sub-selects when you can, join everything that can be joined and restrict everything that can be restricted. These simple rules will solve most of your performance issues. Follow these by indexing properly and you've got 80% covered.
- Chris
-
Hi
Try posying the exaplian plan at least
Give this a try
SELECT COUNT(*)
FROM LISTINGS_NEW ln ,users u
WHERE STATUS = 'A' and
IMAGE_ID > 0 and
END_TIME > sysdate AND
ln.user_id=u.user_id
regards
Hrishy
-
Hi Chris
:-)..we posted the solution with almost the same table aliases ;-)
regards
Hrishy
-
Random thoughts:
If user_id is the PK of the table users, then I think you could write it as a simple inner join:
... FROM listings_new l, users u
WHERE l.user_id=u.user_id
AND . . . . . . .
alternatively recode the IN as an EXISTS (often faster).
Is "END_TIME > sysdate" very selective, if so an index on this field might help a lot.
(I'm assuming STATUS='A' is not very selective - i.e. most users are active?)
"The power of instruction is seldom of much efficacy except in those happy dispositions where it is almost superfluous" - Gibbon, quoted by R.P.Feynman
-
Oh, I should also probably mention things like dats, range comparisons and histograms. Generally, ranged queries operate best on data that is evenly distributed. That is how the optimizer guesses the data is distributed. Now, to do this on a range query, it must have a real value. Right now, it is looking at that SYSDATE call and seeing it as a bind variable. That means it sees WHERE END_DT > :B1. Not knowing the value, it can't even guess at what percentage of records might come back so it guesses 50% and obviously doesn't then use an index.
Now, if you provided the actual value, the optimizer would have a better shot. Of course, you would now have more than one copy in the statement cache, but that's the trade-off. With an actual value, it would say: Hmmm, where does that value fall between the minimum and maximum values in that field? Now, here's the usual problems with dates:
1 - The data is usually not clean. All that has to happen is some legacy record has a bad data from last century in it that can't be fixed and your distribution is screwed.
2 - The data is always ahead of the stats. You are always getting new dates on the leading edge ahead of what the optimizer last recorded. With other fields, any new data is usually peppered throughout existing ranges - new statuses are not created every day, for example. But new dates *are*.
3 - Dates are rarely evenly distributed. Some years are better than others. Same with Months. There are usually only a few old legacy records and probably a few *bad* records. This means the optimizer will always be wrong with it's calculated distribution.
So, the answer with dates that are range-selected is usually to hard-code the ranges, add histograms to the fields, and re-compute the stats often.
Again, this has inherent overhead in the stats-processing, copies of records in the SGA, etc. But if the field is used a lot this can make a lot of difference.
Anyway, just things to think about,
- Chris
-
Hi Friends,
I justed wanted to tell u one thing that values in status column in USERS table in not the same in status column in listings_new table.
The explain plan is as below
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=6491 Card=1 Bytes=37
)
1 0 SORT (AGGREGATE)
2 1 NESTED LOOPS (Cost=6491 Card=24126 Bytes=892662)
3 2 REMOTE* (Cost=25 Card=53 Bytes=795) LINK_D
B
4 2 TABLE ACCESS (BY INDEX ROWID) OF LIST_NEW
' (Cost=122 Card=45520 Bytes=1001440)
5 4 INDEX (RANGE SCAN) OF 'LIST_NEW' (NON
-UNIQUE) (Cost=10 Card=45520)
3 SERIAL_FROM_REMOTE SELECT "USER_ID","STATUS" FROM "U
SERS" "USERS" WHERE "STATUS"
anandkl
-
Originally posted by DaPi
alternatively recode the IN as an EXISTS (often faster).
Actually, EXISTS is faster when you are looking in a table that has a M:n relationship with your base table.
SELECT * FROM B1 WHERE EXISTS (...L2) works best with B1 n:M L2
In this case, we are starting with the LN table and looking to the U table. Their relationship is probably LN M:1 U, so an EXISTS would, in all likelihood, not be any better.
Just an FYI,
- Chris
-
Originally posted by chrisrlong
Code:
U.STATUS = 'A' AND
LN.STATUS = U.STATUS AND
LN.STATUS = 'A' AND
Hey Chris, are you preparing for non-Euclidean SQL? (Euclid, Book 1, Chapter 1, Page 1: "Things equal to the same thing are equal to each other").
"The power of instruction is seldom of much efficacy except in those happy dispositions where it is almost superfluous" - Gibbon, quoted by R.P.Feynman
-
Ahhh, so both tables are remote? If they are on the same remote link, then add a DRIVING_SITE(LN) hint
- Chris
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|