Query performance
DBAsupport.com Forums - Powered by vBulletin
Results 1 to 6 of 6

Thread: Query performance

  1. #1
    Join Date
    Jan 2001
    Posts
    318
    I wrote a query with join and same one with the subquery.
    When I did explain plan it surprised me that the subquery was faster than the join. I was expecting it other way.
    Does any one know why the subquery ran faster than the join query.
    I had index on AuthAttr_Res_ID ,AuthAttr_Work_ID

    e.g

    SQL> Select a.Res_ID
    2 from mwebRes a, MwebAuthAttrib b, mwebwork c
    3 where a.Res_Status <> 20
    AND a.Res_Entity_Type = 3
    4 AND a.Res_ID =b.AuthAttr_Res_ID
    5 AND b.AuthAttr_Work_ID =c.Work_ID
    6 AND c.Work_Entity_Type = 4 AND c.Work_Status <> 40 group by a.res_id ;

    ----------------------
    SQL> Select Res_ID
    2 from mwebRes
    3 where Res_Status <> 20 AND Res_Entity_Type = 3
    4 AND Res_ID IN (Select AuthAttr_Res_ID
    5 From MwebAuthAttrib
    6 where AuthAttr_Work_ID IN
    7 (Select Work_ID
    8 From MwebWork
    9 where Work_Entity_Type = 4 AND Work_Status <> 40));
    Sonali

  2. #2
    Join Date
    Nov 2000
    Location
    Baltimore, MD USA
    Posts
    1,339
    It would help if you could provide the execution plans for both statements.

    The main benefit for a subquery is to 'flatten' M:1 relations. For example, if I want some information about employees that worked on a given project, and this information is in a bunch of tables, it will probably be better to join all those tables in the main query and just do a WHERE EMP_ID IN ( SELECT DISTINCT EMP_ID FROM TIMECARD WHERE PROJECT_CD = :PROJECT_CD). This is because the timecard table is going to have *tons* of entries for each employee that used that project code. If I join that table through the whole query, then my entire resultset size will be hundreds of times larger than it has to be up until the point of doing some sum or distinct or group by at the outer-most level. If I move the DISTINCT (or sum or whatever) into the inner-most level, then I may save a lot of time building the rest of the resultset.

    With an IN, even without the DISTINCT, you will end up with a smaller resultset when you're done. Since you are doing a group by at the end of the first query, I will assume that this is the case. You should probably add the DISTINCT keyword in whichever of the INs is actually returning duplicate IDs (or both). This will make the functionality more obvious and also improve performance, since it will not need to search on the same ID twice for inclusion.

    HTH,

    - Chris

  3. #3
    Join Date
    May 2000
    Location
    ATLANTA, GA, USA
    Posts
    3,136
    How did you find out the 2nd query run faster than the first one?

    The best way is: Run Explain plan and check the total cost.
    The query that has least cost is better.

  4. #4
    Join Date
    Jan 2001
    Posts
    318
    This has confused me, what I observed is the query with IN subquery (without group by, distinct ) was faster than the join with group by or distinct when I was expecting reverse.
    As you said I should have distinct inside IN for subquery for making it more fast ( thanks for this tip).
    But that was not the point the subquery without the distinct is also faster than the join. I have always read that one should use join and not the subquery whenever possible so I was thinking that join would be faster !!!


    BTW Tamilselvan> I am writing this on the basis of EXPLAIN PLAN.
    Sonali

  5. #5
    Join Date
    May 2000
    Location
    ATLANTA, GA, USA
    Posts
    3,136
    Could you post the # of rows in each table and also the output of Explain Plans?

    [Edited by tamilselvan on 01-24-2001 at 01:02 PM]

  6. #6
    Join Date
    Nov 2000
    Location
    Baltimore, MD USA
    Posts
    1,339
    Okay, a couple of points here

    - Re-read my reply. I explained why the sub-select was faster. It has to do with the flattening of the M:1 relationship. When this is a factor that can be isolated to a sub-select, then the sub-select will be faster. When this is not a factor, then a join will give the optimizer more options.

    - I mentioned the DISTINCT as a matter of clarification and optimization of the IN query, *not* in the context of comparison with the first query, since I had already explained that one.

    - Please provide the explain plans when asking to compare statements - makes life much easier :)

    I will try, however, to more clearly explain the benefits of a sub-select versus a join. Relize that in *most* situations, the join is better. However, when you can flatten M:1 relationships in the sub-select, then the sub-select is better.

    So I have:
    EMP_T with 10,000 records
    TIMECARD_T with 1,000,000 records
    EMPDEMO_T with 10,000 records. This is a 1:1 with EMP_T holding demographic info (Sorry, but can't think of a better example at the moment)

    Now, if I want all the employees (along with their, uh, Hair Color) that have ever worked on a given project, I have 2 (for purpose of the example) ways to do this:

    SELECT
    -E.EMP_ID,
    -MAX(ED.HAIRCOLOR_CD)
    FROM
    -TIMECARD_T T,
    -EMP_T E,
    -EMPDEMO_T ED
    WHERE
    -T.PROJECT_CD = :PROJECT_CD AND
    -E.EMP_ID = T.EMP_ID AND
    -ED.EMP_ID = T.EMP_ID AND
    -ED.EMP_ID = E.EMP_ID
    GROUP BY
    -E.EMP_ID

    Now, the optimizer is going to start with T because that has the initial restriction. Let's look at the records that satisfy the query in the various tables:
    T: 10,000 records for emps that woked on that project
    E: 1000 emps worked on that project
    ED: 1000 emps with hair color info :)

    Let's look at what the optimizer does with this query:
    T: Pulls 10,000 records
    Joins in E (or ED)'s 1000 records, by going into table 10,000 times and ends up with a resultset size of 10,000
    Joins in ED (or E)'s 1000 records by going into table 10,000 times and ends up with resultset size of 10,000
    Does the GROUP BY to cut 10,000 records back down to 1,000

    Now, what is the other choice:

    SELECT
    -E.EMP_ID,
    -ED.HAIRCOLOR_CD
    FROM
    -EMP_T E,
    -EMPDEMO_T ED
    WHERE
    -E.EMP_ID IN (
    ---SELECT DISTINCT
    -----T.EMP_ID
    ---FROM
    -----TIMECARD_T T
    ---WHERE
    -----T.PROJECT_CD = :PROJECT_CD ) AND
    -ED.EMP_ID = E.EMP_ID

    Now, what does the optimizer do:

    T: Pulls 10,000 records
    Does a DISTINCT to cuts that down to 1,000 EMP_IDs
    Goes into E 1,000 times and ends up with a resultset size of 1,000
    Joins in ED by going into the table 1,000 times and ending with a resultset size of 1,000

    Hopefully, you can see why the sub-select query is faster in this case. The huge table, the one with the M:1 relationship - TIMECARD, had the M:1 relationship isolated and 'flattened' in the sub-select, saving us a lot of extra work in the rest of the query.

    Again, remember that this is not a common occurence. You generally want to avoid sub-selects because they 'hand-cuff' the optimizer by reducing the choices it has for solving the query. However, keep an eye out for this type of query because this is when the sub-select should be used and really shines.

    Hope this all made sense,

    - Chris


Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width