DBAsupport.com Forums - Powered by vBulletin
Page 1 of 2 12 LastLast
Results 1 to 10 of 19

Thread: Stats better but SQL is slower

  1. #1
    Join Date
    Nov 2001
    Location
    UK
    Posts
    152

    Stats better but SQL is slower

    Running Oracle9i Enterprise Edition Release 9.2.0.6.0 - 64bit Production.

    I had some SQL for detecting duplicates, which I rewrote to avoid having to do a self-join. The old SQL is:

    Code:
    SELECT /*+ USE_HASH(t d) */ 
             t.BUSINESS_DATE, t.SCHEDULE_ID,
            RANK() OVER 
            (PARTITION BY NVL(t.DEAL_ID, '~NVL'),
            NVL(t.LEG_TYPE_CD, '~NVL') 
    ORDER BY t.SRC_SYS_FEED_ROW_ID DESC, 
                 t.SCHEDULE_ID  DESC) ERROR_BIT_
    FROM OPD_SCHEDULE t, 
            (SELECT /*+ USE_HASH(t d) */ BUSINESS_DATE, 
                       SRC_SYS_FEED_ID,
                       NVL(t.DEAL_ID, '~NVL') AS DEAL_ID,
                       NVL(t.LEG_TYPE_CD, '~NVL') AS LEG_TYPE_CD
              FROM SCHEDULE t
             WHERE BUSINESS_DATE = :p_business_date 
                and SRC_SYS_FEED_ID = :p_src_sys_feed_id
             GROUP BY BUSINESS_DATE, SRC_SYS_FEED_ID, 
                           DEAL_ID, LEG_TYPE_CD HAVING COUNT(*) > 1) d
    WHERE t.BUSINESS_DATE = :p_business_date 
        and t.SRC_SYS_FEED_ID = :p_src_sys_feed_id 
        and BITAND(t.ROW_ERROR_CD, 3) = 0 
        and NVL(t.DEAL_ID, '~NVL') = d.DEAL_ID 
        and NVL(t.LEG_TYPE_CD, '~NVL') = d.LEG_TYPE_CD
    I rewrote this to;

    Code:
    SELECT t.rowid,  RANK() OVER (PARTITION BY DEAL_ID
    ,LEG_TYPE_CD
     ORDER BY t.SRC_SYS_FEED_ROW_ID DESC, t.SCHEDULE_ID  DESC) AS ERROR_BIT_ID
    FROM
    (SELECT ROWID, SRC_SYS_FEED_ROW_ID, SCHEDULE_ID,
    NVL(t.DEAL_ID, '~NVL') AS DEAL_ID,
    NVL(t.LEG_TYPE_CD, '~NVL') AS LEG_TYPE_CD,
    COUNT(1) OVER (PARTITION BY NVL(t.DEAL_ID, '~NVL')
    ,NVL(t.LEG_TYPE_CD, '~NVL')
     ) DUPS FROM SCHEDULE t WHERE t.BUSINESS_DATE = :p_business_date
    and t.SRC_SYS_FEED_ID = :p_src_sys_feed_id
    and BITAND(t.ROW_ERROR_CD, 3) = 0  ) t
     WHERE DUPS > 1
    Using autotrace to look at the stats, things definitely seem to have improved.

    Old:
    0 recursive calls
    29 db block gets
    43550 consistent gets
    58023 physical reads
    0 redo size
    210 bytes sent via SQL*Net to client
    234 bytes received via SQL*Net from client
    1 SQL*Net roundtrips to/from client
    1 sorts (memory)
    1 sorts (disk)
    0 rows processed

    New:
    0 recursive calls
    27 db block gets
    21775 consistent gets
    32073 physical reads
    0 redo size
    170 bytes sent via SQL*Net to client
    234 bytes received via SQL*Net from client
    1 SQL*Net roundtrips to/from client
    1 sorts (memory)
    1 sorts (disk)
    0 rows processed


    And yet...the old is actually quicker than the new one ! Admittedly, it's only a few seconds but it's a fairly consistent result across numerous runs. This is very annoying because I do think the new SQL is better and want to use it, but it's hard to justify when it seems to be slowing things down. Can anyone suggest an explanation why it is slower when all the stats point the other way ?
    Last edited by tamilselvan; 12-06-2005 at 12:36 PM.

  2. #2
    Join Date
    May 2000
    Location
    ATLANTA, GA, USA
    Posts
    3,135
    Do you have tkprof output for both SQL statements?

    If yes, post them here.

    Tamil

  3. #3
    Join Date
    Nov 2001
    Location
    UK
    Posts
    152
    No, unfortunately I don't have access to the machine where the Oracle server runs, so I can't get hold of the trace file. I'm trying to get the DBA to send me it, but not having any response so far.

  4. #4
    Join Date
    Nov 2001
    Location
    UK
    Posts
    152
    Investigating further, it looks like the main reason is just that the select ..group by.. having.. is just a lot faster than the partitioned version. The basic queries to get the duplicates are:

    Code:
    SELECT BUSINESS_DATE, SRC_SYS_FEED_ID,
    NVL(t.DEAL_ID, '~NVL') AS DEAL_ID,NVL(t.LEG_TYPE_CD, '~NVL') AS LEG_TYPE_CD
    FROM SCHEDULE t
    WHERE BUSINESS_DATE = :p_business_date and SRC_SYS_FEED_ID = :p_src_sys_feed_id
    GROUP BY BUSINESS_DATE, SRC_SYS_FEED_ID, DEAL_ID, LEG_TYPE_CD HAVING COUNT(*) > 1
    This runs in about 30 seconds and the plan is:

    SELECT STATEMENT Hint=CHOOSE
    FILTER
    SORT GROUP BY
    PARTITION RANGE SINGLE
    TABLE ACCESS FULL SCHEDULE

    The partiioning version is:

    Code:
    select * from
    (SELECT ROWID, SRC_SYS_FEED_ROW_ID, SCHEDULE_ID,
    NVL(t.DEAL_ID, '~NVL') AS DEAL_ID,
    NVL(t.LEG_TYPE_CD, '~NVL') AS LEG_TYPE_CD,
    COUNT(1) OVER (PARTITION BY NVL(t.DEAL_ID, '~NVL')
    ,NVL(t.LEG_TYPE_CD, '~NVL')
     ) DUPS FROM SCHEDULE t WHERE t.BUSINESS_DATE = :p_business_date
    and t.SRC_SYS_FEED_ID = :p_src_sys_feed_id)
    where dups > 1
    This takes twice as long and has plan:

    SELECT STATEMENT Hint=CHOOSE
    VIEW
    WINDOW SORT
    PARTITION RANGE SINGLE
    TABLE ACCESS FULL SCHEDULE

  5. #5
    Join Date
    May 2000
    Location
    ATLANTA, GA, USA
    Posts
    3,135
    Can you post the exact tkprof output after you get it from the DBA?

    Tamil
    Last edited by tamilselvan; 12-06-2005 at 12:44 PM.

  6. #6
    Join Date
    Nov 2001
    Location
    UK
    Posts
    152
    I don't have the tkprof output. All I can get at the moment is the explalin plan from TOAD, which is:

    Old:

    Operation Object Name Rows Bytes Cost Object Node In/Out PStart PStop

    SELECT STATEMENT Hint=CHOOSE 9 K 4885
    FILTER
    SORT GROUP BY 9 K 245 K 4885
    PARTITION RANGE SINGLE KEY KEY
    TABLE ACCESS FULL OPD_SCHEDULE 547 K 13 M 3319 KEY KEY

    New:

    Operation Object Name Rows Bytes Cost Object Node In/Out PStart PStop

    SELECT STATEMENT Hint=CHOOSE 547 K 7629
    VIEW 547 K 34 M 7629
    WINDOW SORT 547 K 22 M 7629
    PARTITION RANGE SINGLE KEY KEY
    TABLE ACCESS FULL OPD_SCHEDULE 547 K 22 M 3319 KEY KEY

    If this has any meaning at all, the key difference seems to be that the group by..having is able to do a sort on only 9K rows, whereas the other one is sorting the whole table. 547K is correct for the number of rows in the table (I've done an analyze compute statistics for all columns), but I'm at a complete loss to explain where 9K comes from. There aren't 9000 distinct combinations of business_date, src_sys_feed_id, deal_id, leg_type_cd (there are only 12 duplicates in all). All the rows in the table have the same business_date and src_sys_feed_id, so it's not produced by the where conditions.

  7. #7
    Join Date
    May 2000
    Location
    ATLANTA, GA, USA
    Posts
    3,135
    Explain plan may not tell you the correct picture unless you cross verify with each line by running individual statement.
    Tkprof is the right place to investigate. Even in tkprof output, it will hide the info if a SQL is used at the column level.

    Tamil

  8. #8
    Join Date
    Jun 2005
    Location
    London, UK
    Posts
    159
    It's easier to read your explain plan output if you format it with [code] tags.

    The explain plan output from SQL*Plus can sometimes give you more clues than the TOAD version.

    The old version seems to join two tables but the new version does not, so I'm not sure I follow the business rule for duplicates. Also you refer to the new version as "partitioned", but the table appears to be partitioned and both versions seem to use partition elimination. Are you referring to the analytic functions?

    This is a bit of a guess but using a standard dedupe approach I get this:

    Code:
    SELECT * FROM schedule
    WHERE  rowid IN
    ( SELECT LEAD(rowid) OVER
             ( PARTITION BY deal_id, leg_type_cd 
               ORDER BY schedule_id DESC )
      FROM   schedule
      WHERE  business_date = :p_business_date
      AND    src_sys_feed_row_id = :p_src_sys_feed_id );
    Last edited by WilliamR; 12-06-2005 at 09:04 PM.

  9. #9
    Join Date
    Nov 2001
    Location
    UK
    Posts
    152
    The SQL has to detect all duplicates, one of which will be marked as the "real" one and the rest as the "error" duplicate. I think your SQL probably only detects the error duplicates.

  10. #10
    Join Date
    Nov 2001
    Location
    UK
    Posts
    152
    I rewrote it using lead and lag but the response times and autotrace stats were almost identical:

    Code:
    select rowid, (case when leader is not null and lagger is null then 1 else 2 end) as error_bit_id
    from
    (select rowid, lead(rowid) over (PARTITION BY NVL(DEAL_ID, '~NVL'),NVL(LEG_TYPE_CD, '~NVL')
    ORDER BY SRC_SYS_FEED_ROW_ID DESC, SCHEDULE_ID     DESC) as leader,
    lag(rowid) over
    (PARTITION BY NVL(DEAL_ID, '~NVL'),NVL(LEG_TYPE_CD, '~NVL')
    ORDER BY SRC_SYS_FEED_ROW_ID DESC, SCHEDULE_ID     DESC) as lagger from schedule
    WHERE BUSINESS_DATE = :p_business_date
    and SRC_SYS_FEED_ID = :p_src_sys_feed_id
    and BITAND(ROW_ERROR_CD, 3) = 0)
    where leader is not null or lagger is not null
    Plan is similar:

    Code:
    Operation	Object Name	Rows	Bytes	Cost	Object Node	In/Out	PStart	PStop
    
    SELECT STATEMENT Hint=CHOOSE		5 K	 	3373  	 	      	             	 
      VIEW		5 K	112 K	3373  	 	      	             	 
        WINDOW SORT		5 K	304 K	3373  	 	      	             	 
          PARTITION RANGE SINGLE		  	 	 	 	      	            KEY	KEY
            TABLE ACCESS FULL	SCHEDULE	5 K	304 K	3319  	 	      	            KEY	KEY

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  


Click Here to Expand Forum to Full Width