Stats better but SQL is slower

**Scorby** · 12-06-2005, 10:21 AM

Running Oracle9i Enterprise Edition Release 9.2.0.6.0 - 64bit Production.

I had some SQL for detecting duplicates, which I rewrote to avoid having to do a self-join. The old SQL is:

Code:

SELECT /*+ USE_HASH(t d) */ 
         t.BUSINESS_DATE, t.SCHEDULE_ID,
        RANK() OVER 
        (PARTITION BY NVL(t.DEAL_ID, '~NVL'),
        NVL(t.LEG_TYPE_CD, '~NVL') 
ORDER BY t.SRC_SYS_FEED_ROW_ID DESC, 
             t.SCHEDULE_ID  DESC) ERROR_BIT_
FROM OPD_SCHEDULE t, 
        (SELECT /*+ USE_HASH(t d) */ BUSINESS_DATE, 
                   SRC_SYS_FEED_ID,
                   NVL(t.DEAL_ID, '~NVL') AS DEAL_ID,
                   NVL(t.LEG_TYPE_CD, '~NVL') AS LEG_TYPE_CD
          FROM SCHEDULE t
         WHERE BUSINESS_DATE = :p_business_date 
            and SRC_SYS_FEED_ID = :p_src_sys_feed_id
         GROUP BY BUSINESS_DATE, SRC_SYS_FEED_ID, 
                       DEAL_ID, LEG_TYPE_CD HAVING COUNT(*) > 1) d
WHERE t.BUSINESS_DATE = :p_business_date 
    and t.SRC_SYS_FEED_ID = :p_src_sys_feed_id 
    and BITAND(t.ROW_ERROR_CD, 3) = 0 
    and NVL(t.DEAL_ID, '~NVL') = d.DEAL_ID 
    and NVL(t.LEG_TYPE_CD, '~NVL') = d.LEG_TYPE_CD

I rewrote this to;

Code:

SELECT t.rowid,  RANK() OVER (PARTITION BY DEAL_ID
,LEG_TYPE_CD
 ORDER BY t.SRC_SYS_FEED_ROW_ID DESC, t.SCHEDULE_ID  DESC) AS ERROR_BIT_ID
FROM
(SELECT ROWID, SRC_SYS_FEED_ROW_ID, SCHEDULE_ID,
NVL(t.DEAL_ID, '~NVL') AS DEAL_ID,
NVL(t.LEG_TYPE_CD, '~NVL') AS LEG_TYPE_CD,
COUNT(1) OVER (PARTITION BY NVL(t.DEAL_ID, '~NVL')
,NVL(t.LEG_TYPE_CD, '~NVL')
 ) DUPS FROM SCHEDULE t WHERE t.BUSINESS_DATE = :p_business_date
and t.SRC_SYS_FEED_ID = :p_src_sys_feed_id
and BITAND(t.ROW_ERROR_CD, 3) = 0  ) t
 WHERE DUPS > 1

Using autotrace to look at the stats, things definitely seem to have improved.

Old:
0 recursive calls
29 db block gets
43550 consistent gets
58023 physical reads
0 redo size
210 bytes sent via SQL*Net to client
234 bytes received via SQL*Net from client
1 SQL*Net roundtrips to/from client
1 sorts (memory)
1 sorts (disk)
0 rows processed
New:
0 recursive calls
27 db block gets
21775 consistent gets
32073 physical reads
0 redo size
170 bytes sent via SQL*Net to client
234 bytes received via SQL*Net from client
1 SQL*Net roundtrips to/from client
1 sorts (memory)
1 sorts (disk)
0 rows processed

And yet...the old is actually quicker than the new one ! Admittedly, it's only a few seconds but it's a fairly consistent result across numerous runs. This is very annoying because I do think the new SQL is better and want to use it, but it's hard to justify when it seems to be slowing things down. Can anyone suggest an explanation why it is slower when all the stats point the other way ?

**tamilselvan** · 12-06-2005, 10:31 AM

Do you have tkprof output for both SQL statements?

If yes, post them here.

Tamil

**Scorby** · 12-06-2005, 10:48 AM

No, unfortunately I don't have access to the machine where the Oracle server runs, so I can't get hold of the trace file. I'm trying to get the DBA to send me it, but not having any response so far.

**Scorby** · 12-06-2005, 11:51 AM

Investigating further, it looks like the main reason is just that the select ..group by.. having.. is just a lot faster than the partitioned version. The basic queries to get the duplicates are:

Code:

SELECT BUSINESS_DATE, SRC_SYS_FEED_ID,
NVL(t.DEAL_ID, '~NVL') AS DEAL_ID,NVL(t.LEG_TYPE_CD, '~NVL') AS LEG_TYPE_CD
FROM SCHEDULE t
WHERE BUSINESS_DATE = :p_business_date and SRC_SYS_FEED_ID = :p_src_sys_feed_id
GROUP BY BUSINESS_DATE, SRC_SYS_FEED_ID, DEAL_ID, LEG_TYPE_CD HAVING COUNT(*) > 1

This runs in about 30 seconds and the plan is:

SELECT STATEMENT Hint=CHOOSE
FILTER
SORT GROUP BY
PARTITION RANGE SINGLE
TABLE ACCESS FULL SCHEDULE

The partiioning version is:

Code:

select * from
(SELECT ROWID, SRC_SYS_FEED_ROW_ID, SCHEDULE_ID,
NVL(t.DEAL_ID, '~NVL') AS DEAL_ID,
NVL(t.LEG_TYPE_CD, '~NVL') AS LEG_TYPE_CD,
COUNT(1) OVER (PARTITION BY NVL(t.DEAL_ID, '~NVL')
,NVL(t.LEG_TYPE_CD, '~NVL')
 ) DUPS FROM SCHEDULE t WHERE t.BUSINESS_DATE = :p_business_date
and t.SRC_SYS_FEED_ID = :p_src_sys_feed_id)
where dups > 1

This takes twice as long and has plan:

SELECT STATEMENT Hint=CHOOSE
VIEW
WINDOW SORT
PARTITION RANGE SINGLE
TABLE ACCESS FULL SCHEDULE

**tamilselvan** · 12-06-2005, 12:33 PM

Can you post the exact tkprof output after you get it from the DBA?

Tamil

**Scorby** · 12-06-2005, 01:03 PM

I don't have the tkprof output. All I can get at the moment is the explalin plan from TOAD, which is:

Old:

Operation Object Name Rows Bytes Cost Object Node In/Out PStart PStop

SELECT STATEMENT Hint=CHOOSE 9 K 4885
FILTER
SORT GROUP BY 9 K 245 K 4885
PARTITION RANGE SINGLE KEY KEY
TABLE ACCESS FULL OPD_SCHEDULE 547 K 13 M 3319 KEY KEY

New:

Operation Object Name Rows Bytes Cost Object Node In/Out PStart PStop

SELECT STATEMENT Hint=CHOOSE 547 K 7629
VIEW 547 K 34 M 7629
WINDOW SORT 547 K 22 M 7629
PARTITION RANGE SINGLE KEY KEY
TABLE ACCESS FULL OPD_SCHEDULE 547 K 22 M 3319 KEY KEY

If this has any meaning at all, the key difference seems to be that the group by..having is able to do a sort on only 9K rows, whereas the other one is sorting the whole table. 547K is correct for the number of rows in the table (I've done an analyze compute statistics for all columns), but I'm at a complete loss to explain where 9K comes from. There aren't 9000 distinct combinations of business_date, src_sys_feed_id, deal_id, leg_type_cd (there are only 12 duplicates in all). All the rows in the table have the same business_date and src_sys_feed_id, so it's not produced by the where conditions.

**tamilselvan** · 12-06-2005, 05:49 PM

Explain plan may not tell you the correct picture unless you cross verify with each line by running individual statement.
Tkprof is the right place to investigate. Even in tkprof output, it will hide the info if a SQL is used at the column level.

Tamil

**WilliamR** · 12-06-2005, 08:58 PM

It's easier to read your explain plan output if you format it with [code] tags.

The explain plan output from SQL*Plus can sometimes give you more clues than the TOAD version.

The old version seems to join two tables but the new version does not, so I'm not sure I follow the business rule for duplicates. Also you refer to the new version as "partitioned", but the table appears to be partitioned and both versions seem to use partition elimination. Are you referring to the analytic functions?

This is a bit of a guess but using a standard dedupe approach I get this:

Code:

SELECT * FROM schedule
WHERE  rowid IN
( SELECT LEAD(rowid) OVER
         ( PARTITION BY deal_id, leg_type_cd 
           ORDER BY schedule_id DESC )
  FROM   schedule
  WHERE  business_date = :p_business_date
  AND    src_sys_feed_row_id = :p_src_sys_feed_id );

**Scorby** · 12-07-2005, 05:23 AM

The SQL has to detect all duplicates, one of which will be marked as the "real" one and the rest as the "error" duplicate. I think your SQL probably only detects the error duplicates.

**Scorby** · 12-07-2005, 06:21 AM

I rewrote it using lead and lag but the response times and autotrace stats were almost identical:

Code:

select rowid, (case when leader is not null and lagger is null then 1 else 2 end) as error_bit_id
from
(select rowid, lead(rowid) over (PARTITION BY NVL(DEAL_ID, '~NVL'),NVL(LEG_TYPE_CD, '~NVL')
ORDER BY SRC_SYS_FEED_ROW_ID DESC, SCHEDULE_ID     DESC) as leader,
lag(rowid) over
(PARTITION BY NVL(DEAL_ID, '~NVL'),NVL(LEG_TYPE_CD, '~NVL')
ORDER BY SRC_SYS_FEED_ROW_ID DESC, SCHEDULE_ID     DESC) as lagger from schedule
WHERE BUSINESS_DATE = :p_business_date
and SRC_SYS_FEED_ID = :p_src_sys_feed_id
and BITAND(ROW_ERROR_CD, 3) = 0)
where leader is not null or lagger is not null

Plan is similar:

Code:

Operation	Object Name	Rows	Bytes	Cost	Object Node	In/Out	PStart	PStop

SELECT STATEMENT Hint=CHOOSE		5 K	 	3373  	 	      	             	 
  VIEW		5 K	112 K	3373  	 	      	             	 
    WINDOW SORT		5 K	304 K	3373  	 	      	             	 
      PARTITION RANGE SINGLE		  	 	 	 	      	            KEY	KEY
        TABLE ACCESS FULL	SCHEDULE	5 K	304 K	3319  	 	      	            KEY	KEY

Thread: Stats better but SQL is slower

Thread Tools

Display

Stats better but SQL is slower

Posting Permissions