Click to See Complete Forum and Search --> : Very slow INTERSECT under 9i - optimizer mistake?


anatoli
09-23-2003, 12:25 PM
Hello,

I have a weird performance issue with a relatively
simple query with INTERSECT and need advice.
The query in question looks like this

(subquery A)
intersect
(subquery B)

independently subquery A runs to completion in 2 seconds and returns 487 rows
independently subquery B runs to completion in 6 seconds and returns 2440 rows

the combined query (with INTERSECT) returns 15 matches
(correct), but runs for a whooping 1 minute 6 seconds

Considering the relatively small size of INTERSECT,
I cannot understand why the performance degrades so much.

Either subquery returns a result set of strings
20-30 characters long. Both subqueries contain nothing
exceptional - very conventional SQL. The size of
intersected results is relatively small, so no memory
issues should be responsible (I could intersect
this results with Perl in less then a second).
The only possible catch is that the strings
returned generally differ only in the last few
characters - could this affect sorting/hashing?

For all
means and purposes Oracle cannot run INTERSECT for
almost 1 minute (we are talking 2 MHz/512MB machine
running Oracle 9i)

So I presume it must be a query optimizer fault.
Nevertheless, playing with QUERY_REWRITE_ENABLED does not solve
anything, likewise COST hint.

Could anyone advice to me how to tackle this problem
or at least explain what is happening?

slimdave
09-23-2003, 01:32 PM
Have you looked at the explain plan of ...
* subquery A on it's own
* subquery B on it's own
* the two subquerys when intersected
... to see if there is a difference in how they are being executed when intersected?

Shestakov
09-24-2003, 01:38 PM
Originally posted by anatoli
Hello,

I have a weird performance issue with a relatively
simple query with INTERSECT and need advice.
The query in question looks like this

(subquery A)
intersect
(subquery B)

independently subquery A runs to completion in 2 seconds and returns 487 rows
independently subquery B runs to completion in 6 seconds and returns 2440 rows

the combined query (with INTERSECT) returns 15 matches
(correct), but runs for a whooping 1 minute 6 seconds


Both subquery independently don't use any addition sorting.
Bot if u try to get intersection of the queries, Oracle has to execute addition step for each of them:
SORT UNIQUE
This is reason, why intersection more slowly, then two subqueries
separately.

slimdave
09-24-2003, 02:10 PM
Originally posted by Shestakov
Both subquery independently don't use any addition sorting.
Bot if u try to get intersection of the queries, Oracle has to execute addition step for each of them:
SORT UNIQUE
This is reason, why intersection more slowly, then two subqueries
separately.

Given the number of rows involved, i don't believe that this explains the differencein execution time.

Shestakov
09-24-2003, 06:36 PM
2 slimdave

This is real query (or near real, because view based on this query).
This query use MINUS (it's like INTERSECT).
U can see execution plan if u don't belive that this explains the difference.


select a.period, center, ent, 1 as source_id
from dtax_model.TM_CC_ENT_LKUP a
, dtax_mlt.MLT_ENT b
WHERE a.period = b.period
AND a.ent_id = b.ENT_ID
AND a.active ='Y'
union all
(
select a.period, center, ent, 2
from dtax_mlt.MLT_CC_ENT_LKUP a
, dtax_mlt.MLT_ENT b
WHERE a.period = b.period
AND a.ent_id = b.ENT_ID
minus
select a.period, center, ent, 2
from dtax_model.TM_CC_ENT_LKUP a
, dtax_mlt.MLT_ENT b
WHERE a.period = b.period
AND a.ent_id = b.ENT_ID
AND a.active = 'Y'
)
;

Plan :

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=9 Card=52 Bytes=884)
1 0 VIEW OF 'V_TM_CC_ENT_LKUP_NEW2' (Cost=9 Card=52 Bytes=884)
2 1 UNION-ALL
3 2 TABLE ACCESS (BY INDEX ROWID) OF 'TM_CC_ENT_LKUP' (Cost=2 Card=10334 Bytes=175678)
4 3 NESTED LOOPS (Cost=3 Card=13 Bytes=390)
5 4 TABLE ACCESS (BY INDEX ROWID) OF 'MLT_ENT' (Cost=2 Card=1 Bytes=13)
6 5 INDEX (RANGE SCAN) OF 'IDX_ENT_PERIOD' (NON-UNIQUE) (Cost=1 Card=1)
7 4 INDEX (RANGE SCAN) OF 'IDX_1_ENT_CC' (NON-UNIQUE) (Cost=1 Card=10334)
8 2 MINUS
9 8 SORT (UNIQUE)
10 9 TABLE ACCESS (BY INDEX ROWID) OF 'MLT_CC_ENT_LKUP' (Cost=1 Card=40900 Bytes=613500)
11 10 NESTED LOOPS (Cost=3 Card=26 Bytes=728)
12 11 TABLE ACCESS (BY INDEX ROWID) OF 'MLT_ENT' (Cost=2 Card=1 Bytes=13)
13 12 INDEX (RANGE SCAN) OF 'IDX_ENT_PERIOD' (NON-UNIQUE) (Cost=1 Card=1)
14 11 INDEX (RANGE SCAN) OF 'IDX_ENT' (NON-UNIQUE)
15 8 SORT (UNIQUE)
16 15 TABLE ACCESS (BY INDEX ROWID) OF 'TM_CC_ENT_LKUP'(Cost=2 Card=10334 Bytes=175678)
17 16 NESTED LOOPS (Cost=3 Card=13 Bytes=390)
18 17 TABLE ACCESS (BY INDEX ROWID) OF 'MLT_ENT' (Cost=2 Card=1 Bytes=13)
19 18 INDEX (RANGE SCAN) OF 'IDX_ENT_PERIOD' (NON-UNIQUE) (Cost=1 Card=1)
20 17 INDEX (RANGE SCAN) OF 'IDX_1_ENT_CC' (NON-UNIQUE) (Cost=1 Card=10334)

Steps :
9 SORT (UNIQUE)
15 SORT (UNIQUE)
show extra sort operations.

chrisrlong
09-24-2003, 08:05 PM
He was not disputing the fact that an extra sort operation is required.

He is simply stating that it is unlikely that sorting a grand total of 2927 rows costs an additional 58 seconds, and I quite agree.

- Chris

slimdave
09-24-2003, 08:07 PM
I could sort 'em by hand faster than that! Well, not really. But nearly as fast.

anatoli
09-25-2003, 10:01 AM
Thank you for your responses. I looked at the execution plan. I would have expected the combined query to largely inherit original subquery plans and combine them using UNIQUE SORT, etc. Instead, the first subquery subplan is completely rewritten (the second subquery subplan is identical).

From looking at the new plan I could understand why it runs slow - but why Oracle decides to behave so differently with effectively identical SQL code? What causes it and how could I prevent it?

The plans for independent subqueries and a combined query are attached. I realize that without seeing the original SQL they are not great use, nevertheless they illustrate the point. In a big combined query the optimizer suddenly decides to go for a very simple recursive condition check on several large tables.

Cheers,
Toly

Anatoli Krassavine
Intellidos Limited

chrisrlong
09-25-2003, 11:19 PM
The reasons belong to the optimizer alone. Based on the 5 dozen different variables it looked at, it decided that out of the 5000 ways it could solve a query of that size, it *thought* the best way was actually *not* to wait until the very last step to do an intgersect. It re-wrote the intersect out of the query (which it always does) and simply came up with a different plan than you expected. All I can tell you is that the optimizer makes mistakes.

How do you stop it? Well, if you mean how do I force the intersect to be the last step when I do exactly this type of statement, the answer is easy, add AND ROWNUM > 0 to both sub-queries. This will force the optimizer to solve the queries independently, thus forcing the intersect to be done last. You could also use a lot of hints to do the same thing, and it would probably be 'more correct', but it would also be more difficult. The rownum thing is easier.

If you mean 'How do I stop the optimizer from making mistakes and/or doing things that I don't expect? Well, all I can say is can you please let the rest of us know when you figure that one out, eh? :D

Lesson of the day? Never assume you know what the optimizer is or *should* be doing. *Always* check the plan.

- Chris

anatoli
09-26-2003, 09:50 AM
Hello Chris,

Please accept my deepest appreciation for your advice - it works and solves the problem at hand. The query now runs in under 4 seconds.

I have some related questions to clarify the situation for me, but they are not urgent any more.

I was not naive enough to ask how to make Oracle always optimize correctly, :-). My biggest frustration was to see Oracle optimize/run independent subqueries without a glitch and then suddenly go complete bonkers when I start applying trivial set operations to their results.

1) What are the practical drawbacks of using ROWNUM > 0 in terms of performance, etc compared with a properly optimized query?

2) Is there a global way to force Oracle to solve the queries involved in an INTERSECT independently? I.e. is there a way to treat INTERSECT as kind of a "hard break" for query optimization?

Thank you again,
Toly

chrisrlong
09-26-2003, 10:46 AM
1) As you saw, there is precious little overhead to adding a ROWNUM predicate. I wouldn't hesitate to use it at all.

2) Not that I know of, but as people love to tell me at every opportunity: I don't know everything ;). Actually, I know of nobody who ever uses INTERSECT, so I've never had the need to look into it. It would also seem then to be unlikely that they would have added special flags to affect the functionality of such a little-used feature, but who knows. Let me know if you find anything.

- Chris

DaPi
09-26-2003, 11:18 AM
Originally posted by chrisrlong
Actually, I know of nobody who ever uses INTERSECT Observation & supposition:

An INTERSECT can be coded as as single query using AND's or AND EXISTS. In general the CBO does OK on these and most people find the logic quite natural.

Where the CBO sometimes comes unstuck is with OR's that refer to columns in different tables (WHERE a.col1='x' OR b.col2='y' etc) under these conditions a UNION can work better, tho' it may not be so intuitive. Your ROWNUM trick (or some other) may be useful here . . . ?

chrisrlong
09-26-2003, 12:16 PM
Originally posted by DaPi
An INTERSECT can be coded as as single query using AND's or AND EXISTS.
Which is why I said the optimizer always re-writes them. Once re-written, it becomes like any other join where the optimizer is free to merge logic in and out of levels and come up with whatever plan makes sense.
Originally posted by DaPi
In general the CBO does OK on these and most people find the logic quite natural.
Considering every single join is an intersect, I personally find the use for them non-existent, but maybe that's just me.

Just pondering here, though. I wonder if the NO_MERGE hint would help with an INTERSECT. The problem is that the in-line views are hidden and not named, so it may or may not work, but it's worth a try.

- Chris

DaPi
09-26-2003, 01:33 PM
I guess I didn't express myself very well. What I meant was that people don't write INTERSECT's because "An INTERSECT can be coded as as single query using AND's or AND EXISTS . . . . and most people find the logic (i.e. the logic of AND's) quite natural."

pando
09-26-2003, 02:34 PM
Originally posted by DaPi
I guess I didn't express myself very well. What I meant was that people don't write INTERSECT's because "An INTERSECT can be coded as as single query using AND's or AND EXISTS . . . . and most people find the logic (i.e. the logic of AND's) quite natural."

And if I understand Chris correctly he is saying Optimizer would rewrite your *INTERSECT* into AND, AND EXISTS join

DaPi
09-26-2003, 02:56 PM
Originally posted by pando
And if I understand Chris correctly he is saying Optimizer would rewrite your *INTERSECT* into AND, AND EXISTS join I wasn't disageeing! just saying why no one actually codes them!

chrisrlong
09-26-2003, 03:07 PM
Originally posted by DaPi
I wasn't disageeing! just saying why no one actually codes them!
...which is exactly what I was saying in my first post, and round and round we go :D. So yes, we all agree, so let's stop arguing :)

- Chris

(I'm getting dizzy here)

slimdave
09-26-2003, 03:22 PM
Originally posted by DaPi
I guess I didn't express myself very well. What I meant was that people don't write INTERSECT's because "An INTERSECT can be coded as as single query using AND's or AND EXISTS . . . . and most people find the logic (i.e. the logic of AND's) quite natural."

New Argument Begins Here

I've never used INTERSECT, as far as i can recall, but I'm always MINUS'ing and UNION'ing stuff -- it seems much easier and more natural than writing a join.

Flame away.

abhaysk
09-27-2003, 09:02 AM
Originally posted by slimdave
UNION'ing stuff -- it seems much easier and more natural than writing a join.


Some times sucks...CBO's A** out.

slimdave
09-27-2003, 01:42 PM
Originally posted by abhaysk
Some times sucks...CBO's A** out.

I was hoping for a "warmer" reply than that -- pehaps I wasn't inflammatory enough in my last post, so what about ...

"What kind of idiot would write complex joins -- difficult to follow and more bug-prone -- when they could be writing an intuitive, simple little set operation? For a single column comparison, perhaps (perhaps!) the ease of writing, understanding, and maintaining a join would approach that of a set operation, but take it up to three, four, five, six etc. columns and you'd have to be crazy."

Don't make me get personal in order to get a response! (You know who you are)

abhaysk
09-29-2003, 03:42 AM
Originally posted by slimdave
Don't make me get personal in order to get a response! (You know who you are)

What are you trying to hint?:confused: :confused:

anatoli
09-29-2003, 10:11 AM
Gentlemen, please…

I appreciate that in ideal world I should be able to write a completely optimised query and the best way to do it is to explicitly specify joins and avoid high-level set operations. This is what I would do in ideal world.

In real world my problems are much closer to what slimdave described. In a nutshell, our system collates domain knowledge, which is in part based on local scope queries written by different people (our developers, our consultants, our client’s IT staff and our client’s scientists). The system ends up with with a managed repository of hundreds of “small” queries written by different people (“small” is relative here – it includes some 200-liners). When a specific cross-domain question is asked (in non-SQL manner), our system automatically interprets it, selects the relevant queries, converts them into subqueries and combines them into a bigger cross-domain query.

In the example I submitted originally, subquery A was written by one person and subquery B by another, none of them aware that the two pieces of SQL will ever be run together. This is a small example - the big query could potentially invoke tens of subqueries.

For reliability sake I am loath to tinker with existing SQL queries written by other people without damn good reason. I would prefer as much as possible to treat them as a black box – one reason I find sets so useful.
1) Although I could isolate individual WHERE clauses and intelligently combine them to avoid INTERSECT, there is a high risk I will break something in the process.
2) If the query is not working, ability to step through individual set operations and see what is actually happening inside is a big help.

I am prepared to accept small performance penalty for using sets, but not a ridiculous one. If the performance is abysmal despite everything, then I am prepared to modify original SQL – but only after everything else failed.

Hence my interest in being able to specify hints to force subqueries compile independently rather then edit SQL.

I do not want to be misunderstood here saying that I do not care about performance. I do care and my post in this forum is a proof of it. It is just that I am not buying performance at the expense of reliability.

chrisrlong
09-29-2003, 10:35 AM
A brilliant reply.

Your problem seems very well thought out and your conclusions seem dead-on.

The ROWNUM trick should work fine for you with practically no overhead, but if you want to, you can try the NO_MERGE or similar hint to see what you can get out of that.

As for the other digressions in the thread: UNION is a critical part of the language that cannot be broken down without introducing 'fake' values, such as doing a cartesian join against a fake 2-row table or something similar, so it doesn't really fall into the same category as INTERSECT and MINUS. As for MINUS, I've seen it used many times and know of several people who like it. I personally do not use it because I can almost always get better performance in other ways. I have come up with test cases where it wins, although no real-life ones yet. Further, I generally don't end up with 2 sets with exactly the same data such that a MINUS would apply. I find that people often 'massage' their queries to be able to use the MINUS operation. It is rare that anything other than a single-table MINUS will fall into place without some fudging. So, IF I actually had two resultsets with the exact same columns and IF it actually performed better in a given situation, then sure, I'd use the MINUS. But as I said, it hasn't happened yet.

- Chris

slimdave
09-29-2003, 11:14 AM
Maybe it's a data warehousing thing -- I use minus op's extensively, mostly for validating referential integrity between fact and dimension tables. The execution plan has always included the MINUS operation, but maybe it's because the query is always of the form ...

select
my_column
from
very_big_table
MINUS
select
my_column
from
very_small_table