-
Originally posted by slimdave
Don't make me get personal in order to get a response! (You know who you are)
What are you trying to hint?
funky...
"I Dont Want To Follow A Path, I would Rather Go Where There Is No Path And Leave A Trail."
"Ego is the worst thing many have, try to overcome it & you will be the best, if not good, person on this earth"
-
Gentlemen, please…
I appreciate that in ideal world I should be able to write a completely optimised query and the best way to do it is to explicitly specify joins and avoid high-level set operations. This is what I would do in ideal world.
In real world my problems are much closer to what slimdave described. In a nutshell, our system collates domain knowledge, which is in part based on local scope queries written by different people (our developers, our consultants, our client’s IT staff and our client’s scientists). The system ends up with with a managed repository of hundreds of “small” queries written by different people (“small” is relative here – it includes some 200-liners). When a specific cross-domain question is asked (in non-SQL manner), our system automatically interprets it, selects the relevant queries, converts them into subqueries and combines them into a bigger cross-domain query.
In the example I submitted originally, subquery A was written by one person and subquery B by another, none of them aware that the two pieces of SQL will ever be run together. This is a small example - the big query could potentially invoke tens of subqueries.
For reliability sake I am loath to tinker with existing SQL queries written by other people without damn good reason. I would prefer as much as possible to treat them as a black box – one reason I find sets so useful.
1) Although I could isolate individual WHERE clauses and intelligently combine them to avoid INTERSECT, there is a high risk I will break something in the process.
2) If the query is not working, ability to step through individual set operations and see what is actually happening inside is a big help.
I am prepared to accept small performance penalty for using sets, but not a ridiculous one. If the performance is abysmal despite everything, then I am prepared to modify original SQL – but only after everything else failed.
Hence my interest in being able to specify hints to force subqueries compile independently rather then edit SQL.
I do not want to be misunderstood here saying that I do not care about performance. I do care and my post in this forum is a proof of it. It is just that I am not buying performance at the expense of reliability.
Anatoli Krassavine
toly@intellidos.com
Intellidos Limited
-
A brilliant reply.
Your problem seems very well thought out and your conclusions seem dead-on.
The ROWNUM trick should work fine for you with practically no overhead, but if you want to, you can try the NO_MERGE or similar hint to see what you can get out of that.
As for the other digressions in the thread: UNION is a critical part of the language that cannot be broken down without introducing 'fake' values, such as doing a cartesian join against a fake 2-row table or something similar, so it doesn't really fall into the same category as INTERSECT and MINUS. As for MINUS, I've seen it used many times and know of several people who like it. I personally do not use it because I can almost always get better performance in other ways. I have come up with test cases where it wins, although no real-life ones yet. Further, I generally don't end up with 2 sets with exactly the same data such that a MINUS would apply. I find that people often 'massage' their queries to be able to use the MINUS operation. It is rare that anything other than a single-table MINUS will fall into place without some fudging. So, IF I actually had two resultsets with the exact same columns and IF it actually performed better in a given situation, then sure, I'd use the MINUS. But as I said, it hasn't happened yet.
- Chris
-
Maybe it's a data warehousing thing -- I use minus op's extensively, mostly for validating referential integrity between fact and dimension tables. The execution plan has always included the MINUS operation, but maybe it's because the query is always of the form ...
Code:
select
my_column
from
very_big_table
MINUS
select
my_column
from
very_small_table
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|