-
Exists Or In
Hi All,
When is it advisable to use EXISTS and when is it advisable to use IN in a SQL query.
Thanking you in advance,
Pradeep
-
Just a matter of thinking;
When U use exists it is almost always like this:
select column_1
from table_a a
where exists
( select 1
from table_b b
where a.column_x = b.column_y)
In this case the inner query has to be performend for
every record.
select column_1
from table_a a
where a.column_x in
(select b.column_a
from table_b b)
Here the inner query has to be performed only once.
I know you can mix up those queries, but it's not logical.
Regards
Ben de Boer
-
it depends, if outer query returns tons of rows then normally it´s better IN
There are quite a few factors to consider, indexes, number of rows would be returned by inner and outer queries
-
Pando,
Pardon, but, EXISTS operator finishes comparing when the first row in outer query satisfies the join inner.col = outer.col, right? And then the next row from inner query is fetched and compared again to the outer query all again. With IN, the outer query is processed first and then inner query is compared with the rows returned by the outer query. And if the first row in outer query matches the inner query, the inner query does not fetch the next row, it keeps comparing till the outer query end, right?
So, if the outer query has 1 billion rows and let's suppose generally the first rows satisfies the join inner.col = outer.col, then EXISTS will return true and will fetch the next row without comparing the rest of the 1 billion rows, while IN will have to process outer query first (1 billion rows) and then start comparing it with inner query, isn't EXISTS faster in this case (outer query has lots of rows in this case) ?
I may be deeply wrong in my explanation.
F
-
I am talking from experience, I have create two tables with following results
Code:
SQL> create table x as select * from dba_objects where rownum <= 200;
Table created.
SQL> create table y as select * from dba_objects;
Table created.
SQL> insert into y select * from y
27699 rows created.
SQL> insert into x select * from x;
200 rows created.
SQL> commit;
Commit complete.
SQL> create index x_idx1 on x(object_id);
Index created.
SQL> create index y_idx1 on y(object_id);
Index created.
SQL> analyze table x compute statistics;
Table analyzed.
SQL> analyze table y compute statistics;
Table analyzed.
SQL> select count(*) from x;
COUNT(*)
----------
400
SQL> select count(*) from y;
COUNT(*)
----------
55398
SQL> set autotrace traceonly exp stat
SQL> set timing on
====================================================================
here the outer query returns large number of rows (table Y is much larger), look statistics by using IN & EXISTS, IN is more effective
SQL> select *
2 from y
3 where object_id in (select object_id from x);
400 rows selected.
Elapsed: 00:00:00.00
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=387 Card=400 Bytes=39200)
1 0 HASH JOIN (Cost=387 Card=400 Bytes=39200)
2 1 VIEW OF 'VW_NSO_1' (Cost=6 Card=200 Bytes=2600)
3 2 SORT (UNIQUE) (Cost=6 Card=200 Bytes=800)
4 3 TABLE ACCESS (FULL) OF 'X' (Cost=3 Card=400 Bytes=1600)
5 1 TABLE ACCESS (FULL) OF 'Y' (Cost=233 Card=55398 Bytes=4708830)
Statistics
----------------------------------------------------------
0 recursive calls
12 db block gets
1576 consistent gets
0 physical reads
0 redo size
53152 bytes sent via SQL*Net to client
3311 bytes received via SQL*Net from client
28 SQL*Net roundtrips to/from client
4 sorts (memory)
0 sorts (disk)
400 rows processed
SQL> select *
2 from y
3 where exists (select null from x
4 where x.object_id = y.object_id);
400 rows selected.
Elapsed: 00:00:00.21
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=233 Card=2770 Bytes=235450)
1 0 FILTER
2 1 TABLE ACCESS (FULL) OF 'Y' (Cost=233 Card=2770 Bytes=235450)
3 1 INDEX (RANGE SCAN) OF 'X_IDX1' (NON-UNIQUE) (Cost=1 Card=2 Bytes=8)
Statistics
----------------------------------------------------------
0 recursive calls
8 db block gets
111981 consistent gets
0 physical reads
0 redo size
53152 bytes sent via SQL*Net to client
3311 bytes received via SQL*Net from client
28 SQL*Net roundtrips to/from client
2 sorts (memory)
0 sorts (disk)
400 rows processed
====================================================================
here the outer query returns smaller number of rows (table x is much smaller), look statistics by using IN & EXISTS, EXISTS is more effective in this case
SQL> select *
2 from x
3 where object_id in (select object_id from y);
400 rows selected.
Elapsed: 00:00:00.71
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=763 Card=400 Bytes=40800)
1 0 MERGE JOIN (Cost=763 Card=400 Bytes=40800)
2 1 VIEW OF 'VW_NSO_1' (Cost=747 Card=27699 Bytes=360087)
3 2 SORT (UNIQUE) (Cost=747 Card=27699 Bytes=110796)
4 3 TABLE ACCESS (FULL) OF 'Y' (Cost=233 Card=55398 Bytes=221592)
5 1 SORT (JOIN) (Cost=16 Card=400 Bytes=35600)
6 5 TABLE ACCESS (FULL) OF 'X' (Cost=3 Card=400 Bytes=35600)
Statistics
----------------------------------------------------------
0 recursive calls
57 db block gets
1549 consistent gets
529 physical reads
0 redo size
30623 bytes sent via SQL*Net to client
3311 bytes received via SQL*Net from client
28 SQL*Net roundtrips to/from client
1 sorts (memory)
1 sorts (disk)
400 rows processed
SQL> select *
2 from x
3 where exists (select null from y
4 where y.object_id = x.object_id);
400 rows selected.
Elapsed: 00:00:00.60
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=3 Card=20 Bytes=1780)
1 0 FILTER
2 1 TABLE ACCESS (FULL) OF 'X' (Cost=3 Card=20 Bytes=1780)
3 1 INDEX (RANGE SCAN) OF 'Y_IDX1' (NON-UNIQUE) (Cost=1 Card=2 Bytes=8)
Statistics
----------------------------------------------------------
0 recursive calls
4 db block gets
575 consistent gets
0 physical reads
0 redo size
53152 bytes sent via SQL*Net to client
3311 bytes received via SQL*Net from client
28 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
400 rows processed
IMHO exists is slower if outer query returns more rows because for each row returned by outer query it has to compare with the results return by inner query. With exists we are correlating the query while as with IN we are not, simple sub-query (which is may be converted to join by CBO)
If you search asktom.oracle.com I think he posted the pseudo code of how Oracle interprets EXISTS and you can see why
But as I said it depends on quite a few factors, if we use IN and CBO converts subquery to a join if table Y has 30 million of rows and table X 10 millions then probably IN is better depending what kind of JOIN CBO is using (if it's sort-merge then bad! imagine sort merge two huge tables)
Last edited by pando; 11-15-2002 at 03:31 PM.
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|