Exists Or In

**paddy04** · 11-15-2002, 08:15 AM

Hi All,

When is it advisable to use EXISTS and when is it advisable to use IN in a SQL query.

Thanking you in advance,

Pradeep

**bensr** · 11-15-2002, 09:29 AM

Just a matter of thinking;
When U use exists it is almost always like this:

select column_1
from table_a a
where exists
( select 1
from table_b b
where a.column_x = b.column_y)

In this case the inner query has to be performend for
every record.

select column_1
from table_a a
where a.column_x in
(select b.column_a
from table_b b)

Here the inner query has to be performed only once.

I know you can mix up those queries, but it's not logical.

**pando** · 11-15-2002, 09:33 AM

it depends, if outer query returns tons of rows then normally it´s better IN

There are quite a few factors to consider, indexes, number of rows would be returned by inner and outer queries

**Mnemonical** · 11-15-2002, 02:17 PM

Pando,

Pardon, but, EXISTS operator finishes comparing when the first row in outer query satisfies the join inner.col = outer.col, right? And then the next row from inner query is fetched and compared again to the outer query all again. With IN, the outer query is processed first and then inner query is compared with the rows returned by the outer query. And if the first row in outer query matches the inner query, the inner query does not fetch the next row, it keeps comparing till the outer query end, right?

So, if the outer query has 1 billion rows and let's suppose generally the first rows satisfies the join inner.col = outer.col, then EXISTS will return true and will fetch the next row without comparing the rest of the 1 billion rows, while IN will have to process outer query first (1 billion rows) and then start comparing it with inner query, isn't EXISTS faster in this case (outer query has lots of rows in this case) ?

I may be deeply wrong in my explanation.

F

**pando** · 11-15-2002, 03:28 PM

I am talking from experience, I have create two tables with following results

Code:

SQL> create table x as select * from dba_objects where rownum <= 200;

Table created.

SQL> create table y as select * from dba_objects;

Table created.

SQL> insert into y select * from y

27699 rows created.

SQL> insert into x select * from x;

200 rows created.

SQL> commit;

Commit complete.

SQL> create index x_idx1 on x(object_id);

Index created.

SQL> create index y_idx1 on y(object_id);

Index created.

SQL> analyze table x compute statistics;

Table analyzed.

SQL> analyze table y compute statistics;

Table analyzed.

SQL> select count(*) from x;

  COUNT(*)
----------
       400

SQL> select count(*) from y;

  COUNT(*)
----------
     55398
     
SQL> set autotrace traceonly exp stat

SQL> set timing on


====================================================================
here the outer query returns large number of rows (table Y is much larger), look statistics by using IN & EXISTS, IN is more effective

SQL> select *
  2    from y
  3   where object_id in (select object_id from x);

400 rows selected.

Elapsed: 00:00:00.00

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=387 Card=400 Bytes=39200)
   1    0   HASH JOIN (Cost=387 Card=400 Bytes=39200)
   2    1     VIEW OF 'VW_NSO_1' (Cost=6 Card=200 Bytes=2600)
   3    2       SORT (UNIQUE) (Cost=6 Card=200 Bytes=800)
   4    3         TABLE ACCESS (FULL) OF 'X' (Cost=3 Card=400 Bytes=1600)

   5    1     TABLE ACCESS (FULL) OF 'Y' (Cost=233 Card=55398 Bytes=4708830)

Statistics
----------------------------------------------------------
          0  recursive calls
         12  db block gets
       1576  consistent gets
          0  physical reads
          0  redo size
      53152  bytes sent via SQL*Net to client
       3311  bytes received via SQL*Net from client
         28  SQL*Net roundtrips to/from client
          4  sorts (memory)
          0  sorts (disk)
        400  rows processed

SQL> select *
  2    from y
  3   where exists (select null from x
  4                  where x.object_id = y.object_id);

400 rows selected.

Elapsed: 00:00:00.21

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=233 Card=2770 Bytes=235450)
   1    0   FILTER
   2    1     TABLE ACCESS (FULL) OF 'Y' (Cost=233 Card=2770 Bytes=235450)
   3    1     INDEX (RANGE SCAN) OF 'X_IDX1' (NON-UNIQUE) (Cost=1 Card=2 Bytes=8)

Statistics
----------------------------------------------------------
          0  recursive calls
          8  db block gets
     111981  consistent gets
          0  physical reads
          0  redo size
      53152  bytes sent via SQL*Net to client
       3311  bytes received via SQL*Net from client
         28  SQL*Net roundtrips to/from client
          2  sorts (memory)
          0  sorts (disk)
        400  rows processed


====================================================================
here the outer query returns smaller number of rows (table x is much smaller), look statistics by using IN & EXISTS, EXISTS is more effective in this case

SQL> select *
  2    from x
  3   where object_id in (select object_id from y);

400 rows selected.

Elapsed: 00:00:00.71

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=763 Card=400 Bytes=40800)
   1    0   MERGE JOIN (Cost=763 Card=400 Bytes=40800)
   2    1     VIEW OF 'VW_NSO_1' (Cost=747 Card=27699 Bytes=360087)
   3    2       SORT (UNIQUE) (Cost=747 Card=27699 Bytes=110796)
   4    3         TABLE ACCESS (FULL) OF 'Y' (Cost=233 Card=55398 Bytes=221592)
   5    1     SORT (JOIN) (Cost=16 Card=400 Bytes=35600)
   6    5       TABLE ACCESS (FULL) OF 'X' (Cost=3 Card=400 Bytes=35600)

Statistics
----------------------------------------------------------
          0  recursive calls
         57  db block gets
       1549  consistent gets
        529  physical reads
          0  redo size
      30623  bytes sent via SQL*Net to client
       3311  bytes received via SQL*Net from client
         28  SQL*Net roundtrips to/from client
          1  sorts (memory)
          1  sorts (disk)
        400  rows processed

SQL> select *
  2    from x
  3   where exists (select null from y
  4                  where y.object_id = x.object_id);

400 rows selected.

Elapsed: 00:00:00.60

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=3 Card=20 Bytes=1780)
   1    0   FILTER
   2    1     TABLE ACCESS (FULL) OF 'X' (Cost=3 Card=20 Bytes=1780)
   3    1     INDEX (RANGE SCAN) OF 'Y_IDX1' (NON-UNIQUE) (Cost=1 Card=2 Bytes=8)

Statistics
----------------------------------------------------------
          0  recursive calls
          4  db block gets
        575  consistent gets
          0  physical reads
          0  redo size
      53152  bytes sent via SQL*Net to client
       3311  bytes received via SQL*Net from client
         28  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        400  rows processed

IMHO exists is slower if outer query returns more rows because for each row returned by outer query it has to compare with the results return by inner query. With exists we are correlating the query while as with IN we are not, simple sub-query (which is may be converted to join by CBO)

If you search asktom.oracle.com I think he posted the pseudo code of how Oracle interprets EXISTS and you can see why

But as I said it depends on quite a few factors, if we use IN and CBO converts subquery to a join if table Y has 30 million of rows and table X 10 millions then probably IN is better depending what kind of JOIN CBO is using (if it's sort-merge then bad! imagine sort merge two huge tables)

Thread: Exists Or In

Thread Tools

Display

Exists Or In

Posting Permissions