DBAsupport.com Forums - Powered by vBulletin
Results 1 to 5 of 5

Thread: Exists Or In

  1. #1
    Join Date
    Nov 2002
    Location
    INDIA
    Posts
    38

    Exists Or In

    Hi All,

    When is it advisable to use EXISTS and when is it advisable to use IN in a SQL query.

    Thanking you in advance,

    Pradeep

  2. #2
    Join Date
    Feb 2001
    Posts
    180
    Just a matter of thinking;
    When U use exists it is almost always like this:

    select column_1
    from table_a a
    where exists
    ( select 1
    from table_b b
    where a.column_x = b.column_y)

    In this case the inner query has to be performend for
    every record.

    select column_1
    from table_a a
    where a.column_x in
    (select b.column_a
    from table_b b)

    Here the inner query has to be performed only once.

    I know you can mix up those queries, but it's not logical.
    Regards
    Ben de Boer

  3. #3
    Join Date
    Jun 2000
    Location
    Madrid, Spain
    Posts
    7,447
    it depends, if outer query returns tons of rows then normally it´s better IN

    There are quite a few factors to consider, indexes, number of rows would be returned by inner and outer queries

  4. #4
    Join Date
    Dec 2001
    Location
    Brazil
    Posts
    282
    Pando,

    Pardon, but, EXISTS operator finishes comparing when the first row in outer query satisfies the join inner.col = outer.col, right? And then the next row from inner query is fetched and compared again to the outer query all again. With IN, the outer query is processed first and then inner query is compared with the rows returned by the outer query. And if the first row in outer query matches the inner query, the inner query does not fetch the next row, it keeps comparing till the outer query end, right?

    So, if the outer query has 1 billion rows and let's suppose generally the first rows satisfies the join inner.col = outer.col, then EXISTS will return true and will fetch the next row without comparing the rest of the 1 billion rows, while IN will have to process outer query first (1 billion rows) and then start comparing it with inner query, isn't EXISTS faster in this case (outer query has lots of rows in this case) ?

    I may be deeply wrong in my explanation.

    F

  5. #5
    Join Date
    Jun 2000
    Location
    Madrid, Spain
    Posts
    7,447
    I am talking from experience, I have create two tables with following results

    Code:
    SQL> create table x as select * from dba_objects where rownum <= 200;
    
    Table created.
    
    SQL> create table y as select * from dba_objects;
    
    Table created.
    
    SQL> insert into y select * from y
    
    27699 rows created.
    
    SQL> insert into x select * from x;
    
    200 rows created.
    
    SQL> commit;
    
    Commit complete.
    
    SQL> create index x_idx1 on x(object_id);
    
    Index created.
    
    SQL> create index y_idx1 on y(object_id);
    
    Index created.
    
    SQL> analyze table x compute statistics;
    
    Table analyzed.
    
    SQL> analyze table y compute statistics;
    
    Table analyzed.
    
    SQL> select count(*) from x;
    
      COUNT(*)
    ----------
           400
    
    SQL> select count(*) from y;
    
      COUNT(*)
    ----------
         55398
         
    SQL> set autotrace traceonly exp stat
    
    SQL> set timing on
    
    
    ====================================================================
    here the outer query returns large number of rows (table Y is much larger), look statistics by using IN & EXISTS, IN is more effective
    
    SQL> select *
      2    from y
      3   where object_id in (select object_id from x);
    
    400 rows selected.
    
    Elapsed: 00:00:00.00
    
    Execution Plan
    ----------------------------------------------------------
       0      SELECT STATEMENT Optimizer=CHOOSE (Cost=387 Card=400 Bytes=39200)
       1    0   HASH JOIN (Cost=387 Card=400 Bytes=39200)
       2    1     VIEW OF 'VW_NSO_1' (Cost=6 Card=200 Bytes=2600)
       3    2       SORT (UNIQUE) (Cost=6 Card=200 Bytes=800)
       4    3         TABLE ACCESS (FULL) OF 'X' (Cost=3 Card=400 Bytes=1600)
    
       5    1     TABLE ACCESS (FULL) OF 'Y' (Cost=233 Card=55398 Bytes=4708830)
    
    Statistics
    ----------------------------------------------------------
              0  recursive calls
             12  db block gets
           1576  consistent gets
              0  physical reads
              0  redo size
          53152  bytes sent via SQL*Net to client
           3311  bytes received via SQL*Net from client
             28  SQL*Net roundtrips to/from client
              4  sorts (memory)
              0  sorts (disk)
            400  rows processed
    
    SQL> select *
      2    from y
      3   where exists (select null from x
      4                  where x.object_id = y.object_id);
    
    400 rows selected.
    
    Elapsed: 00:00:00.21
    
    Execution Plan
    ----------------------------------------------------------
       0      SELECT STATEMENT Optimizer=CHOOSE (Cost=233 Card=2770 Bytes=235450)
       1    0   FILTER
       2    1     TABLE ACCESS (FULL) OF 'Y' (Cost=233 Card=2770 Bytes=235450)
       3    1     INDEX (RANGE SCAN) OF 'X_IDX1' (NON-UNIQUE) (Cost=1 Card=2 Bytes=8)
    
    Statistics
    ----------------------------------------------------------
              0  recursive calls
              8  db block gets
         111981  consistent gets
              0  physical reads
              0  redo size
          53152  bytes sent via SQL*Net to client
           3311  bytes received via SQL*Net from client
             28  SQL*Net roundtrips to/from client
              2  sorts (memory)
              0  sorts (disk)
            400  rows processed
    
    
    ====================================================================
    here the outer query returns smaller number of rows (table x is much smaller), look statistics by using IN & EXISTS, EXISTS is more effective in this case
    
    SQL> select *
      2    from x
      3   where object_id in (select object_id from y);
    
    400 rows selected.
    
    Elapsed: 00:00:00.71
    
    Execution Plan
    ----------------------------------------------------------
       0      SELECT STATEMENT Optimizer=CHOOSE (Cost=763 Card=400 Bytes=40800)
       1    0   MERGE JOIN (Cost=763 Card=400 Bytes=40800)
       2    1     VIEW OF 'VW_NSO_1' (Cost=747 Card=27699 Bytes=360087)
       3    2       SORT (UNIQUE) (Cost=747 Card=27699 Bytes=110796)
       4    3         TABLE ACCESS (FULL) OF 'Y' (Cost=233 Card=55398 Bytes=221592)
       5    1     SORT (JOIN) (Cost=16 Card=400 Bytes=35600)
       6    5       TABLE ACCESS (FULL) OF 'X' (Cost=3 Card=400 Bytes=35600)
    
    Statistics
    ----------------------------------------------------------
              0  recursive calls
             57  db block gets
           1549  consistent gets
            529  physical reads
              0  redo size
          30623  bytes sent via SQL*Net to client
           3311  bytes received via SQL*Net from client
             28  SQL*Net roundtrips to/from client
              1  sorts (memory)
              1  sorts (disk)
            400  rows processed
    
    SQL> select *
      2    from x
      3   where exists (select null from y
      4                  where y.object_id = x.object_id);
    
    400 rows selected.
    
    Elapsed: 00:00:00.60
    
    Execution Plan
    ----------------------------------------------------------
       0      SELECT STATEMENT Optimizer=CHOOSE (Cost=3 Card=20 Bytes=1780)
       1    0   FILTER
       2    1     TABLE ACCESS (FULL) OF 'X' (Cost=3 Card=20 Bytes=1780)
       3    1     INDEX (RANGE SCAN) OF 'Y_IDX1' (NON-UNIQUE) (Cost=1 Card=2 Bytes=8)
    
    Statistics
    ----------------------------------------------------------
              0  recursive calls
              4  db block gets
            575  consistent gets
              0  physical reads
              0  redo size
          53152  bytes sent via SQL*Net to client
           3311  bytes received via SQL*Net from client
             28  SQL*Net roundtrips to/from client
              0  sorts (memory)
              0  sorts (disk)
            400  rows processed
    IMHO exists is slower if outer query returns more rows because for each row returned by outer query it has to compare with the results return by inner query. With exists we are correlating the query while as with IN we are not, simple sub-query (which is may be converted to join by CBO)

    If you search asktom.oracle.com I think he posted the pseudo code of how Oracle interprets EXISTS and you can see why


    But as I said it depends on quite a few factors, if we use IN and CBO converts subquery to a join if table Y has 30 million of rows and table X 10 millions then probably IN is better depending what kind of JOIN CBO is using (if it's sort-merge then bad! imagine sort merge two huge tables)
    Last edited by pando; 11-15-2002 at 03:31 PM.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  


Click Here to Expand Forum to Full Width