DBAsupport.com Forums - Powered by vBulletin
Results 1 to 10 of 10

Thread: Why INDEX instead of SCAN?

  1. #1
    Join Date
    Oct 2002
    Posts
    6

    Why INDEX instead of SCAN?

    Hi,
    I have a simple question, but since it involves
    understanding query plans, I thought I'd post it
    on this board.

    I'm trying to understand why, when there are no
    indexes and no other constraints except primary
    keys, why Oracle does an INDEX UNIQUE SCAN on
    one of table involved in a simple equijoin.

    For example, if I have the following schema:

    CREATE TABLE d (
    d_did NUMBER PRIMARY KEY,
    dname VARCHAR2 (20));

    CREATE TABLE e (
    e_eid NUMBER PRIMARY KEY,
    e_did NUMBER,
    e_name VARCHAR2(10),
    CONSTRAINT e_fk1 FOREIGN KEY (e_did) REFERENCES d (d_did));

    and I do the following query:

    SELECT *
    FROM d, e
    WHERE e_did = d_did;

    I get the plan:

    Execution Plan
    ----------------------------------------------------------
    0 SELECT STATEMENT Optimizer=CHOOSE
    1 0 NESTED LOOPS
    2 1 TABLE ACCESS (FULL) OF 'E'
    3 1 TABLE ACCESS (BY INDEX ROWID) OF 'D'
    4 3 INDEX (UNIQUE SCAN) OF 'SYS_C002462' (UNIQUE)

    What confuses me is why an INDEX (UNIQUE SCAN) is done.
    Why not a full SCAN (ie, TABLE ACCESS(FULL)), esp if
    a full scan is done on E. One would think that both
    tables would be accessed similarly. Also, it
    seems like the extra operation (INDEX) is more
    time consuming than just accessing D via FULL SCAN.

    In fact, I can't even see how the INDEX operator works
    here because there is no value to look for (since no
    literal was specified in the query). So, what does
    the INDEX actually index?


    Can anyone clarify this?

    Thanks,
    Sara

  2. #2
    Join Date
    Feb 2001
    Posts
    180
    For every record in 'E' Oracle searches the referenced record in 'D'.
    The foreign key in 'E' points to the PK in 'D'.

    The optimizer chooses to start with table 'E' what may be caused by the statistics.

    When you run the statistics again for both tables, it may be different. (Depending on the no of records in each table)
    Regards
    Ben de Boer

  3. #3
    Join Date
    Nov 2000
    Location
    Baltimore, MD USA
    Posts
    1,339
    Since I'm not sure I even understood the first answer given , I will try to elaborate for you.

    So, you have a query that joins 2 tables and has no other restrictions. Okay, so if the data were:

    T1
    --
    1
    2
    4
    5

    T2
    --
    1
    2
    3
    4

    ...you would expect the result to be:

    T1 T2
    -------
    1 1
    2 2
    4 4

    So, how does Oracle go about doing this? Well, what are our choices? Since there are no restriction other than the join, we must obviously start with a full tablescan of something (actually, we could have started with a fast full index scan, but those are rare). Since both table have extremely similar statistics, it is a toss-up as to which to start with, so let's just start with T1.

    Okay, so we can table scan through T1. Now what are our choices? Well, we need to join to another table, so how can we do that?
    The basic joins are:
    - HASH JOIN
    - MERGE JOIN
    - NESTED LOOP

    Hash joins are best for joining 2 large resultsets so that each resultset can take advantage of an index not related to the joining column. We don't fit any of those criteria here. However, you can use the USE_HASH hint to force the plan and see what happens.
    Merge joins are done on sorted data. You get the 2 resultsets seperately (using indexes likely not related to the join columns), sort both seperately, and then merge them together. Actually, this plan might be a good fit for this statement, but there are many reasons (including stats and init params) why this might not have been chosen. Again, the USE_MERGE hint would show you this plan.
    Nested loops are done by rolling through the inner result and finding the sorresponding record in the outer table. So, we would roll through each record in T1 and try to find the corresponding record (via the join column) in T2. So, within this loop, we have an ID with which to search T2. Hence, we can use the PK to find the single record that goes with the ID we have in hand.

    So, these are pretty much our only choices. The first 2 would have accessed both tables in similar fashions. However, the Nested Loop join uses the results in hand to find the corresponding records in the second table. Hence, it can use the index for T2. Make sense?

    This is your first small step on the very long journey of trying to understand the optimizer.

    Enjoy,

    - Chris
    Christopher R. Long
    ChrisRLong@HotMail.Com
    But that's just my opinion. I could be wrong

  4. #4
    Join Date
    Oct 2002
    Posts
    6
    Chris,

    Thank you for your detailed reply.

    However, I am still confused.

    I understand the part about joining two streams of data
    via nested/hash/merge. That makes sense.

    What doesn't make sense to me is (1) the INDEX part of that
    plan and (2) why full tablescans are not done on both tables.

    Execution Plan
    ----------------------------------------------------------
    0 SELECT STATEMENT Optimizer=CHOOSE
    1 0 NESTED LOOPS
    2 1 TABLE ACCESS (FULL) OF 'E'
    3 1 TABLE ACCESS (BY INDEX ROWID) OF 'D'
    4 3 INDEX (UNIQUE SCAN) OF 'SYS_C002462' (UNIQUE)

    Notice that INDEX is the _first_ operation done.

    If the query had been

    select * from e,d where e.e_did = d.d_did and d.d_did=10;

    then I could understand the INDEX. The "10" would be
    the basis for the INDEX lookup. That would get the
    corresponding ROWID associated with the PK of "10" in
    D and it would use that tuple in a nested loops join
    with a full scan of E. Makes total sense.

    But without the d.d_did=10, it seems to me that there
    is no basis for lookup. How can one index a table
    without something to lookup?

    Thanks for any and all help,
    Sara

  5. #5
    Join Date
    Nov 2000
    Location
    Baltimore, MD USA
    Posts
    1,339
    I explained that.

    It essentially does this: (consider it pseudo-code internal to the statement executor)
    Code:
    OPEN l_cur FOR 'SELECT C1 FROM T1' -- Tablescan
    LOOP -- now, for each record in that resultset, we want to find the  
         -- corresponding record in T2 (T1.C1 = T2.C1, remember?)
       FETCH l_cur INTO l_T1C1;
       SELECT C1 INTO l_T2C1 FROM T2 WHERE C1 = l_T1C1; -- Indexed lookup
       WHEN SQL%NOTFOUND THEN EXIT;
    END LOOP;
    Make sense? It is looping through the first table via a tablescan and then for each record, finding the corresponding record in the second table. Therefore, it can use an index.


    select * from e,d where e.e_did = d.d_did and d.d_did=10;
    Now you added a predicate that can be used to avoid the first tablesan. If a nested loop join was chosen, the results would still look like:

    Code:
    OPEN l_cur FOR 'SELECT C1 FROM T1 WHERE C1=10' -- Indexed lookup
    LOOP
       FETCH l_cur INTO l_T1C1;
       SELECT C1 INTO l_T2C1 FROM T2 WHERE C1 = l_T1C1; -- Indexed lookup
       WHEN SQL%NOTFOUND THEN EXIT;
    END LOOP;
    Make sense?

    I just noticed your one point of confusion:
    Originally posted by saraypatchnick
    Notice that INDEX is the _first_ operation done.
    This is not true. You are simply reading the plan wrong. The tablescan is done first, then the indexed lookup

    - Chris
    Last edited by chrisrlong; 10-30-2002 at 01:38 PM.
    Christopher R. Long
    ChrisRLong@HotMail.Com
    But that's just my opinion. I could be wrong

  6. #6
    Join Date
    Oct 2002
    Posts
    6
    Originally posted by chrisrlong
    This is not true. You are simply reading the plan wrong. The tablescan is done first, then the indexed lookup
    If the tablescan is really done first, then I agree with you.

    But then the direction of dataflow confuses me.

    According to this plan:

    Code:
    Execution Plan
    ----------------------------------------------------------
    0   SELECT STATEMENT Optimizer=CHOOSE
    1 0   NESTED LOOPS
    2 1     TABLE ACCESS (FULL) OF 'E'
    3 1     TABLE ACCESS (BY INDEX ROWID) OF 'D'
    4 3       INDEX (UNIQUE SCAN) OF 'SYS_C002462' (UNIQUE)
    Statement ID 3 (the TABLE ACCESS of D) is the parent
    of statement ID 4 (the INDEX), meaning that the INDEX
    is needed for the TABLE ACCESS to occur. Makes sense.

    What you seem to be saying is that the full scan of
    E (statement 2) is what is driving the index
    (statement 4). But the plan above does not appear
    to be saying that, since the parent of statement 2 is
    the nested loops.

    In short, the plan to me looks like this:

    Code:
      INDEX-->TABLE ACCESS of D (via INDEX)
                                           \
                                            ---> NESTED LOOPS
                                           /
              TABLE ACCESS of E (via SCAN)
    But of course this cannot be right, thus my confusion.

    How do I read explain plan output to determine the
    direction of dataflow? I assumed that data flows from
    child to parent; if that is not right, I'd really like
    to understand why.

    Thanks for your patience in explaining this.
    Sara

  7. #7
    Join Date
    Nov 2000
    Location
    Baltimore, MD USA
    Posts
    1,339
    Well, I tried to quickly find you something understandable to explain it, but with no luck. Maybe somebody else has a primer at hand that they can link you to.

    Here is what I found, but it pretty much sucks

    http://tahiti.oracle.com/pls/tahiti/...ps.htm%2340660

    Basically, all 3 joins are presented the same, but interpreted differently:

    1 JOIN
    2 TABLE ACCESS
    3 INDEX ACCESS
    4 TABLE ACCESS
    5 INDEX ACCESS

    For:
    NESTED LOOP: 3,2,1,5,4 for each record
    MERGE JOIN: 3,2,(SORT - not shown here),5,4,(SORT - not shown here),1
    HASH JOIN: 3,2,5,4,1(with first table as 'inner' table in hash)

    - Chris

    (awaiting corrections and additions )
    Last edited by chrisrlong; 10-30-2002 at 03:56 PM.
    Christopher R. Long
    ChrisRLong@HotMail.Com
    But that's just my opinion. I could be wrong

  8. #8
    Join Date
    Oct 2002
    Posts
    6
    Okay, it turns out that my EXACT question was already
    answered in the Oracle documentation.

    Unfortunately, the explanation given by the Oracle docs
    does not make 100% sense.

    You can find my exact example at:

    http://download-west.oracle.com/docs...mops.htm#38183

    Look about half way down the page, at Figure 4-4.

    There it is... my exact plan.

    Now, if you read the 4 bullet points that follow, it jives with
    what Chris had posted earlier.

    What STILL DOES NOT MAKE SENSE is bullet point #2:

    Code:
    For each row returned by step 2, step 4 uses the 
    emp.deptno value to perform a unique scan on the 
    pk_dept index
    How is this possible when the query plan tree does not show
    step 2 connected with step 4?! Specifically, step 4 does not recieve the output of step 2. What in the world is going on?!

    The rest of the bullet points make sense. The rest of the documentation makes sense. It's just this one point.

    Thanks for any and all help...
    Sara

  9. #9
    Join Date
    Nov 2000
    Location
    Baltimore, MD USA
    Posts
    1,339
    Originally posted by saraypatchnick
    What STILL DOES NOT MAKE SENSE is bullet point #2:

    Code:
    For each row returned by step 2, step 4 uses the 
    emp.deptno value to perform a unique scan on the 
    pk_dept index
    How is this possible when the query plan tree does not show
    step 2 connected with step 4?! Specifically, step 4 does not recieve the output of step 2. What in the world is going on?!
    Simple - Oracle completely sucks when it comes to the human interface. Period. Just look at SQL*Plus. It's a joke . The EXPLAIN PLAN table is a pathetic interface to use to explain to a human how the statement is being executed, but there it is. The stuff stuck in that table is not an actual part of the execution process. It is only put there for human consumption. So don't sweat too much how things 'look', because they simply needed to stuff the ever-changing capabilities of the optimizer into this fixed format used by EXPLAIN PLAN. You'll get used to it . Nested Loops simply work as I said, regardless of how that information is presented to you through the plan_table.

    - Chris
    Last edited by chrisrlong; 10-30-2002 at 03:58 PM.
    Christopher R. Long
    ChrisRLong@HotMail.Com
    But that's just my opinion. I could be wrong

  10. #10
    Join Date
    Oct 2002
    Posts
    6
    Chris,

    Thanks for all your time and help. I appreciate it!

    Sara

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  


Click Here to Expand Forum to Full Width