-
Why INDEX instead of SCAN?
Hi,
I have a simple question, but since it involves
understanding query plans, I thought I'd post it
on this board.
I'm trying to understand why, when there are no
indexes and no other constraints except primary
keys, why Oracle does an INDEX UNIQUE SCAN on
one of table involved in a simple equijoin.
For example, if I have the following schema:
CREATE TABLE d (
d_did NUMBER PRIMARY KEY,
dname VARCHAR2 (20));
CREATE TABLE e (
e_eid NUMBER PRIMARY KEY,
e_did NUMBER,
e_name VARCHAR2(10),
CONSTRAINT e_fk1 FOREIGN KEY (e_did) REFERENCES d (d_did));
and I do the following query:
SELECT *
FROM d, e
WHERE e_did = d_did;
I get the plan:
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE
1 0 NESTED LOOPS
2 1 TABLE ACCESS (FULL) OF 'E'
3 1 TABLE ACCESS (BY INDEX ROWID) OF 'D'
4 3 INDEX (UNIQUE SCAN) OF 'SYS_C002462' (UNIQUE)
What confuses me is why an INDEX (UNIQUE SCAN) is done.
Why not a full SCAN (ie, TABLE ACCESS(FULL)), esp if
a full scan is done on E. One would think that both
tables would be accessed similarly. Also, it
seems like the extra operation (INDEX) is more
time consuming than just accessing D via FULL SCAN.
In fact, I can't even see how the INDEX operator works
here because there is no value to look for (since no
literal was specified in the query). So, what does
the INDEX actually index?
Can anyone clarify this?
Thanks,
Sara
-
For every record in 'E' Oracle searches the referenced record in 'D'.
The foreign key in 'E' points to the PK in 'D'.
The optimizer chooses to start with table 'E' what may be caused by the statistics.
When you run the statistics again for both tables, it may be different. (Depending on the no of records in each table)
Regards
Ben de Boer
-
Since I'm not sure I even understood the first answer given , I will try to elaborate for you.
So, you have a query that joins 2 tables and has no other restrictions. Okay, so if the data were:
T1
--
1
2
4
5
T2
--
1
2
3
4
...you would expect the result to be:
T1 T2
-------
1 1
2 2
4 4
So, how does Oracle go about doing this? Well, what are our choices? Since there are no restriction other than the join, we must obviously start with a full tablescan of something (actually, we could have started with a fast full index scan, but those are rare). Since both table have extremely similar statistics, it is a toss-up as to which to start with, so let's just start with T1.
Okay, so we can table scan through T1. Now what are our choices? Well, we need to join to another table, so how can we do that?
The basic joins are:
- HASH JOIN
- MERGE JOIN
- NESTED LOOP
Hash joins are best for joining 2 large resultsets so that each resultset can take advantage of an index not related to the joining column. We don't fit any of those criteria here. However, you can use the USE_HASH hint to force the plan and see what happens.
Merge joins are done on sorted data. You get the 2 resultsets seperately (using indexes likely not related to the join columns), sort both seperately, and then merge them together. Actually, this plan might be a good fit for this statement, but there are many reasons (including stats and init params) why this might not have been chosen. Again, the USE_MERGE hint would show you this plan.
Nested loops are done by rolling through the inner result and finding the sorresponding record in the outer table. So, we would roll through each record in T1 and try to find the corresponding record (via the join column) in T2. So, within this loop, we have an ID with which to search T2. Hence, we can use the PK to find the single record that goes with the ID we have in hand.
So, these are pretty much our only choices. The first 2 would have accessed both tables in similar fashions. However, the Nested Loop join uses the results in hand to find the corresponding records in the second table. Hence, it can use the index for T2. Make sense?
This is your first small step on the very long journey of trying to understand the optimizer.
Enjoy,
- Chris
-
Chris,
Thank you for your detailed reply.
However, I am still confused.
I understand the part about joining two streams of data
via nested/hash/merge. That makes sense.
What doesn't make sense to me is (1) the INDEX part of that
plan and (2) why full tablescans are not done on both tables.
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE
1 0 NESTED LOOPS
2 1 TABLE ACCESS (FULL) OF 'E'
3 1 TABLE ACCESS (BY INDEX ROWID) OF 'D'
4 3 INDEX (UNIQUE SCAN) OF 'SYS_C002462' (UNIQUE)
Notice that INDEX is the _first_ operation done.
If the query had been
select * from e,d where e.e_did = d.d_did and d.d_did=10;
then I could understand the INDEX. The "10" would be
the basis for the INDEX lookup. That would get the
corresponding ROWID associated with the PK of "10" in
D and it would use that tuple in a nested loops join
with a full scan of E. Makes total sense.
But without the d.d_did=10, it seems to me that there
is no basis for lookup. How can one index a table
without something to lookup?
Thanks for any and all help,
Sara
-
I explained that.
It essentially does this: (consider it pseudo-code internal to the statement executor)
Code:
OPEN l_cur FOR 'SELECT C1 FROM T1' -- Tablescan
LOOP -- now, for each record in that resultset, we want to find the
-- corresponding record in T2 (T1.C1 = T2.C1, remember?)
FETCH l_cur INTO l_T1C1;
SELECT C1 INTO l_T2C1 FROM T2 WHERE C1 = l_T1C1; -- Indexed lookup
WHEN SQL%NOTFOUND THEN EXIT;
END LOOP;
Make sense? It is looping through the first table via a tablescan and then for each record, finding the corresponding record in the second table. Therefore, it can use an index.
select * from e,d where e.e_did = d.d_did and d.d_did=10;
Now you added a predicate that can be used to avoid the first tablesan. If a nested loop join was chosen, the results would still look like:
Code:
OPEN l_cur FOR 'SELECT C1 FROM T1 WHERE C1=10' -- Indexed lookup
LOOP
FETCH l_cur INTO l_T1C1;
SELECT C1 INTO l_T2C1 FROM T2 WHERE C1 = l_T1C1; -- Indexed lookup
WHEN SQL%NOTFOUND THEN EXIT;
END LOOP;
Make sense?
I just noticed your one point of confusion:
Originally posted by saraypatchnick
Notice that INDEX is the _first_ operation done.
This is not true. You are simply reading the plan wrong. The tablescan is done first, then the indexed lookup
- Chris
Last edited by chrisrlong; 10-30-2002 at 01:38 PM.
-
Originally posted by chrisrlong
This is not true. You are simply reading the plan wrong. The tablescan is done first, then the indexed lookup
If the tablescan is really done first, then I agree with you.
But then the direction of dataflow confuses me.
According to this plan:
Code:
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE
1 0 NESTED LOOPS
2 1 TABLE ACCESS (FULL) OF 'E'
3 1 TABLE ACCESS (BY INDEX ROWID) OF 'D'
4 3 INDEX (UNIQUE SCAN) OF 'SYS_C002462' (UNIQUE)
Statement ID 3 (the TABLE ACCESS of D) is the parent
of statement ID 4 (the INDEX), meaning that the INDEX
is needed for the TABLE ACCESS to occur. Makes sense.
What you seem to be saying is that the full scan of
E (statement 2) is what is driving the index
(statement 4). But the plan above does not appear
to be saying that, since the parent of statement 2 is
the nested loops.
In short, the plan to me looks like this:
Code:
INDEX-->TABLE ACCESS of D (via INDEX)
\
---> NESTED LOOPS
/
TABLE ACCESS of E (via SCAN)
But of course this cannot be right, thus my confusion.
How do I read explain plan output to determine the
direction of dataflow? I assumed that data flows from
child to parent; if that is not right, I'd really like
to understand why.
Thanks for your patience in explaining this.
Sara
-
Well, I tried to quickly find you something understandable to explain it, but with no luck. Maybe somebody else has a primer at hand that they can link you to.
Here is what I found, but it pretty much sucks
http://tahiti.oracle.com/pls/tahiti/...ps.htm%2340660
Basically, all 3 joins are presented the same, but interpreted differently:
1 JOIN
2 TABLE ACCESS
3 INDEX ACCESS
4 TABLE ACCESS
5 INDEX ACCESS
For:
NESTED LOOP: 3,2,1,5,4 for each record
MERGE JOIN: 3,2,(SORT - not shown here),5,4,(SORT - not shown here),1
HASH JOIN: 3,2,5,4,1(with first table as 'inner' table in hash)
- Chris
(awaiting corrections and additions )
Last edited by chrisrlong; 10-30-2002 at 03:56 PM.
-
Okay, it turns out that my EXACT question was already
answered in the Oracle documentation.
Unfortunately, the explanation given by the Oracle docs
does not make 100% sense.
You can find my exact example at:
http://download-west.oracle.com/docs...mops.htm#38183
Look about half way down the page, at Figure 4-4.
There it is... my exact plan.
Now, if you read the 4 bullet points that follow, it jives with
what Chris had posted earlier.
What STILL DOES NOT MAKE SENSE is bullet point #2:
Code:
For each row returned by step 2, step 4 uses the
emp.deptno value to perform a unique scan on the
pk_dept index
How is this possible when the query plan tree does not show
step 2 connected with step 4?! Specifically, step 4 does not recieve the output of step 2. What in the world is going on?!
The rest of the bullet points make sense. The rest of the documentation makes sense. It's just this one point.
Thanks for any and all help...
Sara
-
Originally posted by saraypatchnick
What STILL DOES NOT MAKE SENSE is bullet point #2:
Code:
For each row returned by step 2, step 4 uses the
emp.deptno value to perform a unique scan on the
pk_dept index
How is this possible when the query plan tree does not show
step 2 connected with step 4?! Specifically, step 4 does not recieve the output of step 2. What in the world is going on?!
Simple - Oracle completely sucks when it comes to the human interface. Period. Just look at SQL*Plus. It's a joke . The EXPLAIN PLAN table is a pathetic interface to use to explain to a human how the statement is being executed, but there it is. The stuff stuck in that table is not an actual part of the execution process. It is only put there for human consumption. So don't sweat too much how things 'look', because they simply needed to stuff the ever-changing capabilities of the optimizer into this fixed format used by EXPLAIN PLAN. You'll get used to it . Nested Loops simply work as I said, regardless of how that information is presented to you through the plan_table.
- Chris
Last edited by chrisrlong; 10-30-2002 at 03:58 PM.
-
Chris,
Thanks for all your time and help. I appreciate it!
Sara
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|