Partition-Wise Join in different cases

**Calvin_Qiu** · 11-01-2005, 02:26 AM

Hi, all

Following is an example of partition-wise join in different cases, and leads to several question need your advice:

drop table pn;
drop table ps;
drop table ps2;
create table pn (u_id int, u_name varchar2(10)) partition by hash(u_id) partitions 4;
create table ps (u_id int, u_sales int) partition by hash(u_id) partitions 4;
create table ps2 (u_id int, u_sales int) partition by hash(u_id) partitions 4;

insert into pn values(1,'Calvin');
insert into pn values(2,' Jessica Qiu');
insert into ps values(1, 1000);
insert into ps values(2, 1000);
insert into ps2 values(1, 1000);
insert into ps2 values(2, 1000);
commit;

exec dbms_stats.set_table_stats('CALVIN', 'PN', numrows => 1000000000);
exec dbms_stats.set_table_stats('CALVIN', 'PS', numrows => 100000000);
commit;

--Query 1
explain plan for select pn.u_name, ps.u_sales from pn, ps where pn.u_id = ps.u_id;
@?/rdbms/admin/utlxplp
----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time | Pstart| Pstop |
----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1000M| 56G| | 3623K (3)| 12:04:40 | | |
| 1 | PARTITION HASH ALL | | 1000M| 56G| | 3623K (3)| 12:04:40 | 1 | 4 |
|* 2 | HASH JOIN | | 1000M| 56G| 905M| 3623K (3)| 12:04:40 | | |
| 3 | TABLE ACCESS FULL| PS | 100M| 2479M| | 2726 (100)| 00:00:33 | 1 | 4 |
| 4 | TABLE ACCESS FULL| PN | 1000M| 32G| | 27047 (100)| 00:05:25 | 1 | 4 |
----------------------------------------------------------------------------------------------------

NOTE: It's clear that PS joined with PN in partition-wise join mode.

--Query 2
explain plan for select pn.u_name, ps.sales from pn, (select u_id, sum(u_sales) sales from ps group by u_id) ps where pn.u_id = ps.u_id;
@?/rdbms/admin/utlxplp
-----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time | Pstart| Pstop |
-----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1000M| 63G| | 4392K (9)| 14:38:28 | | |
| 1 | PARTITION HASH ALL | | 1000M| 63G| | 4392K (9)| 14:38:28 | 1 | 4 |
| 2 | SORT GROUP BY | | 1000M| 63G| | 4392K (9)| 14:38:28 | | |
|* 3 | HASH JOIN | | 1000M| 63G| 905M| 4112K (2)| 13:42:28 | | |
| 4 | TABLE ACCESS FULL| PS | 100M| 2479M| | 2726 (100)| 00:00:33 | 1 | 4 |
| 5 | TABLE ACCESS FULL| PN | 1000M| 39G| | 27047 (100)| 00:05:25 | 1 | 4 |
-----------------------------------------------------------------------------------------------------

NOTE: PS joined with PN in partition-wise join mode also. My question is that why did "SORT GROUP BY" (OP#2) appear above "HASH JOIN" instead of "TABLE ACCESS FULL| PS"

--Query 3
explain plan for select pn.u_name, ps.u_sales from pn, (select * from ps union all select * from ps2) ps where pn.u_id = ps.u_id;
@?/rdbms/admin/utlxplp
------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time | Pstart| Pstop |
------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 32G| 1817G| | 3623K (3)| 12:04:40 | | |
|* 1 | HASH JOIN | | 32G| 1817G| 3623M| 3623K (3)| 12:04:40 | | |
| 2 | VIEW | | 100M| 2479M| | 2726 (100)| 00:00:33 | | |
| 3 | UNION-ALL PARTITION| | | | | | | | |
| 4 | PARTITION HASH ALL| | 100M| 2479M| | 2726 (100)| 00:00:33 | 1 | 4 |
| 5 | TABLE ACCESS FULL| PS | 100M| 2479M| | 2726 (100)| 00:00:33 | 1 | 4 |
| 6 | PARTITION HASH ALL| | 2 | 52 | | 4 (0)| 00:00:01 | 1 | 4 |
| 7 | TABLE ACCESS FULL| PS2 | 2 | 52 | | 4 (0)| 00:00:01 | 1 | 4 |
| 8 | PARTITION HASH ALL | | 1000M| 32G| | 27047 (100)| 00:05:25 | 1 | 4 |
| 9 | TABLE ACCESS FULL | PN | 1000M| 32G| | 27047 (100)| 00:05:25 | 1 | 4 |
------------------------------------------------------------------------------------------------------

NOTE: PS didn't join with PN in partition-wise join mode. I though that the access path should be:
1, PS union all PS2 in partition level
2, the interim result set from Step1 joins PN in partition-wise join mode.
3, union all result from Step2.

Thanks for help!

Calvin

**davey23uk** · 11-01-2005, 04:38 AM

ok, i gotta ask, why the flowers in the insert statement?

**mike9** · 11-01-2005, 05:36 PM

please use the CODE or PHP tag when you show explain plans

which DB version are you using? I guess 10g or 10gR2.

2) use the no_merge hint in your subquery and compare the costs

3) it may be faster to do the 2 joins separatly because it would then do 2 partition wise joins:
select pn.u_name, ps.u_sales from pn, ps where pn.u_id = ps.u_id
union all
select pn.u_name, ps2.u_sales from pn, ps2 where pn.u_id = ps2.u_id

**tamilselvan** · 11-02-2005, 05:20 PM

1. Even though the table is parttioned, it does not have actual data. You just manipulated statistics by updating number of rows. What about other statistics? So the optimizer comes with worst plan as you said.

2. AUTOTRACE in sqlplus will not always tell the true access path.
You need to run the script and see tkprof output. Even tkprof output may omit sql statements if they are used at the column level.

Tamil

**mike9** · 11-02-2005, 06:44 PM

1. A partition wise join combined with a partition wise group by doesn't mean that it is the worst execution plan

2. Even if AUTOTRACE and/or EXPLAIN PLAN FOR doesn't always show the correct execution plan it doesnt it in most cases and with this kind of large queries I would say that it even does it more often.

**tamilselvan** · 11-02-2005, 09:01 PM

Code:

I simulated your query 2. 

SQL> explain plan for select pn.u_name, ps.sales from pn,
     (select u_id, sum(u_sales) sales from ps group by u_id) ps 
     where pn.u_id = ps.u_id;

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------
| Id  | Operation             |  Name       | Rows  | Bytes |TempSpc| Cost  | Pstart| Pstop |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |             |  1000P|  40856|       |    10M|       |       |
|   1 |  MERGE JOIN           |             |  1000P|  40856|       |    10M|       |       |
|   2 |   VIEW                |             |  1000M|    24G|       |  2557K|       |       |
|   3 |    SORT GROUP BY      |             |  1000M|    24G|       |  2557K|       |       |
|   4 |     PARTITION HASH ALL|             |       |       |       |       |     1 |     4 |
|   5 |      TABLE ACCESS FULL| PS          |  1000M|    24G|       |     4 |     1 |     4 |
|*  6 |   SORT JOIN           |             |  1000M|    18G|    52G|  8255K|       |       |
|   7 |    PARTITION HASH ALL |             |       |       |       |       |     1 |     4 |
|   8 |     TABLE ACCESS FULL | PN          |  1000M|    18G|       |     4 |     1 |     4 |
---------------------------------------------------------------------------------------------

Let us see the tkprof output:

select pn.u_name, ps.sales
from  pn, 
       (select u_id, sum(u_sales) sales 
          from ps group by u_id) ps 
where pn.u_id  = ps.u_id


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          5          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      0.00       0.00          0         24          0           2
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      0.00       0.00          0         29          0           2

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 31

Rows     Row Source Operation
-------  ---------------------------------------------------
      2  MERGE JOIN
      2   VIEW
      2    SORT GROUP BY
      2     PARTITION HASH ALL PARTITION: 1 4
      2      TABLE ACCESS FULL PS PARTITION: 1 4
      2   SORT JOIN
      2    PARTITION HASH ALL PARTITION: 1 4
      2     TABLE ACCESS FULL PN PARTITION: 1 4

********************************************************************************

In general, Oracle will materialize the in-line view if it sees rownum , or any aggregate function in it as it happened in my test.
But you got a different plan in which the tables are joined , then the sum is calculated.

That's why I asked you to run tkprof.

Tamil

**pando** · 11-03-2005, 02:45 AM

Many times, autotrace desperates me.

**mike9** · 11-03-2005, 05:10 AM

A important information which is missing here is which Oracle version is used. An interessting new feature introduced in 10gR2 is the use of the hash algorithm for the group by.
http://www.oracle.com/technology/pro...10gr2_0705.pdf (page 9, Hash-Based Aggregation)

Thread: Partition-Wise Join in different cases

Thread Tools

Display

Partition-Wise Join in different cases

Posting Permissions