-
Hi all, I want to copy data from 1 table to another, after some processing. The source table has 4 million reocrds and after the processing, it gets reduced to 1 million and I want to put them in the destination table. I tried using BULK COLLECT - FORALL and also COPY command, they are all slow and I hence believe that the bottleneck is in the filtering clause. Here is the query and table structure:
select util.entry, port.eqp_id, port.port_id,
util.router, util.if_name, util.inbitssec, util.outbitssec
from bam_util util, tg_eqp_slot_port port, tg_eqp eqp
where ltrim(rtrim(upper(util.router))) = ltrim(rtrim(upper(eqp.router_name))) and
eqp.eqp_id = port.eqp_id and
ltrim(rtrim(upper(util.if_name))) = ltrim(rtrim(upper(port.interfacename)));
BAM_UTIL :
Name Null? Type
----------------------------------------- -------- ----------------------------
ENTRY NOT NULL DATE
ROUTER NOT NULL VARCHAR2(30)
IF_NAME NOT NULL VARCHAR2(30)
INBITSSEC NUMBER(38)
OUTBITSSEC NUMBER(38)
INOCTETS NUMBER
OUTOCTETS NUMBER
This is the source table, having 4 million entries. I have 2 indexes on this table, one on the following columns
ENTRY ASC,
ROUTER ASC,
IF_NAME ASC,
INBITSSEC ASC,
OUTBITSSEC ASC
and another primary constraint on
ENTRY , ROUTER and IF_NAME .
TG_EQP_SLOT_PORT has around 20,000 records with one unique index on port_id.
TG_EQP has 800 records with one unique index on eqp_id.
Also, the tables are in NOLOGGING option and I have disabled all the constraints/indexes on the destination table.
Do I have to use any of those PARALLEL / special big rollback segment features ?
What else do I need to add to make this query faster. Is filtering the real culprit here or am I overlooking something ? Any insight is greatly appreciated !!
I am using Oracle 8.1.7 on Solaris.
Shiva.
[Edited by Shiva on 05-09-2002 at 11:43 AM]
-
Try using CBO and send the Explain plan with CBO.
-
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=3714 Card=8675042 Bytes=520502520)
1 0 HASH JOIN (Cost=3714 Card=8675042 Bytes=520502520)
2 1 HASH JOIN (Cost=32 Card=20673 Bytes=558171)
3 2 TABLE ACCESS (FULL) OF 'TG_EQP' (Cost=2 Card=774 Bytes=12384)
4 2 TABLE ACCESS (FULL) OF 'TG_EQP_SLOT_PORT' (Cost=29 Card=20673 Bytes=227403)
5 1 TABLE ACCESS (FULL) OF 'BAM_UTIL' (Cost=3621 Card=4196315 Bytes=138478395)
All the 3 tables have statistics computed.
Why is the index on BAM_UTIL not being used in the table scan ?
Thanks,
Shiva.
[Edited by Shiva on 05-09-2002 at 04:45 PM]
-
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=3714 Card=8675042 Bytes=520502520)
1 0 HASH JOIN (Cost=3714 Card=8675042 Bytes=520502520)
2 1 HASH JOIN (Cost=32 Card=20673 Bytes=558171)
3 2 TABLE ACCESS [B](FULL)[\B] OF 'TG_EQP' (Cost=2 Card=774 Bytes=12384)
4 2 TABLE ACCESS [B](FULL)[\B]OF 'TG_EQP_SLOT_PORT' (Cost=29 Card=20673 Bytes=227403)
5 1 TABLE ACCESS [B](FULL)[\B] OF 'BAM_UTIL' (Cost=3621 Card=4196315 Bytes=138478395)
Shiva,
Cost is very high. Try using indexes by removing ltrim, rtrim, upper and then see the cost.
Thanks,
-
Sam, I did an Index rebuild and also tried your suggestion (without the string functions)
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=3693 Card=10780 Bytes=646800)
1 0 HASH JOIN (Cost=3693 Card=10780 Bytes=646800)
2 1 TABLE ACCESS (FULL) OF 'TG_EQP' (Cost=2 Card=774 Bytes=12384)
3 1 HASH JOIN (Cost=3686 Card=8106669 Bytes=356693436)
4 3 TABLE ACCESS (FULL) OF 'TG_EQP_SLOT_PORT' (Cost=29 Card=20673 Bytes=227403)
5 3 TABLE ACCESS (FULL) OF 'BAM_UTIL' (Cost=3621 Card=4196315 Bytes=138478395)
The COST is the same, but CARD & BYTES have gone down.
I would really like to know why the index on BAM_UTIL is not being used..
Shiva.
[Edited by Shiva on 05-09-2002 at 05:10 PM]
-
Shiva,
Analyze all tables, indexes involved.
select
util.entry,
port.eqp_id,
port.port_id,
util.router,
util.if_name,
util.inbitssec,
util.outbitssec
from
bam_util util,
tg_eqp_slot_port port,
tg_eqp eqp
where
util.router = eqp.router_name
and util.if_name = port.interfacename;
and eqp.eqp_id = port.eqp_id
Check whether columns used in the where clause (util.router, util.if_name) are part of index columns and are they in the begining part?
Thanks,
-
Create these index on table.
UTIL
router
if_names
EQP
router_name
eqp_id
PORT
eqp_id
interfacename
Create one index for each of the above columns and do not use any function in where clause. Indexes will not be used if any function is used on the indexed column, in where clause.
If functions are required then try to use function based indexes.
Sanjay
-
Sanjay, I definitely need those functions. So I have created 2 function based indexes :
create index f_bam_util_router on
bam_util trim(upper(router));
create index f_bam_util_ifname on
bam_util trim(upper(if_name));
and used the query :
select util.entry, port.eqp_id, port.port_id,
trim(upper(util.router)), trim(upper(util.if_name)),
util.inbitssec, util.outbitssec
from bam_util util, tg_eqp_slot_port port,
tg_eqp eqp
where trim(upper(util.router)) = eqp.router_name and
eqp.eqp_id = port.eqp_id and
trim(upper(util.if_name)) = port.interfacename;
and here is the cost :
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=3702 Card=88812 Bytes=5328720)
1 0 HASH JOIN (Cost=3702 Card=88812 Bytes=5328720)
2 1 TABLE ACCESS (FULL) OF 'TG_EQP_SLOT_PORT' (Cost=29 Card=20673 Bytes=227403)
3 1 HASH JOIN (Cost=3625 Card=4196315 Bytes=205619435)
4 3 TABLE ACCESS (FULL) OF 'TG_EQP' (Cost=2 Card=774 Bytes=12384)
5 3 TABLE ACCESS (FULL) OF 'BAM_UTIL' (Cost=3621 Card=4196315 Bytes=138478395)
It is still higher than the cost without the functions. Let me try running the updates seperately and then run the query without the functions !! What else am I missing here ?
Shiva.
-
Can u post to forum:
select
TABLE_NAME ,
NUM_ROWS ,
BLOCKS ,
EMPTY_BLOCKS
from all_tables where table_name in('BAM_UTIL', 'TG_EQP' );
select
INDEX_NAME ,
BLEVEL ,
LEAF_BLOCKS ,
DISTINCT_KEYS,
AVG_LEAF_BLOCKS_PER_KEY ,
AVG_DATA_BLOCKS_PER_KEY ,
CLUSTERING_FACTOR ,
NUM_ROWS
from all_indexes where index_name in ('f_bam_util_router','f_bam_util_ifname');
-
select
TABLE_NAME ,
NUM_ROWS ,
BLOCKS ,
EMPTY_BLOCKS
from all_tables where table_name in('BAM_UTIL', 'TG_EQP','TG_EQP_SLOT_PORT');
TABLE_NAME NUM_ROWS BLOCKS EMPTY_BLOCKS
------------------------------ ---------- ---------- ------------
TG_EQP 774 12 3
TG_EQP_SLOT_PORT 20673 187 68
BAM_UTIL 4196315 23856 719
select
INDEX_NAME ,
BLEVEL ,
LEAF_BLOCKS ,
DISTINCT_KEYS,
AVG_LEAF_BLOCKS_PER_KEY ,
AVG_DATA_BLOCKS_PER_KEY ,
CLUSTERING_FACTOR ,
NUM_ROWS
from all_indexes where lower(index_name) in ('f_bam_util_router','f_bam_util_ifname');
INDEX_NAME BLEVEL LEAF_BLOCKS DISTINCT_KEYS
------------------------------ ---------- ----------- -------------
AVG_LEAF_BLOCKS_PER_KEY AVG_DATA_BLOCKS_PER_KEY CLUSTERING_FACTOR NUM_ROWS
----------------------- ----------------------- ----------------- ----------
F_BAM_UTIL_ROUTER 2 13923 218
63 8498 1852597 4196315
F_BAM_UTIL_IFNAME 2 11585 296
39 10214 3023360 4196315
And all the tables & indexes have been analyzed and statistics computed.
Shiva.
-
TABLE_NAME -- TG_EQP 774 12 3
NUM_ROWS 774
BLOCKS 12
EMPTY_BLOCKS 3
-----------------
TABLE_NAME -- TG_EQP_SLOT_PORT 20673 187 68
NUM_ROWS 20673
BLOCKS 187
EMPTY_BLOCKS 68
-----------------
TABLE_NAME -- BAM_UTIL 4196315 23856 719
NUM_ROWS 4196315
BLOCKS 23856
EMPTY_BLOCKS 719
---------------------------
INDEX_NAME -- F_BAM_UTIL_ROUTER
BLEVEL -- 2
LEAF_BLOCKS -- 13923
DISTINCT_KEYS -- 218
AVG_LEAF_BLOCKS_PER_KEY -- 63
AVG_DATA_BLOCKS_PER_KEY -- 10214
CLUSTERING_FACTOR -- 3023360
NUM_ROWS -- 4196315
---------------------------
INDEX_NAME -- F_BAM_UTIL_IFNAME
BLEVEL -- 2
LEAF_BLOCKS -- 11585
DISTINCT_KEYS -- 296
AVG_LEAF_BLOCKS_PER_KEY -- 39
AVG_DATA_BLOCKS_PER_KEY -- 10214
CLUSTERING_FACTOR -- 3023360
NUM_ROWS -- 4196315
------------------------------------
reason why oracle try to use HASH JOIN in
# of :
DISTINCT_KEYS -- 218 -- F_BAM_UTIL_ROUTER
DISTINCT_KEYS -- 296 -- F_BAM_UTIL_IFNAME
and AVG # of rows in 1 db block = 4196315 / (23856 - 719) ~= 181
probability of situation when each db_block keep row with index key value ~
-- F_BAM_UTIL_ROUTER 181 / 218 ~ 83% >> 3-5% selectivity
-- F_BAM_UTIL_IFNAME 181 / 296 ~ 61% >> ----"----
If we will see to parameter MULTY_BLOCK_READ_COUNT (usually = 8)
then CBO choose FULL SCAN TABLE against INDEX RANGE SCAN + ACCESS BY BOWID
as more less cost method.
If u want use indexes, then u have to:
delete statistics
or
use hints.
-
U may also increase (for session or for instance)
SORT_AREA_SIZE (HASH_AREA_SIZE)
if u going to impruve HASH JOIN operations for this query.
-
Thanks for the great suggestions Sheshtakov.
Deleting the statistics reduced the time to 15 minutes !! I perfectly understand all the extensive explanations you have given, but I just do not get how you decided to delete the statistics ? ( I was under the impression that computing the statistics is always good for huge tables..) But still, the indexes were not used, it is still a FULL TABLE SCAN.
If you could point me to any documents which might tell how to interpret the results of the optimizer and decide on the best data access to follow, it will be great.
Thanks,
Shiva.
-
First question: Did u delete statistics for tables and for indexes too.
When CBO try to build execution plan it use both statistics.
About : Why i had been recomended delete statistics:
How CBO was thinking -->
It read statistics about tables and ALL indexes for tables.
then it calculate read block operations and its costs.
In ur case :
if it deside to use indexes then # of "read block" operation ~=
INDEX RANGE SCAN + ACCESS BY BOWID
AVG_LEAF_BLOCKS_PER_KEY -- 39 +
AVG_DATA_BLOCKS_PER_KEY -- 10214 = 10253
and THIS TYPE OF OPERATIONS oracle must execute BLOCK BY BLOCK !!!
if CDO choose FULL SCAN TABLE then # of read operation ~=
(23856 - 719) / MULTI_BLOCK_READ_COUNT (for example 8)
(23856 - 719) / 8 ~= 2890 (<< then 10253)
this is reason why CBO has been choosen FULL SCAN TABLE against
INDEX RANGE SCAN + ACCESS BY BOWID
when we deleted statistics CBO doesn't know nothing about index keys selectivity.
and it try to build execution plan using RULES.
It was my gess when i recomended delete statistics. don't give any
information about tables and indexes to CBO..