-
Hi peeps
I am trying to tune this sql and came across this problem
Code:
SELECT
DW_NEG_OWN.HC_LLAMADA_TARIFICADA_BSC.DE_MSISDN_A,
DW_NEG_OWN.HC_PERSONA_EMPRESA_BSC.ID_DOCUMENTO_IDENTIFICACION,
DW_NEG_OWN.HC_LLAMADA_TARIFICADA_BSC.DE_MSISDN_B,
to_date(lpad(to_char(DW_NEG_OWN.HC_LLAMADA_TARIFICADA_BSC.ID_FECHA),8,'1'),'YYYYMMDD'),
DW_NEG_OWN.HC_SERVICIO_CONTRATADO_BSC.FE_ACTIVACION,
sum(DW_NEG_OWN.HC_LLAMADA_TARIFICADA_BSC.HC_NUMERO_LLAMADAS_TAR),
sum(decode(DW_NEG_OWN.HC_LLAMADA_TARIFICADA_BSC.ID_SUBTIPO_SERVICIO,4,0,5,0,DW_NEG_OWN.HC_LLAMADA_TARIFICADA_BSC.HC_DURACION_TAR)),
DW_NEG_OWN.HC_PERSONA_EMPRESA_BSC.DE_NOMBRE,
DW_NEG_OWN.HC_PERSONA_EMPRESA_BSC.DE_APELLIDO,
TO_DATE(LPAD(TO_CHAR(DW_NEG_OWN.HC_PERSONA_EMPRESA_BSC.FE_NACIMIENTO),8,'1'),'YYYYMMDD'),
DW_NEG_OWN.HC_PERSONA_EMPRESA_BSC.DE_TELEFONO_01,
DW_NEG_OWN.HC_PERSONA_EMPRESA_BSC.DE_TELEFONO_02,
DW_NEG_OWN.DM_PERSONA_EMPRESA_BSC.ID_COD_POSTAL_CLE,
DW_NEG_OWN.HC_PERSONA_EMPRESA_BSC.DE_VIA
FROM
DW_NEG_OWN.HC_LLAMADA_TARIFICADA_BSC,
DW_NEG_OWN.DM_CONTRATO_GRP,
DW_NEG_OWN.DM_DESTINO_CONCRETO_SER_BSC,
DW_NEG_OWN.HC_PERSONA_EMPRESA_BSC,
DW_NEG_OWN.HC_SERVICIO_CONTRATADO_BSC,
DW_NEG_OWN.DM_PERSONA_EMPRESA_BSC
WHERE DW_NEG_OWN.HC_LLAMADA_TARIFICADA_BSC.ID_CONTRATO_V = DW_NEG_OWN.DM_CONTRATO_GRP.ID_CONTRATO_V
AND DW_NEG_OWN.HC_LLAMADA_TARIFICADA_BSC.ID_DESTINO_CONCRETO_SER=DW_NEG_OWN.DM_DESTINO_CONCRETO_SER_BSC.ID_DESTINO_CONCRETO_SER
AND DW_NEG_OWN.DM_CONTRATO_GRP.ID_PERSONA_EMPRESA_V = DW_NEG_OWN.DM_PERSONA_EMPRESA_BSC.ID_PERSONA_EMPRESA_V
AND DW_NEG_OWN.DM_CONTRATO_GRP.ID_PERSONA_EMPRESA_V = DW_NEG_OWN.HC_PERSONA_EMPRESA_BSC.ID_PERSONA_EMPRESA_V
AND DW_NEG_OWN.HC_SERVICIO_CONTRATADO_BSC.ID_CONTRATO_V=DW_NEG_OWN.DM_CONTRATO_GRP.ID_CONTRATO_V
AND DW_NEG_OWN.HC_SERVICIO_CONTRATADO_BSC.ID_SERVICIO = 1038
AND DW_NEG_OWN.HC_LLAMADA_TARIFICADA_BSC.DE_MSISDN_A NOT BETWEEN '656160000' AND '656169999'
AND DW_NEG_OWN.HC_LLAMADA_TARIFICADA_BSC.DE_MSISDN_A NOT BETWEEN '656300000' AND '656300099'
AND DW_NEG_OWN.HC_PERSONA_EMPRESA_BSC.ID_DOCUMENTO_IDENTIFICACION != '11111111H'
AND DW_NEG_OWN.DM_DESTINO_CONCRETO_SER_BSC.ID_DESTINO_SERVICIO = 481
AND DW_NEG_OWN.HC_SERVICIO_CONTRATADO_BSC.FE_ACTIVACION >= 20010621
GROUP BY
DW_NEG_OWN.HC_LLAMADA_TARIFICADA_BSC.DE_MSISDN_A,
DW_NEG_OWN.HC_PERSONA_EMPRESA_BSC.ID_DOCUMENTO_IDENTIFICACION,
DW_NEG_OWN.HC_LLAMADA_TARIFICADA_BSC.DE_MSISDN_B,
to_date(lpad(to_char(DW_NEG_OWN.HC_LLAMADA_TARIFICADA_BSC.ID_FECHA),8,'1'),'YYYYMMDD'),
DW_NEG_OWN.HC_SERVICIO_CONTRATADO_BSC.FE_ACTIVACION,
DW_NEG_OWN.HC_PERSONA_EMPRESA_BSC.DE_NOMBRE,
DW_NEG_OWN.HC_PERSONA_EMPRESA_BSC.DE_APELLIDO,
TO_DATE(LPAD(TO_CHAR(DW_NEG_OWN.HC_PERSONA_EMPRESA_BSC.FE_NACIMIENTO),8,'1'),'YYYYMMDD'),
DW_NEG_OWN.HC_PERSONA_EMPRESA_BSC.DE_TELEFONO_01,
DW_NEG_OWN.HC_PERSONA_EMPRESA_BSC.DE_TELEFONO_02,
DW_NEG_OWN.DM_PERSONA_EMPRESA_BSC.ID_COD_POSTAL_CLE,
DW_NEG_OWN.HC_PERSONA_EMPRESA_BSC.DE_VIA
it´s simple join of several table but the explain plan is full scan of all tables
Code:
Query Plan
----------------------------------------------------------------------------------------------------
SELECT STATEMENT Cost= 150178
SORT GROUP BY
HASH JOIN
HASH JOIN
HASH JOIN
TABLE ACCESS FULL DM_DESTINO_CONCRETO_SER_BSC
HASH JOIN
HASH JOIN
PARTITION RANGE ALL
TABLE ACCESS FULL HC_SERVICIO_CONTRATADO_BSC
TABLE ACCESS FULL DM_CONTRATO_GRP
PARTITION RANGE ALL
TABLE ACCESS FULL HC_LLAMADA_TARIFICADA_BSC
TABLE ACCESS FULL DM_PERSONA_EMPRESA_BSC
TABLE ACCESS FULL HC_PERSONA_EMPRESA_BSC
I see that it is the NOT BETWEEN ... AND is forcing the FTS of all tables but I cant see how I can avoid it, does anyone got any clue :?
I have thouhgt of using > and < instead of between... and but it is not going to work because the logic would be totally wrong
Table HC_LLAMADA_TARIFICADA_BSC has around 100 million rows and it is the bottle neck since the cost of that FTS is about 80% of total cost
Optimizer is set at ALL_ROWS because it is a DWH
[Edited by pando on 09-04-2001 at 04:24 AM]
-
-
Did you try HINTing the select with "FIRST_ROWS" or "INDEX" or "INDEX_FFS" ? It may help to force the optimizer to choose the index.
-
yup tried all three hints
first_rows gives same cost
index_ffs same cost
index costs 30 times more i.e 3000000 instead of 150000
I guess it has to do the FTS because the number of rows that satisfies the range of not between is high
HC_LLAMADA_TARIFICADA_BSC has 400 million rows not 100 million I was mistaken
-
Pando,
Do not rely on the cost. It does not give you the correct picture. A cost which is 1000 may be better than the cost of 10 in some case, especially with the HINTS.
May be you could try various option with "Auto trace traceonly exp STAT" and look at the statistics, which may give some better idea.
Note. In most of the cases sql with HINTS have more COST (doesn't really mean it will take that much cost) than the one without it.
-
oh yes cost DOES have things, I used it to compare between explain plan of same statements, it is not to used to see how long and how much oracle has to spend to run the query but rather it is used to compare cost relativity between execution plan of same statements
for the same statement I managed to get a cartesian product the cost went up to
13000000000000 from 150000
btw the new toad explain plan is rather good, it tells how much data I have to gather from the database, for example I have to gather 16GB of that 400 million row table to do the FTS
-
SELECT /*+ ORDERED */
HC_LLAMADA_TARIFICADA_BSC.DE_MSISDN_A,
HC_PERSONA_EMPRESA_BSC.ID_DOCUMENTO_IDENTIFICACION,
HC_LLAMADA_TARIFICADA_BSC.DE_MSISDN_B,
to_date(lpad(to_char( HC_LLAMADA_TARIFICADA_BSC.ID_FECHA),8,'1'),'YYYYMMDD'),
HC_SERVICIO_CONTRATADO_BSC.FE_ACTIVACION,
sum( HC_LLAMADA_TARIFICADA_BSC.HC_NUMERO_LLAMADAS_TAR),
sum(decode( HC_LLAMADA_TARIFICADA_BSC.ID_SUBTIPO_SERVICIO,4,0,5,0,
HC_LLAMADA_TARIFICADA_BSC.HC_DURACION_TAR)),
HC_PERSONA_EMPRESA_BSC.DE_NOMBRE,
HC_PERSONA_EMPRESA_BSC.DE_APELLIDO,
TO_DATE(LPAD(TO_CHAR( HC_PERSONA_EMPRESA_BSC.FE_NACIMIENTO),8,'1'),'YYYYMMDD'),
HC_PERSONA_EMPRESA_BSC.DE_TELEFONO_01,
HC_PERSONA_EMPRESA_BSC.DE_TELEFONO_02,
DM_PERSONA_EMPRESA_BSC.ID_COD_POSTAL_CLE,
HC_PERSONA_EMPRESA_BSC.DE_VIA
FROM
HC_SERVICIO_CONTRATADO_BSC ,
DM_DESTINO_CONCRETO_SER_BSC,
HC_PERSONA_EMPRESA_BSC,
DM_CONTRATO_GRP,
DM_PERSONA_EMPRESA_BSC,
HC_LLAMADA_TARIFICADA_BSC,
WHERE HC_SERVICIO_CONTRATADO_BSC.ID_SERVICIO = 1038
AND HC_SERVICIO_CONTRATADO_BSC.FE_ACTIVACION >= 20010621
AND DM_DESTINO_CONCRETO_SER_BSC.ID_DESTINO_SERVICIO = 481
AND HC_PERSONA_EMPRESA_BSC.ID_DOCUMENTO_IDENTIFICACION != '11111111H'
AND HC_SERVICIO_CONTRATADO_BSC.ID_CONTRATO_V= DM_CONTRATO_GRP.ID_CONTRATO_V
AND HC_PERSONA_EMPRESA_BSC.ID_PERSONA_EMPRESA_V = DM_CONTRATO_GRP.ID_PERSONA_EMPRESA_V
AND DM_CONTRATO_GRP.ID_PERSONA_EMPRESA_V = DM_PERSONA_EMPRESA_BSC.ID_PERSONA_EMPRESA_V
AND HC_LLAMADA_TARIFICADA_BSC.DE_MSISDN_A NOT BETWEEN '656160000' AND '656169999'
AND HC_LLAMADA_TARIFICADA_BSC.DE_MSISDN_A NOT BETWEEN '656300000' AND '656300099'
AND DM_CONTRATO_GRP.ID_CONTRATO_V = HC_LLAMADA_TARIFICADA_BSC.ID_CONTRATO_V
AND DM_DESTINO_CONCRETO_SER_BSC.ID_DESTINO_CONCRETO_SER = HC_LLAMADA_TARIFICADA_BSC.ID_DESTINO_CONCRETO_SER
GROUP BY
HC_LLAMADA_TARIFICADA_BSC.DE_MSISDN_A,
HC_PERSONA_EMPRESA_BSC.ID_DOCUMENTO_IDENTIFICACION,
HC_LLAMADA_TARIFICADA_BSC.DE_MSISDN_B,
to_date(lpad(to_char( HC_LLAMADA_TARIFICADA_BSC.ID_FECHA),8,'1'),'YYYYMMDD'),
HC_SERVICIO_CONTRATADO_BSC.FE_ACTIVACION,
HC_PERSONA_EMPRESA_BSC.DE_NOMBRE,
HC_PERSONA_EMPRESA_BSC.DE_APELLIDO,
TO_DATE(LPAD(TO_CHAR( HC_PERSONA_EMPRESA_BSC.FE_NACIMIENTO),8,'1'),'YYYYMMDD'),
HC_PERSONA_EMPRESA_BSC.DE_TELEFONO_01,
HC_PERSONA_EMPRESA_BSC.DE_TELEFONO_02,
DM_PERSONA_EMPRESA_BSC.ID_COD_POSTAL_CLE,
HC_PERSONA_EMPRESA_BSC.DE_VIA
Suggestions:
Use Ordered Hint clause. Do not believe Optimizer plans. For tunig large query the DBA/Developer knows how the data is organized. Basically, what I did is filtering small tables with known values and then joining. This method always produce less number of result set, which in turn needs less memory (temp tablespace) to store intermediate results. I put the largest table at the end of the where clause. Since I do not know how many rows are in other tables, I had a guess.
Apart from using HASH join which is normally fast, you need to distribute the big tables into multiple disks under different controllers. Examine parallel Query options if you have more than one CPU in the box. Examine whether BTree or bit map indexes may improve performance.
Good Luck.
-
the sad thing is that I am not very familiar with the dtaa in this database neither. This is because I am in Process Support deparment with a group of people as first level DWH support line (not the DBA production) so basically I cannot add indexes or distribute I/O which are supposed to be tuned by the production people. The SQL are given to us to tune (we support the DWH and support development people) and that is all! I have to guess 80% of time how's the data distributed
I will try this tomorrow and post the results!
Cheers
-
after testing it it managed to do cartesian product and the costs increased from 150000 to 405000000
I wonder why this is doing cartesian product!
Researching now
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|