-
Incorrect and inconsistent result while grouping very large volume of data
Hello - here I am facing an interesting problem. The two aggregate queries below on the same table, written in slightly different way, gives me different results! More surprising is that, the second query gives different different aggregate result everytime I run it, and each time the result is different from what it actually should be - the value of first query. I have tested exactly similar kind of queries on smaller data set, say couple of millions, no problem at all. But the problem starts if the table has large number of records, which is true in my case - the f_rrn_gross table has 260M records, and it is partitioned by one of the date key fields (surrogate key). All the group by fields are actually surrogate key in the fact table, and they are all not null.
I have tried different test cases to find out the root cause of the problem, but no clue till today. Consulted multiple DBAs, even opended a TAR in metalink, no solution or problem reason could be offered. Finally decided to post into this forum, wondering if someone could help me out.
I would really appreciate if you can throw some light why this never-seen-before, unexplainable (to me) problem is happening.
Thanks so much.
-Titash.
-------------------------------------------
SELECT SUM(RR_USD_AMT)
FROM F_RRN_GROSS
42548085415.4457
SELECT SUM (tot_rr_usd_amt)
FROM (SELECT /*+ parallel(f,10) */
SUM (rr_usd_amt) tot_rr_usd_amt
FROM f_rrn_gross f
GROUP BY bkng_date_key,
bkng_loc_key,
cur_key,
PROMO_KEY,
prop_key,
rate_plan_key);
42545111219.4799
-
An expression generally assumes the datatype of its components.
So, what is the datatype of rr_usd_amt?
Is it possible that you have VERY large, compensating, values in the GROUP BY sub-query? Could these produce rounding errors?
"The power of instruction is seldom of much efficacy except in those happy dispositions where it is almost superfluous" - Gibbon, quoted by R.P.Feynman
-
Thanks for the response.
The datatype of rr_usd_amt is NUMBER (20,6).
I concure with that 6 decimal point precision could have created rouding error, but I have had the same discrepancies/problem even when I rounded off the rr_usd_amt field to 0 decimal point, and tehn summed up, like the one below.
I would really appreciate if you could throw some more light on this.
Thanks once again.
-Titash.
-----------------------------------------------
SELECT SUM(round(RR_USD_AMT))
FROM F_RRN_GROSS
SELECT SUM (tot_rr_usd_amt)
FROM (SELECT /*+ parallel(f,10) */
SUM (round(rr_usd_amt)) tot_rr_usd_amt
FROM f_rrn_gross f
GROUP BY bkng_date_key,
bkng_loc_key,
cur_key,
PROMO_KEY,
prop_key,
rate_plan_key);
-
I suspect I'm barking up the wrong tree on this. Before declaring it to be a bug, I'd look at:
SELECT MAX(tot_rr_usd_amt), MIN(tot_rr_usd_amt)
FROM (SELECT /*+ parallel(f,10) */
SUM (rr_usd_amt) tot_rr_usd_amt
FROM f_rrn_gross f
GROUP BY bkng_date_key,
bkng_loc_key,
cur_key,
PROMO_KEY,
prop_key,
rate_plan_key);
"The power of instruction is seldom of much efficacy except in those happy dispositions where it is almost superfluous" - Gibbon, quoted by R.P.Feynman
-
Thanks for the quick response, I really appreciate this.
Here is the result of the query:
-----------------------------------------------------------
MAX(TOT_RR_USD_AMT) MIN(TOT_RR_USD_AMT)
376001148.24 -77255
-----------------------------------------------------------
Thanks a lot for your help.
-Titash.
-
Any chance you have a query rewrite on a stale MV ?
How does it look if you just count(*), rather than summing $ values
SELECT count(*)
FROM F_RRN_GROSS;
SELECT SUM (cnt)
FROM (SELECT /*+ parallel(f,10) */
count(*) cnt
FROM f_rrn_gross f
GROUP BY bkng_date_key,
bkng_loc_key,
cur_key,
PROMO_KEY,
prop_key,
rate_plan_key);
-
Code:
MAX(TOT_RR_USD_AMT) MIN(TOT_RR_USD_AMT)
376001148.24 -77255
I think that kills my rounding error theory.
"The power of instruction is seldom of much efficacy except in those happy dispositions where it is almost superfluous" - Gibbon, quoted by R.P.Feynman
-
1. No, there is no MV with query rewrite.
2. Both the queries give same final count, no issue here:
Thanks so much for your help.
-----------------------------------------------------
SELECT count(*)
FROM F_RRN_GROSS;
297115024
SELECT SUM (cnt)
FROM (SELECT /*+ parallel(f,10) */
count(*) cnt
FROM f_rrn_gross f
GROUP BY bkng_date_key,
bkng_loc_key,
cur_key,
PROMO_KEY,
prop_key,
rate_plan_key);
297115024
-
Hi DaPi,
Do you think this can be a Oracle 10.2.0.1.0 (this is version of oracle I am using) bug? If you think so, by any chance, is there any patch or work-arround which you are aware of?
Thanks so much for your help, I really appreciate this.
-Titash.
-
Originally Posted by titash
Hi DaPi,
Do you think this can be a Oracle 10.2.0.1.0 (this is version of oracle I am using) bug? If you think so, by any chance, is there any patch or work-arround which you are aware of?
Thanks so much for your help, I really appreciate this.
-Titash.
Did you try w/o parallel threads?
Tamil
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|