-
I read ...
Histograms are limited to 250 buckets so if there are more than 250 distinct values and there is no single value that dominates the column's dataset then histograms may not provide helpful statistics
Code:
SQL> create table my_table (col1)
2 as
3 select rownum from dba_objects;
Table created.
SQL> insert into my_table
2 select rownum + (select max(rownum) from my_table)
3 from dba_objects
4 /
26597 rows created.
SQL> /
26597 rows created.
SQL> /
26597 rows created.
SQL> /
26597 rows created.
SQL> commit;
Commit complete.
SQL> select count(*) from my_table;
COUNT(*)
----------
132985
SQL> analyze table my_table compute statistics for table;
Table analyzed.
SQL> select * from my_table
2
SQL> set autotrace traceonly explain
SQL> /
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=32 Card=132985 Bytes
=1063880)
1 0 TABLE ACCESS (FULL) OF 'MY_TABLE' (Cost=32 Card=132985 Byt
es=1063880)
SQL> select * from my_table where col1 < 132985/2
2 /
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=32 Card=6649 Bytes=5
3192)
1 0 TABLE ACCESS (FULL) OF 'MY_TABLE' (Cost=32 Card=6649 Bytes
=53192)
SQL> analyze table my_table compute statistics for all columns size 254
2 /
Table analyzed.
SQL> select * from my_table where col1 < 132985/2
2 /
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=32 Card=66437 Bytes=
265748)
1 0 TABLE ACCESS (FULL) OF 'MY_TABLE' (Cost=32 Card=66437 Byte
s=265748)
SQL>
There you go.
132,985 distinct values in a histogram -- completely unskewed data even -- and it helps Oracle know how many rows my query will retrieve. That's what I'm interested in for alison's query, in particular the "s.time_id <= '31-DEC-98';" clause.
Maybe I'll change my sig -- don't just read the documentation, think about it and test it as well.
-
Originally posted by slimdave
I read ...
There you go.
132,985 distinct values in a histogram -- completely unskewed data even -- and it helps Oracle know how many rows my query will retrieve. That's what I'm interested in for alison's query, in particular the "s.time_id <= '31-DEC-98';" clause.
Maybe I'll change my sig -- don't just read the documentation, think about it and test it as well.
Ohh my .. what r u trying to prove here??? -- that oracle use index when it does not have any ? .. or it use index when it has for heck sake ?
I agree with u, one gotta test test what oracle says and interpret accordingly..but KNOW what u TEST..and WHY u TEST..
Well lets see the test case here...(With actual production data in)
Code:
WW04_PRO_DBA> select count(*) from shipment_2002;
COUNT(*)
--------------------
4383441
WW04_PRO_DBA> select count(distinct part_number) from shipment_2002;
COUNT(DISTINCTPART_NUMBER)
--------------------------
6788
WW04_PRO_DBA> analyze table shipment_2002 delete statistics;
Table analyzed.
WW04_PRO_DBA> analyze table shipment_2002 estimate statistics sample 40 percent for table;
Table analyzed.
WW04_PRO_DBA> select * from shipment_2002 where part_number='Junk';
no rows selected
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=382 Card=487 Bytes=4
3830)
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'SHIPMENT_2002' (Cost=382
Card=487 Bytes=43830)
2 1 INDEX (RANGE SCAN) OF 'PK_SHIPMENT_2002' (UNIQUE) (Cost=
6 Card=487)
WW04_PRO_DBA> analyze table shipment_2002 estimate statistics sample 40 percent for columns part_number size 75;
Table analyzed.
WW04_PRO_DBA> select * from shipment_2002 where part_number='Junk';
no rows selected
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=1268 Card=8241 Bytes
=741690)
1 0 TABLE ACCESS (FULL) OF 'SHIPMENT_2002' (Cost=1268 Card=824
1 Bytes=741690)
Mind therez an index with leading column Part_Number..and you can see when thers no data for the value JUNK it goes for FTS huh???
And Jurij :
The chances of your skewed data to record will be less when CARD of the col grows..i didnt say it will not at all record..but since chances are very very less why go for it..(PS oracle clearly states that the changes of popular/unpopular values getting recorded is a compromise as the CARD of the col increases)...
More over with binds its really not that useful as well..
Rgds
Abhay.
funky...
"I Dont Want To Follow A Path, I would Rather Go Where There Is No Path And Leave A Trail."
"Ego is the worst thing many have, try to overcome it & you will be the best, if not good, person on this earth"
-
Originally posted by abhaysk
Ohh my .. what r u trying to prove here??? -- that oracle use index when it does not have any ? .. or it use index when it has for heck sake ?
I agree with u, one gotta test test what oracle says and interpret accordingly..but KNOW what u TEST..and WHY u TEST..
I'm saying that your contention that "...thumb rule will be to take Col stats only when its Card < 254..." is a big bunch of BS, especially when the documents you reference are read and understood.
In alison's query the histograms would be useful in telling Oracle what % of values in the sales table match the "s.time_id <= '31-DEC-98'" criteria, and (to a lesser extent) how many rows in the cust table are matched by "cust_last_name = 'Smith'". This can make a huge difference to the execution plan.
I'm just trying to help alison with the problem at hand, and i think that your rule of thumb was unhelpful to this particular situation.
-
FYI
without histograms:
cardinality = num_rows * selectivity
selectivity = 1 / NUM_DISTINCT
with histograms:
you get exact cardinality which equals number of rows to be scanned in a join or satisfying a condition
Last edited by pando; 02-23-2004 at 07:01 PM.
-
Originally posted by slimdave
I'm saying that your contention that "...thumb rule will be to take Col stats only when its Card < 254..." is a big bunch of BS, especially when the documents you reference are read and understood.
Can u plz explain me how u have understood..
funky...
"I Dont Want To Follow A Path, I would Rather Go Where There Is No Path And Leave A Trail."
"Ego is the worst thing many have, try to overcome it & you will be the best, if not good, person on this earth"
-
I understand it to say that if there are more than 250 distinct values and there is no single value that dominates the column's dataset then histograms may not provide helpful statistics.
However i also inferred that it was referring to predicates like "my_col = 'A'", and that it might be useful still on predicates like "my_date > '01-jan-2003'" or "my_date between '01-jan-2002' and '01-feb-2002' -1". So I tested it, and found that the statistics were helpful to the optimizer even with over 100,000 rows on a column with unique, evenly distributed values.
-
Dave / Jurij :
Well i tested again and yes Histograms are useful no matter of CARD..
The mistake that i did previously was, i didnt notice the optimizer_index_cost_adj was way too high - 100..& after setting it to optimal value between 30 and 40..plan seems to show up pretty good..
My Bad.
PS : When you asserted "bunch of BS" .. Ok might be ( however i personnaly feel if the discussion was more
professional or rather taken in right spirit, be good )
Rgds
Abhay.
Last edited by abhaysk; 02-26-2004 at 04:51 AM.
funky...
"I Dont Want To Follow A Path, I would Rather Go Where There Is No Path And Leave A Trail."
"Ego is the worst thing many have, try to overcome it & you will be the best, if not good, person on this earth"
-
Originally posted by abhaysk
PS : When you asserted "bunch of BS" .. Ok might be ( however i personnaly feel if the discussion was more
professional or rather taken in right spirit, be good )
Yeah, I was all riled up at having someone shout at me and roll their eyes. I'm very sensitive
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|