Compute uses all rows in the table/index/cluster to determine the statistics.
Estimate uses only a portion of the rows (you specify how many rows or what percent of rows). If you are using, say , 50 % of the rows, I don't know if Oracle uses every other row, or the first or last half, though... does anyone know which portion of the rows Oracle uses?
I'm not actually sure which records it reads. However, what I *have* noticed:
- The stats *will not* change if neither the number of rows nor the data changed.
This is important. This means that the algorithm they use is deterministic and based on the number of rows in the table together with the % to estimate.
- If you add a single row, which may be less than 1/1000th of 1%, the stats can change by up to 5%. This is affected significantly by how 'skewed' your data is.
- If you have any 'extreme' data values, do *not* use estimation! For example, say 99% of the data in a given, indexed date field are within the past 5 years. *However*, this one rogue entry exists with a year of 1897. With Estimation, this value may or may not be found each time. As you can imagine, such a value will significantly change the high and low values in your stats, and therefore the applicability of the index in ranged selects. Therefore, with estimation, you never know what kind of plans you will get.
Of course, the solution is to fix the data, but this is sometimes not possible. Another solution is to create a histogram, but that only works if your date values are 'hard-coded' in the queries. The final choice is to always COMPUTE that table and optimize to a consistent set of 'bad' statistics, possibly hinting the index where necessary.
- So, basically, estimation is best for tables with relatively even distribution and no 'extreme' values. If you *do* use estimation, Oracle suggests using at least 30%. Further, any value of 60% or greater is the same as using 100%.