-
My prodn. databases have got tables ranging from a count of 30 million rows to 70 million rows, we have a analyze compute statistics command which is run as a cron job, which takes about 5hrs. to complete, i am looking to reduce this time by doing a analyze estimate statistics with a particular percentage, now here is my problem, what would be the best
percentage of rows to be used for estimating statistics on these very large tables?
reply soon
soren
-
Use sample % on which optimizer assumes the rest of the data statistics and that helps reducing the time.
say 30-35 % is fairly good sample to compute statistics. I don't know what kinda sample you are currently using.
[Edited by sreddy on 02-01-2001 at 02:25 PM]
-
Just remember, if you specify a sample over 50%, oracle will read the whole table anyway...
-
This optimal sample size depends on the distribution of your data in the tables. If the data is highly regular and evenly distributed throughout a table, like say a customer data table might be, then a small sample would be perfectly fine and would obviously consume fewer resources. If your data tends to be lumped in chunks, or if row size varies a great deal, then a much larger sample would be needed.
-
I thought when you sample 30% Oracle will do 100% anyway :-?
-
Nope pando, Jeff z Right. If its 50% or more then Oracle will do 100%. Otherwise whatever % that you specify.
-
thankx guys, that was pretty fast, i am now clear on a lot of issues, will check out the data distribution also, and thnks for the info. that oracle will go for a full sample in case the percentage exceeds 50%, will keep this in mind,gave me extra factors to look at, now will do some more study of my own and hopefully reduce the downtime for the analyze
thanks again,
soren
-
If a table has 70 M rows, and you know what data the table has, and you know the SQL statement executed against the table, then it would be better to add HINT clause (overriding CBO) in the SQL so that your query will run fast.
For large tables, CBO execution path will not be the right choice. For example, if the table has 3 bit map indexes on three columns, and all the 3 columns are used in the WHERE clause, CBO execution path will not include those indexes unless you use HINT clause (INDEX_COMBINE) in the SQL statement.
Do not think STATISTICS on table will always help to the query fast. CBO uses certain predefined algorithm to determine efficient path. That path need not be always efficient. Manual intervention might be needed in many cases.
-
Another point : normally is VERY RARE to someone have 70M of REAL hot data, check CAREFULLY for partitions (or better yet, a cleaning-routine what moves old data to another tab, or out of the db).
Of course, YMMV and IMHO, IG.
[]s
Chiappa