Introduction

The index clustering factor is a key statistic that can improve both the Oracle optimizer's performance, and the technician's understanding of the utility of an index.

It is used by Oracle's optimizer to help determine the cost associated with index range scans in comparison to full table scans.

Calculating the Clustering Factor

To calculate the clustering factor of an index during the gathering of index statistics, Oracle does the following.

For each entry in the index Oracle compares the entry's table rowid block with the block of the previous index entry.

If the block is different, Oracle increments the clustering factor by 1.

The minimum possible clustering factor is equal to the number of blocks identified through the index's list of rowid's -- for an index on a column or columns conatining no nulls, this will be equal to the number of blocks in the table that contain data. The maximum clustering factor is the number of entries in the index.

Interpreting the Clustering Factor

So this means that Oracle now has a statistic to allow it to estimate how many table blocks would be associated with an index range scan.

If the clustering factor is close to the number of entries in the index, then an index range scan of 1000 index entries may require nearly 1000 blocks to be read from the table.

If the clustering factor is close to the number of blocks in the table, then an index range scan of 1000 index entries may require only 50 blocks to be read from the table.

This can be compared with the cost of reading the entire table up to the high-water mark (using the more efficient multiblock i/o mechanism) to determine whether a full table scan or an index range scan offers the most efficient access mechanism.

Note that where extensive deletes have occurred from the table there may be blocks with no rows in. These will be accounted for in the clustering factor because those blocks will not appear in the index's rowid list. The full table scan will still read all table blocks up to the high water mark, regardless of whether they contain rows or not. So in an extreme case it is possible that Oracle could see from the index and table statistics that although a table has 1,000,000 blocks below the high water mark, reading 100% of the rows in the table might only require reading 10 of those blocks. Providing that "Not Null" constraints tell Oracle that all table rows are present in the index, a query such as "select * from big_table_with_few_rows" might be more efficiently satisfied with an index range scan than with a full table scan.

Note also that calculating the clustering factor is done without reference to the table at all -- it is based solely on information contained in the index.