`
yangsj19870829
  • 浏览: 40860 次
  • 性别: Icon_minigender_1
  • 来自: 深圳
社区版块
存档分类
最新评论

Oracle index clustering factor

阅读更多

    最近有看到clustering factor,记得之前有看过。现在居然忘了。   还是记录下来。

clustering factor实际是索引列的值在表数据块中的分布情况。相同索引值对应的行分布越紧凑clustering factor值越低,相反相同索引值对应的行分布越分散clustering factor的值越高。

 

    实际上这个因子会影响到TABLE ACCESS BY INDEX ROWID    IO的数量,因子值越小,说明只要少量的IO就可以读取出相应的行(因为因子值越小表示他们可能分布在同一个数据块中),因子值越大,说明需要较多的IO才能读取出相应的行(因为因子值越大表示他们可能散布在不同的数据块中)。    

 

 

Oracle does I/O by blocks. Therefore, the optimizer's decision to use full table scans is influenced by the percentage of blocks accessed, not rows. This is called the index clustering factor. If blocks contain single rows, then rows accessed and blocks accessed are the same.

However, most tables have multiple rows in each block. Consequently, the desired number of rows could be clustered together in a few blocks, or they could be spread out over a larger number of blocks.

Although the clustering factor is a property of the index, the clustering factor actually relates to the spread of similar indexed column values within data blocks in the table. A lower clustering factor indicates that the individual rows are concentrated within fewer blocks in the table. Conversely, a high clustering factor indicates that the individual rows are scattered more randomly across blocks in the table. Therefore, a high clustering factor means that it costs more to use a range scan to fetch rows by rowid, because more blocks in the table need to be visited to return the data. Example 13-3 shows how the clustering factor can affect cost.

Example 13-3 Effects of Clustering Factor on Cost

Assume the following situation:

<!-- class="example" -->
  • There is a table with 9 rows.

  • There is a non-unique index on col1 for table.

  • The c1 column currently stores the values A, B, and C.

  • The table only has three Oracle blocks.

Case 1: The index clustering factor is low for the rows as they are arranged in the following diagram.

                 Block 1       Block 2        Block 3 
                 -------       -------        -------- 
                 A  A  A       B  B  B        C  C  C 

This is because the rows that have the same indexed column values for c1 are located within the same physical blocks in the table. The cost of using a range scan to return all of the rows that have the value A is low, because only one block in the table needs to be read.

Case 2: If the same rows in the table are rearranged so that the index values are scattered across the table blocks (rather than collocated), then the index clustering factor is higher.

                 Block 1       Block 2        Block 3 
                 -------       -------        --------
                 A  B  C       A  B  C        A  B  C

This is because all three blocks in the table must be read in order to retrieve all rows with the value A in col1.

 

 

 

分享到:
评论

相关推荐

Global site tag (gtag.js) - Google Analytics