最近有看到clustering factor,记得之前有看过。现在居然忘了。 还是记录下来。
clustering factor实际是索引列的值在表数据块中的分布情况。相同索引值对应的行分布越紧凑clustering factor值越低,相反相同索引值对应的行分布越分散clustering factor的值越高。
实际上这个因子会影响到TABLE ACCESS BY INDEX ROWID IO的数量,因子值越小,说明只要少量的IO就可以读取出相应的行(因为因子值越小表示他们可能分布在同一个数据块中),因子值越大,说明需要较多的IO才能读取出相应的行(因为因子值越大表示他们可能散布在不同的数据块中)。
Oracle does I/O by blocks. Therefore, the optimizer's decision to use full table scans is influenced by the percentage of blocks accessed, not rows. This is called the index clustering factor. If blocks contain single rows, then rows accessed and blocks accessed are the same.
However, most tables have multiple rows in each block. Consequently, the desired number of rows could be clustered together in a few blocks, or they could be spread out over a larger number of blocks.
Although the clustering factor is a property of the index, the clustering factor actually relates to the spread of similar indexed column values within data blocks in the table. A lower clustering factor indicates that the individual rows are concentrated within fewer blocks in the table. Conversely, a high clustering factor indicates that the individual rows are scattered more randomly across blocks in the table. Therefore, a high clustering factor means that it costs more to use a range scan to fetch rows by rowid, because more blocks in the table need to be visited to return the data. Example 13-3 shows how the clustering factor can affect cost.
<!-- class="example" -->-
There is a table with 9 rows.
-
There is a non-unique index on
col1
for table. -
The
c1
column currently stores the valuesA
,B
, andC
. -
The table only has three Oracle blocks.
Case 1: The index clustering factor is low for the rows as they are arranged in the following diagram.
Block 1 Block 2 Block 3 ------- ------- -------- A A A B B B C C C
This is because the rows that have the same indexed column values for c1
are located within the same physical blocks in the table. The cost of using a range scan to return all of the rows that have the value A
is low, because only one block in the table needs to be read.
Case 2: If the same rows in the table are rearranged so that the index values are scattered across the table blocks (rather than collocated), then the index clustering factor is higher.
Block 1 Block 2 Block 3 ------- ------- -------- A B C A B C A B C
This is because all three blocks in the table must be read in order to retrieve all rows with the value A
in col1
.
相关推荐
The clustering_factor of an index is a primary indicator of how desirable the index appears to be to the optimizer. The drawback is that you need to compare the clustering_factor to the number of rows...
聚类算法评价指标:RandIndex,matlab代码
MYDBSCAN:基于密度的聚类DBSCAN(Density-Based Spatial Clustering of Applications with Noise)算法的底层实现 MYAP:基于划分的聚类AP(Affinity Propagation Clustering Algorithm )算法的底层实现--近邻传播...
Oracle 10g RAC Grid,Services & Clustering
包含QT聚类的matlab程序,以及qt聚类原理说明。
多篇关于层次聚类的论文,打包下载,具有一定的学习价值
聚类算法,密度聚类,高效聚类,无监督聚类,快速聚类
层次聚类算法描述
轨迹聚类算法,先划分,后聚类,聚类算法是改进的DB-Scan;linux 编译,里面有Makefile
oracle 10g rac grid,services & clustering
谱聚类spectral clustering,构图和切图,拉普拉斯矩阵
Clustering(聚类算法英文版)
Clustering by fast search and find of density peaks[J]. Science, 2014, 344(6191): 1492-1496.基于这篇文章实现的最基本的密度聚类的算法,具体请看我博客中的相关文章...
聚类分析是研究多要素事物分类问题的数量方法。基本原理是根据样本自身的属性,用数学方法按照某种相似性或差异性指标,定量地确定样本之间的亲疏关系,并按这种亲疏关系程度对样本进行聚类。 常见的聚类分析方法有...
对文本进行聚类
Oracle RAC的全称为Oracle Real Application clusters,在此前称之为OPS,是一个计算环境,通过它,用户可以充分利用多个使用集群技术互联的计算机的处理能力.Oracle Real Application Clusters通过简单轻松地使用单一...
可以快速将dmp文件从9i转换成10g当然这个仅是改变了版本表示可以倒入;但是入果在从高版本转换成底版本时候;有用到新版本特有的类型或约束时候就会出问题
常用的聚类算法指标计算Rand index, Adjusted Rand index, Silhouette, Calinski-Harabasz, Davies-Bouldin,里面有说明,matlab程序
层次聚类算法C++ VS2010 调试运行成功
最近在Science上的一篇基于密度的聚类算法《Clustering by fast search and find of density peaks》引起了大家的关注(在我的博文“论文中的机器学习算法——基于密度峰值的聚类算法”中也进行了中文的描述)。...