优化初始类中心的自适应 K-medoids算法
摘要:
针对传统的K-medoids聚类算法在聚类时需要随机选择初始类中心且指定聚类数目K,及聚类结果不稳定的问题,提出了一种优化初始类中心的自适应 K-medoids算法(adaptive K-medoids algorithm for optimizing initial class centers,CH_KD)。其思想是定义了特征重要度,以此筛选出每一簇中最优的代表特征,组成特征子集,并重点研究了传统划分算法的自适应优化与改进。首先,利用特征标准差定义特征区分度,选择出区分度强的特征.其次,利用皮尔逊相关系数度量特征簇中每个特征的冗余度,选择出冗余度低的特征。最后,将特征区分度与特征冗余度之积作为特征重要度,以此筛选出每一簇中最优的代表特征,组成特征子集。实验将所提算法与其他聚类算法在14个UCI数据集上进行对比。结果验证了CH_KD算法的有效性与优势。
To solve the problem that the traditional K-medoids clustering algorithm needs to randomly select the initial cluster center and specify the number of clusters K, and the clustering results are unstable, this paper proposes an adaptiveK-medoids algorithm to optimize the initial cluster center(CH-KD). The purpose is to define the feature importance, so as to screen out the best representative features in each cluster and form a feature subset, and focus on the adaptive optimization and improvement of the traditional partition algorithm. First, the feature discrimination is defined by the feature standard deviation,and the features with strong discrimination are selected.Secondly, Pearson correlation coefficient is used to measure the redundancy of each feature in the feature cluster, and the features with low redundancy are selected.Finally, the product of feature discrimination and feature redundancy is taken as the feature importance to screen out the best representative features in each cluster and form a feature subset, The experiment compares the proposed algorithm with other clustering algorithms on 14 UCl datasets, and the results verify that CH-KD the effectiveness and advantages of algorithm.
作者:
刘金金
Liu Jinjin
机构地区:
Betway官方客服软件学院
引用本文:
刘金金。优化初始类中心的自适应K-medoids算法[J].Betway官方客服学报(自然科学版),2025,53(1):106-115.(Liu Jinjin. Adaptive K-medoids algorithm for optimizing initial class center[J].Journal of Henan Normal University (Natural Science Edition),2025,53(1):106-115.DOI:10.16366/j.cnki.1000-2367.2023.08.22.0001.)
基金:
国家自然科学基金
关键词:
无监督;特征区分度;特征冗余度;CH函数;特征选择
unsupervised; feature differentiation; feature redundancy; CH function; feature selection
分类号:
TP391