A Comparative Analysis of Embedding Techniques and Clustering Algorithms on Benchmark Datasets
- 한국공공가치학회
- Journal of Public Value
- Vol. 9
-
2025.0669 - 84 (16 pages)
-
DOI : 10.53581/jojopv.2025.9.1.69
- 0

Purpose: To systematically evaluate and compare the effectiveness of various embedding techniques when combined with different clustering algorithms across diverse benchmark datasets, providing practical guidance for method selection based on dataset characteristics. Method: We conducted comprehensive experiments using 12 embedding techniques (including UMAP, t-SNE, PCA, Isomap, and others) com-bined with 12 clustering algorithms (including K-Means, Gaussian Mixture Models, GenieClust, and others) across multiple dataset collections from ClustBench. Performance was evaluated using Normalized Clustering Accuracy (NCA) and Adjusted Rand Index (AR) as primary metrics. Results: UMAP emerged as the top-performing embedding technique across all evaluation metrics, followed closely by t-SNE. GenieClust demonstrated superior performance among clustering algorithms, with Gaussian Mixture Models ranking second. The combination of Base embedding with GenieClust achieved the highest average performance, while computationally expensive embedding techniques generally outperformed simpler methods at the cost of scalability. Conclusion: No single embedding-clustering combination dominates universally across all datasets. The study reveals important tradeoffs be-tween computational complexity and clustering performance, with UMAP and GenieClust showing consistently strong results. Method selection should be based on dataset characteristics, computational constraints, and performance requirements.
1. Introduction
2. Related Work
3. Methodology
4. Experimental Setup
5. Results
6. Limitations
7. Conclusion
8. References
(0)
(0)