Uncovering Hidden Structures in Materials Data: A Study of Two Clustering Algorithms with Dimensionality Reduction
Yan Mei
FAU, WW8
6. Mai 2025, 17:00
WW8, Room 2.018-2, Dr.-Mack-Str. 77, Fürth
This study applies two clustering methods to high-dimensional materials datasets from the NOMAD and Matminer. Principal component analysis (PCA) and t-SNE are used for dimensionality reduction and visualization, enabling direct comparisons of cluster assignments. Outlier detection and Jaccard index for clusters(including outlier overlap) are employed to evaluate differences in how the algorithms group and label data. In addition, space group, atomic density, and bulk modulus descriptors are introduced to examine possible connections between material properties and cluster structures. The results indicate that while both algorithms can reveal structural patterns, they define clusters in different ways, suggesting the importance of algorithm choice and feature selection in materials data analysis.