EVENT DETAILS
How much does prior knowledge about the number of clusters, $k$, influence the statistical feasibility and algorithmic performance of clustering methods? In this thesis, we explore two complementary clustering paradigms to elucidate the central role played by cluster cardinality. We first investigate Gaussian Mixture Models, a widely-used framework for clustering high-dimensional data. Here, knowledge of $k$ proves critical: we demonstrate a fundamental statistical barrier wherein mixtures of spherical Gaussians with unknown number of components become indistinguishable from a single Gaussian distribution, unless their pairwise mean separation is on the order of $\min(\sqrt{\log k}, \sqrt{d})$. Without prior knowledge or stronger assumptions, even the detection of multiple clusters is impossible, highlighting that knowing the correct number of components is an inherent necessity in Gaussian Mixture Models. In contrast, we examine Correlation Clustering, which is explicitly formulated without reference to the number of clusters. Objects are clustered based solely on possibly inconsistent pairwise similarity or dissimilarity labels. We provide improved approximation algorithms for the local-error objective in both complete and incomplete information settings. Furthermore, we introduce a more general model which we call the Correlation Clustering with Asymmetric Classification Errors, and present novel approximation guarantees tailored to this richer scenario.
Together, these results reveal a fundamental dichotomy: the cluster cardinality is either a crucial piece of structural information that defines feasibility, as in Gaussian Mixture Models, or an intentionally omitted parameter whose absence motivates alternative local objective functions, as in Correlation Clustering. Leveraging this dichotomy is thus essential for effectively choosing and employing clustering methods in practice.
TIME Monday August 4, 2025 at 1:30 PM - 4:00 PM
LOCATION 3501, Mudd Hall ( formerly Seeley G. Mudd Library) map it
ADD TO CALENDAR&group= echo $value['group_name']; ?>&location= echo htmlentities($value['location']); ?>&pipurl= echo $value['ppurl']; ?>" class="button_outlook_export">
CONTACT Wynante R Charles wynante.charles@northwestern.edu
CALENDAR Department of Computer Science (CS)