We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.DS

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Data Structures and Algorithms

Title: It's Hard to HAC with Average Linkage!

Abstract: Average linkage Hierarchical Agglomerative Clustering (HAC) is an extensively studied and applied method for hierarchical clustering. Recent applications to massive datasets have driven significant interest in near-linear-time and efficient parallel algorithms for average linkage HAC.
We provide hardness results that rule out such algorithms. On the sequential side, we establish a runtime lower bound of $n^{3/2-\epsilon}$ on $n$ node graphs for sequential combinatorial algorithms under standard fine-grained complexity assumptions. This essentially matches the best-known running time for average linkage HAC. On the parallel side, we prove that average linkage HAC likely cannot be parallelized even on simple graphs by showing that it is CC-hard on trees of diameter $4$. On the possibility side, we demonstrate that average linkage HAC can be efficiently parallelized (i.e., it is in NC) on paths and can be solved in near-linear time when the height of the output cluster hierarchy is small.
Comments: To appear at ICALP 2024
Subjects: Data Structures and Algorithms (cs.DS); Computational Complexity (cs.CC); Distributed, Parallel, and Cluster Computing (cs.DC)
Cite as: arXiv:2404.14730 [cs.DS]
  (or arXiv:2404.14730v1 [cs.DS] for this version)

Submission history

From: Kishen N Gowda [view email]
[v1] Tue, 23 Apr 2024 04:14:29 GMT (1318kb,D)

Link back to: arXiv, form interface, contact.