Low-Bandwidth Matrix Multiplication: Faster Algorithms and More General Forms of Sparsity

Gupta, Chetan; Korhonen, Janne H.; Studený, Jan; Suomela, Jukka; Vahidi, Hossein

Full-text links:

Download:

Current browse context:

cs.DC

< prev | next >

new | recent | 2404

Change to browse by:

Computer Science > Distributed, Parallel, and Cluster Computing

Title: Low-Bandwidth Matrix Multiplication: Faster Algorithms and More General Forms of Sparsity

Authors: Chetan Gupta, Janne H. Korhonen, Jan Studený, Jukka Suomela, Hossein Vahidi

(Submitted on 23 Apr 2024)

Abstract: In prior work, Gupta et al. (SPAA 2022) presented a distributed algorithm for multiplying sparse $n \times n$ matrices, using $n$ computers. They assumed that the input matrices are uniformly sparse -- there are at most $d$ non-zeros in each row and column -- and the task is to compute a uniformly sparse part of the product matrix. Initially each computer knows one row of each input matrix, and eventually each computer needs to know one row of the product matrix. In each communication round each computer can send and receive one $O(\log n)$-bit message. Their algorithm solves this task in $O(d^{1.907})$ rounds, while the trivial bound is $O(d^2)$. We improve on the prior work in two dimensions: First, we show that we can solve the same task faster, in only $O(d^{1.832})$ rounds. Second, we explore what happens when matrices are not uniformly sparse. We consider the following alternative notions of sparsity: row-sparse matrices (at most $d$ non-zeros per row), column-sparse matrices, matrices with bounded degeneracy (we can recursively delete a row or column with at most $d$ non-zeros), average-sparse matrices (at most $dn$ non-zeros in total), and general matrices. We show that we can still compute $X = AB$ in $O(d^{1.832})$ rounds even if one of the three matrices ($A$, $B$, or $X$) is average-sparse instead of uniformly sparse. We present algorithms that handle a much broader range of sparsity in $O(d^2 + \log n)$ rounds, and present conditional hardness results that put limits on further improvements and generalizations.

Subjects:	Distributed, Parallel, and Cluster Computing (cs.DC)
ACM classes:	F.2.1; F.2.2
Cite as:	arXiv:2404.15559 [cs.DC]
	(or arXiv:2404.15559v1 [cs.DC] for this version)

Submission history

From: Hossein Vahidi [view email]
[v1] Tue, 23 Apr 2024 23:15:05 GMT (250kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2404.15559

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Distributed, Parallel, and Cluster Computing

Title: Low-Bandwidth Matrix Multiplication: Faster Algorithms and More General Forms of Sparsity

Submission history