Data Structures and Algorithms

New submissions

Submissions received from Wed 8 May 24 to Thu 9 May 24, announced Fri, 10 May 24

New submissions
Cross-lists
Replacements

[ total of 11 entries: 1-11 ]
[ showing up to 2000 entries per page: fewer | more ]

New submissions for Fri, 10 May 24

[1] arXiv:2405.05343 [pdf, other]: Title: Distributed Least Squares in Small Space via Sketching and Bias Reduction

Authors: Sachin Garg, Kevin Tan, Michał Dereziński

Subjects: Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG); Numerical Analysis (math.NA)

Matrix sketching is a powerful tool for reducing the size of large data matrices. Yet there are fundamental limitations to this size reduction when we want to recover an accurate estimator for a task such as least square regression. We show that these limitations can be circumvented in the distributed setting by designing sketching methods that minimize the bias of the estimator, rather than its error. In particular, we give a sparse sketching method running in optimal space and current matrix multiplication time, which recovers a nearly-unbiased least squares estimator using two passes over the data. This leads to new communication-efficient distributed averaging algorithms for least squares and related tasks, which directly improve on several prior approaches. Our key novelty is a new bias analysis for sketched least squares, giving a sharp characterization of its dependence on the sketch sparsity. The techniques include new higher-moment restricted Bai-Silverstein inequalities, which are of independent interest to the non-asymptotic analysis of deterministic equivalents for random matrices that arise from sketching.
[2] arXiv:2405.05373 [pdf, other]: Title: Certifying Euclidean Sections and Finding Planted Sparse Vectors Beyond the $\sqrt{n}$ Dimension Threshold

Authors: Venkatesan Guruswami, Jun-Ting Hsieh, Prasad Raghavendra

Comments: 32 pages, 2 Figures

Subjects: Data Structures and Algorithms (cs.DS); Computational Complexity (cs.CC); Metric Geometry (math.MG)

We consider the task of certifying that a random $d$-dimensional subspace $X$ in $\mathbb{R}^n$ is well-spread - every vector $x \in X$ satisfies $c\sqrt{n} \|x\|_2 \leq \|x\|_1 \leq \sqrt{n}\|x\|_2$. In a seminal work, Barak et. al. showed a polynomial-time certification algorithm when $d \leq O(\sqrt{n})$. On the other hand, when $d \gg \sqrt{n}$, the certification task is information-theoretically possible but there is evidence that it is computationally hard [MW21,Cd22], a phenomenon known as the information-computation gap.
In this paper, we give subexponential-time certification algorithms in the $d \gg \sqrt{n}$ regime. Our algorithm runs in time $\exp(\widetilde{O}(n^{\varepsilon}))$ when $d \leq \widetilde{O}(n^{(1+\varepsilon)/2})$, establishing a smooth trade-off between runtime and the dimension.
Our techniques naturally extend to the related planted problem, where the task is to recover a sparse vector planted in a random subspace. Our algorithm achieves the same runtime and dimension trade-off for this task.
[3] arXiv:2405.05535 [pdf, ps, other]: Title: Reconfiguration of Multisets with Applications to Bin Packing

Authors: Jeffrey Kam, Shahin Kamali, Avery Miller, Naomi Nishimura

Comments: A preliminary version of this paper appeared in the proceedings of the 18th International Conference and Workshops on Algorithms and Computation (WALCOM 2024)

Subjects: Data Structures and Algorithms (cs.DS); Discrete Mathematics (cs.DM)

We use the reconfiguration framework to analyze problems that involve the rearrangement of items among groups. In various applications, a group of items could correspond to the files or jobs assigned to a particular machine, and the goal of rearrangement could be improving efficiency or increasing locality.
To cover problems arising in a wide range of application areas, we define the general Repacking problem as the rearrangement of multisets of multisets. We present hardness results for the general case and algorithms for various classes of instances that arise in real-life scenarios. By limiting the total size of items in each multiset, our results can be viewed as an offline approach to Bin Packing, in which each bin is represented as a multiset.
In addition to providing the first results on reconfiguration of multisets, our contributions open up several research avenues: the interplay between reconfiguration and online algorithms and parallel algorithms; the use of the tools of linear programming in reconfiguration; and, in the longer term, a focus on resources in reconfiguration.
[4] arXiv:2405.05818 [pdf, ps, other]: Title: Fine-grained Analysis and Faster Algorithms for Iteratively Solving Linear Systems

Authors: Michał Dereziński, Daniel LeJeune, Deanna Needell, Elizaveta Rebrova

Comments: 32 pages

Subjects: Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG); Numerical Analysis (math.NA); Optimization and Control (math.OC)

While effective in practice, iterative methods for solving large systems of linear equations can be significantly affected by problem-dependent condition number quantities. This makes characterizing their time complexity challenging, particularly when we wish to make comparisons between deterministic and stochastic methods, that may or may not rely on preconditioning and/or fast matrix multiplication. In this work, we consider a fine-grained notion of complexity for iterative linear solvers which we call the spectral tail condition number, $\kappa_\ell$, defined as the ratio between the $\ell$th largest and the smallest singular value of the matrix representing the system.
Concretely, we prove the following main algorithmic result: Given an $n\times n$ matrix $A$ and a vector $b$, we can find $\tilde{x}$ such that $\|A\tilde{x}-b\|\leq\epsilon\|b\|$ in time $\tilde{O}(\kappa_\ell\cdot n^2\log 1/\epsilon)$ for any $\ell = O(n^{\frac1{\omega-1}})=O(n^{0.729})$, where $\omega \approx 2.372$ is the current fast matrix multiplication exponent. This guarantee is achieved by Sketch-and-Project with Nesterov's acceleration. Some of the implications of our result, and of the use of $\kappa_\ell$, include direct improvement over a fine-grained analysis of the Conjugate Gradient method, suggesting a stronger separation between deterministic and stochastic iterative solvers; and relating the complexity of iterative solvers to the ongoing algorithmic advances in fast matrix multiplication, since the bound on $\ell$ improves with $\omega$.
Our main technical contributions are new sharp characterizations for the first and second moments of the random projection matrix that commonly arises in sketching algorithms, building on a combination of techniques from combinatorial sampling via determinantal point processes and Gaussian universality results from random matrix theory.
[5] arXiv:2405.05865 [pdf, ps, other]: Title: Faster Linear Systems and Matrix Norm Approximation via Multi-level Sketched Preconditioning

Authors: Michał Dereziński, Christopher Musco, Jiaming Yang

Subjects: Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG); Numerical Analysis (math.NA); Optimization and Control (math.OC)

We present a new class of preconditioned iterative methods for solving linear systems of the form $Ax = b$. Our methods are based on constructing a low-rank Nystr\"om approximation to $A$ using sparse random sketching. This approximation is used to construct a preconditioner, which itself is inverted quickly using additional levels of random sketching and preconditioning. We prove that the convergence of our methods depends on a natural average condition number of $A$, which improves as the rank of the Nystr\"om approximation increases. Concretely, this allows us to obtain faster runtimes for a number of fundamental linear algebraic problems:
1. We show how to solve any $n\times n$ linear system that is well-conditioned except for $k$ outlying large singular values in $\tilde{O}(n^{2.065} + k^\omega)$ time, improving on a recent result of [Derezi\'nski, Yang, STOC 2024] for all $k \gtrsim n^{0.78}$.
2. We give the first $\tilde{O}(n^2 + {d_\lambda}^{\omega}$) time algorithm for solving a regularized linear system $(A + \lambda I)x = b$, where $A$ is positive semidefinite with effective dimension $d_\lambda$. This problem arises in applications like Gaussian process regression.
3. We give faster algorithms for approximating Schatten $p$-norms and other matrix norms. For example, for the Schatten 1 (nuclear) norm, we give an algorithm that runs in $\tilde{O}(n^{2.11})$ time, improving on an $\tilde{O}(n^{2.18})$ method of [Musco et al., ITCS 2018].
Interestingly, previous state-of-the-art algorithms for most of the problems above relied on stochastic iterative methods, like stochastic coordinate and gradient descent. Our work takes a completely different approach, instead leveraging tools from matrix sketching.
[6] arXiv:2405.05942 [pdf, other]: Title: Improved Evolutionary Algorithms for Submodular Maximization with Cost Constraints

Authors: Yanhui Zhu, Samik Basu, A Pavan

Comments: IJCAI 2024

Subjects: Data Structures and Algorithms (cs.DS)

We present an evolutionary algorithm evo-SMC for the problem of Submodular Maximization under Cost constraints (SMC). Our algorithm achieves $1/2$-approximation with a high probability $1-1/n$ within $\mathcal{O}(n^2K_{\beta})$ iterations, where $K_{\beta}$ denotes the maximum size of a feasible solution set with cost constraint $\beta$. To the best of our knowledge, this is the best approximation guarantee offered by evolutionary algorithms for this problem. We further refine evo-SMC, and develop {\sc st-evo-SMC}. This stochastic version yields a significantly faster algorithm while maintaining the approximation ratio of $1/2$, with probability $1-\epsilon$. The required number of iterations reduces to $\mathcal{O}(nK_{\beta}\log{(1/\epsilon)}/p)$, where the user defined parameters $p \in (0,1]$ represents the stochasticity probability, and $\epsilon \in (0,1]$ denotes the error threshold. Finally, the empirical evaluations carried out through extensive experimentation substantiate the efficiency and effectiveness of our proposed algorithms. Our algorithms consistently outperform existing methods, producing higher-quality solutions.
[7] arXiv:2405.05952 [pdf, other]: Title: New Algorithms and Lower Bounds for Streaming Tournaments

Authors: Prantar Ghosh, Sahil Kuchlous

Subjects: Data Structures and Algorithms (cs.DS)

We study fundamental directed graph (digraph) problems in the streaming model. An initial investigation by Chakrabarti, Ghosh, McGregor, and Vorotnikova [SODA'20] on streaming digraphs showed that while most of these problems are provably hard in general, some of them become tractable when restricted to the well-studied class of tournament graphs where every pair of nodes shares exactly one directed edge. Thus, we focus on tournaments and improve the state of the art for multiple problems in terms of both upper and lower bounds.
Our primary upper bound is a deterministic single-pass semi-streaming algorithm (using $\tilde{O}(n)$ space for $n$-node graphs, where $\tilde{O}(.)$ hides polylog$(n)$ factors) for decomposing a tournament into strongly connected components (SCC). it improves upon the previously best-known algorithm by Baweja, Jia, and Woodruff [ITCS'22] in terms of both space and passes: for $p\geq 1$, they used $(p+1)$-passes and $\tilde{O}(n^{1+1/p})$-space. We further extend our algorithm to digraphs that are close to tournaments and establish tight bounds demonstrating that the problem's complexity grows smoothly with the "distance" from tournaments. Applying our framework, we obtain improved tournament algorithms for $s,t$-reachability, strong connectivity, Hamiltonian paths and cycles, and feedback arc set.
On the other hand, we prove the first $\Omega(n^2)$-space lower bounds for this class, exhibiting that some well-studied problems -- such as (exact) feedback arc set on tournaments (FAST) and $s,t$-distance -- remain hard here. We obtain a generalized lower bound on space-approximation tradeoffs for FAST: any single-pass $(1\pm \varepsilon)$-approximation algorithm requires $\Omega(n/\sqrt{\varepsilon})$ space. As a whole, our collection of results contributes significantly to the growing literature on streaming digraphs.

Cross-lists for Fri, 10 May 24

[8] arXiv:2405.05544 (cross-list from cs.DM) [pdf, ps, other]: Title: Partially Ordered Sets Corresponding to the Partition Problem

Authors: Susumu Kubo

Comments: 16 pages

Subjects: Discrete Mathematics (cs.DM); Data Structures and Algorithms (cs.DS)

The partition problem is a well-known basic NP-complete problem. We mainly consider the optimization version of it in this paper. The problem has been investigated from various perspectives for a long time and can be solved efficiently in practice. Hence, we might say that the only remaining task is to decide whether the problem can be solved in polynomial time in the number $n$ of given integers.
We propose two partially ordered sets (posets) and present a novel methodology for solving the partition problem. The first poset is order-isomorphic to a well-known poset whose structures are related to the solutions of the subset sum problem, while the second is a subposet of the first and plays a crucial role in this paper. We first show several properties of the two posets, such as size, height and width (the largest size of a subset consisting of incomparable elements). Both widths are the same and $O(2^n / n^{3/2})$ for $n$ congruent to $0$ or $3$ (mod $4$). This fact indicates the hardness of the partition problem. We then prove that in general all the candidate solutions correspond to the elements of the second poset, whose size is $2^{n} - 2 \binom{n}{\lfloor n/2 \rfloor}$. Since a partition corresponds to two elements of the poset, the number of the candidate partitions is half of it, that is, $2^{n-1} - \binom{n}{\lfloor n/2 \rfloor}$. We finally prove that the candidate solutions can be reduced based on the partial order. In particular, we give several polynomially solvable cases by considering the minimal elements of the second poset.
Our approach offers a valuable tool for structural analysis of the partition problem and provides a new perspective on the P versus NP problem.

Replacements for Fri, 10 May 24

[9] arXiv:2310.12889 (replaced) [pdf, other]: Title: An Enumerative Perspective on Connectivity

Authors: Shyan Akmal

Comments: Fixed a mistake in the proof of Lemma 12 from the previous version, and corrected some typos

Subjects: Data Structures and Algorithms (cs.DS); Discrete Mathematics (cs.DM)
[10] arXiv:2211.06352 (replaced) [pdf, other]: Title: Spectral Triadic Decompositions of Real-World Networks

Authors: Sabyasachi Basu, Suman Kalyan Bera, C. Seshadhri

Subjects: Social and Information Networks (cs.SI); Discrete Mathematics (cs.DM); Data Structures and Algorithms (cs.DS)
[11] arXiv:2310.04609 (replaced) [pdf, ps, other]: Title: Kawasaki dynamics beyond the uniqueness threshold

Authors: Roland Bauerschmidt, Thierry Bodineau, Benoit Dagallier

Comments: Improved presentation

Subjects: Probability (math.PR); Data Structures and Algorithms (cs.DS); Mathematical Physics (math-ph)

New submissions
Cross-lists
Replacements

[ total of 11 entries: 1-11 ]
[ showing up to 2000 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, recent, 2405, contact, help (Access key information)

> cs > cs.DS

Data Structures and Algorithms

New submissions

New submissions for Fri, 10 May 24

Cross-lists for Fri, 10 May 24

Replacements for Fri, 10 May 24