Second-order Information Promotes Mini-Batch Robustness in Variance-Reduced Gradients

Garg, Sachin; Berahas, Albert S.; Dereziński, Michał

Full-text links:

Download:

Current browse context:

math.OC

< prev | next >

new | recent | 2404

Mathematics > Optimization and Control

Title: Second-order Information Promotes Mini-Batch Robustness in Variance-Reduced Gradients

Authors: Sachin Garg, Albert S. Berahas, Michał Dereziński

(Submitted on 23 Apr 2024)

Abstract: We show that, for finite-sum minimization problems, incorporating partial second-order information of the objective function can dramatically improve the robustness to mini-batch size of variance-reduced stochastic gradient methods, making them more scalable while retaining their benefits over traditional Newton-type approaches. We demonstrate this phenomenon on a prototypical stochastic second-order algorithm, called Mini-Batch Stochastic Variance-Reduced Newton ($\texttt{Mb-SVRN}$), which combines variance-reduced gradient estimates with access to an approximate Hessian oracle. In particular, we show that when the data size $n$ is sufficiently large, i.e., $n\gg \alpha^2\kappa$, where $\kappa$ is the condition number and $\alpha$ is the Hessian approximation factor, then $\texttt{Mb-SVRN}$ achieves a fast linear convergence rate that is independent of the gradient mini-batch size $b$, as long $b$ is in the range between $1$ and $b_{\max}=O(n/(\alpha \log n))$. Only after increasing the mini-batch size past this critical point $b_{\max}$, the method begins to transition into a standard Newton-type algorithm which is much more sensitive to the Hessian approximation quality. We demonstrate this phenomenon empirically on benchmark optimization tasks showing that, after tuning the step size, the convergence rate of $\texttt{Mb-SVRN}$ remains fast for a wide range of mini-batch sizes, and the dependence of the phase transition point $b_{\max}$ on the Hessian approximation factor $\alpha$ aligns with our theoretical predictions.

Subjects:	Optimization and Control (math.OC); Machine Learning (cs.LG); Machine Learning (stat.ML)
MSC classes:	65K05, 90C06, 90C30
Cite as:	arXiv:2404.14758 [math.OC]
	(or arXiv:2404.14758v1 [math.OC] for this version)

Submission history

From: Sachin Garg [view email]
[v1] Tue, 23 Apr 2024 05:45:52 GMT (2842kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> math > arXiv:2404.14758

Download:

Current browse context:

Change to browse by:

References & Citations

Bookmark

Mathematics > Optimization and Control

Title: Second-order Information Promotes Mini-Batch Robustness in Variance-Reduced Gradients

Submission history