We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.LG

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Machine Learning

Title: PETScML: Second-order solvers for training regression problems in Scientific Machine Learning

Abstract: In recent years, we have witnessed the emergence of scientific machine learning as a data-driven tool for the analysis, by means of deep-learning techniques, of data produced by computational science and engineering applications. At the core of these methods is the supervised training algorithm to learn the neural network realization, a highly non-convex optimization problem that is usually solved using stochastic gradient methods. However, distinct from deep-learning practice, scientific machine-learning training problems feature a much larger volume of smooth data and better characterizations of the empirical risk functions, which make them suited for conventional solvers for unconstrained optimization. We introduce a lightweight software framework built on top of the Portable and Extensible Toolkit for Scientific computation to bridge the gap between deep-learning software and conventional solvers for unconstrained minimization. We empirically demonstrate the superior efficacy of a trust region method based on the Gauss-Newton approximation of the Hessian in improving the generalization errors arising from regression tasks when learning surrogate models for a wide range of scientific machine-learning techniques and test cases. All the conventional second-order solvers tested, including L-BFGS and inexact Newton with line-search, compare favorably, either in terms of cost or accuracy, with the adaptive first-order methods used to validate the surrogate models.
Subjects: Machine Learning (cs.LG); Mathematical Software (cs.MS); Optimization and Control (math.OC)
MSC classes: 65K10, 68T07, 65M70, 65Y05
ACM classes: I.2.5; D.2.m; G.4; G.1.6; J.2
Cite as: arXiv:2403.12188 [cs.LG]
  (or arXiv:2403.12188v1 [cs.LG] for this version)

Submission history

From: Stefano Zampini [view email]
[v1] Mon, 18 Mar 2024 18:59:42 GMT (835kb,D)

Link back to: arXiv, form interface, contact.