DSD$^2$: Can We Dodge Sparse Double Descent and Compress the Neural Network Worry-Free?

Quétu, Victor; Tartaglione, Enzo

Full-text links:

Download:

Current browse context:

cs.LG

< prev | next >

new | recent | 2303

Change to browse by:

Computer Science > Machine Learning

Title: DSD$^2$: Can We Dodge Sparse Double Descent and Compress the Neural Network Worry-Free?

Authors: Victor Quétu, Enzo Tartaglione

(Submitted on 2 Mar 2023 (v1), last revised 8 Feb 2024 (this version, v3))

Abstract: Neoteric works have shown that modern deep learning models can exhibit a sparse double descent phenomenon. Indeed, as the sparsity of the model increases, the test performance first worsens since the model is overfitting the training data; then, the overfitting reduces, leading to an improvement in performance, and finally, the model begins to forget critical information, resulting in underfitting. Such a behavior prevents using traditional early stop criteria. In this work, we have three key contributions. First, we propose a learning framework that avoids such a phenomenon and improves generalization. Second, we introduce an entropy measure providing more insights into the insurgence of this phenomenon and enabling the use of traditional stop criteria. Third, we provide a comprehensive quantitative analysis of contingent factors such as re-initialization methods, model width and depth, and dataset noise. The contributions are supported by empirical evidence in typical setups. Our code is available at this https URL

Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2303.01213 [cs.LG]
	(or arXiv:2303.01213v3 [cs.LG] for this version)

Submission history

From: Victor Quétu [view email]
[v1] Thu, 2 Mar 2023 12:54:12 GMT (16461kb,D)
[v2] Sun, 17 Dec 2023 10:04:11 GMT (2113kb,D)
[v3] Thu, 8 Feb 2024 08:26:47 GMT (2113kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2303.01213

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Machine Learning

Title: DSD$^2$: Can We Dodge Sparse Double Descent and Compress the Neural Network Worry-Free?

Submission history