Sharp Asymptotics of Self-training with Linear Classifier

Takahashi, Takashi

Full-text links:

Download:

Current browse context:

stat.ML

< prev | next >

new | recent | 2205

Statistics > Machine Learning

Title: Sharp Asymptotics of Self-training with Linear Classifier

Authors: Takashi Takahashi

(Submitted on 16 May 2022 (this version), latest version 7 May 2024 (v3))

Abstract: Self-training (ST) is a straightforward and standard approach in semi-supervised learning, successfully applied to many machine learning problems. The performance of ST strongly depends on the supervised learning method used in the refinement step and the nature of the given data; hence, a general performance guarantee from a concise theory may become loose in a concrete setup. However, the theoretical methods that sharply predict how the performance of ST depends on various details for each learning scenario are limited. This study develops a novel theoretical framework for sharply characterizing the generalization abilities of the models trained by ST using the non-rigorous replica method of statistical physics. We consider the ST of the linear model that minimizes the ridge-regularized cross-entropy loss when the data are generated from a two-component Gaussian mixture. Consequently, we show that the generalization performance of ST in each iteration is sharply characterized by a small finite number of variables, which satisfy a set of deterministic self-consistent equations. By numerically solving these self-consistent equations, we find that ST's generalization performance approaches to the supervised learning method with a very simple regularization schedule when the label bias is small and a moderately large number of iterations are used.

Comments:	34 pages, 6 figures
Subjects:	Machine Learning (stat.ML); Disordered Systems and Neural Networks (cond-mat.dis-nn); Statistical Mechanics (cond-mat.stat-mech); Machine Learning (cs.LG); Statistics Theory (math.ST)
Cite as:	arXiv:2205.07739 [stat.ML]
	(or arXiv:2205.07739v1 [stat.ML] for this version)

Submission history

From: Takashi Takahashi [view email]
[v1] Mon, 16 May 2022 15:02:44 GMT (428kb,D)
[v2] Thu, 4 Apr 2024 12:14:30 GMT (2858kb,D)
[v3] Tue, 7 May 2024 11:22:49 GMT (2840kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> stat > arXiv:2205.07739v1

Download:

Current browse context:

Change to browse by:

References & Citations

Bookmark

Statistics > Machine Learning

Title: Sharp Asymptotics of Self-training with Linear Classifier

Submission history