Pearls from Pebbles: Improved Confidence Functions for Auto-labeling

Vishwakarma, Harit; Reid; Chen; Tay, Sui Jiet; Namburi, Satya Sai Srinath; Sala, Frederic; Vinayak, Ramya Korlakai

Full-text links:

Download:

Current browse context:

cs.LG

< prev | next >

new | recent | 2404

Computer Science > Machine Learning

Title: Pearls from Pebbles: Improved Confidence Functions for Auto-labeling

Authors: Harit Vishwakarma, Reid (Yi) Chen, Sui Jiet Tay, Satya Sai Srinath Namburi, Frederic Sala, Ramya Korlakai Vinayak

(Submitted on 24 Apr 2024)

Abstract: Auto-labeling is an important family of techniques that produce labeled training sets with minimum manual labeling. A prominent variant, threshold-based auto-labeling (TBAL), works by finding a threshold on a model's confidence scores above which it can accurately label unlabeled data points. However, many models are known to produce overconfident scores, leading to poor TBAL performance. While a natural idea is to apply off-the-shelf calibration methods to alleviate the overconfidence issue, such methods still fall short. Rather than experimenting with ad-hoc choices of confidence functions, we propose a framework for studying the \emph{optimal} TBAL confidence function. We develop a tractable version of the framework to obtain \texttt{Colander} (Confidence functions for Efficient and Reliable Auto-labeling), a new post-hoc method specifically designed to maximize performance in TBAL systems. We perform an extensive empirical evaluation of our method \texttt{Colander} and compare it against methods designed for calibration. \texttt{Colander} achieves up to 60\% improvements on coverage over the baselines while maintaining auto-labeling error below $5\%$ and using the same amount of labeled data as the baselines.

Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Cite as:	arXiv:2404.16188 [cs.LG]
	(or arXiv:2404.16188v1 [cs.LG] for this version)

Submission history

From: Harit Vishwakarma [view email]
[v1] Wed, 24 Apr 2024 20:22:48 GMT (7253kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2404.16188

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Machine Learning

Title: Pearls from Pebbles: Improved Confidence Functions for Auto-labeling

Submission history