Current browse context:
cs.CL
Change to browse by:
References & Citations
Computer Science > Computation and Language
Title: Comparing Specialised Small and General Large Language Models on Text Classification: 100 Labelled Samples to Achieve Break-Even Performance
(Submitted on 20 Feb 2024 (v1), last revised 26 Apr 2024 (this version, v2))
Abstract: When solving NLP tasks with limited labelled data, researchers can either use a general large language model without further update, or use a small number of labelled examples to tune a specialised smaller model. In this work, we address the research gap of how many labelled samples are required for the specialised small models to outperform general large models, while taking the performance variance into consideration. By observing the behaviour of fine-tuning, instruction-tuning, prompting and in-context learning on 7 language models, we identify such performance break-even points across 8 representative text classification tasks of varying characteristics. We show that the specialised models often need only few samples (on average $10 - 1000$) to be on par or better than the general ones. At the same time, the number of required labels strongly depends on the dataset or task characteristics, with this number being significantly lower on multi-class datasets (up to $100$) than on binary datasets (up to $5000$). When performance variance is taken into consideration, the number of required labels increases on average by $100 - 200\%$ and even up to $1500\%$ in specific cases.
Submission history
From: Branislav Pecher [view email][v1] Tue, 20 Feb 2024 08:38:24 GMT (9705kb,D)
[v2] Fri, 26 Apr 2024 08:20:40 GMT (2745kb,D)
Link back to: arXiv, form interface, contact.