Small Language Models Need Strong Verifiers to Self-Correct Reasoning

Zhang, Yunxiang; Khalifa, Muhammad; Logeswaran, Lajanugen; Kim, Jaekyeom; Lee, Moontae; Lee, Honglak; Wang, Lu

Full-text links:

Download:

Current browse context:

cs.CL

< prev | next >

new | recent | 2404

Change to browse by:

Computer Science > Computation and Language

Title: Small Language Models Need Strong Verifiers to Self-Correct Reasoning

Authors: Yunxiang Zhang, Muhammad Khalifa, Lajanugen Logeswaran, Jaekyeom Kim, Moontae Lee, Honglak Lee, Lu Wang

(Submitted on 26 Apr 2024)

Abstract: Self-correction has emerged as a promising solution to boost the reasoning performance of large language models (LLMs), where LLMs refine their solutions using self-generated critiques that pinpoint the errors. This work explores whether smaller-size (<= 13B) language models (LMs) have the ability of self-correction on reasoning tasks with minimal inputs from stronger LMs. We propose a novel pipeline that prompts smaller LMs to collect self-correction data that supports the training of self-refinement abilities. First, we leverage correct solutions to guide the model in critiquing their incorrect responses. Second, the generated critiques, after filtering, are used for supervised fine-tuning of the self-correcting reasoner through solution refinement. Our experimental results show improved self-correction abilities of two models on five datasets spanning math and commonsense reasoning, with notable performance gains when paired with a strong GPT-4-based verifier, though limitations are identified when using a weak self-verifier for determining when to correct.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2404.17140 [cs.CL]
	(or arXiv:2404.17140v1 [cs.CL] for this version)

Submission history

From: Yunxiang Zhang [view email]
[v1] Fri, 26 Apr 2024 03:41:28 GMT (440kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2404.17140

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computation and Language

Title: Small Language Models Need Strong Verifiers to Self-Correct Reasoning

Submission history