LITO: Learnable Intervention for Truthfulness Optimization

Bayat, Farima Fatahi; Liu, Xin; Jagadish, H. V.; Wang, Lu

Full-text links:

Download:

Current browse context:

cs.CL

< prev | next >

new | recent | 2405

Change to browse by:

References & Citations

NASA ADS

Bookmark

(what is this?)

Computer Science > Computation and Language

Title: LITO: Learnable Intervention for Truthfulness Optimization

Authors: Farima Fatahi Bayat, Xin Liu, H. V. Jagadish, Lu Wang

(Submitted on 1 May 2024)

Abstract: Large language models (LLMs) can generate long-form and coherent text, but they still frequently hallucinate facts, thus limiting their reliability. To address this issue, inference-time methods that elicit truthful responses have been proposed by shifting LLM representations towards learned "truthful directions". However, applying the truthful directions with the same intensity fails to generalize across different question contexts. We propose LITO, a Learnable Intervention method for Truthfulness Optimization that automatically identifies the optimal intervention intensity tailored to a specific context. LITO explores a sequence of model generations based on increasing levels of intervention intensities. It selects the most accurate response or refuses to answer when the predictions are highly uncertain. Experiments on multiple LLMs and question-answering datasets demonstrate that LITO improves truthfulness while preserving task accuracy. The adaptive nature of LITO counters issues with one-size-fits-all intervention-based solutions, maximizing model truthfulness by reflecting internal knowledge only when the model is confident.

Comments:	14 pages, 5 figures
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2405.00301 [cs.CL]
	(or arXiv:2405.00301v1 [cs.CL] for this version)

Submission history

From: Farima Fatahi Bayat [view email]
[v1] Wed, 1 May 2024 03:50:09 GMT (7435kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2405.00301

Download:

Current browse context:

Change to browse by:

References & Citations

Bookmark

Computer Science > Computation and Language

Title: LITO: Learnable Intervention for Truthfulness Optimization

Submission history