A Framework for Real-time Safeguarding the Text Generation of Large Language Model

Dong, Ximing; Lin, Dayi; Wang, Shaowei; Hassan, Ahmed E.

Full-text links:

Download:

Current browse context:

cs.CL

< prev | next >

new | recent | 2404

Computer Science > Computation and Language

Title: A Framework for Real-time Safeguarding the Text Generation of Large Language Model

Authors: Ximing Dong, Dayi Lin, Shaowei Wang, Ahmed E. Hassan

(Submitted on 29 Apr 2024 (v1), last revised 1 May 2024 (this version, v2))

Abstract: Large Language Models (LLMs) have significantly advanced natural language processing (NLP) tasks but also pose ethical and societal risks due to their propensity to generate harmful content. To address this, various approaches have been developed to safeguard LLMs from producing unsafe content. However, existing methods have limitations, including the need for training specific control models and proactive intervention during text generation, that lead to quality degradation and increased computational overhead. To mitigate those limitations, we propose LLMSafeGuard, a lightweight framework to safeguard LLM text generation in real-time. LLMSafeGuard integrates an external validator into the beam search algorithm during decoding, rejecting candidates that violate safety constraints while allowing valid ones to proceed. We introduce a similarity based validation approach, simplifying constraint introduction and eliminating the need for control model training. Additionally, LLMSafeGuard employs a context-wise timing selection strategy, intervening LLMs only when necessary. We evaluate LLMSafeGuard on two tasks, detoxification and copyright safeguarding, and demonstrate its superior performance over SOTA baselines. For instance, LLMSafeGuard reduces the average toxic score of. LLM output by 29.7% compared to the best baseline meanwhile preserving similar linguistic quality as natural output in detoxification task. Similarly, in the copyright task, LLMSafeGuard decreases the Longest Common Subsequence (LCS) by 56.2% compared to baselines. Moreover, our context-wise timing selection strategy reduces inference time by at least 24% meanwhile maintaining comparable effectiveness as validating each time step. LLMSafeGuard also offers tunable parameters to balance its effectiveness and efficiency.

Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2404.19048 [cs.CL]
	(or arXiv:2404.19048v2 [cs.CL] for this version)

Submission history

From: Ximing Dong [view email]
[v1] Mon, 29 Apr 2024 18:40:01 GMT (337kb,D)
[v2] Wed, 1 May 2024 19:53:12 GMT (337kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2404.19048

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computation and Language

Title: A Framework for Real-time Safeguarding the Text Generation of Large Language Model

Submission history