Improving Dialog Safety using Socially Aware Contrastive Learning

Das, Souvik; Srihari, Rohini K.

Full-text links:

Download:

Current browse context:

cs.CL

< prev | next >

new | recent | 2402

Change to browse by:

Computer Science > Computation and Language

Title: Improving Dialog Safety using Socially Aware Contrastive Learning

Authors: Souvik Das, Rohini K. Srihari

(Submitted on 1 Feb 2024)

Abstract: State-of-the-art conversational AI systems raise concerns due to their potential risks of generating unsafe, toxic, unethical, or dangerous content. Previous works have developed datasets to teach conversational agents the appropriate social paradigms to respond effectively to specifically designed hazardous content. However, models trained on these adversarial datasets still struggle to recognize subtle unsafe situations that appear naturally in conversations or introduce an inappropriate response in a casual context. To understand the extent of this problem, we study prosociality in both adversarial and casual dialog contexts and audit the response quality of general-purpose language models in terms of propensity to produce unsafe content. We propose a dual-step fine-tuning process to address these issues using a socially aware n-pair contrastive loss. Subsequently, we train a base model that integrates prosocial behavior by leveraging datasets like Moral Integrity Corpus (MIC) and ProsocialDialog. Experimental results on several dialog datasets demonstrate the effectiveness of our approach in generating socially appropriate responses.

Comments:	SCI-CHAT@EACL2024
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2402.00446 [cs.CL]
	(or arXiv:2402.00446v1 [cs.CL] for this version)

Submission history

From: Souvik Das [view email]
[v1] Thu, 1 Feb 2024 09:24:33 GMT (8452kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2402.00446

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computation and Language

Title: Improving Dialog Safety using Socially Aware Contrastive Learning

Submission history