Computation and Language

New submissions

Submissions received from Tue 7 May 24 to Wed 8 May 24, announced Thu, 9 May 24

New submissions
Cross-lists
Replacements

[ total of 67 entries: 1-67 ]
[ showing up to 2000 entries per page: fewer | more ]

New submissions for Thu, 9 May 24

[1] arXiv:2405.04585 [pdf, other]: Title: PoPE: Legendre Orthogonal Polynomials Based Position Encoding for Large Language Models

Authors: Arpit Aggarwal

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

There are several improvements proposed over the baseline Absolute Positional Encoding (APE) method used in original transformer. In this study, we aim to investigate the implications of inadequately representing positional encoding in higher dimensions on crucial aspects of the attention mechanism, the model's capacity to learn relative positional information, and the convergence of models, all stemming from the choice of sinusoidal basis functions. Through a combination of theoretical insights and empirical analyses, we elucidate how these challenges extend beyond APEs and may adversely affect the performance of Relative Positional Encoding (RPE) methods, such as Rotatory Positional Encoding (RoPE).
Subsequently, we introduce an innovative solution termed Orthogonal Polynomial Based Positional Encoding (PoPE) to address some of the limitations associated with existing methods. The PoPE method encodes positional information by leveraging Orthogonal Legendre polynomials. Legendre polynomials as basis functions offers several desirable properties for positional encoding, including improved correlation structure, non-periodicity, orthogonality, and distinct functional forms among polynomials of varying orders. Our experimental findings demonstrate that transformer models incorporating PoPE outperform baseline transformer models on the $Multi30k$ English-to-German translation task, thus establishing a new performance benchmark. Furthermore, PoPE-based transformers exhibit significantly accelerated convergence rates.
Additionally, we will present novel theoretical perspectives on position encoding based on the superior performance of PoPE.
[2] arXiv:2405.04590 [pdf, other]: Title: Language Modeling Using Tensor Trains

Authors: Zhan Su, Yuqin Zhou, Fengran Mo, Jakob Grue Simonsen

Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)

We propose a novel tensor network language model based on the simplest tensor network (i.e., tensor trains), called `Tensor Train Language Model' (TTLM). TTLM represents sentences in an exponential space constructed by the tensor product of words, but computing the probabilities of sentences in a low-dimensional fashion. We demonstrate that the architectures of Second-order RNNs, Recurrent Arithmetic Circuits (RACs), and Multiplicative Integration RNNs are, essentially, special cases of TTLM. Experimental evaluations on real language modeling tasks show that the proposed variants of TTLM (i.e., TTLM-Large and TTLM-Tiny) outperform the vanilla Recurrent Neural Networks (RNNs) with low-scale of hidden units. (The code is available at https://github.com/shuishen112/tensortrainlm.)
[3] arXiv:2405.04655 [pdf, other]: Title: Understanding the Capabilities and Limitations of Large Language Models for Cultural Commonsense

Authors: Siqi Shen, Lajanugen Logeswaran, Moontae Lee, Honglak Lee, Soujanya Poria, Rada Mihalcea

Subjects: Computation and Language (cs.CL)

Large language models (LLMs) have demonstrated substantial commonsense understanding through numerous benchmark evaluations. However, their understanding of cultural commonsense remains largely unexamined. In this paper, we conduct a comprehensive examination of the capabilities and limitations of several state-of-the-art LLMs in the context of cultural commonsense tasks. Using several general and cultural commonsense benchmarks, we find that (1) LLMs have a significant discrepancy in performance when tested on culture-specific commonsense knowledge for different cultures; (2) LLMs' general commonsense capability is affected by cultural context; and (3) The language used to query the LLMs can impact their performance on cultural-related tasks. Our study points to the inherent bias in the cultural understanding of LLMs and provides insights that can help develop culturally aware language models.
[4] arXiv:2405.04685 [pdf, other]: Title: Bridging the Bosphorus: Advancing Turkish Large Language Models through Strategies for Low-Resource Language Adaptation and Benchmarking

Authors: Emre Can Acikgoz, Mete Erdogan, Deniz Yuret

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Large Language Models (LLMs) are becoming crucial across various fields, emphasizing the urgency for high-quality models in underrepresented languages. This study explores the unique challenges faced by low-resource languages, such as data scarcity, model selection, evaluation, and computational limitations, with a special focus on Turkish. We conduct an in-depth analysis to evaluate the impact of training strategies, model choices, and data availability on the performance of LLMs designed for underrepresented languages. Our approach includes two methodologies: (i) adapting existing LLMs originally pretrained in English to understand Turkish, and (ii) developing a model from the ground up using Turkish pretraining data, both supplemented with supervised fine-tuning on a novel Turkish instruction-tuning dataset aimed at enhancing reasoning capabilities. The relative performance of these methods is evaluated through the creation of a new leaderboard for Turkish LLMs, featuring benchmarks that assess different reasoning and knowledge skills. Furthermore, we conducted experiments on data and model scaling, both during pretraining and fine-tuning, simultaneously emphasizing the capacity for knowledge transfer across languages and addressing the challenges of catastrophic forgetting encountered during fine-tuning on a different language. Our goal is to offer a detailed guide for advancing the LLM framework in low-resource linguistic contexts, thereby making natural language processing (NLP) benefits more globally accessible.
[5] arXiv:2405.04726 [pdf, other]: Title: Learning Phonotactics from Linguistic Informants

Authors: Canaan Breiss, Alexis Ross, Amani Maina-Kilaas, Roger Levy, Jacob Andreas

Subjects: Computation and Language (cs.CL)

We propose an interactive approach to language learning that utilizes linguistic acceptability judgments from an informant (a competent language user) to learn a grammar. Given a grammar formalism and a framework for synthesizing data, our model iteratively selects or synthesizes a data-point according to one of a range of information-theoretic policies, asks the informant for a binary judgment, and updates its own parameters in preparation for the next query. We demonstrate the effectiveness of our model in the domain of phonotactics, the rules governing what kinds of sound-sequences are acceptable in a language, and carry out two experiments, one with typologically-natural linguistic data and another with a range of procedurally-generated languages. We find that the information-theoretic policies that our model uses to select items to query the informant achieve sample efficiency comparable to, and sometimes greater than, fully supervised approaches.
[6] arXiv:2405.04756 [pdf, other]: Title: BiasKG: Adversarial Knowledge Graphs to Induce Bias in Large Language Models

Authors: Chu Fei Luo, Ahmad Ghawanmeh, Xiaodan Zhu, Faiza Khan Khattak

Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

Modern large language models (LLMs) have a significant amount of world knowledge, which enables strong performance in commonsense reasoning and knowledge-intensive tasks when harnessed properly. The language model can also learn social biases, which has a significant potential for societal harm. There have been many mitigation strategies proposed for LLM safety, but it is unclear how effective they are for eliminating social biases. In this work, we propose a new methodology for attacking language models with knowledge graph augmented generation. We refactor natural language stereotypes into a knowledge graph, and use adversarial attacking strategies to induce biased responses from several open- and closed-source language models. We find our method increases bias in all models, even those trained with safety guardrails. This demonstrates the need for further research in AI safety, and further work in this new adversarial space.
[7] arXiv:2405.04777 [pdf, other]: Title: Empathy Through Multimodality in Conversational Interfaces

Authors: Mahyar Abbasian, Iman Azimi, Mohammad Feli, Amir M. Rahmani, Ramesh Jain

Comments: 7 pages, 2 figures, 2 tables, conference paper

Subjects: Computation and Language (cs.CL)

Agents represent one of the most emerging applications of Large Language Models (LLMs) and Generative AI, with their effectiveness hinging on multimodal capabilities to navigate complex user environments. Conversational Health Agents (CHAs), a prime example of this, are redefining healthcare by offering nuanced support that transcends textual analysis to incorporate emotional intelligence. This paper introduces an LLM-based CHA engineered for rich, multimodal dialogue-especially in the realm of mental health support. It adeptly interprets and responds to users' emotional states by analyzing multimodal cues, thus delivering contextually aware and empathetically resonant verbal responses. Our implementation leverages the versatile openCHA framework, and our comprehensive evaluation involves neutral prompts expressed in diverse emotional tones: sadness, anger, and joy. We evaluate the consistency and repeatability of the planning capability of the proposed CHA. Furthermore, human evaluators critique the CHA's empathic delivery, with findings revealing a striking concordance between the CHA's outputs and evaluators' assessments. These results affirm the indispensable role of vocal (soon multimodal) emotion recognition in strengthening the empathetic connection built by CHAs, cementing their place at the forefront of interactive, compassionate digital health solutions.
[8] arXiv:2405.04781 [pdf, other]: Title: CourseGPT-zh: an Educational Large Language Model Based on Knowledge Distillation Incorporating Prompt Optimization

Authors: Zheyan Qu, Lu Yin, Zitong Yu, Wenbo Wang, Xing zhang

Subjects: Computation and Language (cs.CL)

Large language models (LLMs) have demonstrated astonishing capabilities in natural language processing (NLP) tasks, sparking interest in their application to professional domains with higher specialized requirements. However, restricted access to closed-source LLMs via APIs and the difficulty in collecting massive high-quality datasets pose obstacles to the development of large language models in education fields of various courses. Given these challenges, we propose CourseGPT-zh, a course-oriented education LLM that supports customization and low-cost deployment. To address the comprehensiveness and diversity requirements of course-specific corpora, we design a high-quality question-answering corpus distillation framework incorporating prompt optimization, which effectively mines textbook knowledge and enhances its diversity. Moreover, considering the alignment of LLM responses with user needs, a novel method for discrete prompt optimization based on LLM-as-Judge is introduced. During optimization, this framework leverages the LLM's ability to reflect on and exploit error feedback and patterns, allowing for prompts that meet user needs and preferences while saving response length. Lastly, we obtain CourseGPT-zh based on the open-source LLM using parameter-efficient fine-tuning. Experimental results show that our discrete prompt optimization framework effectively improves the response quality of ChatGPT, and CourseGPT-zh exhibits strong professional capabilities in specialized knowledge question-answering, significantly outperforming comparable open-source models.
[9] arXiv:2405.04793 [pdf, other]: Title: Zero-shot LLM-guided Counterfactual Generation for Text

Authors: Amrita Bhattacharjee, Raha Moraffah, Joshua Garland, Huan Liu

Comments: arXiv admin note: text overlap with arXiv:2309.13340

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Counterfactual examples are frequently used for model development and evaluation in many natural language processing (NLP) tasks. Although methods for automated counterfactual generation have been explored, such methods depend on models such as pre-trained language models that are then fine-tuned on auxiliary, often task-specific datasets. Collecting and annotating such datasets for counterfactual generation is labor intensive and therefore, infeasible in practice. Therefore, in this work, we focus on a novel problem setting: \textit{zero-shot counterfactual generation}. To this end, we propose a structured way to utilize large language models (LLMs) as general purpose counterfactual example generators. We hypothesize that the instruction-following and textual understanding capabilities of recent LLMs can be effectively leveraged for generating high quality counterfactuals in a zero-shot manner, without requiring any training or fine-tuning. Through comprehensive experiments on various downstream tasks in natural language processing (NLP), we demonstrate the efficacy of LLMs as zero-shot counterfactual generators in evaluating and explaining black-box NLP models.
[10] arXiv:2405.04818 [pdf, other]: Title: ACORN: Aspect-wise Commonsense Reasoning Explanation Evaluation

Authors: Ana Brassard, Benjamin Heinzerling, Keito Kudo, Keisuke Sakaguchi, Kentaro Inui

Comments: 18 pages, 7 figures, under review. Data available here: this https URL

Subjects: Computation and Language (cs.CL)

Evaluating free-text explanations is a multifaceted, subjective, and labor-intensive task. Large language models (LLMs) present an appealing alternative due to their potential for consistency, scalability, and cost-efficiency. In this work, we present ACORN, a new dataset of 3,500 free-text explanations and aspect-wise quality ratings, and use it to gain insights into how LLMs evaluate explanations. We observed that replacing one of the human ratings sometimes maintained, but more often lowered the inter-annotator agreement across different settings and quality aspects, suggesting that their judgments are not always consistent with human raters. We further quantified this difference by comparing the correlation between LLM-generated ratings with majority-voted human ratings across different quality aspects. With the best system, Spearman's rank correlation ranged between 0.53 to 0.95, averaging 0.72 across aspects, indicating moderately high but imperfect alignment. Finally, we considered the alternative of using an LLM as an additional rater when human raters are scarce, and measured the correlation between majority-voted labels with a limited human pool and LLMs as an additional rater, compared to the original gold labels. While GPT-4 improved the outcome when there were only two human raters, in all other observed cases, LLMs were neutral to detrimental when there were three or more human raters. We publicly release the dataset to support future improvements in LLM-in-the-loop evaluation here: https://github.com/a-brassard/ACORN.
[11] arXiv:2405.04819 [pdf, other]: Title: DALK: Dynamic Co-Augmentation of LLMs and KG to answer Alzheimer's Disease Questions with Scientific Literature

Authors: Dawei Li, Shu Yang, Zhen Tan, Jae Young Baik, Sunkwon Yun, Joseph Lee, Aaron Chacko, Bojian Hou, Duy Duong-Tran, Ying Ding, Huan Liu, Li Shen, Tianlong Chen

Comments: Under Review

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Recent advancements in large language models (LLMs) have achieved promising performances across various applications. Nonetheless, the ongoing challenge of integrating long-tail knowledge continues to impede the seamless adoption of LLMs in specialized domains. In this work, we introduce DALK, a.k.a. Dynamic Co-Augmentation of LLMs and KG, to address this limitation and demonstrate its ability on studying Alzheimer's Disease (AD), a specialized sub-field in biomedicine and a global health priority. With a synergized framework of LLM and KG mutually enhancing each other, we first leverage LLM to construct an evolving AD-specific knowledge graph (KG) sourced from AD-related scientific literature, and then we utilize a coarse-to-fine sampling method with a novel self-aware knowledge retrieval approach to select appropriate knowledge from the KG to augment LLM inference capabilities. The experimental results, conducted on our constructed AD question answering (ADQA) benchmark, underscore the efficacy of DALK. Additionally, we perform a series of detailed analyses that can offer valuable insights and guidelines for the emerging topic of mutually enhancing KG and LLM. We will release the code and data at https://github.com/David-Li0406/DALK.
[12] arXiv:2405.04820 [pdf, other]: Title: APrompt4EM: Augmented Prompt Tuning for Generalized Entity Matching

Authors: Yikuan Xia, Jiazun Chen, Xinchi Li, Jun Gao

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Generalized Entity Matching (GEM), which aims at judging whether two records represented in different formats refer to the same real-world entity, is an essential task in data management. The prompt tuning paradigm for pre-trained language models (PLMs), including the recent PromptEM model, effectively addresses the challenges of low-resource GEM in practical applications, offering a robust solution when labeled data is scarce. However, existing prompt tuning models for GEM face the challenges of prompt design and information gap. This paper introduces an augmented prompt tuning framework for the challenges, which consists of two main improvements. The first is an augmented contextualized soft token-based prompt tuning method that extracts a guiding soft token benefit for the PLMs' prompt tuning, and the second is a cost-effective information augmentation strategy leveraging large language models (LLMs). Our approach performs well on the low-resource GEM challenges. Extensive experiments show promising advancements of our basic model without information augmentation over existing methods based on moderate-size PLMs (average 5.24%+), and our model with information augmentation achieves comparable performance compared with fine-tuned LLMs, using less than 14% of the API fee.
[13] arXiv:2405.04828 [pdf, other]: Title: ChuXin: 1.6B Technical Report

Authors: Xiaomin Zhuang, Yufan Jiang, Qiaozhi He, Zhihua Wu

Comments: Technical Report

Subjects: Computation and Language (cs.CL)

In this report, we present ChuXin, an entirely open-source language model with a size of 1.6 billion parameters. Unlike the majority of works that only open-sourced the model weights and architecture, we have made everything needed to train a model available, including the training data, the training process, and the evaluation code. Our goal is to empower and strengthen the open research community, fostering transparency and enabling a new wave of innovation in the field of language modeling. Furthermore, we extend the context length to 1M tokens through lightweight continual pretraining and demonstrate strong needle-in-a-haystack retrieval performance. The weights for both models are available at Hugging Face to download and use.
[14] arXiv:2405.04829 [pdf, other]: Title: Fine-tuning Pre-trained Named Entity Recognition Models For Indian Languages

Authors: Sankalp Bahad, Pruthwik Mishra, Karunesh Arora, Rakesh Chandra Balabantaray, Dipti Misra Sharma, Parameswari Krishnamurthy

Comments: 8 pages, accepted in NAACL-SRW, 2024

Subjects: Computation and Language (cs.CL)

Named Entity Recognition (NER) is a useful component in Natural Language Processing (NLP) applications. It is used in various tasks such as Machine Translation, Summarization, Information Retrieval, and Question-Answering systems. The research on NER is centered around English and some other major languages, whereas limited attention has been given to Indian languages. We analyze the challenges and propose techniques that can be tailored for Multilingual Named Entity Recognition for Indian Languages. We present a human annotated named entity corpora of 40K sentences for 4 Indian languages from two of the major Indian language families. Additionally,we present a multilingual model fine-tuned on our dataset, which achieves an F1 score of 0.80 on our dataset on average. We achieve comparable performance on completely unseen benchmark datasets for Indian languages which affirms the usability of our model.
[15] arXiv:2405.04872 [pdf, other]: Title: Logical Negation Augmenting and Debiasing for Prompt-based Methods

Authors: Yitian Li, Jidong Tian, Hao He, Yaohui Jin

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Logic in Computer Science (cs.LO)

Prompt-based methods have gained increasing attention on NLP and shown validity on many downstream tasks. Many works have focused on mining these methods' potential for knowledge extraction, but few explore their ability to make logical reasoning. In this work, we focus on the effectiveness of the prompt-based methods on first-order logical reasoning and find that the bottleneck lies in logical negation. Based on our analysis, logical negation tends to result in spurious correlations to negative answers, while propositions without logical negation correlate to positive answers. To solve the problem, we propose a simple but effective method, Negation Augmenting and Negation Debiasing (NAND), which introduces negative propositions to prompt-based methods without updating parameters. Specifically, these negative propositions can counteract spurious correlations by providing "not" for all instances so that models cannot make decisions only by whether expressions contain a logical negation. Experiments on three datasets show that NAND not only solves the problem of calibrating logical negation but also significantly enhances prompt-based methods of logical reasoning without model retraining.
[16] arXiv:2405.04897 [pdf, ps, other]: Title: Machine Learning-based NLP for Emotion Classification on a Cholera X Dataset

Authors: Paul Jideani, Aurona Gerber

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Recent social media posts on the cholera outbreak in Hammanskraal have highlighted the diverse range of emotions people experienced in response to such an event. The extent of people's opinions varies greatly depending on their level of knowledge and information about the disease. The documented re-search about Cholera lacks investigations into the classification of emotions. This study aims to examine the emotions expressed in social media posts about Chol-era. A dataset of 23,000 posts was extracted and pre-processed. The Python Nat-ural Language Toolkit (NLTK) sentiment analyzer library was applied to deter-mine the emotional significance of each text. Additionally, Machine Learning (ML) models were applied for emotion classification, including Long short-term memory (LSTM), Logistic regression, Decision trees, and the Bidirectional En-coder Representations from Transformers (BERT) model. The results of this study demonstrated that LSTM achieved the highest accuracy of 75%. Emotion classification presents a promising tool for gaining a deeper understanding of the impact of Cholera on society. The findings of this study might contribute to the development of effective interventions in public health strategies.
[17] arXiv:2405.04955 [pdf, other]: Title: Improving Long Text Understanding with Knowledge Distilled from Summarization Model

Authors: Yan Liu, Yazheng Yang, Xiaokang Chen

Comments: arXiv admin note: text overlap with arXiv:2110.04741

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Long text understanding is important yet challenging for natural language processing. A long article or document usually contains many redundant words that are not pertinent to its gist and sometimes can be regarded as noise. With recent advances of abstractive summarization, we propose our \emph{Gist Detector} to leverage the gist detection ability of a summarization model and integrate the extracted gist into downstream models to enhance their long text understanding ability. Specifically, Gist Detector first learns the gist detection knowledge distilled from a summarization model, and then produces gist-aware representations to augment downstream models. We evaluate our method on three different tasks: long document classification, distantly supervised open-domain question answering, and non-parallel text style transfer. The experimental results show that our method can significantly improve the performance of baseline models on all tasks.
[18] arXiv:2405.04960 [pdf, other]: Title: P-ICL: Point In-Context Learning for Named Entity Recognition with Large Language Models

Authors: Guochao Jiang, Zepeng Ding, Yuchen Shi, Deqing Yang

Subjects: Computation and Language (cs.CL)

In recent years, the rise of large language models (LLMs) has made it possible to directly achieve named entity recognition (NER) without any demonstration samples or only using a few samples through in-context learning (ICL). However, standard ICL only helps LLMs understand task instructions, format and input-label mapping, but neglects the particularity of the NER task itself. In this paper, we propose a new prompting framework P-ICL to better achieve NER with LLMs, in which some point entities are leveraged as the auxiliary information to recognize each entity type. With such significant information, the LLM can achieve entity classification more precisely. To obtain optimal point entities for prompting LLMs, we also proposed a point entity selection method based on K-Means clustering. Our extensive experiments on some representative NER benchmarks verify the effectiveness of our proposed strategies in P-ICL and point entity selection.
[19] arXiv:2405.05008 [pdf, other]: Title: ADELIE: Aligning Large Language Models on Information Extraction

Authors: Yunjia Qi, Hao Peng, Xiaozhi Wang, Bin Xu, Lei Hou, Juanzi Li

Subjects: Computation and Language (cs.CL)

Large language models (LLMs) usually fall short on information extraction (IE) tasks and struggle to follow the complex instructions of IE tasks. This primarily arises from LLMs not being aligned with humans, as mainstream alignment datasets typically do not include IE data. In this paper, we introduce ADELIE (Aligning large language moDELs on Information Extraction), an aligned LLM that effectively solves various IE tasks, including closed IE, open IE, and on-demand IE. We first collect and construct a high-quality alignment corpus IEInstruct for IE. Then we train ADELIE_SFT using instruction tuning on IEInstruct. We further train ADELIE_SFT with direct preference optimization (DPO) objective, resulting in ADELIE_DPO. Extensive experiments on various held-out IE datasets demonstrate that our models (ADELIE_SFT and ADELIE_DPO) achieve state-of-the-art (SoTA) performance among open-source models. We further explore the general capabilities of ADELIE, and experimental results reveal that their general capabilities do not exhibit a noticeable decline. We will release the code, data, and models to facilitate further research.
[20] arXiv:2405.05049 [pdf, ps, other]: Title: Seeds of Stereotypes: A Large-Scale Textual Analysis of Race and Gender Associations with Diseases in Online Sources

Authors: Lasse Hyldig Hansen, Nikolaj Andersen, Jack Gallifant, Liam G. McCoy, James K Stone, Nura Izath, Marcela Aguirre-Jerez, Danielle S Bitterman, Judy Gichoya, Leo Anthony Celi

Subjects: Computation and Language (cs.CL)

Background Advancements in Large Language Models (LLMs) hold transformative potential in healthcare, however, recent work has raised concern about the tendency of these models to produce outputs that display racial or gender biases. Although training data is a likely source of such biases, exploration of disease and demographic associations in text data at scale has been limited.
Methods We conducted a large-scale textual analysis using a dataset comprising diverse web sources, including Arxiv, Wikipedia, and Common Crawl. The study analyzed the context in which various diseases are discussed alongside markers of race and gender. Given that LLMs are pre-trained on similar datasets, this approach allowed us to examine the potential biases that LLMs may learn and internalize. We compared these findings with actual demographic disease prevalence as well as GPT-4 outputs in order to evaluate the extent of bias representation.
Results Our findings indicate that demographic terms are disproportionately associated with specific disease concepts in online texts. gender terms are prominently associated with disease concepts, while racial terms are much less frequently associated. We find widespread disparities in the associations of specific racial and gender terms with the 18 diseases analyzed. Most prominently, we see an overall significant overrepresentation of Black race mentions in comparison to population proportions.
Conclusions Our results highlight the need for critical examination and transparent reporting of biases in LLM pretraining datasets. Our study suggests the need to develop mitigation strategies to counteract the influence of biased training data in LLMs, particularly in sensitive domains such as healthcare.
[21] arXiv:2405.05060 [pdf, other]: Title: Conversational Topic Recommendation in Counseling and Psychotherapy with Decision Transformer and Large Language Models

Authors: Aylin Gunal, Baihan Lin, Djallel Bouneffouf

Comments: 5 pages excluding references, 3 figures; accepted at Clinical NLP Workshop @ NAACL 2024

Subjects: Computation and Language (cs.CL)

Given the increasing demand for mental health assistance, artificial intelligence (AI), particularly large language models (LLMs), may be valuable for integration into automated clinical support systems. In this work, we leverage a decision transformer architecture for topic recommendation in counseling conversations between patients and mental health professionals. The architecture is utilized for offline reinforcement learning, and we extract states (dialogue turn embeddings), actions (conversation topics), and rewards (scores measuring the alignment between patient and therapist) from previous turns within a conversation to train a decision transformer model. We demonstrate an improvement over baseline reinforcement learning methods, and propose a novel system of utilizing our model's output as synthetic labels for fine-tuning a large language model for the same task. Although our implementation based on LLaMA-2 7B has mixed results, future work can undoubtedly build on the design.
[22] arXiv:2405.05109 [pdf, other]: Title: QFMTS: Generating Query-Focused Summaries over Multi-Table Inputs

Authors: Weijia Zhang, Vaishali Pal, Jia-Hong Huang, Evangelos Kanoulas, Maarten de Rijke

Comments: 16 pages, 3 figures

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Table summarization is a crucial task aimed at condensing information from tabular data into concise and comprehensible textual summaries. However, existing approaches often fall short of adequately meeting users' information and quality requirements and tend to overlook the complexities of real-world queries. In this paper, we propose a novel method to address these limitations by introducing query-focused multi-table summarization. Our approach, which comprises a table serialization module, a summarization controller, and a large language model (LLM), utilizes textual queries and multiple tables to generate query-dependent table summaries tailored to users' information needs. To facilitate research in this area, we present a comprehensive dataset specifically tailored for this task, consisting of 4909 query-summary pairs, each associated with multiple tables. Through extensive experiments using our curated dataset, we demonstrate the effectiveness of our proposed method compared to baseline approaches. Our findings offer insights into the challenges of complex table reasoning for precise summarization, contributing to the advancement of research in query-focused multi-table summarization.
[23] arXiv:2405.05116 [pdf, other]: Title: XAMPLER: Learning to Retrieve Cross-Lingual In-Context Examples

Authors: Peiqin Lin, André F. T. Martins, Hinrich Schütze

Subjects: Computation and Language (cs.CL)

Recent studies have shown that leveraging off-the-shelf or fine-tuned retrievers, capable of retrieving high-quality in-context examples, significantly improves in-context learning of English. However, adapting these methods to other languages, especially low-resource ones, presents challenges due to the scarcity of available cross-lingual retrievers and annotated data. In this paper, we introduce XAMPLER: Cross-Lingual Example Retrieval, a method tailored to tackle the challenge of cross-lingual in-context learning using only annotated English data. XAMPLER first trains a retriever with positive/negative English samples, which are constructed based on the predictions of the multilingual large language model for in-context learning. Then, the trained retriever is directly employed to retrieve English examples as few-shot examples for in-context learning of target languages. Experiments on the massively multilingual text classification benchmark of SIB200 with 176 languages demonstrate that XAMPLER substantially improves the in-context learning performance across languages. Our code is available at https://github.com/cisnlp/XAMPLER.
[24] arXiv:2405.05161 [pdf, ps, other]: Title: Motion Capture Analysis of Verb and Adjective Types in Austrian Sign Language

Authors: Julia Krebs, Evie Malaia, Ronnie B. Wilbur, Isabella Fessl, Hans-Peter Wiesinger, Hermann Schwameder, Dietmar Roehm

Comments: 10 pages, 7 figures

Subjects: Computation and Language (cs.CL); Neurons and Cognition (q-bio.NC)

Across a number of sign languages, temporal and spatial characteristics of dominant hand articulation are used to express semantic and grammatical features. In this study of Austrian Sign Language (\"Osterreichische Geb\"ardensprache, or \"OGS), motion capture data of four Deaf signers is used to quantitatively characterize the kinematic parameters of sign production in verbs and adjectives. We investigate (1) the difference in production between verbs involving a natural endpoint (telic verbs; e.g. arrive) and verbs lacking an endpoint (atelic verbs; e.g. analyze), and (2) adjective signs in intensified vs. non-intensified (plain) forms. Motion capture data analysis using linear-mixed effects models (LME) indicates that both the endpoint marking in verbs, as well as marking of intensification in adjectives, are expressed by movement modulation in \"OGS. While the semantic distinction between verb types (telic/atelic) is marked by higher peak velocity and shorter duration for telic signs compared to atelic ones, the grammatical distinction (intensification) in adjectives is expressed by longer duration for intensified compared to non-intensified adjectives. The observed individual differences of signers might be interpreted as personal signing style.
[25] arXiv:2405.05176 [pdf, other]: Title: Encoder-Decoder Framework for Interactive Free Verses with Generation with Controllable High-Quality Rhyming

Authors: Tommaso Pasini, Alejo López-Ávila, Husam Quteineh, Gerasimos Lampouras, Jinhua Du, Yubing Wang, Ze Li, Yusen Sun

Comments: 18 pages, 1 figure

Subjects: Computation and Language (cs.CL)

Composing poetry or lyrics involves several creative factors, but a challenging aspect of generation is the adherence to a more or less strict metric and rhyming pattern. To address this challenge specifically, previous work on the task has mainly focused on reverse language modeling, which brings the critical selection of each rhyming word to the forefront of each verse. On the other hand, reversing the word order requires that models be trained from scratch with this task-specific goal and cannot take advantage of transfer learning from a Pretrained Language Model (PLM). We propose a novel fine-tuning approach that prepends the rhyming word at the start of each lyric, which allows the critical rhyming decision to be made before the model commits to the content of the lyric (as during reverse language modeling), but maintains compatibility with the word order of regular PLMs as the lyric itself is still generated in left-to-right order. We conducted extensive experiments to compare this fine-tuning against the current state-of-the-art strategies for rhyming, finding that our approach generates more readable text and better rhyming capabilities. Furthermore, we furnish a high-quality dataset in English and 12 other languages, analyse the approach's feasibility in a multilingual context, provide extensive experimental results shedding light on good and bad practices for lyrics generation, and propose metrics to compare methods in the future.
[26] arXiv:2405.05189 [pdf, other]: Title: MIDGARD: Self-Consistency Using Minimum Description Length for Structured Commonsense Reasoning

Authors: Inderjeet Nair, Lu Wang

Comments: Under review at ACL 2024

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

We study the task of conducting structured reasoning as generating a reasoning graph from natural language input using large language models (LLMs). Previous approaches have explored various prompting schemes, yet they suffer from error propagation due to the autoregressive nature and single-pass-based decoding, which lack error correction capability. Additionally, relying solely on a single sample may result in the omission of true nodes and edges. To counter this, we draw inspiration from self-consistency (SC), which involves sampling a diverse set of reasoning chains and taking the majority vote as the final answer. To tackle the substantial challenge of applying SC on generated graphs, we propose MIDGARD (MInimum Description length Guided Aggregation of Reasoning in Directed acyclic graph) that leverages Minimum Description Length (MDL)-based formulation to identify consistent properties among the different graph samples generated by an LLM. This formulation helps reject properties that appear in only a few samples, which are likely to be erroneous, while enabling the inclusion of missing elements without compromising precision. Our method demonstrates superior performance than comparisons across various structured reasoning tasks, including argument structure extraction, explanation graph generation, inferring dependency relations among actions for everyday tasks, and semantic graph generation from natural texts.
[27] arXiv:2405.05204 [pdf, ps, other]: Title: CARE-SD: Classifier-based analysis for recognizing and eliminating stigmatizing and doubt marker labels in electronic health records: model development and validation

Authors: Drew Walker, Annie Thorne, Sudeshna Das, Jennifer Love, Hannah LF Cooper, Melvin Livingston III, Abeed Sarker

Comments: 28 pages, 3 figures, 4 tables. 5 Appendices

Subjects: Computation and Language (cs.CL)

Objective: To detect and classify features of stigmatizing and biased language in intensive care electronic health records (EHRs) using natural language processing techniques. Materials and Methods: We first created a lexicon and regular expression lists from literature-driven stem words for linguistic features of stigmatizing patient labels, doubt markers, and scare quotes within EHRs. The lexicon was further extended using Word2Vec and GPT 3.5, and refined through human evaluation. These lexicons were used to search for matches across 18 million sentences from the de-identified Medical Information Mart for Intensive Care-III (MIMIC-III) dataset. For each linguistic bias feature, 1000 sentence matches were sampled, labeled by expert clinical and public health annotators, and used to supervised learning classifiers. Results: Lexicon development from expanded literature stem-word lists resulted in a doubt marker lexicon containing 58 expressions, and a stigmatizing labels lexicon containing 127 expressions. Classifiers for doubt markers and stigmatizing labels had the highest performance, with macro F1-scores of .84 and .79, positive-label recall and precision values ranging from .71 to .86, and accuracies aligning closely with human annotator agreement (.87). Discussion: This study demonstrated the feasibility of supervised classifiers in automatically identifying stigmatizing labels and doubt markers in medical text, and identified trends in stigmatizing language use in an EHR setting. Additional labeled data may help improve lower scare quote model performance. Conclusions: Classifiers developed in this study showed high model performance and can be applied to identify patterns and target interventions to reduce stigmatizing labels and doubt markers in healthcare systems.
[28] arXiv:2405.05248 [pdf, other]: Title: LLMs with Personalities in Multi-issue Negotiation Games

Authors: Sean Noh, Ho-Chun Herbert Chang

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)

Powered by large language models (LLMs), AI agents have become capable of many human tasks. Using the most canonical definitions of the Big Five personality, we measure the ability of LLMs to negotiate within a game-theoretical framework, as well as methodological challenges to measuring notions of fairness and risk. Simulations (n=1,500) for both single-issue and multi-issue negotiation reveal increase in domain complexity with asymmetric issue valuations improve agreement rates but decrease surplus from aggressive negotiation. Through gradient-boosted regression and Shapley explainers, we find high openness, conscientiousness, and neuroticism are associated with fair tendencies; low agreeableness and low openness are associated with rational tendencies. Low conscientiousness is associated with high toxicity. These results indicate that LLMs may have built-in guardrails that default to fair behavior, but can be "jail broken" to exploit agreeable opponents. We also offer pragmatic insight in how negotiation bots can be designed, and a framework of assessing negotiation behavior based on game theory and computational social science.
[29] arXiv:2405.05253 [pdf, other]: Title: Open Source Language Models Can Provide Feedback: Evaluating LLMs' Ability to Help Students Using GPT-4-As-A-Judge

Authors: Charles Koutcheme, Nicola Dainese, Sami Sarsa, Arto Hellas, Juho Leinonen, Paul Denny

Comments: 7 pages, 4 figures, 2 tables. Accepted for publication at the 29th annual ACM conference on Innovation and Technology in Computer Science Education (ITiCSE 2024)

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)

Large language models (LLMs) have shown great potential for the automatic generation of feedback in a wide range of computing contexts. However, concerns have been voiced around the privacy and ethical implications of sending student work to proprietary models. This has sparked considerable interest in the use of open source LLMs in education, but the quality of the feedback that such open models can produce remains understudied. This is a concern as providing flawed or misleading generated feedback could be detrimental to student learning. Inspired by recent work that has utilised very powerful LLMs, such as GPT-4, to evaluate the outputs produced by less powerful models, we conduct an automated analysis of the quality of the feedback produced by several open source models using a dataset from an introductory programming course. First, we investigate the viability of employing GPT-4 as an automated evaluator by comparing its evaluations with those of a human expert. We observe that GPT-4 demonstrates a bias toward positively rating feedback while exhibiting moderate agreement with human raters, showcasing its potential as a feedback evaluator. Second, we explore the quality of feedback generated by several leading open-source LLMs by using GPT-4 to evaluate the feedback. We find that some models offer competitive performance with popular proprietary LLMs, such as ChatGPT, indicating opportunities for their responsible use in educational settings.
[30] arXiv:2405.05254 [pdf, other]: Title: You Only Cache Once: Decoder-Decoder Architectures for Language Models

Authors: Yutao Sun, Li Dong, Yi Zhu, Shaohan Huang, Wenhui Wang, Shuming Ma, Quanlu Zhang, Jianyong Wang, Furu Wei

Subjects: Computation and Language (cs.CL)

We introduce a decoder-decoder architecture, YOCO, for large language models, which only caches key-value pairs once. It consists of two components, i.e., a cross-decoder stacked upon a self-decoder. The self-decoder efficiently encodes global key-value (KV) caches that are reused by the cross-decoder via cross-attention. The overall model behaves like a decoder-only Transformer, although YOCO only caches once. The design substantially reduces GPU memory demands, yet retains global attention capability. Additionally, the computation flow enables prefilling to early exit without changing the final output, thereby significantly speeding up the prefill stage. Experimental results demonstrate that YOCO achieves favorable performance compared to Transformer in various settings of scaling up model size and number of training tokens. We also extend YOCO to 1M context length with near-perfect needle retrieval accuracy. The profiling results show that YOCO improves inference memory, prefill latency, and throughput by orders of magnitude across context lengths and model sizes. Code is available at https://aka.ms/YOCO.

Cross-lists for Thu, 9 May 24

[31] arXiv:2405.04620 (cross-list from hep-ph) [pdf, ps, other]: Title: Folded context condensation in Path Integral formalism for infinite context transformers

Authors: Won-Gi Paeng, Daesuk Kwon

Comments: 7 pages, 2 figures

Subjects: High Energy Physics - Phenomenology (hep-ph); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

This short note is written for rapid communication of long context training and to share the idea of how to train it with low memory usage. In the note, we generalize the attention algorithm and neural network of Generative Pre-Trained Transformers and reinterpret it in Path integral formalism. First, the role of the transformer is understood as the time evolution of the token state and second, it is suggested that the all key-token states in the same time as the query-token can attend to the attention with the query token states. As a result of the repetitive time evolution, it is discussed that the token states in the past sequence meats the token states in the present sequence so that the attention between separated sequences becomes possible for maintaining infinite contextual information just by using low memory for limited size of sequence. For the experiment, the $12$ input token window size was taken and one GPU with $24$GB memory was used for the pre-training. It was confirmed that more than $150$ length context is preserved. The sampling result of the training, the code and the other details will be included in the revised version of this note later.
[32] arXiv:2405.04758 (cross-list from cs.CR) [pdf, other]: Title: Honeyfile Camouflage: Hiding Fake Files in Plain Sight

Authors: Roelien C. Timmer, David Liebowitz, Surya Nepal, Salil S. Kanhere

Comments: 3rd Workshop on the security implications of Deepfakes and Cheapfakes (WDC) co-located at ACM ASIACCS 2024

Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

Honeyfiles are a particularly useful type of honeypot: fake files deployed to detect and infer information from malicious behaviour. This paper considers the challenge of naming honeyfiles so they are camouflaged when placed amongst real files in a file system. Based on cosine distances in semantic vector spaces, we develop two metrics for filename camouflage: one based on simple averaging and one on clustering with mixture fitting. We evaluate and compare the metrics, showing that both perform well on a publicly available GitHub software repository dataset.
[33] arXiv:2405.04950 (cross-list from cs.CV) [pdf, other]: Title: VisionGraph: Leveraging Large Multimodal Models for Graph Theory Problems in Visual Context

Authors: Yunxin Li, Baotian Hu, Haoyuan Shi, Wei Wang, Longyue Wang, Min Zhang

Comments: 17 pages; Accepted by ICML 2024

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

Large Multimodal Models (LMMs) have achieved impressive success in visual understanding and reasoning, remarkably improving the performance of mathematical reasoning in a visual context. Yet, a challenging type of visual math lies in the multimodal graph theory problem, which demands that LMMs understand the graphical structures accurately and perform multi-step reasoning on the visual graph. Additionally, exploring multimodal graph theory problems will lead to more effective strategies in fields like biology, transportation, and robotics planning. To step forward in this direction, we are the first to design a benchmark named VisionGraph, used to explore the capabilities of advanced LMMs in solving multimodal graph theory problems. It encompasses eight complex graph problem tasks, from connectivity to shortest path problems. Subsequently, we present a Description-Program-Reasoning (DPR) chain to enhance the logical accuracy of reasoning processes through graphical structure description generation and algorithm-aware multi-step reasoning. Our extensive study shows that 1) GPT-4V outperforms Gemini Pro in multi-step graph reasoning; 2) All LMMs exhibit inferior perception accuracy for graphical structures, whether in zero/few-shot settings or with supervised fine-tuning (SFT), which further affects problem-solving performance; 3) DPR significantly improves the multi-step graph reasoning capabilities of LMMs and the GPT-4V (DPR) agent achieves SOTA performance.
[34] arXiv:2405.05135 (cross-list from cs.SE) [pdf, ps, other]: Title: Lessons from the Use of Natural Language Inference (NLI) in Requirements Engineering Tasks

Authors: Mohamad Fazelnia, Viktoria Koscinski, Spencer Herzog, Mehdi Mirakhorli

Subjects: Software Engineering (cs.SE); Computation and Language (cs.CL); Machine Learning (cs.LG)

We investigate the use of Natural Language Inference (NLI) in automating requirements engineering tasks. In particular, we focus on three tasks: requirements classification, identification of requirements specification defects, and detection of conflicts in stakeholders' requirements. While previous research has demonstrated significant benefit in using NLI as a universal method for a broad spectrum of natural language processing tasks, these advantages have not been investigated within the context of software requirements engineering. Therefore, we design experiments to evaluate the use of NLI in requirements analysis. We compare the performance of NLI with a spectrum of approaches, including prompt-based models, conventional transfer learning, Large Language Models (LLMs)-powered chatbot models, and probabilistic models. Through experiments conducted under various learning settings including conventional learning and zero-shot, we demonstrate conclusively that our NLI method surpasses classical NLP methods as well as other LLMs-based and chatbot models in the analysis of requirements specifications. Additionally, we share lessons learned characterizing the learning settings that make NLI a suitable approach for automating requirements engineering tasks.
[35] arXiv:2405.05136 (cross-list from cs.CY) [pdf, other]: Title: Integrating LSTM and BERT for Long-Sequence Data Analysis in Intelligent Tutoring Systems

Authors: Zhaoxing Li, Jujie Yang, Jindi Wang, Lei Shi, Sebastian Stein

Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)

The field of Knowledge Tracing aims to understand how students learn and master knowledge over time by analyzing their historical behaviour data. To achieve this goal, many researchers have proposed Knowledge Tracing models that use data from Intelligent Tutoring Systems to predict students' subsequent actions. However, with the development of Intelligent Tutoring Systems, large-scale datasets containing long-sequence data began to emerge. Recent deep learning based Knowledge Tracing models face obstacles such as low efficiency, low accuracy, and low interpretability when dealing with large-scale datasets containing long-sequence data. To address these issues and promote the sustainable development of Intelligent Tutoring Systems, we propose a LSTM BERT-based Knowledge Tracing model for long sequence data processing, namely LBKT, which uses a BERT-based architecture with a Rasch model-based embeddings block to deal with different difficulty levels information and an LSTM block to process the sequential characteristic in students' actions. LBKT achieves the best performance on most benchmark datasets on the metrics of ACC and AUC. Additionally, an ablation study is conducted to analyse the impact of each component of LBKT's overall performance. Moreover, we used t-SNE as the visualisation tool to demonstrate the model's embedding strategy. The results indicate that LBKT is faster, more interpretable, and has a lower memory cost than the traditional deep learning based Knowledge Tracing methods.
[36] arXiv:2405.05175 (cross-list from cs.CR) [pdf, other]: Title: Air Gap: Protecting Privacy-Conscious Conversational Agents

Authors: Eugene Bagdasaryan, Ren Yi, Sahra Ghalebikesabi, Peter Kairouz, Marco Gruteser, Sewoong Oh, Borja Balle, Daniel Ramage

Subjects: Cryptography and Security (cs.CR); Computation and Language (cs.CL); Machine Learning (cs.LG)

The growing use of large language model (LLM)-based conversational agents to manage sensitive user data raises significant privacy concerns. While these agents excel at understanding and acting on context, this capability can be exploited by malicious actors. We introduce a novel threat model where adversarial third-party apps manipulate the context of interaction to trick LLM-based agents into revealing private information not relevant to the task at hand.
Grounded in the framework of contextual integrity, we introduce AirGapAgent, a privacy-conscious agent designed to prevent unintended data leakage by restricting the agent's access to only the data necessary for a specific task. Extensive experiments using Gemini, GPT, and Mistral models as agents validate our approach's effectiveness in mitigating this form of context hijacking while maintaining core agent functionality. For example, we show that a single-query context hijacking attack on a Gemini Ultra agent reduces its ability to protect user data from 94% to 45%, while an AirGapAgent achieves 97% protection, rendering the same attack ineffective.

Replacements for Thu, 9 May 24

[37] arXiv:2212.10935 (replaced) [pdf, other]: Title: Commentary Generation from Data Records of Multiplayer Strategy Esports Game

Authors: Zihan Wang, Naoki Yoshinaga

Comments: Accepted by NAACL SRW 2024

Subjects: Computation and Language (cs.CL)
[38] arXiv:2303.03593 (replaced) [pdf, other]: Title: ADELT: Transpilation Between Deep Learning Frameworks

Authors: Linyuan Gong, Jiayi Wang, Alvin Cheung

Comments: 19 pages, to be published in the main track of IJCAI 2024

Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
[39] arXiv:2304.08448 (replaced) [pdf, other]: Title: An Iterative Optimizing Framework for Radiology Report Summarization with ChatGPT

Authors: Chong Ma, Zihao Wu, Jiaqi Wang, Shaochen Xu, Yaonai Wei, Fang Zeng, Zhengliang Liu, Xi Jiang, Lei Guo, Xiaoyan Cai, Shu Zhang, Tuo Zhang, Dajiang Zhu, Dinggang Shen, Tianming Liu, Xiang Li

Comments: Change to the published version. "ImpressionGPT" has been removed from the title

Journal-ref: IEEE Transactions on Artificial Intelligence (Early Access)(12 February 2024)

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[40] arXiv:2310.04743 (replaced) [pdf, other]: Title: Resprompt: Residual Connection Prompting Advances Multi-Step Reasoning in Large Language Models

Authors: Song Jiang, Zahra Shakeri, Aaron Chan, Maziar Sanjabi, Hamed Firooz, Yinglong Xia, Bugra Akyildiz, Yizhou Sun, Jinchao Li, Qifan Wang, Asli Celikyilmaz

Comments: 29 pages

Subjects: Computation and Language (cs.CL)
[41] arXiv:2310.11451 (replaced) [pdf, other]: Title: Seeking Neural Nuggets: Knowledge Transfer in Large Language Models from a Parametric Perspective

Authors: Ming Zhong, Chenxin An, Weizhu Chen, Jiawei Han, Pengcheng He

Comments: ICLR 2024

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[42] arXiv:2401.06712 (replaced) [pdf, other]: Title: Few-Shot Detection of Machine-Generated Text using Style Representations

Authors: Rafael Rivera Soto, Kailin Koch, Aleem Khan, Barry Chen, Marcus Bishop, Nicholas Andrews

Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
[43] arXiv:2401.14011 (replaced) [pdf, other]: Title: CMMU: A Benchmark for Chinese Multi-modal Multi-type Question Understanding and Reasoning

Authors: Zheqi He, Xinya Wu, Pengfei Zhou, Richeng Xuan, Guang Liu, Xi Yang, Qiannan Zhu, Hua Huang

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Multimedia (cs.MM)
[44] arXiv:2402.05812 (replaced) [pdf, other]: Title: FAQ-Gen: An automated system to generate domain-specific FAQs to aid content comprehension

Authors: Sahil Kale, Gautam Khaire, Jay Patankar

Comments: 16 pages, 4 figures. Accepted for publication in Journal of Computer-Assisted Linguistic Research (Vol. 8, 2024)

Subjects: Computation and Language (cs.CL)
[45] arXiv:2402.06221 (replaced) [pdf, other]: Title: ResumeFlow: An LLM-facilitated Pipeline for Personalized Resume Generation and Refinement

Authors: Saurabh Bhausaheb Zinjad, Amrita Bhattacharjee, Amey Bhilegaonkar, Huan Liu

Comments: Accepted to SIGIR 2024 (Demo)

Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)
[46] arXiv:2402.08403 (replaced) [pdf, other]: Title: LLMs and the Human Condition

Authors: Peter Wallis

Comments: 4th draft. Added images of Zak and the ewe. No destination publication at this stage (missed IVA)

Subjects: Computation and Language (cs.CL)
[47] arXiv:2403.02333 (replaced) [pdf, other]: Title: Key-Point-Driven Data Synthesis with its Enhancement on Mathematical Reasoning

Authors: Yiming Huang, Xiao Liu, Yeyun Gong, Zhibin Gou, Yelong Shen, Nan Duan, Weizhu Chen

Comments: In progress

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[48] arXiv:2403.12403 (replaced) [pdf, other]: Title: Towards Interpretable Hate Speech Detection using Large Language Model-extracted Rationales

Authors: Ayushi Nirmal, Amrita Bhattacharjee, Paras Sheth, Huan Liu

Comments: Camera-ready for NAACL WOAH 2024 (Workshop on Online Abuse and Harms). First two authors contributed equally

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[49] arXiv:2403.13799 (replaced) [pdf, other]: Title: Reverse Training to Nurse the Reversal Curse

Authors: Olga Golovneva, Zeyuan Allen-Zhu, Jason Weston, Sainbayar Sukhbaatar

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[50] arXiv:2404.01626 (replaced) [pdf, other]: Title: Entity Disambiguation via Fusion Entity Decoding

Authors: Junxiong Wang, Ali Mousavi, Omar Attia, Ronak Pradeep, Saloni Potdar, Alexander M. Rush, Umar Farooq Minhas, Yunyao Li

Comments: Accepted at NAACL'24 main

Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)
[51] arXiv:2404.01921 (replaced) [pdf, other]: Title: A Rationale-centric Counterfactual Data Augmentation Method for Cross-Document Event Coreference Resolution

Authors: Bowen Ding, Qingkai Min, Shengkun Ma, Yingjie Li, Linyi Yang, Yue Zhang

Comments: Accepted to NAACL-24 Main

Subjects: Computation and Language (cs.CL)
[52] arXiv:2404.05694 (replaced) [pdf, other]: Title: Comprehensive Study on German Language Models for Clinical and Biomedical Text Understanding

Authors: Ahmad Idrissi-Yaghir, Amin Dada, Henning Schäfer, Kamyar Arzideh, Giulia Baldini, Jan Trienes, Max Hasin, Jeanette Bewersdorff, Cynthia S. Schmidt, Marie Bauer, Kaleb E. Smith, Jiang Bian, Yonghui Wu, Jörg Schlötterer, Torsten Zesch, Peter A. Horn, Christin Seifert, Felix Nensa, Jens Kleesiek, Christoph M. Friedrich

Comments: Accepted at LREC-COLING 2024

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[53] arXiv:2404.06162 (replaced) [pdf, other]: Title: Characterizing Multimodal Long-form Summarization: A Case Study on Financial Reports

Authors: Tianyu Cao, Natraj Raman, Danial Dervovic, Chenhao Tan

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[54] arXiv:2404.08354 (replaced) [pdf, other]: Title: Gaining More Insight into Neural Semantic Parsing with Challenging Benchmarks

Authors: Xiao Zhang, Chunliu Wang, Rik van Noord, Johan Bos

Subjects: Computation and Language (cs.CL)
[55] arXiv:2404.15667 (replaced) [pdf, other]: Title: The Promise and Challenges of Using LLMs to Accelerate the Screening Process of Systematic Reviews

Authors: Aleksi Huotala, Miikka Kuutila, Paul Ralph, Mika Mäntylä

Comments: Accepted to the International Conference on Evaluation and Assessment in Software Engineering (EASE), 2024 edition

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[56] arXiv:2404.17178 (replaced) [pdf, other]: Title: A Unified Label-Aware Contrastive Learning Framework for Few-Shot Named Entity Recognition

Authors: Haojie Zhang, Yimeng Zhuang

Subjects: Computation and Language (cs.CL)
[57] arXiv:2405.01379 (replaced) [pdf, other]: Title: Verification and Refinement of Natural Language Explanations through LLM-Symbolic Theorem Proving

Authors: Xin Quan, Marco Valentino, Louise A. Dennis, André Freitas

Subjects: Computation and Language (cs.CL)
[58] arXiv:2405.03279 (replaced) [pdf, other]: Title: Lifelong Knowledge Editing for LLMs with Retrieval-Augmented Continuous Prompt Learning

Authors: Qizhou Chen, Taolin Zhang, Xiaofeng He, Dongyang Li, Chengyu Wang, Longtao Huang, Hui Xue

Comments: 14 pages, 4 figures, 6 tables

Subjects: Computation and Language (cs.CL)
[59] arXiv:2405.04434 (replaced) [pdf, other]: Title: DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

Authors: DeepSeek-AI

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[60] arXiv:2309.12342 (replaced) [pdf, other]: Title: Cultural Alignment in Large Language Models: An Explanatory Analysis Based on Hofstede's Cultural Dimensions

Authors: Reem I. Masoud, Ziquan Liu, Martin Ferianc, Philip Treleaven, Miguel Rodrigues

Comments: 31 pages

Subjects: Computers and Society (cs.CY); Computation and Language (cs.CL); Machine Learning (cs.LG)
[61] arXiv:2309.12689 (replaced) [pdf, other]: Title: AMPLIFY:Attention-based Mixup for Performance Improvement and Label Smoothing in Transformer

Authors: Leixin Yang, Yu Xiang

Journal-ref: https://peerj.com/articles/cs-2011/

Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)
[62] arXiv:2401.12428 (replaced) [pdf, other]: Title: CIM-MLC: A Multi-level Compilation Stack for Computing-In-Memory Accelerators

Authors: Songyun Qu, Shixin Zhao, Bing Li, Yintao He, Xuyi Cai, Lei Zhang, Ying Wang

Comments: 16 pages, 22 figures

Subjects: Hardware Architecture (cs.AR); Computation and Language (cs.CL)
[63] arXiv:2402.07818 (replaced) [pdf, other]: Title: Differentially Private Zeroth-Order Methods for Scalable Large Language Model Finetuning

Authors: Z Liu, J Lou, W Bao, Y Hu, B Li, Z Qin, K Ren

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[64] arXiv:2403.08115 (replaced) [pdf, ps, other]: Title: Legally Binding but Unfair? Towards Assessing Fairness of Privacy Policies

Authors: Vincent Freiberger, Erik Buchmann

Comments: Accepted at IWSPA 2024

Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[65] arXiv:2404.00566 (replaced) [pdf, other]: Title: CodeBenchGen: Creating Scalable Execution-based Code Generation Benchmarks

Authors: Yiqing Xie, Alex Xie, Divyanshu Sheth, Pengfei Liu, Daniel Fried, Carolyn Rose

Subjects: Software Engineering (cs.SE); Computation and Language (cs.CL)
[66] arXiv:2405.03146 (replaced) [pdf, other]: Title: Quantifying the Capabilities of LLMs across Scale and Precision

Authors: Sher Badshah, Hassan Sajjad

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[67] arXiv:2405.03932 (replaced) [pdf, other]: Title: CleanGraph: Human-in-the-loop Knowledge Graph Refinement and Completion

Authors: Tyler Bikaun, Michael Stewart, Wei Liu

Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

New submissions
Cross-lists
Replacements

[ total of 67 entries: 1-67 ]
[ showing up to 2000 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, recent, 2405, contact, help (Access key information)

> cs > cs.CL

Computation and Language

New submissions

New submissions for Thu, 9 May 24

Cross-lists for Thu, 9 May 24

Replacements for Thu, 9 May 24