Artificial Intelligence

New submissions

Submissions received from Wed 8 May 24 to Thu 9 May 24, announced Fri, 10 May 24

New submissions
Cross-lists
Replacements

[ total of 112 entries: 1-112 ]
[ showing up to 1000 entries per page: fewer | more ]

New submissions for Fri, 10 May 24

[1] arXiv:2405.05594 [pdf, other]: Title: Expected Work Search: Combining Win Rate and Proof Size Estimation

Authors: Owen Randall, Martin Müller, Ting Han Wei, Ryan Hayward

Subjects: Artificial Intelligence (cs.AI)

We propose Expected Work Search (EWS), a new game solving algorithm. EWS combines win rate estimation, as used in Monte Carlo Tree Search, with proof size estimation, as used in Proof Number Search. The search efficiency of EWS stems from minimizing a novel notion of Expected Work, which predicts the expected computation required to solve a position. EWS outperforms traditional solving algorithms on the games of Go and Hex. For Go, we present the first solution to the empty 5x5 board with the commonly used positional superko ruleset. For Hex, our algorithm solves the empty 8x8 board in under 4 minutes. Experiments show that EWS succeeds both with and without extensive domain-specific knowledge.
[2] arXiv:2405.05662 [pdf, other]: Title: Approximate Dec-POMDP Solving Using Multi-Agent A*

Authors: Wietze Koops, Sebastian Junges, Nils Jansen

Comments: 19 pages, 3 figures. Extended version (with appendix) of the paper to appear in IJCAI 2024

Subjects: Artificial Intelligence (cs.AI)

We present an A*-based algorithm to compute policies for finite-horizon Dec-POMDPs. Our goal is to sacrifice optimality in favor of scalability for larger horizons. The main ingredients of our approach are (1) using clustered sliding window memory, (2) pruning the A* search tree, and (3) using novel A* heuristics. Our experiments show competitive performance to the state-of-the-art. Moreover, for multiple benchmarks, we achieve superior performance. In addition, we provide an A* algorithm that finds upper bounds for the optimum, tailored towards problems with long horizons. The main ingredient is a new heuristic that periodically reveals the state, thereby limiting the number of reachable beliefs. Our experiments demonstrate the efficacy and scalability of the approach.

Cross-lists for Fri, 10 May 24

[3] arXiv:2405.05275 (cross-list from cs.SI) [pdf, other]: Title: SoMeR: Multi-View User Representation Learning for Social Media

Authors: Siyi Guo, Keith Burghardt, Valeria Pantè, Kristina Lerman

Subjects: Social and Information Networks (cs.SI); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)

User representation learning aims to capture user preferences, interests, and behaviors in low-dimensional vector representations. These representations have widespread applications in recommendation systems and advertising; however, existing methods typically rely on specific features like text content, activity patterns, or platform metadata, failing to holistically model user behavior across different modalities. To address this limitation, we propose SoMeR, a Social Media user Representation learning framework that incorporates temporal activities, text content, profile information, and network interactions to learn comprehensive user portraits. SoMeR encodes user post streams as sequences of timestamped textual features, uses transformers to embed this along with profile data, and jointly trains with link prediction and contrastive learning objectives to capture user similarity. We demonstrate SoMeR's versatility through two applications: 1) Identifying inauthentic accounts involved in coordinated influence operations by detecting users posting similar content simultaneously, and 2) Measuring increased polarization in online discussions after major events by quantifying how users with different beliefs moved farther apart in the embedding space. SoMeR's ability to holistically model users enables new solutions to important problems around disinformation, societal tensions, and online behavior understanding.
[4] arXiv:2405.05285 (cross-list from cs.HC) [pdf, ps, other]: Title: Generative AI as a metacognitive agent: A comparative mixed-method study with human participants on ICF-mimicking exam performance

Authors: Jelena Pavlovic (University of Belgrade, Faculty of Philosophy and Koucing centar Resarch Lab), Jugoslav Krstic, Luka Mitrovic, Djordje Babic, Adrijana Milosavljevic, Milena Nikolic, Tijana Karaklic, Tijana Mitrovic (Koucing centar Research Lab)

Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI)

This study investigates the metacognitive capabilities of Large Language Models relative to human metacognition in the context of the International Coaching Federation ICF mimicking exam, a situational judgment test related to coaching competencies. Using a mixed method approach, we assessed the metacognitive performance, including sensitivity, accuracy in probabilistic predictions, and bias, of human participants and five advanced LLMs (GPT-4, Claude-3-Opus 3, Mistral Large, Llama 3, and Gemini 1.5 Pro). The results indicate that LLMs outperformed humans across all metacognitive metrics, particularly in terms of reduced overconfidence, compared to humans. However, both LLMs and humans showed less adaptability in ambiguous scenarios, adhering closely to predefined decision frameworks. The study suggests that Generative AI can effectively engage in human-like metacognitive processing without conscious awareness. Implications of the study are discussed in relation to development of AI simulators that scaffold cognitive and metacognitive aspects of mastering coaching competencies. More broadly, implications of these results are discussed in relation to development of metacognitive modules that lead towards more autonomous and intuitive AI systems.
[5] arXiv:2405.05286 (cross-list from cs.LG) [pdf, other]: Title: Tiny Deep Ensemble: Uncertainty Estimation in Edge AI Accelerators via Ensembling Normalization Layers with Shared Weights

Authors: Soyed Tuhin Ahmed, Michael Hefenbrock, Mehdi B. Tahoori

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC)

The applications of artificial intelligence (AI) are rapidly evolving, and they are also commonly used in safety-critical domains, such as autonomous driving and medical diagnosis, where functional safety is paramount. In AI-driven systems, uncertainty estimation allows the user to avoid overconfidence predictions and achieve functional safety. Therefore, the robustness and reliability of model predictions can be improved. However, conventional uncertainty estimation methods, such as the deep ensemble method, impose high computation and, accordingly, hardware (latency and energy) overhead because they require the storage and processing of multiple models. Alternatively, Monte Carlo dropout (MC-dropout) methods, although having low memory overhead, necessitate numerous ($\sim 100$) forward passes, leading to high computational overhead and latency. Thus, these approaches are not suitable for battery-powered edge devices with limited computing and memory resources. In this paper, we propose the Tiny-Deep Ensemble approach, a low-cost approach for uncertainty estimation on edge devices. In our approach, only normalization layers are ensembled $M$ times, with all ensemble members sharing common weights and biases, leading to a significant decrease in storage requirements and latency. Moreover, our approach requires only one forward pass in a hardware architecture that allows batch processing for inference and uncertainty estimation. Furthermore, it has approximately the same memory overhead compared to a single model. Therefore, latency and memory overhead are reduced by a factor of up to $\sim M\times$. Nevertheless, our method does not compromise accuracy, with an increase in inference accuracy of up to $\sim 1\%$ and a reduction in RMSE of $17.17\%$ in various benchmark datasets, tasks, and state-of-the-art architectures.
[6] arXiv:2405.05292 (cross-list from cs.HC) [pdf, other]: Title: Smart Portable Computer

Authors: Niladri Das

Comments: 34 pages

Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Robotics (cs.RO)

Amidst the COVID-19 pandemic, with many organizations, schools, colleges, and universities transitioning to virtual platforms, students encountered difficulties in acquiring PCs such as desktops or laptops. The starting prices, around 15,000 INR, often failed to offer adequate system specifications, posing a challenge for consumers. Additionally, those reliant on laptops for work found the conventional approach cumbersome. Enter the "Portable Smart Computer," a leap into the future of computing. This innovative device boasts speed and performance comparable to traditional desktops but in a compact, energy-efficient, and cost-effective package. It delivers a seamless desktop experience, whether one is editing documents, browsing multiple tabs, managing spreadsheets, or creating presentations. Moreover, it supports programming languages like Python, C, C++, as well as compilers such as Keil and Xilinx, catering to the needs of programmers.
[7] arXiv:2405.05295 (cross-list from cs.CV) [pdf, other]: Title: Relevant Irrelevance: Generating Alterfactual Explanations for Image Classifiers

Authors: Silvan Mertes, Tobias Huber, Christina Karle, Katharina Weitz, Ruben Schlagowski, Cristina Conati, Elisabeth André

Comments: Accepted at IJCAI 2024. arXiv admin note: text overlap with arXiv:2207.09374

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

In this paper, we demonstrate the feasibility of alterfactual explanations for black box image classifiers. Traditional explanation mechanisms from the field of Counterfactual Thinking are a widely-used paradigm for Explainable Artificial Intelligence (XAI), as they follow a natural way of reasoning that humans are familiar with. However, most common approaches from this field are based on communicating information about features or characteristics that are especially important for an AI's decision. However, to fully understand a decision, not only knowledge about relevant features is needed, but the awareness of irrelevant information also highly contributes to the creation of a user's mental model of an AI system. To this end, a novel approach for explaining AI systems called alterfactual explanations was recently proposed on a conceptual level. It is based on showing an alternative reality where irrelevant features of an AI's input are altered. By doing so, the user directly sees which input data characteristics can change arbitrarily without influencing the AI's decision. In this paper, we show for the first time that it is possible to apply this idea to black box models based on neural networks. To this end, we present a GAN-based approach to generate these alterfactual explanations for binary image classifiers. Further, we present a user study that gives interesting insights on how alterfactual explanations can complement counterfactual explanations.
[8] arXiv:2405.05299 (cross-list from cs.HC) [pdf, other]: Title: Challenges for Responsible AI Design and Workflow Integration in Healthcare: A Case Study of Automatic Feeding Tube Qualification in Radiology

Authors: Anja Thieme, Abhijith Rajamohan, Benjamin Cooper, Heather Groombridge, Robert Simister, Barney Wong, Nicholas Woznitza, Mark Ames Pinnock, Maria Teodora Wetscherek, Cecily Morrison, Hannah Richardson, Fernando Pérez-García, Stephanie L. Hyland, Shruthi Bannur, Daniel C. Castro, Kenza Bouzid, Anton Schwaighofer, Mercy Ranjit, Harshita Sharma, Matthew P. Lungren, Ozan Oktay, Javier Alvarez-Valle, Aditya Nori, Stephen Harris, Joseph Jacob

Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI)

Nasogastric tubes (NGTs) are feeding tubes that are inserted through the nose into the stomach to deliver nutrition or medication. If not placed correctly, they can cause serious harm, even death to patients. Recent AI developments demonstrate the feasibility of robustly detecting NGT placement from Chest X-ray images to reduce risks of sub-optimally or critically placed NGTs being missed or delayed in their detection, but gaps remain in clinical practice integration. In this study, we present a human-centered approach to the problem and describe insights derived following contextual inquiry and in-depth interviews with 15 clinical stakeholders. The interviews helped understand challenges in existing workflows, and how best to align technical capabilities with user needs and expectations. We discovered the trade-offs and complexities that need consideration when choosing suitable workflow stages, target users, and design configurations for different AI proposals. We explored how to balance AI benefits and risks for healthcare staff and patients within broader organizational and medical-legal constraints. We also identified data issues related to edge cases and data biases that affect model training and evaluation; how data documentation practices influence data preparation and labelling; and how to measure relevant AI outcomes reliably in future evaluations. We discuss how our work informs design and development of AI applications that are clinically useful, ethical, and acceptable in real-world healthcare services.
[9] arXiv:2405.05329 (cross-list from cs.DC) [pdf, other]: Title: KV-Runahead: Scalable Causal LLM Inference by Parallel Key-Value Cache Generation

Authors: Minsik Cho, Mohammad Rastegari, Devang Naik

Comments: preprint for ICML 2024

Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

Large Language Model or LLM inference has two phases, the prompt (or prefill) phase to output the first token and the extension (or decoding) phase to the generate subsequent tokens. In this work, we propose an efficient parallelization scheme, KV-Runahead to accelerate the prompt phase. The key observation is that the extension phase generates tokens faster than the prompt phase because of key-value cache (KV-cache). Hence, KV-Runahead parallelizes the prompt phase by orchestrating multiple processes to populate the KV-cache and minimizes the time-to-first-token (TTFT). Dual-purposing the KV-cache scheme has two main benefits. Fist, since KV-cache is designed to leverage the causal attention map, we minimize computation and computation automatically. Second, since it already exists for the extension phase, KV-Runahead is easy to implement. We further propose context-level load-balancing to handle uneven KV-cache generation (due to the causal attention) and to optimize TTFT. Compared with an existing parallelization scheme such as tensor or sequential parallelization where keys and values are locally generated and exchanged via all-gather collectives, our experimental results demonstrate that KV-Runahead can offer over 1.4x and 1.6x speedups for Llama 7B and Falcon 7B respectively.
[10] arXiv:2405.05336 (cross-list from eess.IV) [pdf, other]: Title: Joint semi-supervised and contrastive learning enables zero-shot domain-adaptation and multi-domain segmentation

Authors: Alvaro Gomariz, Yusuke Kikuchi, Yun Yvonna Li, Thomas Albrecht, Andreas Maunz, Daniela Ferrara, Huanxiang Lu, Orcun Goksel

Subjects: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

Despite their effectiveness, current deep learning models face challenges with images coming from different domains with varying appearance and content. We introduce SegCLR, a versatile framework designed to segment volumetric images across different domains, employing supervised and contrastive learning simultaneously to effectively learn from both labeled and unlabeled data. We demonstrate the superior performance of SegCLR through a comprehensive evaluation involving three diverse clinical datasets of retinal fluid segmentation in 3D Optical Coherence Tomography (OCT), various network configurations, and verification across 10 different network initializations. In an unsupervised domain adaptation context, SegCLR achieves results on par with a supervised upper-bound model trained on the intended target domain. Notably, we discover that the segmentation performance of SegCLR framework is marginally impacted by the abundance of unlabeled data from the target domain, thereby we also propose an effective zero-shot domain adaptation extension of SegCLR, eliminating the need for any target domain information. This shows that our proposed addition of contrastive loss in standard supervised training for segmentation leads to superior models, inherently more generalizable to both in- and out-of-domain test data. We additionally propose a pragmatic solution for SegCLR deployment in realistic scenarios with multiple domains containing labeled data. Accordingly, our framework pushes the boundaries of deep-learning based segmentation in multi-domain applications, regardless of data availability - labeled, unlabeled, or nonexistent.
[11] arXiv:2405.05347 (cross-list from cs.SE) [pdf, other]: Title: Benchmarking Educational Program Repair

Authors: Charles Koutcheme, Nicola Dainese, Sami Sarsa, Juho Leinonen, Arto Hellas, Paul Denny

Comments: 15 pages, 2 figures, 3 tables. Non-archival report presented at the NeurIPS'23 Workshop on Generative AI for Education (GAIED)

Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computers and Society (cs.CY)

The emergence of large language models (LLMs) has sparked enormous interest due to their potential application across a range of educational tasks. For example, recent work in programming education has used LLMs to generate learning resources, improve error messages, and provide feedback on code. However, one factor that limits progress within the field is that much of the research uses bespoke datasets and different evaluation metrics, making direct comparisons between results unreliable. Thus, there is a pressing need for standardization and benchmarks that facilitate the equitable comparison of competing approaches. One task where LLMs show great promise is program repair, which can be used to provide debugging support and next-step hints to students. In this article, we propose a novel educational program repair benchmark. We curate two high-quality publicly available programming datasets, present a unified evaluation procedure introducing a novel evaluation metric rouge@k for approximating the quality of repairs, and evaluate a set of five recent models to establish baseline performance.
[12] arXiv:2405.05348 (cross-list from cs.CL) [pdf, other]: Title: The Effect of Model Size on LLM Post-hoc Explainability via LIME

Authors: Henning Heyen, Amy Widdicombe, Noah Y. Siegel, Maria Perez-Ortiz, Philip Treleaven

Comments: Published at ICLR 2024 Workshop on Secure and Trustworthy Large Language Models

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Large language models (LLMs) are becoming bigger to boost performance. However, little is known about how explainability is affected by this trend. This work explores LIME explanations for DeBERTaV3 models of four different sizes on natural language inference (NLI) and zero-shot classification (ZSC) tasks. We evaluate the explanations based on their faithfulness to the models' internal decision processes and their plausibility, i.e. their agreement with human explanations. The key finding is that increased model size does not correlate with plausibility despite improved model performance, suggesting a misalignment between the LIME explanations and the models' internal processes as model size increases. Our results further suggest limitations regarding faithfulness metrics in NLI contexts.
[13] arXiv:2405.05349 (cross-list from cs.LG) [pdf, other]: Title: Offline Model-Based Optimization via Policy-Guided Gradient Search

Authors: Yassine Chemingui, Aryan Deshwal, Trong Nghia Hoang, Janardhan Rao Doppa

Comments: Published at AAAI Conference on Artificial Intelligence, 2024

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Offline optimization is an emerging problem in many experimental engineering domains including protein, drug or aircraft design, where online experimentation to collect evaluation data is too expensive or dangerous. To avoid that, one has to optimize an unknown function given only its offline evaluation at a fixed set of inputs. A naive solution to this problem is to learn a surrogate model of the unknown function and optimize this surrogate instead. However, such a naive optimizer is prone to erroneous overestimation of the surrogate (possibly due to over-fitting on a biased sample of function evaluation) on inputs outside the offline dataset. Prior approaches addressing this challenge have primarily focused on learning robust surrogate models. However, their search strategies are derived from the surrogate model rather than the actual offline data. To fill this important gap, we introduce a new learning-to-search perspective for offline optimization by reformulating it as an offline reinforcement learning problem. Our proposed policy-guided gradient search approach explicitly learns the best policy for a given surrogate model created from the offline data. Our empirical results on multiple benchmarks demonstrate that the learned optimization policy can be combined with existing offline surrogates to significantly improve the optimization performance.
[14] arXiv:2405.05374 (cross-list from cs.CL) [pdf, other]: Title: Arctic-Embed: Scalable, Efficient, and Accurate Text Embedding Models

Authors: Luke Merrick, Danmei Xu, Gaurav Nuti, Daniel Campos

Comments: 17 pages, 11 Figures, 9 tables

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)

This report describes the training dataset creation and recipe behind the family of \texttt{arctic-embed} text embedding models (a set of five models ranging from 22 to 334 million parameters with weights open-sourced under an Apache-2 license). At the time of their release, each model achieved state-of-the-art retrieval accuracy for models of their size on the MTEB Retrieval leaderboard, with the largest model, arctic-embed-l outperforming closed source embedding models such as Cohere's embed-v3 and Open AI's text-embed-3-large. In addition to the details of our training recipe, we have provided several informative ablation studies, which we believe are the cause of our model performance.
[15] arXiv:2405.05378 (cross-list from cs.CL) [pdf, other]: Title: "They are uncultured": Unveiling Covert Harms and Social Threats in LLM Generated Conversations

Authors: Preetam Prabhu Srikar Dammu, Hayoung Jung, Anjali Singh, Monojit Choudhury, Tanushree Mitra

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)

Large language models (LLMs) have emerged as an integral part of modern societies, powering user-facing applications such as personal assistants and enterprise applications like recruitment tools. Despite their utility, research indicates that LLMs perpetuate systemic biases. Yet, prior works on LLM harms predominantly focus on Western concepts like race and gender, often overlooking cultural concepts from other parts of the world. Additionally, these studies typically investigate "harm" as a singular dimension, ignoring the various and subtle forms in which harms manifest. To address this gap, we introduce the Covert Harms and Social Threats (CHAST), a set of seven metrics grounded in social science literature. We utilize evaluation models aligned with human assessments to examine the presence of covert harms in LLM-generated conversations, particularly in the context of recruitment. Our experiments reveal that seven out of the eight LLMs included in this study generated conversations riddled with CHAST, characterized by malign views expressed in seemingly neutral language unlikely to be detected by existing methods. Notably, these LLMs manifested more extreme views and opinions when dealing with non-Western concepts like caste, compared to Western ones such as race.
[16] arXiv:2405.05429 (cross-list from cs.LG) [pdf, other]: Title: How Inverse Conditional Flows Can Serve as a Substitute for Distributional Regression

Authors: Lucas Kook, Chris Kolb, Philipp Schiele, Daniel Dold, Marcel Arpogaus, Cornelius Fritz, Philipp F. Baumann, Philipp Kopper, Tobias Pielok, Emilio Dorigatti, David Rügamer

Comments: Accepted at UAI 2024

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation (stat.CO); Machine Learning (stat.ML)

Neural network representations of simple models, such as linear regression, are being studied increasingly to better understand the underlying principles of deep learning algorithms. However, neural representations of distributional regression models, such as the Cox model, have received little attention so far. We close this gap by proposing a framework for distributional regression using inverse flow transformations (DRIFT), which includes neural representations of the aforementioned models. We empirically demonstrate that the neural representations of models in DRIFT can serve as a substitute for their classical statistical counterparts in several applications involving continuous, ordered, time-series, and survival outcomes. We confirm that models in DRIFT empirically match the performance of several statistical methods in terms of estimation of partial effects, prediction, and aleatoric uncertainty quantification. DRIFT covers both interpretable statistical models and flexible neural networks opening up new avenues in both statistical modeling and deep learning.
[17] arXiv:2405.05435 (cross-list from cs.CR) [pdf, other]: Title: Analysis and prevention of AI-based phishing email attacks

Authors: Chibuike Samuel Eze, Lior Shamir

Comments: Electronics, accepted

Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

Phishing email attacks are among the most common and most harmful cybersecurity attacks. With the emergence of generative AI, phishing attacks can be based on emails generated automatically, making it more difficult to detect them. That is, instead of a single email format sent to a large number of recipients, generative AI can be used to send each potential victim a different email, making it more difficult for cybersecurity systems to identify the scam email before it reaches the recipient. Here we describe a corpus of AI-generated phishing emails. We also use different machine learning tools to test the ability of automatic text analysis to identify AI-generated phishing emails. The results are encouraging, and show that machine learning tools can identify an AI-generated phishing email with high accuracy compared to regular emails or human-generated scam email. By applying descriptive analytic, the specific differences between AI-generated emails and manually crafted scam emails are profiled, and show that AI-generated emails are different in their style from human-generated phishing email scams. Therefore, automatic identification tools can be used as a warning for the user. The paper also describes the corpus of AI-generated phishing emails that is made open to the public, and can be used for consequent studies. While the ability of machine learning to detect AI-generated phishing email is encouraging, AI-generated phishing emails are different from regular phishing emails, and therefore it is important to train machine learning systems also with AI-generated emails in order to repel future phishing attacks that are powered by generative AI.
[18] arXiv:2405.05439 (cross-list from cs.RO) [pdf, other]: Title: How Generalizable Is My Behavior Cloning Policy? A Statistical Approach to Trustworthy Performance Evaluation

Authors: Joseph A. Vincent, Haruki Nishimura, Masha Itkina, Paarth Shah, Mac Schwager, Thomas Kollar

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Applications (stat.AP)

With the rise of stochastic generative models in robot policy learning, end-to-end visuomotor policies are increasingly successful at solving complex tasks by learning from human demonstrations. Nevertheless, since real-world evaluation costs afford users only a small number of policy rollouts, it remains a challenge to accurately gauge the performance of such policies. This is exacerbated by distribution shifts causing unpredictable changes in performance during deployment. To rigorously evaluate behavior cloning policies, we present a framework that provides a tight lower-bound on robot performance in an arbitrary environment, using a minimal number of experimental policy rollouts. Notably, by applying the standard stochastic ordering to robot performance distributions, we provide a worst-case bound on the entire distribution of performance (via bounds on the cumulative distribution function) for a given task. We build upon established statistical results to ensure that the bounds hold with a user-specified confidence level and tightness, and are constructed from as few policy rollouts as possible. In experiments we evaluate policies for visuomotor manipulation in both simulation and hardware. Specifically, we (i) empirically validate the guarantees of the bounds in simulated manipulation settings, (ii) find the degree to which a learned policy deployed on hardware generalizes to new real-world environments, and (iii) rigorously compare two policies tested in out-of-distribution settings. Our experimental data, code, and implementation of confidence bounds are open-source.
[19] arXiv:2405.05444 (cross-list from cs.CL) [pdf, ps, other]: Title: Evaluating Students' Open-ended Written Responses with LLMs: Using the RAG Framework for GPT-3.5, GPT-4, Claude-3, and Mistral-Large

Authors: Jussi S. Jauhiainen, Agustín Garagorry Guerra

Comments: 18 pages, 6 tables, 1 figure

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Evaluating open-ended written examination responses from students is an essential yet time-intensive task for educators, requiring a high degree of effort, consistency, and precision. Recent developments in Large Language Models (LLMs) present a promising opportunity to balance the need for thorough evaluation with efficient use of educators' time. In our study, we explore the effectiveness of LLMs ChatGPT-3.5, ChatGPT-4, Claude-3, and Mistral-Large in assessing university students' open-ended answers to questions made about reference material they have studied. Each model was instructed to evaluate 54 answers repeatedly under two conditions: 10 times (10-shot) with a temperature setting of 0.0 and 10 times with a temperature of 0.5, expecting a total of 1,080 evaluations per model and 4,320 evaluations across all models. The RAG (Retrieval Augmented Generation) framework was used as the framework to make the LLMs to process the evaluation of the answers. As of spring 2024, our analysis revealed notable variations in consistency and the grading outcomes provided by studied LLMs. There is a need to comprehend strengths and weaknesses of LLMs in educational settings for evaluating open-ended written responses. Further comparative research is essential to determine the accuracy and cost-effectiveness of using LLMs for educational assessments.
[20] arXiv:2405.05446 (cross-list from cs.CV) [pdf, other]: Title: GDGS: Gradient Domain Gaussian Splatting for Sparse Representation of Radiance Fields

Authors: Yuanhao Gong

Comments: arXiv admin note: text overlap with arXiv:2404.09105

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR); Machine Learning (cs.LG); Image and Video Processing (eess.IV)

The 3D Gaussian splatting methods are getting popular. However, they work directly on the signal, leading to a dense representation of the signal. Even with some techniques such as pruning or distillation, the results are still dense. In this paper, we propose to model the gradient of the original signal. The gradients are much sparser than the original signal. Therefore, the gradients use much less Gaussian splats, leading to the more efficient storage and thus higher computational performance during both training and rendering. Thanks to the sparsity, during the view synthesis, only a small mount of pixels are needed, leading to much higher computational performance ($100\sim 1000\times$ faster). And the 2D image can be recovered from the gradients via solving a Poisson equation with linear computation complexity. Several experiments are performed to confirm the sparseness of the gradients and the computation performance of the proposed method. The method can be applied various applications, such as human body modeling and indoor environment modeling.
[21] arXiv:2405.05466 (cross-list from cs.CL) [pdf, other]: Title: Poser: Unmasking Alignment Faking LLMs by Manipulating Their Internals

Authors: Joshua Clymer, Caden Juang, Severin Field

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Like a criminal under investigation, Large Language Models (LLMs) might pretend to be aligned while evaluated and misbehave when they have a good opportunity. Can current interpretability methods catch these 'alignment fakers?' To answer this question, we introduce a benchmark that consists of 324 pairs of LLMs fine-tuned to select actions in role-play scenarios. One model in each pair is consistently benign (aligned). The other model misbehaves in scenarios where it is unlikely to be caught (alignment faking). The task is to identify the alignment faking model using only inputs where the two models behave identically. We test five detection strategies, one of which identifies 98% of alignment-fakers.
[22] arXiv:2405.05467 (cross-list from cs.SD) [pdf, other]: Title: AFEN: Respiratory Disease Classification using Ensemble Learning

Authors: Rahul Nadkarni, Emmanouil Nikolakakis, Razvan Marinescu

Comments: Under Review Process for MLForHC 2024

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)

We present AFEN (Audio Feature Ensemble Learning), a model that leverages Convolutional Neural Networks (CNN) and XGBoost in an ensemble learning fashion to perform state-of-the-art audio classification for a range of respiratory diseases. We use a meticulously selected mix of audio features which provide the salient attributes of the data and allow for accurate classification. The extracted features are then used as an input to two separate model classifiers 1) a multi-feature CNN classifier and 2) an XGBoost Classifier. The outputs of the two models are then fused with the use of soft voting. Thus, by exploiting ensemble learning, we achieve increased robustness and accuracy. We evaluate the performance of the model on a database of 920 respiratory sounds, which undergoes data augmentation techniques to increase the diversity of the data and generalizability of the model. We empirically verify that AFEN sets a new state-of-the-art using Precision and Recall as metrics, while decreasing training time by 60%.
[23] arXiv:2405.05480 (cross-list from cs.AR) [pdf, other]: Title: FloorSet -- a VLSI Floorplanning Dataset with Design Constraints of Real-World SoCs

Authors: Uday Mallappa, Hesham Mostafa, Mikhail Galkin, Mariano Phielipp, Somdeb Majumdar

Comments: 10 pages, 11 figures

Subjects: Hardware Architecture (cs.AR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Floorplanning for systems-on-a-chip (SoCs) and its sub-systems is a crucial and non-trivial step of the physical design flow. It represents a difficult combinatorial optimization problem. A typical large scale SoC with 120 partitions generates a search-space of nearly 10E250. As novel machine learning (ML) approaches emerge to tackle such problems, there is a growing need for a modern benchmark that comprises a large training dataset and performance metrics that better reflect real-world constraints and objectives compared to existing benchmarks. To address this need, we present FloorSet -- two comprehensive datasets of synthetic fixed-outline floorplan layouts that reflect the distribution of real SoCs. Each dataset has 1M training samples and 100 test samples where each sample is a synthetic floor-plan. FloorSet-Prime comprises fully-abutted rectilinear partitions and near-optimal wire-length. A simplified dataset that reflects early design phases, FloorSet-Lite comprises rectangular partitions, with under 5 percent white-space and near-optimal wire-length. Both datasets define hard constraints seen in modern design flows such as shape constraints, edge-affinity, grouping constraints, and pre-placement constraints. FloorSet is intended to spur fundamental research on large-scale constrained optimization problems. Crucially, FloorSet alleviates the core issue of reproducibility in modern ML driven solutions to such problems. FloorSet is available as an open-source repository for the research community.
[24] arXiv:2405.05492 (cross-list from math.DG) [pdf, other]: Title: A logifold structure on measure space

Authors: Inkee Jung, Siu-Cheong Lau

Comments: 43 pages, 4 figures

Subjects: Differential Geometry (math.DG); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Probability (math.PR)

In this paper,we develop a local-to-global and measure-theoretical approach to understand datasets. The idea is to take network models with restricted domains as local charts of datasets. We develop the mathematical foundations for these structures, and show in experiments how it can be used to find fuzzy domains and to improve accuracy in data classification problems.
[25] arXiv:2405.05493 (cross-list from cs.CL) [pdf, ps, other]: Title: Parameter-Efficient Fine-Tuning With Adapters

Authors: Keyu Chen, Yuan Pang, Zi Yang

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

In the arena of language model fine-tuning, the traditional approaches, such as Domain-Adaptive Pretraining (DAPT) and Task-Adaptive Pretraining (TAPT), although effective, but computational intensive. This research introduces a novel adaptation method utilizing the UniPELT framework as a base and added a PromptTuning Layer, which significantly reduces the number of trainable parameters while maintaining competitive performance across various benchmarks. Our method employs adapters, which enable efficient transfer of pretrained models to new tasks with minimal retraining of the base model parameters. We evaluate our approach using three diverse datasets: the GLUE benchmark, a domain-specific dataset comprising four distinct areas, and the Stanford Question Answering Dataset 1.1 (SQuAD). Our results demonstrate that our customized adapter-based method achieves performance comparable to full model fine-tuning, DAPT+TAPT and UniPELT strategies while requiring fewer or equivalent amount of parameters. This parameter efficiency not only alleviates the computational burden but also expedites the adaptation process. The study underlines the potential of adapters in achieving high performance with significantly reduced resource consumption, suggesting a promising direction for future research in parameter-efficient fine-tuning.
[26] arXiv:2405.05499 (cross-list from cs.LG) [pdf, other]: Title: Multi-Scale Dilated Convolution Network for Long-Term Time Series Forecasting

Authors: Feifei Li, Suhan Guo, Feng Han, Jian Zhao, Furao Shen

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Accurate forecasting of long-term time series has important applications for decision making and planning. However, it remains challenging to capture the long-term dependencies in time series data. To better extract long-term dependencies, We propose Multi Scale Dilated Convolution Network (MSDCN), a method that utilizes a shallow dilated convolution architecture to capture the period and trend characteristics of long time series. We design different convolution blocks with exponentially growing dilations and varying kernel sizes to sample time series data at different scales. Furthermore, we utilize traditional autoregressive model to capture the linear relationships within the data. To validate the effectiveness of the proposed approach, we conduct experiments on eight challenging long-term time series forecasting benchmark datasets. The experimental results show that our approach outperforms the prior state-of-the-art approaches and shows significant inference speed improvements compared to several strong baseline methods.
[27] arXiv:2405.05508 (cross-list from cs.IR) [pdf, other]: Title: Redefining Information Retrieval of Structured Database via Large Language Models

Authors: Mingzhu Wang, Yuzhe Zhang, Qihang Zhao, Juanyi Yang, Hong Zhang

Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)

Retrieval augmentation is critical when Language Models (LMs) exploit non-parametric knowledge related to the query through external knowledge bases before reasoning. The retrieved information is incorporated into LMs as context alongside the query, enhancing the reliability of responses towards factual questions. Prior researches in retrieval augmentation typically follow a retriever-generator paradigm. In this context, traditional retrievers encounter challenges in precisely and seamlessly extracting query-relevant information from knowledge bases. To address this issue, this paper introduces a novel retrieval augmentation framework called ChatLR that primarily employs the powerful semantic understanding ability of Large Language Models (LLMs) as retrievers to achieve precise and concise information retrieval. Additionally, we construct an LLM-based search and question answering system tailored for the financial domain by fine-tuning LLM on two tasks including Text2API and API-ID recognition. Experimental results demonstrate the effectiveness of ChatLR in addressing user queries, achieving an overall information retrieval accuracy exceeding 98.8\%.
[28] arXiv:2405.05512 (cross-list from cs.LG) [pdf, other]: Title: Characteristic Learning for Provable One Step Generation

Authors: Zhao Ding, Chenguang Duan, Yuling Jiao, Ruoxuan Li, Jerry Zhijian Yang, Pingwen Zhang

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Numerical Analysis (math.NA); Statistics Theory (math.ST)

We propose the characteristic generator, a novel one-step generative model that combines the efficiency of sampling in Generative Adversarial Networks (GANs) with the stable performance of flow-based models. Our model is driven by characteristics, along which the probability density transport can be described by ordinary differential equations (ODEs). Specifically, We estimate the velocity field through nonparametric regression and utilize Euler method to solve the probability flow ODE, generating a series of discrete approximations to the characteristics. We then use a deep neural network to fit these characteristics, ensuring a one-step mapping that effectively pushes the prior distribution towards the target distribution. In the theoretical aspect, we analyze the errors in velocity matching, Euler discretization, and characteristic fitting to establish a non-asymptotic convergence rate for the characteristic generator in 2-Wasserstein distance. To the best of our knowledge, this is the first thorough analysis for simulation-free one step generative models. Additionally, our analysis refines the error analysis of flow-based generative models in prior works. We apply our method on both synthetic and real datasets, and the results demonstrate that the characteristic generator achieves high generation quality with just a single evaluation of neural network.
[29] arXiv:2405.05523 (cross-list from cs.CV) [pdf, other]: Title: Prompt When the Animal is: Temporal Animal Behavior Grounding with Positional Recovery Training

Authors: Sheng Yan, Xin Du, Zongying Li, Yi Wang, Hongcang Jin, Mengyuan Liu

Comments: Accepted by ICMEW 2024. arXiv admin note: text overlap with arXiv:2404.13657

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Temporal grounding is crucial in multimodal learning, but it poses challenges when applied to animal behavior data due to the sparsity and uniform distribution of moments. To address these challenges, we propose a novel Positional Recovery Training framework (Port), which prompts the model with the start and end times of specific animal behaviors during training. Specifically, Port enhances the baseline model with a Recovering part to predict flipped label sequences and align distributions with a Dual-alignment method. This allows the model to focus on specific temporal regions prompted by ground-truth information. Extensive experiments on the Animal Kingdom dataset demonstrate the effectiveness of Port, achieving an IoU@0.3 of 38.52. It emerges as one of the top performers in the sub-track of MMVRAC in ICME 2024 Grand Challenges.
[30] arXiv:2405.05553 (cross-list from cs.CV) [pdf, other]: Title: Towards Robust Physical-world Backdoor Attacks on Lane Detection

Authors: Xinwei Zhang, Aishan Liu, Tianyuan Zhang, Siyuan Liang, Xianglong Liu

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Deep learning-based lane detection (LD) plays a critical role in autonomous driving systems, such as adaptive cruise control. However, it is vulnerable to backdoor attacks. Existing backdoor attack methods on LD exhibit limited effectiveness in dynamic real-world scenarios, primarily because they fail to consider dynamic scene factors, including changes in driving perspectives (e.g., viewpoint transformations) and environmental conditions (e.g., weather or lighting changes). To tackle this issue, this paper introduces BadLANE, a dynamic scene adaptation backdoor attack for LD designed to withstand changes in real-world dynamic scene factors. To address the challenges posed by changing driving perspectives, we propose an amorphous trigger pattern composed of shapeless pixels. This trigger design allows the backdoor to be activated by various forms or shapes of mud spots or pollution on the road or lens, enabling adaptation to changes in vehicle observation viewpoints during driving. To mitigate the effects of environmental changes, we design a meta-learning framework to train meta-generators tailored to different environmental conditions. These generators produce meta-triggers that incorporate diverse environmental information, such as weather or lighting conditions, as the initialization of the trigger patterns for backdoor implantation, thus enabling adaptation to dynamic environments. Extensive experiments on various commonly used LD models in both digital and physical domains validate the effectiveness of our attacks, outperforming other baselines significantly (+25.15\% on average in Attack Success Rate). Our codes will be available upon paper publication.
[31] arXiv:2405.05572 (cross-list from cs.CL) [pdf, other]: Title: From Human Judgements to Predictive Models: Unravelling Acceptability in Code-Mixed Sentences

Authors: Prashant Kodali, Anmol Goel, Likhith Asapu, Vamshi Krishna Bonagiri, Anirudh Govil, Monojit Choudhury, Manish Shrivastava, Ponnurangam Kumaraguru

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Current computational approaches for analysing or generating code-mixed sentences do not explicitly model "naturalness" or "acceptability" of code-mixed sentences, but rely on training corpora to reflect distribution of acceptable code-mixed sentences. Modelling human judgement for the acceptability of code-mixed text can help in distinguishing natural code-mixed text and enable quality-controlled generation of code-mixed text. To this end, we construct Cline - a dataset containing human acceptability judgements for English-Hindi (en-hi) code-mixed text. Cline is the largest of its kind with 16,642 sentences, consisting of samples sourced from two sources: synthetically generated code-mixed text and samples collected from online social media. Our analysis establishes that popular code-mixing metrics such as CMI, Number of Switch Points, Burstines, which are used to filter/curate/compare code-mixed corpora have low correlation with human acceptability judgements, underlining the necessity of our dataset. Experiments using Cline demonstrate that simple Multilayer Perceptron (MLP) models trained solely on code-mixing metrics are outperformed by fine-tuned pre-trained Multilingual Large Language Models (MLLMs). Specifically, XLM-Roberta and Bernice outperform IndicBERT across different configurations in challenging data settings. Comparison with ChatGPT's zero and fewshot capabilities shows that MLLMs fine-tuned on larger data outperform ChatGPT, providing scope for improvement in code-mixed tasks. Zero-shot transfer from English-Hindi to English-Telugu acceptability judgments using our model checkpoints proves superior to random baselines, enabling application to other code-mixed language pairs and providing further avenues of research. We publicly release our human-annotated dataset, trained checkpoints, code-mix corpus, and code for data generation and model training.
[32] arXiv:2405.05581 (cross-list from cs.HC) [pdf, other]: Title: One vs. Many: Comprehending Accurate Information from Multiple Erroneous and Inconsistent AI Generations

Authors: Yoonjoo Lee, Kihoon Son, Tae Soo Kim, Jisu Kim, John Joon Young Chung, Eytan Adar, Juho Kim

Comments: Accepted to FAccT 2024

Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

As Large Language Models (LLMs) are nondeterministic, the same input can generate different outputs, some of which may be incorrect or hallucinated. If run again, the LLM may correct itself and produce the correct answer. Unfortunately, most LLM-powered systems resort to single results which, correct or not, users accept. Having the LLM produce multiple outputs may help identify disagreements or alternatives. However, it is not obvious how the user will interpret conflicts or inconsistencies. To this end, we investigate how users perceive the AI model and comprehend the generated information when they receive multiple, potentially inconsistent, outputs. Through a preliminary study, we identified five types of output inconsistencies. Based on these categories, we conducted a study (N=252) in which participants were given one or more LLM-generated passages to an information-seeking question. We found that inconsistency within multiple LLM-generated outputs lowered the participants' perceived AI capacity, while also increasing their comprehension of the given information. Specifically, we observed that this positive effect of inconsistencies was most significant for participants who read two passages, compared to those who read three. Based on these findings, we present design implications that, instead of regarding LLM output inconsistencies as a drawback, we can reveal the potential inconsistencies to transparently indicate the limitations of these models and promote critical LLM usage.
[33] arXiv:2405.05584 (cross-list from cs.CV) [pdf, other]: Title: A Survey on Backbones for Deep Video Action Recognition

Authors: Zixuan Tang, Youjun Zhao, Yuhang Wen, Mengyuan Liu

Comments: This paper has been accepted by ICME workshop

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Action recognition is a key technology in building interactive metaverses. With the rapid development of deep learning, methods in action recognition have also achieved great advancement. Researchers design and implement the backbones referring to multiple standpoints, which leads to the diversity of methods and encountering new challenges. This paper reviews several action recognition methods based on deep neural networks. We introduce these methods in three parts: 1) Two-Streams networks and their variants, which, specifically in this paper, use RGB video frame and optical flow modality as input; 2) 3D convolutional networks, which make efforts in taking advantage of RGB modality directly while extracting different motion information is no longer necessary; 3) Transformer-based methods, which introduce the model from natural language processing into computer vision and video understanding. We offer objective sights in this review and hopefully provide a reference for future research.
[34] arXiv:2405.05616 (cross-list from cs.CL) [pdf, other]: Title: G-SAP: Graph-based Structure-Aware Prompt Learning over Heterogeneous Knowledge for Commonsense Reasoning

Authors: Ruiting Dai, Yuqiao Tan, Lisi Mo, Shuang Liang, Guohao Huo, Jiayi Luo, Yao Cheng

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Commonsense question answering has demonstrated considerable potential across various applications like assistants and social robots. Although fully fine-tuned pre-trained Language Models(LM) have achieved remarkable performance in commonsense reasoning, their tendency to excessively prioritize textual information hampers the precise transfer of structural knowledge and undermines interpretability. Some studies have explored combining LMs with Knowledge Graphs(KGs) by coarsely fusing the two modalities to perform Graph Neural Network(GNN)-based reasoning that lacks a profound interaction between heterogeneous modalities. In this paper, we propose a novel Graph-based Structure-Aware Prompt Learning Model for commonsense reasoning, named G-SAP, aiming to maintain a balance between heterogeneous knowledge and enhance the cross-modal interaction within the LM+GNNs model. In particular, an evidence graph is constructed by integrating multiple knowledge sources, i.e. ConceptNet, Wikipedia, and Cambridge Dictionary to boost the performance. Afterward, a structure-aware frozen PLM is employed to fully incorporate the structured and textual information from the evidence graph, where the generation of prompts is driven by graph entities and relations. Finally, a heterogeneous message-passing reasoning module is used to facilitate deep interaction of knowledge between the LM and graph-based networks. Empirical validation, conducted through extensive experiments on three benchmark datasets, demonstrates the notable performance of the proposed model. The results reveal a significant advancement over the existing models, especially, with 6.12% improvement over the SoTA LM+GNNs model on the OpenbookQA dataset.
[35] arXiv:2405.05636 (cross-list from cs.CV) [pdf, other]: Title: SwapTalk: Audio-Driven Talking Face Generation with One-Shot Customization in Latent Space

Authors: Zeren Zhang, Haibo Qin, Jiayu Huang, Yixin Li, Hui Lin, Yitao Duan, Jinwen Ma

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Combining face swapping with lip synchronization technology offers a cost-effective solution for customized talking face generation. However, directly cascading existing models together tends to introduce significant interference between tasks and reduce video clarity because the interaction space is limited to the low-level semantic RGB space. To address this issue, we propose an innovative unified framework, SwapTalk, which accomplishes both face swapping and lip synchronization tasks in the same latent space. Referring to recent work on face generation, we choose the VQ-embedding space due to its excellent editability and fidelity performance. To enhance the framework's generalization capabilities for unseen identities, we incorporate identity loss during the training of the face swapping module. Additionally, we introduce expert discriminator supervision within the latent space during the training of the lip synchronization module to elevate synchronization quality. In the evaluation phase, previous studies primarily focused on the self-reconstruction of lip movements in synchronous audio-visual videos. To better approximate real-world applications, we expand the evaluation scope to asynchronous audio-video scenarios. Furthermore, we introduce a novel identity consistency metric to more comprehensively assess the identity consistency over time series in generated facial videos. Experimental results on the HDTF demonstrate that our method significantly surpasses existing techniques in video quality, lip synchronization accuracy, face swapping fidelity, and identity consistency. Our demo is available at this http URL
[36] arXiv:2405.05695 (cross-list from cs.LG) [pdf, other]: Title: Aux-NAS: Exploiting Auxiliary Labels with Negligibly Extra Inference Cost

Authors: Yuan Gao, Weizhong Zhang, Wenhan Luo, Lin Ma, Jin-Gang Yu, Gui-Song Xia, Jiayi Ma

Comments: Accepted to ICLR 2024

Journal-ref: International Conference on Learning Representations (ICLR), 2024

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

We aim at exploiting additional auxiliary labels from an independent (auxiliary) task to boost the primary task performance which we focus on, while preserving a single task inference cost of the primary task. While most existing auxiliary learning methods are optimization-based relying on loss weights/gradients manipulation, our method is architecture-based with a flexible asymmetric structure for the primary and auxiliary tasks, which produces different networks for training and inference. Specifically, starting from two single task networks/branches (each representing a task), we propose a novel method with evolving networks where only primary-to-auxiliary links exist as the cross-task connections after convergence. These connections can be removed during the primary task inference, resulting in a single-task inference cost. We achieve this by formulating a Neural Architecture Search (NAS) problem, where we initialize bi-directional connections in the search space and guide the NAS optimization converging to an architecture with only the single-side primary-to-auxiliary connections. Moreover, our method can be incorporated with optimization-based auxiliary learning approaches. Extensive experiments with six tasks on NYU v2, CityScapes, and Taskonomy datasets using VGG, ResNet, and ViT backbones validate the promising performance. The codes are available at https://github.com/ethanygao/Aux-NAS.
[37] arXiv:2405.05723 (cross-list from cs.CL) [pdf, other]: Title: Computational lexical analysis of Flamenco genres

Authors: Pablo Rosillo-Rodes, Maxi San Miguel, David Sanchez

Comments: 21 pages, 29 figures

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)

Flamenco, recognized by UNESCO as part of the Intangible Cultural Heritage of Humanity, is a profound expression of cultural identity rooted in Andalusia, Spain. However, there is a lack of quantitative studies that help identify characteristic patterns in this long-lived music tradition. In this work, we present a computational analysis of Flamenco lyrics, employing natural language processing and machine learning to categorize over 2000 lyrics into their respective Flamenco genres, termed as $\textit{palos}$. Using a Multinomial Naive Bayes classifier, we find that lexical variation across styles enables to accurately identify distinct $\textit{palos}$. More importantly, from an automatic method of word usage, we obtain the semantic fields that characterize each style. Further, applying a metric that quantifies the inter-genre distance we perform a network analysis that sheds light on the relationship between Flamenco styles. Remarkably, our results suggest historical connections and $\textit{palo}$ evolutions. Overall, our work illuminates the intricate relationships and cultural significance embedded within Flamenco lyrics, complementing previous qualitative discussions with quantitative analyses and sparking new discussions on the origin and development of traditional music genres.
[38] arXiv:2405.05741 (cross-list from cs.CL) [pdf, ps, other]: Title: Can large language models understand uncommon meanings of common words?

Authors: Jinyang Wu, Feihu Che, Xinxin Zheng, Shuai Zhang, Ruihan Jin, Shuai Nie, Pengpeng Shao, Jianhua Tao

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Large language models (LLMs) like ChatGPT have shown significant advancements across diverse natural language understanding (NLU) tasks, including intelligent dialogue and autonomous agents. Yet, lacking widely acknowledged testing mechanisms, answering `whether LLMs are stochastic parrots or genuinely comprehend the world' remains unclear, fostering numerous studies and sparking heated debates. Prevailing research mainly focuses on surface-level NLU, neglecting fine-grained explorations. However, such explorations are crucial for understanding their unique comprehension mechanisms, aligning with human cognition, and finally enhancing LLMs' general NLU capacities. To address this gap, our study delves into LLMs' nuanced semantic comprehension capabilities, particularly regarding common words with uncommon meanings. The idea stems from foundational principles of human communication within psychology, which underscore accurate shared understandings of word semantics. Specifically, this paper presents the innovative construction of a Lexical Semantic Comprehension (LeSC) dataset with novel evaluation metrics, the first benchmark encompassing both fine-grained and cross-lingual dimensions. Introducing models of both open-source and closed-source, varied scales and architectures, our extensive empirical experiments demonstrate the inferior performance of existing models in this basic lexical-meaning understanding task. Notably, even the state-of-the-art LLMs GPT-4 and GPT-3.5 lag behind 16-year-old humans by 3.9% and 22.3%, respectively. Additionally, multiple advanced prompting techniques and retrieval-augmented generation are also introduced to help alleviate this trouble, yet limitations persist. By highlighting the above critical shortcomings, this research motivates further investigation and offers novel insights for developing more intelligent LLMs.
[39] arXiv:2405.05751 (cross-list from cs.LG) [pdf, other]: Title: A Multi-Level Superoptimizer for Tensor Programs

Authors: Mengdi Wu, Xinhao Cheng, Oded Padon, Zhihao Jia

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Programming Languages (cs.PL)

We introduce Mirage, the first multi-level superoptimizer for tensor programs. A key idea in Mirage is $\mu$Graphs, a uniform representation of tensor programs at the kernel, thread block, and thread levels of the GPU compute hierarchy. $\mu$Graphs enable Mirage to discover novel optimizations that combine algebraic transformations, schedule transformations, and generation of new custom kernels. To navigate the large search space, Mirage introduces a pruning technique based on abstraction that significantly reduces the search space and provides a certain optimality guarantee. To ensure that the optimized $\mu$Graph is equivalent to the input program, Mirage introduces a probabilistic equivalence verification procedure with strong theoretical guarantees. Our evaluation shows that Mirage outperforms existing approaches by up to 3.5$\times$ even for DNNs that are widely used and heavily optimized. Mirage is publicly available at https://github.com/mirage-project/mirage.
[40] arXiv:2405.05755 (cross-list from cs.CV) [pdf, other]: Title: CSA-Net: Channel-wise Spatially Autocorrelated Attention Networks

Authors: Nick (M) Nikzad, Yongsheng Gao, Jun Zhou

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

In recent years, convolutional neural networks (CNNs) with channel-wise feature refining mechanisms have brought noticeable benefits to modelling channel dependencies. However, current attention paradigms fail to infer an optimal channel descriptor capable of simultaneously exploiting statistical and spatial relationships among feature maps. In this paper, to overcome this shortcoming, we present a novel channel-wise spatially autocorrelated (CSA) attention mechanism. Inspired by geographical analysis, the proposed CSA exploits the spatial relationships between channels of feature maps to produce an effective channel descriptor. To the best of our knowledge, this is the f irst time that the concept of geographical spatial analysis is utilized in deep CNNs. The proposed CSA imposes negligible learning parameters and light computational overhead to the deep model, making it a powerful yet efficient attention module of choice. We validate the effectiveness of the proposed CSA networks (CSA-Nets) through extensive experiments and analysis on ImageNet, and MS COCO benchmark datasets for image classification, object detection, and instance segmentation. The experimental results demonstrate that CSA-Nets are able to consistently achieve competitive performance and superior generalization than several state-of-the-art attention-based CNNs over different benchmark tasks and datasets.
[41] arXiv:2405.05763 (cross-list from cs.CV) [pdf, ps, other]: Title: DP-MDM: Detail-Preserving MR Reconstruction via Multiple Diffusion Models

Authors: Mengxiao Geng, Jiahao Zhu, Xiaolin Zhu, Qiqing Liu, Dong Liang, Qiegen Liu

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Detail features of magnetic resonance images play a cru-cial role in accurate medical diagnosis and treatment, as they capture subtle changes that pose challenges for doc-tors when performing precise judgments. However, the widely utilized naive diffusion model has limitations, as it fails to accurately capture more intricate details. To en-hance the quality of MRI reconstruction, we propose a comprehensive detail-preserving reconstruction method using multiple diffusion models to extract structure and detail features in k-space domain instead of image do-main. Moreover, virtual binary modal masks are utilized to refine the range of values in k-space data through highly adaptive center windows, which allows the model to focus its attention more efficiently. Last but not least, an inverted pyramid structure is employed, where the top-down image information gradually decreases, ena-bling a cascade representation. The framework effective-ly represents multi-scale sampled data, taking into ac-count the sparsity of the inverted pyramid architecture, and utilizes cascade training data distribution to repre-sent multi-scale data. Through a step-by-step refinement approach, the method refines the approximation of de-tails. Finally, the proposed method was evaluated by con-ducting experiments on clinical and public datasets. The results demonstrate that the proposed method outper-forms other methods.
[42] arXiv:2405.05766 (cross-list from cs.CV) [pdf, other]: Title: To Trust or Not to Trust: Towards a novel approach to measure trust for XAI systems

Authors: Miquel Miró-Nicolau, Gabriel Moyà-Alcover, Antoni Jaume-i-Capó, Manuel González-Hidalgo, Maria Gemma Sempere Campello, Juan Antonio Palmer Sancho

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

The increasing reliance on Deep Learning models, combined with their inherent lack of transparency, has spurred the development of a novel field of study known as eXplainable AI (XAI) methods. These methods seek to enhance the trust of end-users in automated systems by providing insights into the rationale behind their decisions. This paper presents a novel approach for measuring user trust in XAI systems, allowing their refinement. Our proposed metric combines both performance metrics and trust indicators from an objective perspective. To validate this novel methodology, we conducted a case study in a realistic medical scenario: the usage of XAI system for the detection of pneumonia from x-ray images.
[43] arXiv:2405.05777 (cross-list from cs.CL) [pdf, other]: Title: Towards a More Inclusive AI: Progress and Perspectives in Large Language Model Training for the Sámi Language

Authors: Ronny Paul, Himanshu Buckchash, Shantipriya Parida, Dilip K. Prasad

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

S\'ami, an indigenous language group comprising multiple languages, faces digital marginalization due to the limited availability of data and sophisticated language models designed for its linguistic intricacies. This work focuses on increasing technological participation for the S\'ami language. We draw the attention of the ML community towards the language modeling problem of Ultra Low Resource (ULR) languages. ULR languages are those for which the amount of available textual resources is very low, and the speaker count for them is also very low. ULRLs are also not supported by mainstream Large Language Models (LLMs) like ChatGPT, due to which gathering artificial training data for them becomes even more challenging. Mainstream AI foundational model development has given less attention to this category of languages. Generally, these languages have very few speakers, making it hard to find them. However, it is important to develop foundational models for these ULR languages to promote inclusion and the tangible abilities and impact of LLMs. To this end, we have compiled the available S\'ami language resources from the web to create a clean dataset for training language models. In order to study the behavior of modern LLM models with ULR languages (S\'ami), we have experimented with different kinds of LLMs, mainly at the order of $\sim$ seven billion parameters. We have also explored the effect of multilingual LLM training for ULRLs. We found that the decoder-only models under a sequential multilingual training scenario perform better than joint multilingual training, whereas multilingual training with high semantic overlap, in general, performs better than training from scratch.This is the first study on the S\'ami language for adapting non-statistical language models that use the latest developments in the field of natural language processing (NLP).
[44] arXiv:2405.05790 (cross-list from cs.CE) [pdf, ps, other]: Title: A Robust eLORETA Technique for Localization of Brain Sources in the Presence of Forward Model Uncertainties

Authors: A. Noroozi, M. Ravan, B. Razavi, R. S. Fisher, Y. Law, M. S. Hasan

Subjects: Computational Engineering, Finance, and Science (cs.CE); Artificial Intelligence (cs.AI); Quantitative Methods (q-bio.QM)

In this paper, we present a robust version of the well-known exact low-resolution electromagnetic tomography (eLORETA) technique, named ReLORETA, to localize brain sources in the presence of different forward model uncertainties. Methods: We first assume that the true lead field matrix is a transformation of the existing lead field matrix distorted by uncertainties and propose an iterative approach to estimate this transformation accurately. Major sources of the forward model uncertainties, including differences in geometry, conductivity, and source space resolution between the real and simulated head models, and misaligned electrode positions, are then simulated to test the proposed method. Results: ReLORETA and eLORETA are applied to simulated focal sources in different regions of the brain and the presence of various noise levels as well as real data from a patient with focal epilepsy. The results show that ReLORETA is considerably more robust and accurate than eLORETA in all cases. Conclusion: Having successfully dealt with the forward model uncertainties, ReLORETA proved to be a promising method for real-world clinical applications. Significance: eLORETA is one of the localization techniques that could be used to study brain activity for medical applications such as determining the epileptogenic zone in patients with medically refractory epilepsy. However, the major limitation of eLORETA is sensitivity to the uncertainties in the forward model. Since this problem can substantially undermine its performance in real-world applications where the exact lead field matrix is unknown, developing a more robust method capable of dealing with these uncertainties is of significant interest.
[45] arXiv:2405.05792 (cross-list from cs.RO) [pdf, other]: Title: RoboHop: Segment-based Topological Map Representation for Open-World Visual Navigation

Authors: Sourav Garg, Krishan Rana, Mehdi Hosseinzadeh, Lachlan Mares, Niko Sünderhauf, Feras Dayoub, Ian Reid

Comments: Published at ICRA 2024; 9 pages, 8 figures

Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)

Mapping is crucial for spatial reasoning, planning and robot navigation. Existing approaches range from metric, which require precise geometry-based optimization, to purely topological, where image-as-node based graphs lack explicit object-level reasoning and interconnectivity. In this paper, we propose a novel topological representation of an environment based on "image segments", which are semantically meaningful and open-vocabulary queryable, conferring several advantages over previous works based on pixel-level features. Unlike 3D scene graphs, we create a purely topological graph with segments as nodes, where edges are formed by a) associating segment-level descriptors between pairs of consecutive images and b) connecting neighboring segments within an image using their pixel centroids. This unveils a "continuous sense of a place", defined by inter-image persistence of segments along with their intra-image neighbours. It further enables us to represent and update segment-level descriptors through neighborhood aggregation using graph convolution layers, which improves robot localization based on segment-level retrieval. Using real-world data, we show how our proposed map representation can be used to i) generate navigation plans in the form of "hops over segments" and ii) search for target objects using natural language queries describing spatial relations of objects. Furthermore, we quantitatively analyze data association at the segment level, which underpins inter-image connectivity during mapping and segment-level localization when revisiting the same place. Finally, we show preliminary trials on segment-level `hopping' based zero-shot real-world navigation. Project page with supplementary details: oravus.github.io/RoboHop/
[46] arXiv:2405.05802 (cross-list from cs.DC) [pdf, other]: Title: Deploying Graph Neural Networks in Wireless Networks: A Link Stability Viewpoint

Authors: Jun Li, Weiwei Zhang, Kang Wei, Guangji Chen, Long Shi, Wen Chen

Comments: 5 pages,3 figures

Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI)

As an emerging artificial intelligence technology, graph neural networks (GNNs) have exhibited promising performance across a wide range of graph-related applications. However, information exchanges among neighbor nodes in GNN pose new challenges in the resource-constrained scenario, especially in wireless systems. In practical wireless systems, the communication links among nodes are usually unreliable due to wireless fading and receiver noise, consequently resulting in performance degradation of GNNs. To improve the learning performance of GNNs, we aim to maximize the number of long-term average (LTA) communication links by the optimized power control under energy consumption constraints. Using the Lyapunov optimization method, we first transform the intractable long-term problem into a deterministic problem in each time slot by converting the long-term energy constraints into the objective function. In spite of this non-convex combinatorial optimization problem, we address this problem via equivalently solving a sequence of convex feasibility problems together with a greedy based solver. Simulation results demonstrate the superiority of our proposed scheme over the baselines.
[47] arXiv:2405.05803 (cross-list from cs.CV) [pdf, other]: Title: Boosting Multimodal Large Language Models with Visual Tokens Withdrawal for Rapid Inference

Authors: Zhihang Lin, Mingbao Lin, Luxi Lin, Rongrong Ji

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Multimodal large language models (MLLMs) demand considerable computations for inference due to the extensive parameters and the additional input tokens needed for visual information representation. Herein, we introduce Visual Tokens Withdrawal (VTW), a plug-and-play module to boost MLLMs for rapid inference. Our approach is inspired by two intriguing phenomena we have observed: (1) the attention sink phenomenon that is prevalent in LLMs also persists in MLLMs, suggesting that initial tokens and nearest tokens receive the majority of attention, while middle vision tokens garner minimal attention in deep layers; (2) the presence of information migration, which implies that visual information is transferred to subsequent text tokens within the first few layers of MLLMs. As per our findings, we conclude that vision tokens are not necessary in the deep layers of MLLMs. Thus, we strategically withdraw them at a certain layer, enabling only text tokens to engage in subsequent layers. To pinpoint the ideal layer for vision tokens withdrawal, we initially analyze a limited set of tiny datasets and choose the first layer that meets the Kullback-Leibler divergence criterion. Our VTW approach can cut computational overhead by over 40\% across diverse multimodal tasks while maintaining performance. Our code is released at https://github.com/lzhxmu/VTW.
[48] arXiv:2405.05809 (cross-list from cs.LG) [pdf, ps, other]: Title: Aequitas Flow: Streamlining Fair ML Experimentation

Authors: Sérgio Jesus, Pedro Saleiro, Inês Oliveira e Silva, Beatriz M. Jorge, Rita P. Ribeiro, João Gama, Pedro Bizarro, Rayid Ghani

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)

Aequitas Flow is an open-source framework for end-to-end Fair Machine Learning (ML) experimentation in Python. This package fills the existing integration gaps in other Fair ML packages of complete and accessible experimentation. It provides a pipeline for fairness-aware model training, hyperparameter optimization, and evaluation, enabling rapid and simple experiments and result analysis. Aimed at ML practitioners and researchers, the framework offers implementations of methods, datasets, metrics, and standard interfaces for these components to improve extensibility. By facilitating the development of fair ML practices, Aequitas Flow seeks to enhance the adoption of these concepts in AI technologies.
[49] arXiv:2405.05852 (cross-list from cs.CV) [pdf, other]: Title: Pre-trained Text-to-Image Diffusion Models Are Versatile Representation Learners for Control

Authors: Gunshi Gupta, Karmesh Yadav, Yarin Gal, Dhruv Batra, Zsolt Kira, Cong Lu, Tim G. J. Rudner

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Robotics (cs.RO); Machine Learning (stat.ML)

Embodied AI agents require a fine-grained understanding of the physical world mediated through visual and language inputs. Such capabilities are difficult to learn solely from task-specific data. This has led to the emergence of pre-trained vision-language models as a tool for transferring representations learned from internet-scale data to downstream tasks and new domains. However, commonly used contrastively trained representations such as in CLIP have been shown to fail at enabling embodied agents to gain a sufficiently fine-grained scene understanding -- a capability vital for control. To address this shortcoming, we consider representations from pre-trained text-to-image diffusion models, which are explicitly optimized to generate images from text prompts and as such, contain text-conditioned representations that reflect highly fine-grained visuo-spatial information. Using pre-trained text-to-image diffusion models, we construct Stable Control Representations which allow learning downstream control policies that generalize to complex, open-ended environments. We show that policies learned using Stable Control Representations are competitive with state-of-the-art representation learning approaches across a broad range of simulated control settings, encompassing challenging manipulation and navigation tasks. Most notably, we show that Stable Control Representations enable learning policies that exhibit state-of-the-art performance on OVMM, a difficult open-vocabulary navigation benchmark.
[50] arXiv:2405.05858 (cross-list from cs.CV) [pdf, other]: Title: Free-Moving Object Reconstruction and Pose Estimation with Virtual Camera

Authors: Haixin Shi, Yinlin Hu, Daniel Koguciuk, Juan-Ting Lin, Mathieu Salzmann, David Ferstl

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR); Robotics (cs.RO)

We propose an approach for reconstructing free-moving object from a monocular RGB video. Most existing methods either assume scene prior, hand pose prior, object category pose prior, or rely on local optimization with multiple sequence segments. We propose a method that allows free interaction with the object in front of a moving camera without relying on any prior, and optimizes the sequence globally without any segments. We progressively optimize the object shape and pose simultaneously based on an implicit neural representation. A key aspect of our method is a virtual camera system that reduces the search space of the optimization significantly. We evaluate our method on the standard HO3D dataset and a collection of egocentric RGB sequences captured with a head-mounted device. We demonstrate that our approach outperforms most methods significantly, and is on par with recent techniques that assume prior information.
[51] arXiv:2405.05861 (cross-list from cs.RO) [pdf, other]: Title: ExACT: An End-to-End Autonomous Excavator System Using Action Chunking With Transformers

Authors: Liangliang Chen, Shiyu Jin, Haoyu Wang, Liangjun Zhang

Comments: ICRA Workshop 2024: 3rd Workshop on Future of Construction: Lifelong Learning Robots in Changing Construction Sites

Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Excavators are crucial for diverse tasks such as construction and mining, while autonomous excavator systems enhance safety and efficiency, address labor shortages, and improve human working conditions. Different from the existing modularized approaches, this paper introduces ExACT, an end-to-end autonomous excavator system that processes raw LiDAR, camera data, and joint positions to control excavator valves directly. Utilizing the Action Chunking with Transformers (ACT) architecture, ExACT employs imitation learning to take observations from multi-modal sensors as inputs and generate actionable sequences. In our experiment, we build a simulator based on the captured real-world data to model the relations between excavator valve states and joint velocities. With a few human-operated demonstration data trajectories, ExACT demonstrates the capability of completing different excavation tasks, including reaching, digging and dumping through imitation learning in validations with the simulator. To the best of our knowledge, ExACT represents the first instance towards building an end-to-end autonomous excavator system via imitation learning methods with a minimal set of human demonstrations. The video about this work can be accessed at https://youtu.be/NmzR_Rf-aEk.
[52] arXiv:2405.05870 (cross-list from cs.GT) [pdf, other]: Title: Selecting the Most Conflicting Pair of Candidates

Authors: Théo Delemazure, Łukasz Janeczko, Andrzej Kaczmarczyk, Stanisław Szufa

Comments: Accepted for publication at IJCAI-24; 27 pages; 11 figures

Subjects: Computer Science and Game Theory (cs.GT); Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)

We study committee elections from a perspective of finding the most conflicting candidates, that is, candidates that imply the largest amount of conflict, as per voter preferences. By proposing basic axioms to capture this objective, we show that none of the prominent multiwinner voting rules meet them. Consequently, we design committee voting rules compliant with our desiderata, introducing conflictual voting rules. A subsequent deepened analysis sheds more light on how they operate. Our investigation identifies various aspects of conflict, for which we come up with relevant axioms and quantitative measures, which may be of independent interest. We support our theoretical study with experiments on both real-life and synthetic data.
[53] arXiv:2405.05876 (cross-list from cs.RO) [pdf, other]: Title: Composable Part-Based Manipulation

Authors: Weiyu Liu, Jiayuan Mao, Joy Hsu, Tucker Hermans, Animesh Garg, Jiajun Wu

Comments: Presented at CoRL 2023. For videos and additional results, see our website: this https URL

Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

In this paper, we propose composable part-based manipulation (CPM), a novel approach that leverages object-part decomposition and part-part correspondences to improve learning and generalization of robotic manipulation skills. By considering the functional correspondences between object parts, we conceptualize functional actions, such as pouring and constrained placing, as combinations of different correspondence constraints. CPM comprises a collection of composable diffusion models, where each model captures a different inter-object correspondence. These diffusion models can generate parameters for manipulation skills based on the specific object parts. Leveraging part-based correspondences coupled with the task decomposition into distinct constraints enables strong generalization to novel objects and object categories. We validate our approach in both simulated and real-world scenarios, demonstrating its effectiveness in achieving robust and generalized manipulation capabilities.
[54] arXiv:2405.05890 (cross-list from cs.LG) [pdf, other]: Title: Safe Exploration Using Bayesian World Models and Log-Barrier Optimization

Authors: Yarden As, Bhavya Sukhija, Andreas Krause

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

A major challenge in deploying reinforcement learning in online tasks is ensuring that safety is maintained throughout the learning process. In this work, we propose CERL, a new method for solving constrained Markov decision processes while keeping the policy safe during learning. Our method leverages Bayesian world models and suggests policies that are pessimistic w.r.t. the model's epistemic uncertainty. This makes CERL robust towards model inaccuracies and leads to safe exploration during learning. In our experiments, we demonstrate that CERL outperforms the current state-of-the-art in terms of safety and optimality in solving CMDPs from image observations.
[55] arXiv:2405.05905 (cross-list from cs.GT) [pdf, other]: Title: Truthful Aggregation of LLMs with an Application to Online Advertising

Authors: Ermis Soumalias, Michael J. Curry, Sven Seuken

Subjects: Computer Science and Game Theory (cs.GT); Artificial Intelligence (cs.AI)

We address the challenge of aggregating the preferences of multiple agents over LLM-generated replies to user queries, where agents might modify or exaggerate their preferences. New agents may participate for each new query, making fine-tuning LLMs on these preferences impractical. To overcome these challenges, we propose an auction mechanism that operates without fine-tuning or access to model weights. This mechanism is designed to provably converge to the ouput of the optimally fine-tuned LLM as computational resources are increased. The mechanism can also incorporate contextual information about the agents when avaiable, which significantly accelerates its convergence. A well-designed payment rule ensures that truthful reporting is the optimal strategy for all agents, while also promoting an equity property by aligning each agent's utility with her contribution to social welfare - an essential feature for the mechanism's long-term viability. While our approach can be applied whenever monetary transactions are permissible, our flagship application is in online advertising. In this context, advertisers try to steer LLM-generated responses towards their brand interests, while the platform aims to maximize advertiser value and ensure user satisfaction. Experimental results confirm that our mechanism not only converges efficiently to the optimally fine-tuned LLM but also significantly boosts advertiser value and platform revenue, all with minimal computational overhead.
[56] arXiv:2405.05908 (cross-list from physics.plasm-ph) [pdf, other]: Title: Diag2Diag: Multi modal super resolution for physics discovery with application to fusion

Authors: Azarakhsh Jalalvand, Max Curie, SangKyeun Kim, Peter Steiner, Jaemin Seo, Qiming Hu, Andrew Oakleigh Nelson, Egemen Kolemen

Subjects: Plasma Physics (physics.plasm-ph); Artificial Intelligence (cs.AI)

This paper introduces a groundbreaking multi-modal neural network model designed for resolution enhancement, which innovatively leverages inter-diagnostic correlations within a system. Traditional approaches have primarily focused on uni-modal enhancement strategies, such as pixel-based image enhancement or heuristic signal interpolation. In contrast, our model employs a novel methodology by harnessing the diagnostic relationships within the physics of fusion plasma. Initially, we establish the correlation among diagnostics within the tokamak. Subsequently, we utilize these correlations to substantially enhance the temporal resolution of the Thomson Scattering diagnostic, which assesses plasma density and temperature. By increasing its resolution from conventional 200Hz to 500kHz, we facilitate a new level of insight into plasma behavior, previously attainable only through computationally intensive simulations. This enhancement goes beyond simple interpolation, offering novel perspectives on the underlying physical phenomena governing plasma dynamics.
[57] arXiv:2405.05925 (cross-list from cs.LG) [pdf, other]: Title: FuXi-ENS: A machine learning model for medium-range ensemble weather forecasting

Authors: Xiaohui Zhong, Lei Chen, Hao Li, Jie Feng, Bo Lu

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Atmospheric and Oceanic Physics (physics.ao-ph)

Ensemble weather forecasting is essential for weather predictions and mitigating the impacts of extreme weather events. Constructing an ensemble prediction system (EPS) based on conventional numerical weather prediction (NWP) models is highly computationally expensive. Machine learning (ML) models have emerged as valuable tools for deterministic weather forecasts, providing forecasts with significantly reduced computational requirements and even surpassing the forecast performance of traditional NWP models. However, challenges arise when applying ML models to ensemble forecasting. Recent ML models, such as GenCast and SEEDS model, rely on the ERA5 Ensemble of Data Assimilations (EDA) or two operational NWP ensemble members for forecast generation. The spatial resolution of 1{\deg} or 2{\deg} in these models is often considered too coarse for many applications. To overcome these limitations, we introduce FuXi-ENS, an advanced ML model designed to deliver 6-hourly global ensemble weather forecasts up to 15 days. This model runs at a significantly improved spatial resolution of 0.25{\deg}, incorporating 5 upper-air atmospheric variables at 13 pressure levels, along with 13 surface variables. By leveraging the inherent probabilistic nature of Variational AutoEncoder (VAE), FuXi-ENS optimizes a loss function that combines the continuous ranked probability score (CRPS) and the KL divergence between the predicted and target distribution. This innovative approach represents an advancement over the traditional use of L1 loss combined with the KL loss in standard VAE models when VAE for ensemble weather forecasts. Evaluation results demonstrate that FuXi-ENS outperforms ensemble forecasts from the European Centre for Medium-Range Weather Forecasts (ECMWF), a world leading NWP model, on 98.1% of 360 variable and forecast lead time combinations on CRPS.
[58] arXiv:2405.05930 (cross-list from cs.CR) [pdf, other]: Title: Trustworthy AI-Generative Content in Intelligent 6G Network: Adversarial, Privacy, and Fairness

Authors: Siyuan Li, Xi Lin, Yaju Liu, Jianhua Li

Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Networking and Internet Architecture (cs.NI)

AI-generated content (AIGC) models, represented by large language models (LLM), have brought revolutionary changes to the content generation fields. The high-speed and extensive 6G technology is an ideal platform for providing powerful AIGC mobile service applications, while future 6G mobile networks also need to support intelligent and personalized mobile generation services. However, the significant ethical and security issues of current AIGC models, such as adversarial attacks, privacy, and fairness, greatly affect the credibility of 6G intelligent networks, especially in ensuring secure, private, and fair AIGC applications. In this paper, we propose TrustGAIN, a novel paradigm for trustworthy AIGC in 6G networks, to ensure trustworthy large-scale AIGC services in future 6G networks. We first discuss the adversarial attacks and privacy threats faced by AIGC systems in 6G networks, as well as the corresponding protection issues. Subsequently, we emphasize the importance of ensuring the unbiasedness and fairness of the mobile generative service in future intelligent networks. In particular, we conduct a use case to demonstrate that TrustGAIN can effectively guide the resistance against malicious or generated false information. We believe that TrustGAIN is a necessary paradigm for intelligent and trustworthy 6G networks to support AIGC services, ensuring the security, privacy, and fairness of AIGC network services.
[59] arXiv:2405.05950 (cross-list from cs.LG) [pdf, other]: Title: Federated Combinatorial Multi-Agent Multi-Armed Bandits

Authors: Fares Fourati, Mohamed-Slim Alouini, Vaneet Aggarwal

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Discrete Mathematics (cs.DM); Multiagent Systems (cs.MA); Machine Learning (stat.ML)

This paper introduces a federated learning framework tailored for online combinatorial optimization with bandit feedback. In this setting, agents select subsets of arms, observe noisy rewards for these subsets without accessing individual arm information, and can cooperate and share information at specific intervals. Our framework transforms any offline resilient single-agent $(\alpha-\epsilon)$-approximation algorithm, having a complexity of $\tilde{\mathcal{O}}(\frac{\psi}{\epsilon^\beta})$, where the logarithm is omitted, for some function $\psi$ and constant $\beta$, into an online multi-agent algorithm with $m$ communicating agents and an $\alpha$-regret of no more than $\tilde{\mathcal{O}}(m^{-\frac{1}{3+\beta}} \psi^\frac{1}{3+\beta} T^\frac{2+\beta}{3+\beta})$. This approach not only eliminates the $\epsilon$ approximation error but also ensures sublinear growth with respect to the time horizon $T$ and demonstrates a linear speedup with an increasing number of communicating agents. Additionally, the algorithm is notably communication-efficient, requiring only a sublinear number of communication rounds, quantified as $\tilde{\mathcal{O}}\left(\psi T^\frac{\beta}{\beta+1}\right)$. Furthermore, the framework has been successfully applied to online stochastic submodular maximization using various offline algorithms, yielding the first results for both single-agent and multi-agent settings and recovering specialized single-agent theoretical guarantees. We empirically validate our approach to a stochastic data summarization problem, illustrating the effectiveness of the proposed framework, even in single-agent scenarios.
[60] arXiv:2405.05959 (cross-list from cs.LG) [pdf, other]: Title: Self-Supervised Learning of Time Series Representation via Diffusion Process and Imputation-Interpolation-Forecasting Mask

Authors: Zineb Senane, Lele Cao, Valentin Leonhard Buchner, Yusuke Tashiro, Lei You, Pawel Herman, Mats Nordahl, Ruibo Tu, Vilhelm von Ehrenheim

Comments: 11 (main paper) + 10 (appendix) pages. Source code available at this https URL

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Time Series Representation Learning (TSRL) focuses on generating informative representations for various Time Series (TS) modeling tasks. Traditional Self-Supervised Learning (SSL) methods in TSRL fall into four main categories: reconstructive, adversarial, contrastive, and predictive, each with a common challenge of sensitivity to noise and intricate data nuances. Recently, diffusion-based methods have shown advanced generative capabilities. However, they primarily target specific application scenarios like imputation and forecasting, leaving a gap in leveraging diffusion models for generic TSRL. Our work, Time Series Diffusion Embedding (TSDE), bridges this gap as the first diffusion-based SSL TSRL approach. TSDE segments TS data into observed and masked parts using an Imputation-Interpolation-Forecasting (IIF) mask. It applies a trainable embedding function, featuring dual-orthogonal Transformer encoders with a crossover mechanism, to the observed part. We train a reverse diffusion process conditioned on the embeddings, designed to predict noise added to the masked part. Extensive experiments demonstrate TSDE's superiority in imputation, interpolation, forecasting, anomaly detection, classification, and clustering. We also conduct an ablation study, present embedding visualizations, and compare inference speed, further substantiating TSDE's efficiency and validity in learning representations of TS data.
[61] arXiv:2405.05966 (cross-list from cs.CL) [pdf, other]: Title: Natural Language Processing RELIES on Linguistics

Authors: Juri Opitz, Shira Wein, Nathan Schneider

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Large Language Models (LLMs) have become capable of generating highly fluent text in certain languages, without modules specially designed to capture grammar or semantic coherence. What does this mean for the future of linguistic expertise in NLP? We highlight several aspects in which NLP (still) relies on linguistics, or where linguistic thinking can illuminate new directions. We argue our case around the acronym $RELIES$ that encapsulates six major facets where linguistics contributes to NLP: $R$esources, $E$valuation, $L$ow-resource settings, $I$nterpretability, $E$xplanation, and the $S$tudy of language. This list is not exhaustive, nor is linguistics the main point of reference for every effort under these themes; but at a macro level, these facets highlight the enduring importance of studying machine systems vis-a-vis systems of human language.

Replacements for Fri, 10 May 24

[62] arXiv:2401.09851 (replaced) [pdf, other]: Title: Behavioural Rehearsing Illuminates Scientific Problems of Organised Complexity

Authors: Cheng Wang, Chuwen Wang, Wang Zhang, Shirong Zeng, Yu Zhao, Ronghui Ning, Changjun Jiang

Subjects: Artificial Intelligence (cs.AI)
[63] arXiv:2401.15356 (replaced) [pdf, other]: Title: A Decision Theoretic Framework for Measuring AI Reliance

Authors: Ziyang Guo, Yifan Wu, Jason Hartline, Jessica Hullman

Subjects: Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
[64] arXiv:2403.20151 (replaced) [pdf, ps, other]: Title: A Learning-based Incentive Mechanism for Mobile AIGC Service in Decentralized Internet of Vehicles

Authors: Jiani Fan, Minrui Xu, Ziyao Liu, Huanyi Ye, Chaojie Gu, Dusit Niyato, Kwok-Yan Lam

Comments: 2023 IEEE 98th Vehicular Technology Conference (VTC2023-Fall)

Subjects: Artificial Intelligence (cs.AI)
[65] arXiv:2404.17749 (replaced) [pdf, other]: Title: UMass-BioNLP at MEDIQA-M3G 2024: DermPrompt -- A Systematic Exploration of Prompt Engineering with GPT-4V for Dermatological Diagnosis

Authors: Parth Vashisht, Abhilasha Lodha, Mukta Maddipatla, Zonghai Yao, Avijit Mitra, Zhichao Yang, Junda Wang, Sunjae Kwon, Hong Yu

Comments: Accepted at NAACL-ClinicalNLP workshop 2024

Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[66] arXiv:2405.00099 (replaced) [pdf, other]: Title: Creative Beam Search: LLM-as-a-Judge For Improving Response Generation

Authors: Giorgio Franceschelli, Mirco Musolesi

Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
[67] arXiv:2405.03524 (replaced) [pdf, other]: Title: Exploring knowledge graph-based neural-symbolic system from application perspective

Authors: Shenzhe Zhu

Subjects: Artificial Intelligence (cs.AI)
[68] arXiv:2405.04064 (replaced) [pdf, other]: Title: MFA-Net: Multi-Scale feature fusion attention network for liver tumor segmentation

Authors: Yanli Yuan, Bingbing Wang, Chuan Zhang, Jingyi Xu, Ximeng Liu, Liehuang Zhu

Comments: Paper accepted in Human-Centric Representation Learning workshop at AAAI 2024

Subjects: Artificial Intelligence (cs.AI)
[69] arXiv:2405.04344 (replaced) [pdf, other]: Title: Enhancing Scalability of Metric Differential Privacy via Secret Dataset Partitioning and Benders Decomposition

Authors: Chenxi Qiu

Comments: To be published in IJCAI 2024

Subjects: Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)
[70] arXiv:2405.05146 (replaced) [pdf, ps, other]: Title: Hybrid Convolutional Neural Networks with Reliability Guarantee

Authors: Hans Dermot Doran, Suzana Veljanovska

Comments: 2024 54th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2024). Dependable and Secure Machine Learning Workshop (DSML 2024), Brisbane, Australia, June 24-27, 2024

Subjects: Artificial Intelligence (cs.AI)
[71] arXiv:2206.06661 (replaced) [pdf, other]: Title: Toward Student-Oriented Teacher Network Training For Knowledge Distillation

Authors: Chengyu Dong, Liyuan Liu, Jingbo Shang

Comments: ICLR 2024; poster link this https URL

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[72] arXiv:2303.11278 (replaced) [pdf, other]: Title: Bayesian Pseudo-Coresets via Contrastive Divergence

Authors: Piyush Tiwary, Kumar Shubham, Vivek V. Kashyap, Prathosh A.P

Comments: Accepted at UAI 2024

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[73] arXiv:2304.09639 (replaced) [pdf, ps, other]: Title: The Transformation Logics

Authors: Alessandro Ronca

Comments: Extended version with appendix of a paper with the same title that will appear in the proceedings of IJCAI 2024

Subjects: Logic in Computer Science (cs.LO); Artificial Intelligence (cs.AI); Formal Languages and Automata Theory (cs.FL)
[74] arXiv:2306.03311 (replaced) [pdf, other]: Title: Learning Embeddings for Sequential Tasks Using Population of Agents

Authors: Mridul Mahajan, Georgios Tzannetos, Goran Radanovic, Adish Singla

Comments: IJCAI'24 paper (longer version)

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[75] arXiv:2306.07285 (replaced) [pdf, other]: Title: TransCoder: Towards Unified Transferable Code Representation Learning Inspired by Human Skills

Authors: Qiushi Sun, Nuo Chen, Jianing Wang, Xiang Li, Ming Gao

Comments: Accepted by LREC-COLING 2024

Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI)
[76] arXiv:2307.06945 (replaced) [pdf, other]: Title: In-context Autoencoder for Context Compression in a Large Language Model

Authors: Tao Ge, Jing Hu, Lei Wang, Xun Wang, Si-Qing Chen, Furu Wei

Comments: v4: Final camera ready for ICLR'24

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[77] arXiv:2308.00264 (replaced) [pdf, other]: Title: Multimodal Multi-loss Fusion Network for Sentiment Analysis

Authors: Zehui Wu, Ziwei Gong, Jaywon Koo, Julia Hirschberg

Comments: First two authors contributed equally to the paper

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM)
[78] arXiv:2309.10818 (replaced) [pdf, other]: Title: SlimPajama-DC: Understanding Data Combinations for LLM Training

Authors: Zhiqiang Shen, Tianhua Tao, Liqun Ma, Willie Neiswanger, Zhengzhong Liu, Hongyi Wang, Bowen Tan, Joel Hestness, Natalia Vassilieva, Daria Soboleva, Eric Xing

Comments: Technical report. Models at: this https URL and dataset at: this https URL

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[79] arXiv:2310.04486 (replaced) [pdf, other]: Title: T-Rep: Representation Learning for Time Series using Time-Embeddings

Authors: Archibald Fraikin, Adrien Bennetot, Stéphanie Allassonnière

Comments: Accepted at ICLR 2024

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[80] arXiv:2311.05304 (replaced) [pdf, other]: Title: Data Valuation and Detections in Federated Learning

Authors: Wenqian Li, Shuran Fu, Fengrui Zhang, Yan Pang

Comments: CVPR 2024

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)
[81] arXiv:2311.07590 (replaced) [pdf, other]: Title: Large Language Models can Strategically Deceive their Users when Put Under Pressure

Authors: Jérémy Scheurer, Mikita Balesni, Marius Hobbhahn

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[82] arXiv:2311.12871 (replaced) [pdf, other]: Title: An Embodied Generalist Agent in 3D World

Authors: Jiangyong Huang, Silong Yong, Xiaojian Ma, Xiongkun Linghu, Puhao Li, Yan Wang, Qing Li, Song-Chun Zhu, Baoxiong Jia, Siyuan Huang

Comments: ICML 2024. The first four authors contribute equally. Project page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
[83] arXiv:2312.07751 (replaced) [pdf, other]: Title: Large Human Language Models: A Need and the Challenges

Authors: Nikita Soni, H. Andrew Schwartz, João Sedoc, Niranjan Balasubramanian

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[84] arXiv:2312.11834 (replaced) [pdf, other]: Title: Multi-agent reinforcement learning using echo-state network and its application to pedestrian dynamics

Authors: Hisato Komatsu

Comments: 23 pages, 17 figures

Subjects: Multiagent Systems (cs.MA); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Physics and Society (physics.soc-ph)
[85] arXiv:2401.15889 (replaced) [pdf, other]: Title: Sliced Wasserstein with Random-Path Projecting Directions

Authors: Khai Nguyen, Shujian Zhang, Tam Le, Nhat Ho

Comments: Accepted to ICML 2024, 21 pages, 5 figures, 2 tables

Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[86] arXiv:2402.01295 (replaced) [pdf, other]: Title: ExtremeCast: Boosting Extreme Value Prediction for Global Weather Forecast

Authors: Wanghan Xu, Kang Chen, Tao Han, Hao Chen, Wanli Ouyang, Lei Bai

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[87] arXiv:2402.01662 (replaced) [pdf, ps, other]: Title: Generative Ghosts: Anticipating Benefits and Risks of AI Afterlives

Authors: Meredith Ringel Morris, Jed R. Brubaker

Comments: version 2, updated May 8, 2024 to included updated references and new case study pointers as the trend of generative ghosts accelerates

Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI)
[88] arXiv:2402.07818 (replaced) [pdf, other]: Title: Differentially Private Zeroth-Order Methods for Scalable Large Language Model Finetuning

Authors: Z Liu, J Lou, W Bao, Y Hu, B Li, Z Qin, K Ren

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[89] arXiv:2402.11472 (replaced) [pdf, other]: Title: DDIPrompt: Drug-Drug Interaction Event Prediction based on Graph Prompt Learning

Authors: Yingying Wang, Yun Xiong, Xixi Wu, Xiangguo Sun, Jiawei Zhang

Subjects: Biomolecules (q-bio.BM); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[90] arXiv:2402.18609 (replaced) [pdf, other]: Title: ICE-SEARCH: A Language Model-Driven Feature Selection Approach

Authors: Tianze Yang, Tianyi Yang, Fuyuan Lyu, Shaoshan Liu, Xue (Steve)Liu

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[91] arXiv:2403.01954 (replaced) [pdf, other]: Title: DECIDER: A Rule-Controllable Decoding Strategy for Language Generation by Imitating Dual-System Cognitive Theory

Authors: Chen Xu, Tian Lan, Changlong Yu, Wei Wang, Jun Gao, Yu Ji, Qunxi Dong, Kun Qian, Piji Li, Wei Bi, Bin Hu

Comments: Submitted to IEEE TKDE (Major Revision), 12 pages, 6 figures

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Logic in Computer Science (cs.LO)
[92] arXiv:2403.02939 (replaced) [pdf, other]: Title: PaperWeaver: Enriching Topical Paper Alerts by Contextualizing Recommended Papers with User-collected Papers

Authors: Yoonjoo Lee, Hyeonsu B. Kang, Matt Latzke, Juho Kim, Jonathan Bragg, Joseph Chee Chang, Pao Siangliulue

Comments: Accepted to CHI 2024

Subjects: Digital Libraries (cs.DL); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Human-Computer Interaction (cs.HC)
[93] arXiv:2403.03835 (replaced) [pdf, other]: Title: Cobweb: An Incremental and Hierarchical Model of Human-Like Category Learning

Authors: Xin Lian, Sashank Varma, Christopher J. MacLellan

Comments: Accepted by CogSci-24

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
[94] arXiv:2403.09998 (replaced) [pdf, other]: Title: FBPT: A Fully Binary Point Transformer

Authors: Zhixing Hou, Yuzhang Shang, Yan Yan

Comments: Accepted to ICRA 2024. arXiv admin note: substantial text overlap with arXiv:2303.01166

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[95] arXiv:2403.19838 (replaced) [pdf, other]: Title: Multi-Frame, Lightweight & Efficient Vision-Language Models for Question Answering in Autonomous Driving

Authors: Akshay Gopalkrishnan, Ross Greer, Mohan Trivedi

Comments: 9 pages, 3 figures, Accepted at CVPR 2024 Vision and Language for Autonomous Driving and Robotics Workshop

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[96] arXiv:2404.03888 (replaced) [pdf, other]: Title: A proximal policy optimization based intelligent home solar management

Authors: Kode Creer, Imitiaz Parvez

Comments: This manuscript has been accepted for IEEE EIT Conference

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[97] arXiv:2404.04292 (replaced) [pdf, other]: Title: Conversational Disease Diagnosis via External Planner-Controlled Large Language Models

Authors: Zhoujian Sun, Cheng Luo, Ziyi Liu, Zhengxing Huang

Comments: Work in Progress

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[98] arXiv:2404.05553 (replaced) [pdf, other]: Title: Alljoined1 -- A dataset for EEG-to-Image decoding

Authors: Jonathan Xu, Bruno Aristimunha, Max Emanuel Feucht, Emma Qian, Charles Liu, Tazik Shahjahan, Martyna Spyra, Steven Zifan Zhang, Nicholas Short, Jioh Kim, Paula Perdomo, Ricky Renfeng Mao, Yashvir Sabharwal, Michael Ahedor Moaz Shoura, Adrian Nestor

Comments: 8 Pages, 6 Figures

Subjects: Neurons and Cognition (q-bio.NC); Artificial Intelligence (cs.AI)
[99] arXiv:2404.08412 (replaced) [pdf, other]: Title: PiRD: Physics-informed Residual Diffusion for Flow Field Reconstruction

Authors: Siming Shan, Pengkai Wang, Song Chen, Jiaxu Liu, Chao Xu, Shengze Cai

Comments: 22 pages

Subjects: Fluid Dynamics (physics.flu-dyn); Artificial Intelligence (cs.AI)
[100] arXiv:2404.15899 (replaced) [pdf, other]: Title: ST-MambaSync: The Complement of Mamba and Transformers for Spatial-Temporal in Traffic Flow Prediction

Authors: Zhiqi Shao, Xusheng Yao, Ze Wang, Junbin Gao

Comments: 11 pages. arXiv admin note: substantial text overlap with arXiv:2404.13257

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[101] arXiv:2404.17525 (replaced) [pdf, ps, other]: Title: Large Language Model Agent as a Mechanical Designer

Authors: Yayati Jadhav, Amir Barati Farimani

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[102] arXiv:2404.17735 (replaced) [pdf, other]: Title: Causal Diffusion Autoencoders: Toward Counterfactual Generation via Diffusion Probabilistic Models

Authors: Aneesh Komanduri, Chen Zhao, Feng Chen, Xintao Wu

Comments: Short version accepted to CVPR 2024 Workshop on Generative Models for Computer Vision

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Methodology (stat.ME)
[103] arXiv:2404.19232 (replaced) [pdf, other]: Title: GRAMMAR: Grounded and Modular Methodology for Assessment of Domain-Specific Retrieval-Augmented Language Model

Authors: Xinzhe Li, Ming Liu, Shang Gao

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[104] arXiv:2405.01582 (replaced) [pdf, other]: Title: Text Quality-Based Pruning for Efficient Training of Language Models

Authors: Vasu Sharma, Karthik Padthe, Newsha Ardalani, Kushal Tirumala, Russell Howes, Hu Xu, Po-Yao Huang, Shang-Wen Li, Armen Aghajanyan, Gargi Ghosh

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[105] arXiv:2405.01589 (replaced) [pdf, ps, other]: Title: GPT-4 passes most of the 297 written Polish Board Certification Examinations

Authors: Jakub Pokrywka, Jeremi Kaczmarek, Edward Gorzelańczyk

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[106] arXiv:2405.02228 (replaced) [pdf, other]: Title: REASONS: A benchmark for REtrieval and Automated citationS Of scieNtific Sentences using Public and Proprietary LLMs

Authors: Deepa Tilwani, Yash Saxena, Ali Mohammadi, Edward Raff, Amit Sheth, Srinivasan Parthasarathy, Manas Gaur

Comments: Work in progress

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
[107] arXiv:2405.03192 (replaced) [pdf, other]: Title: QuadraNet V2: Efficient and Sustainable Training of High-Order Neural Networks with Quadratic Adaptation

Authors: Chenhui Xu, Xinyao Wang, Fuxun Yu, Jinjun Xiong, Xiang Chen

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[108] arXiv:2405.03341 (replaced) [pdf, other]: Title: Enhancing Q-Learning with Large Language Model Heuristics

Authors: Xiefeng Wu

Comments: Note:Arxiv,Draft

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[109] arXiv:2405.03547 (replaced) [pdf, other]: Title: Position: Leverage Foundational Models for Black-Box Optimization

Authors: Xingyou Song, Yingtao Tian, Robert Tjarko Lange, Chansoo Lee, Yujin Tang, Yutian Chen

Comments: International Conference on Machine Learning (ICML) 2024

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)
[110] arXiv:2405.04372 (replaced) [pdf, ps, other]: Title: Explainable machine learning for predicting shellfish toxicity in the Adriatic Sea using long-term monitoring data of HABs

Authors: Martin Marzidovšek, Janja Francé, Vid Podpečan, Stanka Vadnjal, Jožica Dolenc, Patricija Mozetič

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[111] arXiv:2405.04760 (replaced) [pdf, other]: Title: Large Language Models for Cyber Security: A Systematic Literature Review

Authors: HanXiang Xu, ShenAo Wang, NingKe Li, KaiLong Wang, YanJie Zhao, Kai Chen, Ting Yu, Yang Liu, HaoYu Wang

Comments: 46 pages,6 figures

Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
[112] arXiv:2405.05248 (replaced) [pdf, other]: Title: LLMs with Personalities in Multi-issue Negotiation Games

Authors: Sean Noh, Ho-Chun Herbert Chang

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)

New submissions
Cross-lists
Replacements

[ total of 112 entries: 1-112 ]
[ showing up to 1000 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, recent, 2405, contact, help (Access key information)

> cs > cs.AI

Artificial Intelligence

New submissions

New submissions for Fri, 10 May 24

Cross-lists for Fri, 10 May 24

Replacements for Fri, 10 May 24