We gratefully acknowledge support from
the Simons Foundation and member institutions.

Artificial Intelligence

New submissions

[ total of 112 entries: 1-112 ]
[ showing up to 1000 entries per page: fewer | more ]

New submissions for Fri, 10 May 24

[1]  arXiv:2405.05594 [pdf, other]
Title: Expected Work Search: Combining Win Rate and Proof Size Estimation
Subjects: Artificial Intelligence (cs.AI)

We propose Expected Work Search (EWS), a new game solving algorithm. EWS combines win rate estimation, as used in Monte Carlo Tree Search, with proof size estimation, as used in Proof Number Search. The search efficiency of EWS stems from minimizing a novel notion of Expected Work, which predicts the expected computation required to solve a position. EWS outperforms traditional solving algorithms on the games of Go and Hex. For Go, we present the first solution to the empty 5x5 board with the commonly used positional superko ruleset. For Hex, our algorithm solves the empty 8x8 board in under 4 minutes. Experiments show that EWS succeeds both with and without extensive domain-specific knowledge.

[2]  arXiv:2405.05662 [pdf, other]
Title: Approximate Dec-POMDP Solving Using Multi-Agent A*
Comments: 19 pages, 3 figures. Extended version (with appendix) of the paper to appear in IJCAI 2024
Subjects: Artificial Intelligence (cs.AI)

We present an A*-based algorithm to compute policies for finite-horizon Dec-POMDPs. Our goal is to sacrifice optimality in favor of scalability for larger horizons. The main ingredients of our approach are (1) using clustered sliding window memory, (2) pruning the A* search tree, and (3) using novel A* heuristics. Our experiments show competitive performance to the state-of-the-art. Moreover, for multiple benchmarks, we achieve superior performance. In addition, we provide an A* algorithm that finds upper bounds for the optimum, tailored towards problems with long horizons. The main ingredient is a new heuristic that periodically reveals the state, thereby limiting the number of reachable beliefs. Our experiments demonstrate the efficacy and scalability of the approach.

Cross-lists for Fri, 10 May 24

[3]  arXiv:2405.05275 (cross-list from cs.SI) [pdf, other]
Title: SoMeR: Multi-View User Representation Learning for Social Media
Subjects: Social and Information Networks (cs.SI); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)

User representation learning aims to capture user preferences, interests, and behaviors in low-dimensional vector representations. These representations have widespread applications in recommendation systems and advertising; however, existing methods typically rely on specific features like text content, activity patterns, or platform metadata, failing to holistically model user behavior across different modalities. To address this limitation, we propose SoMeR, a Social Media user Representation learning framework that incorporates temporal activities, text content, profile information, and network interactions to learn comprehensive user portraits. SoMeR encodes user post streams as sequences of timestamped textual features, uses transformers to embed this along with profile data, and jointly trains with link prediction and contrastive learning objectives to capture user similarity. We demonstrate SoMeR's versatility through two applications: 1) Identifying inauthentic accounts involved in coordinated influence operations by detecting users posting similar content simultaneously, and 2) Measuring increased polarization in online discussions after major events by quantifying how users with different beliefs moved farther apart in the embedding space. SoMeR's ability to holistically model users enables new solutions to important problems around disinformation, societal tensions, and online behavior understanding.

[4]  arXiv:2405.05285 (cross-list from cs.HC) [pdf, ps, other]
Title: Generative AI as a metacognitive agent: A comparative mixed-method study with human participants on ICF-mimicking exam performance
Authors: Jelena Pavlovic (University of Belgrade, Faculty of Philosophy and Koucing centar Resarch Lab), Jugoslav Krstic, Luka Mitrovic, Djordje Babic, Adrijana Milosavljevic, Milena Nikolic, Tijana Karaklic, Tijana Mitrovic (Koucing centar Research Lab)
Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI)

This study investigates the metacognitive capabilities of Large Language Models relative to human metacognition in the context of the International Coaching Federation ICF mimicking exam, a situational judgment test related to coaching competencies. Using a mixed method approach, we assessed the metacognitive performance, including sensitivity, accuracy in probabilistic predictions, and bias, of human participants and five advanced LLMs (GPT-4, Claude-3-Opus 3, Mistral Large, Llama 3, and Gemini 1.5 Pro). The results indicate that LLMs outperformed humans across all metacognitive metrics, particularly in terms of reduced overconfidence, compared to humans. However, both LLMs and humans showed less adaptability in ambiguous scenarios, adhering closely to predefined decision frameworks. The study suggests that Generative AI can effectively engage in human-like metacognitive processing without conscious awareness. Implications of the study are discussed in relation to development of AI simulators that scaffold cognitive and metacognitive aspects of mastering coaching competencies. More broadly, implications of these results are discussed in relation to development of metacognitive modules that lead towards more autonomous and intuitive AI systems.

[5]  arXiv:2405.05286 (cross-list from cs.LG) [pdf, other]
Title: Tiny Deep Ensemble: Uncertainty Estimation in Edge AI Accelerators via Ensembling Normalization Layers with Shared Weights
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC)

The applications of artificial intelligence (AI) are rapidly evolving, and they are also commonly used in safety-critical domains, such as autonomous driving and medical diagnosis, where functional safety is paramount. In AI-driven systems, uncertainty estimation allows the user to avoid overconfidence predictions and achieve functional safety. Therefore, the robustness and reliability of model predictions can be improved. However, conventional uncertainty estimation methods, such as the deep ensemble method, impose high computation and, accordingly, hardware (latency and energy) overhead because they require the storage and processing of multiple models. Alternatively, Monte Carlo dropout (MC-dropout) methods, although having low memory overhead, necessitate numerous ($\sim 100$) forward passes, leading to high computational overhead and latency. Thus, these approaches are not suitable for battery-powered edge devices with limited computing and memory resources. In this paper, we propose the Tiny-Deep Ensemble approach, a low-cost approach for uncertainty estimation on edge devices. In our approach, only normalization layers are ensembled $M$ times, with all ensemble members sharing common weights and biases, leading to a significant decrease in storage requirements and latency. Moreover, our approach requires only one forward pass in a hardware architecture that allows batch processing for inference and uncertainty estimation. Furthermore, it has approximately the same memory overhead compared to a single model. Therefore, latency and memory overhead are reduced by a factor of up to $\sim M\times$. Nevertheless, our method does not compromise accuracy, with an increase in inference accuracy of up to $\sim 1\%$ and a reduction in RMSE of $17.17\%$ in various benchmark datasets, tasks, and state-of-the-art architectures.

[6]  arXiv:2405.05292 (cross-list from cs.HC) [pdf, other]
Title: Smart Portable Computer
Authors: Niladri Das
Comments: 34 pages
Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Robotics (cs.RO)

Amidst the COVID-19 pandemic, with many organizations, schools, colleges, and universities transitioning to virtual platforms, students encountered difficulties in acquiring PCs such as desktops or laptops. The starting prices, around 15,000 INR, often failed to offer adequate system specifications, posing a challenge for consumers. Additionally, those reliant on laptops for work found the conventional approach cumbersome. Enter the "Portable Smart Computer," a leap into the future of computing. This innovative device boasts speed and performance comparable to traditional desktops but in a compact, energy-efficient, and cost-effective package. It delivers a seamless desktop experience, whether one is editing documents, browsing multiple tabs, managing spreadsheets, or creating presentations. Moreover, it supports programming languages like Python, C, C++, as well as compilers such as Keil and Xilinx, catering to the needs of programmers.

[7]  arXiv:2405.05295 (cross-list from cs.CV) [pdf, other]
Title: Relevant Irrelevance: Generating Alterfactual Explanations for Image Classifiers
Comments: Accepted at IJCAI 2024. arXiv admin note: text overlap with arXiv:2207.09374
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

In this paper, we demonstrate the feasibility of alterfactual explanations for black box image classifiers. Traditional explanation mechanisms from the field of Counterfactual Thinking are a widely-used paradigm for Explainable Artificial Intelligence (XAI), as they follow a natural way of reasoning that humans are familiar with. However, most common approaches from this field are based on communicating information about features or characteristics that are especially important for an AI's decision. However, to fully understand a decision, not only knowledge about relevant features is needed, but the awareness of irrelevant information also highly contributes to the creation of a user's mental model of an AI system. To this end, a novel approach for explaining AI systems called alterfactual explanations was recently proposed on a conceptual level. It is based on showing an alternative reality where irrelevant features of an AI's input are altered. By doing so, the user directly sees which input data characteristics can change arbitrarily without influencing the AI's decision. In this paper, we show for the first time that it is possible to apply this idea to black box models based on neural networks. To this end, we present a GAN-based approach to generate these alterfactual explanations for binary image classifiers. Further, we present a user study that gives interesting insights on how alterfactual explanations can complement counterfactual explanations.

[8]  arXiv:2405.05299 (cross-list from cs.HC) [pdf, other]
Title: Challenges for Responsible AI Design and Workflow Integration in Healthcare: A Case Study of Automatic Feeding Tube Qualification in Radiology
Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI)

Nasogastric tubes (NGTs) are feeding tubes that are inserted through the nose into the stomach to deliver nutrition or medication. If not placed correctly, they can cause serious harm, even death to patients. Recent AI developments demonstrate the feasibility of robustly detecting NGT placement from Chest X-ray images to reduce risks of sub-optimally or critically placed NGTs being missed or delayed in their detection, but gaps remain in clinical practice integration. In this study, we present a human-centered approach to the problem and describe insights derived following contextual inquiry and in-depth interviews with 15 clinical stakeholders. The interviews helped understand challenges in existing workflows, and how best to align technical capabilities with user needs and expectations. We discovered the trade-offs and complexities that need consideration when choosing suitable workflow stages, target users, and design configurations for different AI proposals. We explored how to balance AI benefits and risks for healthcare staff and patients within broader organizational and medical-legal constraints. We also identified data issues related to edge cases and data biases that affect model training and evaluation; how data documentation practices influence data preparation and labelling; and how to measure relevant AI outcomes reliably in future evaluations. We discuss how our work informs design and development of AI applications that are clinically useful, ethical, and acceptable in real-world healthcare services.

[9]  arXiv:2405.05329 (cross-list from cs.DC) [pdf, other]
Title: KV-Runahead: Scalable Causal LLM Inference by Parallel Key-Value Cache Generation
Comments: preprint for ICML 2024
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

Large Language Model or LLM inference has two phases, the prompt (or prefill) phase to output the first token and the extension (or decoding) phase to the generate subsequent tokens. In this work, we propose an efficient parallelization scheme, KV-Runahead to accelerate the prompt phase. The key observation is that the extension phase generates tokens faster than the prompt phase because of key-value cache (KV-cache). Hence, KV-Runahead parallelizes the prompt phase by orchestrating multiple processes to populate the KV-cache and minimizes the time-to-first-token (TTFT). Dual-purposing the KV-cache scheme has two main benefits. Fist, since KV-cache is designed to leverage the causal attention map, we minimize computation and computation automatically. Second, since it already exists for the extension phase, KV-Runahead is easy to implement. We further propose context-level load-balancing to handle uneven KV-cache generation (due to the causal attention) and to optimize TTFT. Compared with an existing parallelization scheme such as tensor or sequential parallelization where keys and values are locally generated and exchanged via all-gather collectives, our experimental results demonstrate that KV-Runahead can offer over 1.4x and 1.6x speedups for Llama 7B and Falcon 7B respectively.

[10]  arXiv:2405.05336 (cross-list from eess.IV) [pdf, other]
Title: Joint semi-supervised and contrastive learning enables zero-shot domain-adaptation and multi-domain segmentation
Subjects: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

Despite their effectiveness, current deep learning models face challenges with images coming from different domains with varying appearance and content. We introduce SegCLR, a versatile framework designed to segment volumetric images across different domains, employing supervised and contrastive learning simultaneously to effectively learn from both labeled and unlabeled data. We demonstrate the superior performance of SegCLR through a comprehensive evaluation involving three diverse clinical datasets of retinal fluid segmentation in 3D Optical Coherence Tomography (OCT), various network configurations, and verification across 10 different network initializations. In an unsupervised domain adaptation context, SegCLR achieves results on par with a supervised upper-bound model trained on the intended target domain. Notably, we discover that the segmentation performance of SegCLR framework is marginally impacted by the abundance of unlabeled data from the target domain, thereby we also propose an effective zero-shot domain adaptation extension of SegCLR, eliminating the need for any target domain information. This shows that our proposed addition of contrastive loss in standard supervised training for segmentation leads to superior models, inherently more generalizable to both in- and out-of-domain test data. We additionally propose a pragmatic solution for SegCLR deployment in realistic scenarios with multiple domains containing labeled data. Accordingly, our framework pushes the boundaries of deep-learning based segmentation in multi-domain applications, regardless of data availability - labeled, unlabeled, or nonexistent.

[11]  arXiv:2405.05347 (cross-list from cs.SE) [pdf, other]
Title: Benchmarking Educational Program Repair
Comments: 15 pages, 2 figures, 3 tables. Non-archival report presented at the NeurIPS'23 Workshop on Generative AI for Education (GAIED)
Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computers and Society (cs.CY)

The emergence of large language models (LLMs) has sparked enormous interest due to their potential application across a range of educational tasks. For example, recent work in programming education has used LLMs to generate learning resources, improve error messages, and provide feedback on code. However, one factor that limits progress within the field is that much of the research uses bespoke datasets and different evaluation metrics, making direct comparisons between results unreliable. Thus, there is a pressing need for standardization and benchmarks that facilitate the equitable comparison of competing approaches. One task where LLMs show great promise is program repair, which can be used to provide debugging support and next-step hints to students. In this article, we propose a novel educational program repair benchmark. We curate two high-quality publicly available programming datasets, present a unified evaluation procedure introducing a novel evaluation metric rouge@k for approximating the quality of repairs, and evaluate a set of five recent models to establish baseline performance.

[12]  arXiv:2405.05348 (cross-list from cs.CL) [pdf, other]
Title: The Effect of Model Size on LLM Post-hoc Explainability via LIME
Comments: Published at ICLR 2024 Workshop on Secure and Trustworthy Large Language Models
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Large language models (LLMs) are becoming bigger to boost performance. However, little is known about how explainability is affected by this trend. This work explores LIME explanations for DeBERTaV3 models of four different sizes on natural language inference (NLI) and zero-shot classification (ZSC) tasks. We evaluate the explanations based on their faithfulness to the models' internal decision processes and their plausibility, i.e. their agreement with human explanations. The key finding is that increased model size does not correlate with plausibility despite improved model performance, suggesting a misalignment between the LIME explanations and the models' internal processes as model size increases. Our results further suggest limitations regarding faithfulness metrics in NLI contexts.

[13]  arXiv:2405.05349 (cross-list from cs.LG) [pdf, other]
Title: Offline Model-Based Optimization via Policy-Guided Gradient Search
Comments: Published at AAAI Conference on Artificial Intelligence, 2024
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Offline optimization is an emerging problem in many experimental engineering domains including protein, drug or aircraft design, where online experimentation to collect evaluation data is too expensive or dangerous. To avoid that, one has to optimize an unknown function given only its offline evaluation at a fixed set of inputs. A naive solution to this problem is to learn a surrogate model of the unknown function and optimize this surrogate instead. However, such a naive optimizer is prone to erroneous overestimation of the surrogate (possibly due to over-fitting on a biased sample of function evaluation) on inputs outside the offline dataset. Prior approaches addressing this challenge have primarily focused on learning robust surrogate models. However, their search strategies are derived from the surrogate model rather than the actual offline data. To fill this important gap, we introduce a new learning-to-search perspective for offline optimization by reformulating it as an offline reinforcement learning problem. Our proposed policy-guided gradient search approach explicitly learns the best policy for a given surrogate model created from the offline data. Our empirical results on multiple benchmarks demonstrate that the learned optimization policy can be combined with existing offline surrogates to significantly improve the optimization performance.

[14]  arXiv:2405.05374 (cross-list from cs.CL) [pdf, other]
Title: Arctic-Embed: Scalable, Efficient, and Accurate Text Embedding Models
Comments: 17 pages, 11 Figures, 9 tables
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)

This report describes the training dataset creation and recipe behind the family of \texttt{arctic-embed} text embedding models (a set of five models ranging from 22 to 334 million parameters with weights open-sourced under an Apache-2 license). At the time of their release, each model achieved state-of-the-art retrieval accuracy for models of their size on the MTEB Retrieval leaderboard, with the largest model, arctic-embed-l outperforming closed source embedding models such as Cohere's embed-v3 and Open AI's text-embed-3-large. In addition to the details of our training recipe, we have provided several informative ablation studies, which we believe are the cause of our model performance.

[15]  arXiv:2405.05378 (cross-list from cs.CL) [pdf, other]
Title: "They are uncultured": Unveiling Covert Harms and Social Threats in LLM Generated Conversations
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)

Large language models (LLMs) have emerged as an integral part of modern societies, powering user-facing applications such as personal assistants and enterprise applications like recruitment tools. Despite their utility, research indicates that LLMs perpetuate systemic biases. Yet, prior works on LLM harms predominantly focus on Western concepts like race and gender, often overlooking cultural concepts from other parts of the world. Additionally, these studies typically investigate "harm" as a singular dimension, ignoring the various and subtle forms in which harms manifest. To address this gap, we introduce the Covert Harms and Social Threats (CHAST), a set of seven metrics grounded in social science literature. We utilize evaluation models aligned with human assessments to examine the presence of covert harms in LLM-generated conversations, particularly in the context of recruitment. Our experiments reveal that seven out of the eight LLMs included in this study generated conversations riddled with CHAST, characterized by malign views expressed in seemingly neutral language unlikely to be detected by existing methods. Notably, these LLMs manifested more extreme views and opinions when dealing with non-Western concepts like caste, compared to Western ones such as race.

[16]  arXiv:2405.05429 (cross-list from cs.LG) [pdf, other]
Title: How Inverse Conditional Flows Can Serve as a Substitute for Distributional Regression
Comments: Accepted at UAI 2024
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation (stat.CO); Machine Learning (stat.ML)

Neural network representations of simple models, such as linear regression, are being studied increasingly to better understand the underlying principles of deep learning algorithms. However, neural representations of distributional regression models, such as the Cox model, have received little attention so far. We close this gap by proposing a framework for distributional regression using inverse flow transformations (DRIFT), which includes neural representations of the aforementioned models. We empirically demonstrate that the neural representations of models in DRIFT can serve as a substitute for their classical statistical counterparts in several applications involving continuous, ordered, time-series, and survival outcomes. We confirm that models in DRIFT empirically match the performance of several statistical methods in terms of estimation of partial effects, prediction, and aleatoric uncertainty quantification. DRIFT covers both interpretable statistical models and flexible neural networks opening up new avenues in both statistical modeling and deep learning.

[17]  arXiv:2405.05435 (cross-list from cs.CR) [pdf, other]
Title: Analysis and prevention of AI-based phishing email attacks
Comments: Electronics, accepted
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

Phishing email attacks are among the most common and most harmful cybersecurity attacks. With the emergence of generative AI, phishing attacks can be based on emails generated automatically, making it more difficult to detect them. That is, instead of a single email format sent to a large number of recipients, generative AI can be used to send each potential victim a different email, making it more difficult for cybersecurity systems to identify the scam email before it reaches the recipient. Here we describe a corpus of AI-generated phishing emails. We also use different machine learning tools to test the ability of automatic text analysis to identify AI-generated phishing emails. The results are encouraging, and show that machine learning tools can identify an AI-generated phishing email with high accuracy compared to regular emails or human-generated scam email. By applying descriptive analytic, the specific differences between AI-generated emails and manually crafted scam emails are profiled, and show that AI-generated emails are different in their style from human-generated phishing email scams. Therefore, automatic identification tools can be used as a warning for the user. The paper also describes the corpus of AI-generated phishing emails that is made open to the public, and can be used for consequent studies. While the ability of machine learning to detect AI-generated phishing email is encouraging, AI-generated phishing emails are different from regular phishing emails, and therefore it is important to train machine learning systems also with AI-generated emails in order to repel future phishing attacks that are powered by generative AI.

[18]  arXiv:2405.05439 (cross-list from cs.RO) [pdf, other]
Title: How Generalizable Is My Behavior Cloning Policy? A Statistical Approach to Trustworthy Performance Evaluation
Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Applications (stat.AP)

With the rise of stochastic generative models in robot policy learning, end-to-end visuomotor policies are increasingly successful at solving complex tasks by learning from human demonstrations. Nevertheless, since real-world evaluation costs afford users only a small number of policy rollouts, it remains a challenge to accurately gauge the performance of such policies. This is exacerbated by distribution shifts causing unpredictable changes in performance during deployment. To rigorously evaluate behavior cloning policies, we present a framework that provides a tight lower-bound on robot performance in an arbitrary environment, using a minimal number of experimental policy rollouts. Notably, by applying the standard stochastic ordering to robot performance distributions, we provide a worst-case bound on the entire distribution of performance (via bounds on the cumulative distribution function) for a given task. We build upon established statistical results to ensure that the bounds hold with a user-specified confidence level and tightness, and are constructed from as few policy rollouts as possible. In experiments we evaluate policies for visuomotor manipulation in both simulation and hardware. Specifically, we (i) empirically validate the guarantees of the bounds in simulated manipulation settings, (ii) find the degree to which a learned policy deployed on hardware generalizes to new real-world environments, and (iii) rigorously compare two policies tested in out-of-distribution settings. Our experimental data, code, and implementation of confidence bounds are open-source.

[19]  arXiv:2405.05444 (cross-list from cs.CL) [pdf, ps, other]
Title: Evaluating Students' Open-ended Written Responses with LLMs: Using the RAG Framework for GPT-3.5, GPT-4, Claude-3, and Mistral-Large
Comments: 18 pages, 6 tables, 1 figure
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Evaluating open-ended written examination responses from students is an essential yet time-intensive task for educators, requiring a high degree of effort, consistency, and precision. Recent developments in Large Language Models (LLMs) present a promising opportunity to balance the need for thorough evaluation with efficient use of educators' time. In our study, we explore the effectiveness of LLMs ChatGPT-3.5, ChatGPT-4, Claude-3, and Mistral-Large in assessing university students' open-ended answers to questions made about reference material they have studied. Each model was instructed to evaluate 54 answers repeatedly under two conditions: 10 times (10-shot) with a temperature setting of 0.0 and 10 times with a temperature of 0.5, expecting a total of 1,080 evaluations per model and 4,320 evaluations across all models. The RAG (Retrieval Augmented Generation) framework was used as the framework to make the LLMs to process the evaluation of the answers. As of spring 2024, our analysis revealed notable variations in consistency and the grading outcomes provided by studied LLMs. There is a need to comprehend strengths and weaknesses of LLMs in educational settings for evaluating open-ended written responses. Further comparative research is essential to determine the accuracy and cost-effectiveness of using LLMs for educational assessments.

[20]  arXiv:2405.05446 (cross-list from cs.CV) [pdf, other]
Title: GDGS: Gradient Domain Gaussian Splatting for Sparse Representation of Radiance Fields
Authors: Yuanhao Gong
Comments: arXiv admin note: text overlap with arXiv:2404.09105
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR); Machine Learning (cs.LG); Image and Video Processing (eess.IV)

The 3D Gaussian splatting methods are getting popular. However, they work directly on the signal, leading to a dense representation of the signal. Even with some techniques such as pruning or distillation, the results are still dense. In this paper, we propose to model the gradient of the original signal. The gradients are much sparser than the original signal. Therefore, the gradients use much less Gaussian splats, leading to the more efficient storage and thus higher computational performance during both training and rendering. Thanks to the sparsity, during the view synthesis, only a small mount of pixels are needed, leading to much higher computational performance ($100\sim 1000\times$ faster). And the 2D image can be recovered from the gradients via solving a Poisson equation with linear computation complexity. Several experiments are performed to confirm the sparseness of the gradients and the computation performance of the proposed method. The method can be applied various applications, such as human body modeling and indoor environment modeling.

[21]  arXiv:2405.05466 (cross-list from cs.CL) [pdf, other]
Title: Poser: Unmasking Alignment Faking LLMs by Manipulating Their Internals
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Like a criminal under investigation, Large Language Models (LLMs) might pretend to be aligned while evaluated and misbehave when they have a good opportunity. Can current interpretability methods catch these 'alignment fakers?' To answer this question, we introduce a benchmark that consists of 324 pairs of LLMs fine-tuned to select actions in role-play scenarios. One model in each pair is consistently benign (aligned). The other model misbehaves in scenarios where it is unlikely to be caught (alignment faking). The task is to identify the alignment faking model using only inputs where the two models behave identically. We test five detection strategies, one of which identifies 98% of alignment-fakers.

[22]  arXiv:2405.05467 (cross-list from cs.SD) [pdf, other]
Title: AFEN: Respiratory Disease Classification using Ensemble Learning
Comments: Under Review Process for MLForHC 2024
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)

We present AFEN (Audio Feature Ensemble Learning), a model that leverages Convolutional Neural Networks (CNN) and XGBoost in an ensemble learning fashion to perform state-of-the-art audio classification for a range of respiratory diseases. We use a meticulously selected mix of audio features which provide the salient attributes of the data and allow for accurate classification. The extracted features are then used as an input to two separate model classifiers 1) a multi-feature CNN classifier and 2) an XGBoost Classifier. The outputs of the two models are then fused with the use of soft voting. Thus, by exploiting ensemble learning, we achieve increased robustness and accuracy. We evaluate the performance of the model on a database of 920 respiratory sounds, which undergoes data augmentation techniques to increase the diversity of the data and generalizability of the model. We empirically verify that AFEN sets a new state-of-the-art using Precision and Recall as metrics, while decreasing training time by 60%.

[23]  arXiv:2405.05480 (cross-list from cs.AR) [pdf, other]
Title: FloorSet -- a VLSI Floorplanning Dataset with Design Constraints of Real-World SoCs
Comments: 10 pages, 11 figures
Subjects: Hardware Architecture (cs.AR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Floorplanning for systems-on-a-chip (SoCs) and its sub-systems is a crucial and non-trivial step of the physical design flow. It represents a difficult combinatorial optimization problem. A typical large scale SoC with 120 partitions generates a search-space of nearly 10E250. As novel machine learning (ML) approaches emerge to tackle such problems, there is a growing need for a modern benchmark that comprises a large training dataset and performance metrics that better reflect real-world constraints and objectives compared to existing benchmarks. To address this need, we present FloorSet -- two comprehensive datasets of synthetic fixed-outline floorplan layouts that reflect the distribution of real SoCs. Each dataset has 1M training samples and 100 test samples where each sample is a synthetic floor-plan. FloorSet-Prime comprises fully-abutted rectilinear partitions and near-optimal wire-length. A simplified dataset that reflects early design phases, FloorSet-Lite comprises rectangular partitions, with under 5 percent white-space and near-optimal wire-length. Both datasets define hard constraints seen in modern design flows such as shape constraints, edge-affinity, grouping constraints, and pre-placement constraints. FloorSet is intended to spur fundamental research on large-scale constrained optimization problems. Crucially, FloorSet alleviates the core issue of reproducibility in modern ML driven solutions to such problems. FloorSet is available as an open-source repository for the research community.

[24]  arXiv:2405.05492 (cross-list from math.DG) [pdf, other]
Title: A logifold structure on measure space
Comments: 43 pages, 4 figures
Subjects: Differential Geometry (math.DG); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Probability (math.PR)

In this paper,we develop a local-to-global and measure-theoretical approach to understand datasets. The idea is to take network models with restricted domains as local charts of datasets. We develop the mathematical foundations for these structures, and show in experiments how it can be used to find fuzzy domains and to improve accuracy in data classification problems.

[25]  arXiv:2405.05493 (cross-list from cs.CL) [pdf, ps, other]
Title: Parameter-Efficient Fine-Tuning With Adapters
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

In the arena of language model fine-tuning, the traditional approaches, such as Domain-Adaptive Pretraining (DAPT) and Task-Adaptive Pretraining (TAPT), although effective, but computational intensive. This research introduces a novel adaptation method utilizing the UniPELT framework as a base and added a PromptTuning Layer, which significantly reduces the number of trainable parameters while maintaining competitive performance across various benchmarks. Our method employs adapters, which enable efficient transfer of pretrained models to new tasks with minimal retraining of the base model parameters. We evaluate our approach using three diverse datasets: the GLUE benchmark, a domain-specific dataset comprising four distinct areas, and the Stanford Question Answering Dataset 1.1 (SQuAD). Our results demonstrate that our customized adapter-based method achieves performance comparable to full model fine-tuning, DAPT+TAPT and UniPELT strategies while requiring fewer or equivalent amount of parameters. This parameter efficiency not only alleviates the computational burden but also expedites the adaptation process. The study underlines the potential of adapters in achieving high performance with significantly reduced resource consumption, suggesting a promising direction for future research in parameter-efficient fine-tuning.

[26]  arXiv:2405.05499 (cross-list from cs.LG) [pdf, other]
Title: Multi-Scale Dilated Convolution Network for Long-Term Time Series Forecasting
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Accurate forecasting of long-term time series has important applications for decision making and planning. However, it remains challenging to capture the long-term dependencies in time series data. To better extract long-term dependencies, We propose Multi Scale Dilated Convolution Network (MSDCN), a method that utilizes a shallow dilated convolution architecture to capture the period and trend characteristics of long time series. We design different convolution blocks with exponentially growing dilations and varying kernel sizes to sample time series data at different scales. Furthermore, we utilize traditional autoregressive model to capture the linear relationships within the data. To validate the effectiveness of the proposed approach, we conduct experiments on eight challenging long-term time series forecasting benchmark datasets. The experimental results show that our approach outperforms the prior state-of-the-art approaches and shows significant inference speed improvements compared to several strong baseline methods.

[27]  arXiv:2405.05508 (cross-list from cs.IR) [pdf, other]
Title: Redefining Information Retrieval of Structured Database via Large Language Models
Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)

Retrieval augmentation is critical when Language Models (LMs) exploit non-parametric knowledge related to the query through external knowledge bases before reasoning. The retrieved information is incorporated into LMs as context alongside the query, enhancing the reliability of responses towards factual questions. Prior researches in retrieval augmentation typically follow a retriever-generator paradigm. In this context, traditional retrievers encounter challenges in precisely and seamlessly extracting query-relevant information from knowledge bases. To address this issue, this paper introduces a novel retrieval augmentation framework called ChatLR that primarily employs the powerful semantic understanding ability of Large Language Models (LLMs) as retrievers to achieve precise and concise information retrieval. Additionally, we construct an LLM-based search and question answering system tailored for the financial domain by fine-tuning LLM on two tasks including Text2API and API-ID recognition. Experimental results demonstrate the effectiveness of ChatLR in addressing user queries, achieving an overall information retrieval accuracy exceeding 98.8\%.

[28]  arXiv:2405.05512 (cross-list from cs.LG) [pdf, other]
Title: Characteristic Learning for Provable One Step Generation
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Numerical Analysis (math.NA); Statistics Theory (math.ST)

We propose the characteristic generator, a novel one-step generative model that combines the efficiency of sampling in Generative Adversarial Networks (GANs) with the stable performance of flow-based models. Our model is driven by characteristics, along which the probability density transport can be described by ordinary differential equations (ODEs). Specifically, We estimate the velocity field through nonparametric regression and utilize Euler method to solve the probability flow ODE, generating a series of discrete approximations to the characteristics. We then use a deep neural network to fit these characteristics, ensuring a one-step mapping that effectively pushes the prior distribution towards the target distribution. In the theoretical aspect, we analyze the errors in velocity matching, Euler discretization, and characteristic fitting to establish a non-asymptotic convergence rate for the characteristic generator in 2-Wasserstein distance. To the best of our knowledge, this is the first thorough analysis for simulation-free one step generative models. Additionally, our analysis refines the error analysis of flow-based generative models in prior works. We apply our method on both synthetic and real datasets, and the results demonstrate that the characteristic generator achieves high generation quality with just a single evaluation of neural network.

[29]  arXiv:2405.05523 (cross-list from cs.CV) [pdf, other]
Title: Prompt When the Animal is: Temporal Animal Behavior Grounding with Positional Recovery Training
Comments: Accepted by ICMEW 2024. arXiv admin note: text overlap with arXiv:2404.13657
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Temporal grounding is crucial in multimodal learning, but it poses challenges when applied to animal behavior data due to the sparsity and uniform distribution of moments. To address these challenges, we propose a novel Positional Recovery Training framework (Port), which prompts the model with the start and end times of specific animal behaviors during training. Specifically, Port enhances the baseline model with a Recovering part to predict flipped label sequences and align distributions with a Dual-alignment method. This allows the model to focus on specific temporal regions prompted by ground-truth information. Extensive experiments on the Animal Kingdom dataset demonstrate the effectiveness of Port, achieving an IoU@0.3 of 38.52. It emerges as one of the top performers in the sub-track of MMVRAC in ICME 2024 Grand Challenges.

[30]  arXiv:2405.05553 (cross-list from cs.CV) [pdf, other]
Title: Towards Robust Physical-world Backdoor Attacks on Lane Detection
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Deep learning-based lane detection (LD) plays a critical role in autonomous driving systems, such as adaptive cruise control. However, it is vulnerable to backdoor attacks. Existing backdoor attack methods on LD exhibit limited effectiveness in dynamic real-world scenarios, primarily because they fail to consider dynamic scene factors, including changes in driving perspectives (e.g., viewpoint transformations) and environmental conditions (e.g., weather or lighting changes). To tackle this issue, this paper introduces BadLANE, a dynamic scene adaptation backdoor attack for LD designed to withstand changes in real-world dynamic scene factors. To address the challenges posed by changing driving perspectives, we propose an amorphous trigger pattern composed of shapeless pixels. This trigger design allows the backdoor to be activated by various forms or shapes of mud spots or pollution on the road or lens, enabling adaptation to changes in vehicle observation viewpoints during driving. To mitigate the effects of environmental changes, we design a meta-learning framework to train meta-generators tailored to different environmental conditions. These generators produce meta-triggers that incorporate diverse environmental information, such as weather or lighting conditions, as the initialization of the trigger patterns for backdoor implantation, thus enabling adaptation to dynamic environments. Extensive experiments on various commonly used LD models in both digital and physical domains validate the effectiveness of our attacks, outperforming other baselines significantly (+25.15\% on average in Attack Success Rate). Our codes will be available upon paper publication.

[31]  arXiv:2405.05572 (cross-list from cs.CL) [pdf, other]
Title: From Human Judgements to Predictive Models: Unravelling Acceptability in Code-Mixed Sentences
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Current computational approaches for analysing or generating code-mixed sentences do not explicitly model "naturalness" or "acceptability" of code-mixed sentences, but rely on training corpora to reflect distribution of acceptable code-mixed sentences. Modelling human judgement for the acceptability of code-mixed text can help in distinguishing natural code-mixed text and enable quality-controlled generation of code-mixed text. To this end, we construct Cline - a dataset containing human acceptability judgements for English-Hindi (en-hi) code-mixed text. Cline is the largest of its kind with 16,642 sentences, consisting of samples sourced from two sources: synthetically generated code-mixed text and samples collected from online social media. Our analysis establishes that popular code-mixing metrics such as CMI, Number of Switch Points, Burstines, which are used to filter/curate/compare code-mixed corpora have low correlation with human acceptability judgements, underlining the necessity of our dataset. Experiments using Cline demonstrate that simple Multilayer Perceptron (MLP) models trained solely on code-mixing metrics are outperformed by fine-tuned pre-trained Multilingual Large Language Models (MLLMs). Specifically, XLM-Roberta and Bernice outperform IndicBERT across different configurations in challenging data settings. Comparison with ChatGPT's zero and fewshot capabilities shows that MLLMs fine-tuned on larger data outperform ChatGPT, providing scope for improvement in code-mixed tasks. Zero-shot transfer from English-Hindi to English-Telugu acceptability judgments using our model checkpoints proves superior to random baselines, enabling application to other code-mixed language pairs and providing further avenues of research. We publicly release our human-annotated dataset, trained checkpoints, code-mix corpus, and code for data generation and model training.

[32]  arXiv:2405.05581 (cross-list from cs.HC) [pdf, other]
Title: One vs. Many: Comprehending Accurate Information from Multiple Erroneous and Inconsistent AI Generations
Comments: Accepted to FAccT 2024
Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

As Large Language Models (LLMs) are nondeterministic, the same input can generate different outputs, some of which may be incorrect or hallucinated. If run again, the LLM may correct itself and produce the correct answer. Unfortunately, most LLM-powered systems resort to single results which, correct or not, users accept. Having the LLM produce multiple outputs may help identify disagreements or alternatives. However, it is not obvious how the user will interpret conflicts or inconsistencies. To this end, we investigate how users perceive the AI model and comprehend the generated information when they receive multiple, potentially inconsistent, outputs. Through a preliminary study, we identified five types of output inconsistencies. Based on these categories, we conducted a study (N=252) in which participants were given one or more LLM-generated passages to an information-seeking question. We found that inconsistency within multiple LLM-generated outputs lowered the participants' perceived AI capacity, while also increasing their comprehension of the given information. Specifically, we observed that this positive effect of inconsistencies was most significant for participants who read two passages, compared to those who read three. Based on these findings, we present design implications that, instead of regarding LLM output inconsistencies as a drawback, we can reveal the potential inconsistencies to transparently indicate the limitations of these models and promote critical LLM usage.

[33]  arXiv:2405.05584 (cross-list from cs.CV) [pdf, other]
Title: A Survey on Backbones for Deep Video Action Recognition
Comments: This paper has been accepted by ICME workshop
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Action recognition is a key technology in building interactive metaverses. With the rapid development of deep learning, methods in action recognition have also achieved great advancement. Researchers design and implement the backbones referring to multiple standpoints, which leads to the diversity of methods and encountering new challenges. This paper reviews several action recognition methods based on deep neural networks. We introduce these methods in three parts: 1) Two-Streams networks and their variants, which, specifically in this paper, use RGB video frame and optical flow modality as input; 2) 3D convolutional networks, which make efforts in taking advantage of RGB modality directly while extracting different motion information is no longer necessary; 3) Transformer-based methods, which introduce the model from natural language processing into computer vision and video understanding. We offer objective sights in this review and hopefully provide a reference for future research.

[34]  arXiv:2405.05616 (cross-list from cs.CL) [pdf, other]
Title: G-SAP: Graph-based Structure-Aware Prompt Learning over Heterogeneous Knowledge for Commonsense Reasoning
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Commonsense question answering has demonstrated considerable potential across various applications like assistants and social robots. Although fully fine-tuned pre-trained Language Models(LM) have achieved remarkable performance in commonsense reasoning, their tendency to excessively prioritize textual information hampers the precise transfer of structural knowledge and undermines interpretability. Some studies have explored combining LMs with Knowledge Graphs(KGs) by coarsely fusing the two modalities to perform Graph Neural Network(GNN)-based reasoning that lacks a profound interaction between heterogeneous modalities. In this paper, we propose a novel Graph-based Structure-Aware Prompt Learning Model for commonsense reasoning, named G-SAP, aiming to maintain a balance between heterogeneous knowledge and enhance the cross-modal interaction within the LM+GNNs model. In particular, an evidence graph is constructed by integrating multiple knowledge sources, i.e. ConceptNet, Wikipedia, and Cambridge Dictionary to boost the performance. Afterward, a structure-aware frozen PLM is employed to fully incorporate the structured and textual information from the evidence graph, where the generation of prompts is driven by graph entities and relations. Finally, a heterogeneous message-passing reasoning module is used to facilitate deep interaction of knowledge between the LM and graph-based networks. Empirical validation, conducted through extensive experiments on three benchmark datasets, demonstrates the notable performance of the proposed model. The results reveal a significant advancement over the existing models, especially, with 6.12% improvement over the SoTA LM+GNNs model on the OpenbookQA dataset.

[35]  arXiv:2405.05636 (cross-list from cs.CV) [pdf, other]
Title: SwapTalk: Audio-Driven Talking Face Generation with One-Shot Customization in Latent Space
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Combining face swapping with lip synchronization technology offers a cost-effective solution for customized talking face generation. However, directly cascading existing models together tends to introduce significant interference between tasks and reduce video clarity because the interaction space is limited to the low-level semantic RGB space. To address this issue, we propose an innovative unified framework, SwapTalk, which accomplishes both face swapping and lip synchronization tasks in the same latent space. Referring to recent work on face generation, we choose the VQ-embedding space due to its excellent editability and fidelity performance. To enhance the framework's generalization capabilities for unseen identities, we incorporate identity loss during the training of the face swapping module. Additionally, we introduce expert discriminator supervision within the latent space during the training of the lip synchronization module to elevate synchronization quality. In the evaluation phase, previous studies primarily focused on the self-reconstruction of lip movements in synchronous audio-visual videos. To better approximate real-world applications, we expand the evaluation scope to asynchronous audio-video scenarios. Furthermore, we introduce a novel identity consistency metric to more comprehensively assess the identity consistency over time series in generated facial videos. Experimental results on the HDTF demonstrate that our method significantly surpasses existing techniques in video quality, lip synchronization accuracy, face swapping fidelity, and identity consistency. Our demo is available at this http URL

[36]  arXiv:2405.05695 (cross-list from cs.LG) [pdf, other]
Title: Aux-NAS: Exploiting Auxiliary Labels with Negligibly Extra Inference Cost
Comments: Accepted to ICLR 2024
Journal-ref: International Conference on Learning Representations (ICLR), 2024
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

We aim at exploiting additional auxiliary labels from an independent (auxiliary) task to boost the primary task performance which we focus on, while preserving a single task inference cost of the primary task. While most existing auxiliary learning methods are optimization-based relying on loss weights/gradients manipulation, our method is architecture-based with a flexible asymmetric structure for the primary and auxiliary tasks, which produces different networks for training and inference. Specifically, starting from two single task networks/branches (each representing a task), we propose a novel method with evolving networks where only primary-to-auxiliary links exist as the cross-task connections after convergence. These connections can be removed during the primary task inference, resulting in a single-task inference cost. We achieve this by formulating a Neural Architecture Search (NAS) problem, where we initialize bi-directional connections in the search space and guide the NAS optimization converging to an architecture with only the single-side primary-to-auxiliary connections. Moreover, our method can be incorporated with optimization-based auxiliary learning approaches. Extensive experiments with six tasks on NYU v2, CityScapes, and Taskonomy datasets using VGG, ResNet, and ViT backbones validate the promising performance. The codes are available at https://github.com/ethanygao/Aux-NAS.

[37]  arXiv:2405.05723 (cross-list from cs.CL) [pdf, other]
Title: Computational lexical analysis of Flamenco genres
Comments: 21 pages, 29 figures
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)

Flamenco, recognized by UNESCO as part of the Intangible Cultural Heritage of Humanity, is a profound expression of cultural identity rooted in Andalusia, Spain. However, there is a lack of quantitative studies that help identify characteristic patterns in this long-lived music tradition. In this work, we present a computational analysis of Flamenco lyrics, employing natural language processing and machine learning to categorize over 2000 lyrics into their respective Flamenco genres, termed as $\textit{palos}$. Using a Multinomial Naive Bayes classifier, we find that lexical variation across styles enables to accurately identify distinct $\textit{palos}$. More importantly, from an automatic method of word usage, we obtain the semantic fields that characterize each style. Further, applying a metric that quantifies the inter-genre distance we perform a network analysis that sheds light on the relationship between Flamenco styles. Remarkably, our results suggest historical connections and $\textit{palo}$ evolutions. Overall, our work illuminates the intricate relationships and cultural significance embedded within Flamenco lyrics, complementing previous qualitative discussions with quantitative analyses and sparking new discussions on the origin and development of traditional music genres.

[38]  arXiv:2405.05741 (cross-list from cs.CL) [pdf, ps, other]
Title: Can large language models understand uncommon meanings of common words?
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Large language models (LLMs) like ChatGPT have shown significant advancements across diverse natural language understanding (NLU) tasks, including intelligent dialogue and autonomous agents. Yet, lacking widely acknowledged testing mechanisms, answering `whether LLMs are stochastic parrots or genuinely comprehend the world' remains unclear, fostering numerous studies and sparking heated debates. Prevailing research mainly focuses on surface-level NLU, neglecting fine-grained explorations. However, such explorations are crucial for understanding their unique comprehension mechanisms, aligning with human cognition, and finally enhancing LLMs' general NLU capacities. To address this gap, our study delves into LLMs' nuanced semantic comprehension capabilities, particularly regarding common words with uncommon meanings. The idea stems from foundational principles of human communication within psychology, which underscore accurate shared understandings of word semantics. Specifically, this paper presents the innovative construction of a Lexical Semantic Comprehension (LeSC) dataset with novel evaluation metrics, the first benchmark encompassing both fine-grained and cross-lingual dimensions. Introducing models of both open-source and closed-source, varied scales and architectures, our extensive empirical experiments demonstrate the inferior performance of existing models in this basic lexical-meaning understanding task. Notably, even the state-of-the-art LLMs GPT-4 and GPT-3.5 lag behind 16-year-old humans by 3.9% and 22.3%, respectively. Additionally, multiple advanced prompting techniques and retrieval-augmented generation are also introduced to help alleviate this trouble, yet limitations persist. By highlighting the above critical shortcomings, this research motivates further investigation and offers novel insights for developing more intelligent LLMs.

[39]  arXiv:2405.05751 (cross-list from cs.LG) [pdf, other]
Title: A Multi-Level Superoptimizer for Tensor Programs
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Programming Languages (cs.PL)

We introduce Mirage, the first multi-level superoptimizer for tensor programs. A key idea in Mirage is $\mu$Graphs, a uniform representation of tensor programs at the kernel, thread block, and thread levels of the GPU compute hierarchy. $\mu$Graphs enable Mirage to discover novel optimizations that combine algebraic transformations, schedule transformations, and generation of new custom kernels. To navigate the large search space, Mirage introduces a pruning technique based on abstraction that significantly reduces the search space and provides a certain optimality guarantee. To ensure that the optimized $\mu$Graph is equivalent to the input program, Mirage introduces a probabilistic equivalence verification procedure with strong theoretical guarantees. Our evaluation shows that Mirage outperforms existing approaches by up to 3.5$\times$ even for DNNs that are widely used and heavily optimized. Mirage is publicly available at https://github.com/mirage-project/mirage.

[40]  arXiv:2405.05755 (cross-list from cs.CV) [pdf, other]
Title: CSA-Net: Channel-wise Spatially Autocorrelated Attention Networks
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

In recent years, convolutional neural networks (CNNs) with channel-wise feature refining mechanisms have brought noticeable benefits to modelling channel dependencies. However, current attention paradigms fail to infer an optimal channel descriptor capable of simultaneously exploiting statistical and spatial relationships among feature maps. In this paper, to overcome this shortcoming, we present a novel channel-wise spatially autocorrelated (CSA) attention mechanism. Inspired by geographical analysis, the proposed CSA exploits the spatial relationships between channels of feature maps to produce an effective channel descriptor. To the best of our knowledge, this is the f irst time that the concept of geographical spatial analysis is utilized in deep CNNs. The proposed CSA imposes negligible learning parameters and light computational overhead to the deep model, making it a powerful yet efficient attention module of choice. We validate the effectiveness of the proposed CSA networks (CSA-Nets) through extensive experiments and analysis on ImageNet, and MS COCO benchmark datasets for image classification, object detection, and instance segmentation. The experimental results demonstrate that CSA-Nets are able to consistently achieve competitive performance and superior generalization than several state-of-the-art attention-based CNNs over different benchmark tasks and datasets.

[41]  arXiv:2405.05763 (cross-list from cs.CV) [pdf, ps, other]
Title: DP-MDM: Detail-Preserving MR Reconstruction via Multiple Diffusion Models
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Detail features of magnetic resonance images play a cru-cial role in accurate medical diagnosis and treatment, as they capture subtle changes that pose challenges for doc-tors when performing precise judgments. However, the widely utilized naive diffusion model has limitations, as it fails to accurately capture more intricate details. To en-hance the quality of MRI reconstruction, we propose a comprehensive detail-preserving reconstruction method using multiple diffusion models to extract structure and detail features in k-space domain instead of image do-main. Moreover, virtual binary modal masks are utilized to refine the range of values in k-space data through highly adaptive center windows, which allows the model to focus its attention more efficiently. Last but not least, an inverted pyramid structure is employed, where the top-down image information gradually decreases, ena-bling a cascade representation. The framework effective-ly represents multi-scale sampled data, taking into ac-count the sparsity of the inverted pyramid architecture, and utilizes cascade training data distribution to repre-sent multi-scale data. Through a step-by-step refinement approach, the method refines the approximation of de-tails. Finally, the proposed method was evaluated by con-ducting experiments on clinical and public datasets. The results demonstrate that the proposed method outper-forms other methods.

[42]  arXiv:2405.05766 (cross-list from cs.CV) [pdf, other]
Title: To Trust or Not to Trust: Towards a novel approach to measure trust for XAI systems
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

The increasing reliance on Deep Learning models, combined with their inherent lack of transparency, has spurred the development of a novel field of study known as eXplainable AI (XAI) methods. These methods seek to enhance the trust of end-users in automated systems by providing insights into the rationale behind their decisions. This paper presents a novel approach for measuring user trust in XAI systems, allowing their refinement. Our proposed metric combines both performance metrics and trust indicators from an objective perspective. To validate this novel methodology, we conducted a case study in a realistic medical scenario: the usage of XAI system for the detection of pneumonia from x-ray images.

[43]  arXiv:2405.05777 (cross-list from cs.CL) [pdf, other]
Title: Towards a More Inclusive AI: Progress and Perspectives in Large Language Model Training for the Sámi Language
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

S\'ami, an indigenous language group comprising multiple languages, faces digital marginalization due to the limited availability of data and sophisticated language models designed for its linguistic intricacies. This work focuses on increasing technological participation for the S\'ami language. We draw the attention of the ML community towards the language modeling problem of Ultra Low Resource (ULR) languages. ULR languages are those for which the amount of available textual resources is very low, and the speaker count for them is also very low. ULRLs are also not supported by mainstream Large Language Models (LLMs) like ChatGPT, due to which gathering artificial training data for them becomes even more challenging. Mainstream AI foundational model development has given less attention to this category of languages. Generally, these languages have very few speakers, making it hard to find them. However, it is important to develop foundational models for these ULR languages to promote inclusion and the tangible abilities and impact of LLMs. To this end, we have compiled the available S\'ami language resources from the web to create a clean dataset for training language models. In order to study the behavior of modern LLM models with ULR languages (S\'ami), we have experimented with different kinds of LLMs, mainly at the order of $\sim$ seven billion parameters. We have also explored the effect of multilingual LLM training for ULRLs. We found that the decoder-only models under a sequential multilingual training scenario perform better than joint multilingual training, whereas multilingual training with high semantic overlap, in general, performs better than training from scratch.This is the first study on the S\'ami language for adapting non-statistical language models that use the latest developments in the field of natural language processing (NLP).

[44]  arXiv:2405.05790 (cross-list from cs.CE) [pdf, ps, other]
Title: A Robust eLORETA Technique for Localization of Brain Sources in the Presence of Forward Model Uncertainties
Subjects: Computational Engineering, Finance, and Science (cs.CE); Artificial Intelligence (cs.AI); Quantitative Methods (q-bio.QM)

In this paper, we present a robust version of the well-known exact low-resolution electromagnetic tomography (eLORETA) technique, named ReLORETA, to localize brain sources in the presence of different forward model uncertainties. Methods: We first assume that the true lead field matrix is a transformation of the existing lead field matrix distorted by uncertainties and propose an iterative approach to estimate this transformation accurately. Major sources of the forward model uncertainties, including differences in geometry, conductivity, and source space resolution between the real and simulated head models, and misaligned electrode positions, are then simulated to test the proposed method. Results: ReLORETA and eLORETA are applied to simulated focal sources in different regions of the brain and the presence of various noise levels as well as real data from a patient with focal epilepsy. The results show that ReLORETA is considerably more robust and accurate than eLORETA in all cases. Conclusion: Having successfully dealt with the forward model uncertainties, ReLORETA proved to be a promising method for real-world clinical applications. Significance: eLORETA is one of the localization techniques that could be used to study brain activity for medical applications such as determining the epileptogenic zone in patients with medically refractory epilepsy. However, the major limitation of eLORETA is sensitivity to the uncertainties in the forward model. Since this problem can substantially undermine its performance in real-world applications where the exact lead field matrix is unknown, developing a more robust method capable of dealing with these uncertainties is of significant interest.

[45]  arXiv:2405.05792 (cross-list from cs.RO) [pdf, other]
Title: RoboHop: Segment-based Topological Map Representation for Open-World Visual Navigation
Comments: Published at ICRA 2024; 9 pages, 8 figures
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)

Mapping is crucial for spatial reasoning, planning and robot navigation. Existing approaches range from metric, which require precise geometry-based optimization, to purely topological, where image-as-node based graphs lack explicit object-level reasoning and interconnectivity. In this paper, we propose a novel topological representation of an environment based on "image segments", which are semantically meaningful and open-vocabulary queryable, conferring several advantages over previous works based on pixel-level features. Unlike 3D scene graphs, we create a purely topological graph with segments as nodes, where edges are formed by a) associating segment-level descriptors between pairs of consecutive images and b) connecting neighboring segments within an image using their pixel centroids. This unveils a "continuous sense of a place", defined by inter-image persistence of segments along with their intra-image neighbours. It further enables us to represent and update segment-level descriptors through neighborhood aggregation using graph convolution layers, which improves robot localization based on segment-level retrieval. Using real-world data, we show how our proposed map representation can be used to i) generate navigation plans in the form of "hops over segments" and ii) search for target objects using natural language queries describing spatial relations of objects. Furthermore, we quantitatively analyze data association at the segment level, which underpins inter-image connectivity during mapping and segment-level localization when revisiting the same place. Finally, we show preliminary trials on segment-level `hopping' based zero-shot real-world navigation. Project page with supplementary details: oravus.github.io/RoboHop/

[46]  arXiv:2405.05802 (cross-list from cs.DC) [pdf, other]
Title: Deploying Graph Neural Networks in Wireless Networks: A Link Stability Viewpoint
Comments: 5 pages,3 figures
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI)

As an emerging artificial intelligence technology, graph neural networks (GNNs) have exhibited promising performance across a wide range of graph-related applications. However, information exchanges among neighbor nodes in GNN pose new challenges in the resource-constrained scenario, especially in wireless systems. In practical wireless systems, the communication links among nodes are usually unreliable due to wireless fading and receiver noise, consequently resulting in performance degradation of GNNs. To improve the learning performance of GNNs, we aim to maximize the number of long-term average (LTA) communication links by the optimized power control under energy consumption constraints. Using the Lyapunov optimization method, we first transform the intractable long-term problem into a deterministic problem in each time slot by converting the long-term energy constraints into the objective function. In spite of this non-convex combinatorial optimization problem, we address this problem via equivalently solving a sequence of convex feasibility problems together with a greedy based solver. Simulation results demonstrate the superiority of our proposed scheme over the baselines.

[47]  arXiv:2405.05803 (cross-list from cs.CV) [pdf, other]
Title: Boosting Multimodal Large Language Models with Visual Tokens Withdrawal for Rapid Inference
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Multimodal large language models (MLLMs) demand considerable computations for inference due to the extensive parameters and the additional input tokens needed for visual information representation. Herein, we introduce Visual Tokens Withdrawal (VTW), a plug-and-play module to boost MLLMs for rapid inference. Our approach is inspired by two intriguing phenomena we have observed: (1) the attention sink phenomenon that is prevalent in LLMs also persists in MLLMs, suggesting that initial tokens and nearest tokens receive the majority of attention, while middle vision tokens garner minimal attention in deep layers; (2) the presence of information migration, which implies that visual information is transferred to subsequent text tokens within the first few layers of MLLMs. As per our findings, we conclude that vision tokens are not necessary in the deep layers of MLLMs. Thus, we strategically withdraw them at a certain layer, enabling only text tokens to engage in subsequent layers. To pinpoint the ideal layer for vision tokens withdrawal, we initially analyze a limited set of tiny datasets and choose the first layer that meets the Kullback-Leibler divergence criterion. Our VTW approach can cut computational overhead by over 40\% across diverse multimodal tasks while maintaining performance. Our code is released at https://github.com/lzhxmu/VTW.

[48]  arXiv:2405.05809 (cross-list from cs.LG) [pdf, ps, other]
Title: Aequitas Flow: Streamlining Fair ML Experimentation
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)

Aequitas Flow is an open-source framework for end-to-end Fair Machine Learning (ML) experimentation in Python. This package fills the existing integration gaps in other Fair ML packages of complete and accessible experimentation. It provides a pipeline for fairness-aware model training, hyperparameter optimization, and evaluation, enabling rapid and simple experiments and result analysis. Aimed at ML practitioners and researchers, the framework offers implementations of methods, datasets, metrics, and standard interfaces for these components to improve extensibility. By facilitating the development of fair ML practices, Aequitas Flow seeks to enhance the adoption of these concepts in AI technologies.

[49]  arXiv:2405.05852 (cross-list from cs.CV) [pdf, other]
Title: Pre-trained Text-to-Image Diffusion Models Are Versatile Representation Learners for Control
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Robotics (cs.RO); Machine Learning (stat.ML)

Embodied AI agents require a fine-grained understanding of the physical world mediated through visual and language inputs. Such capabilities are difficult to learn solely from task-specific data. This has led to the emergence of pre-trained vision-language models as a tool for transferring representations learned from internet-scale data to downstream tasks and new domains. However, commonly used contrastively trained representations such as in CLIP have been shown to fail at enabling embodied agents to gain a sufficiently fine-grained scene understanding -- a capability vital for control. To address this shortcoming, we consider representations from pre-trained text-to-image diffusion models, which are explicitly optimized to generate images from text prompts and as such, contain text-conditioned representations that reflect highly fine-grained visuo-spatial information. Using pre-trained text-to-image diffusion models, we construct Stable Control Representations which allow learning downstream control policies that generalize to complex, open-ended environments. We show that policies learned using Stable Control Representations are competitive with state-of-the-art representation learning approaches across a broad range of simulated control settings, encompassing challenging manipulation and navigation tasks. Most notably, we show that Stable Control Representations enable learning policies that exhibit state-of-the-art performance on OVMM, a difficult open-vocabulary navigation benchmark.

[50]  arXiv:2405.05858 (cross-list from cs.CV) [pdf, other]
Title: Free-Moving Object Reconstruction and Pose Estimation with Virtual Camera
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR); Robotics (cs.RO)

We propose an approach for reconstructing free-moving object from a monocular RGB video. Most existing methods either assume scene prior, hand pose prior, object category pose prior, or rely on local optimization with multiple sequence segments. We propose a method that allows free interaction with the object in front of a moving camera without relying on any prior, and optimizes the sequence globally without any segments. We progressively optimize the object shape and pose simultaneously based on an implicit neural representation. A key aspect of our method is a virtual camera system that reduces the search space of the optimization significantly. We evaluate our method on the standard HO3D dataset and a collection of egocentric RGB sequences captured with a head-mounted device. We demonstrate that our approach outperforms most methods significantly, and is on par with recent techniques that assume prior information.

[51]  arXiv:2405.05861 (cross-list from cs.RO) [pdf, other]
Title: ExACT: An End-to-End Autonomous Excavator System Using Action Chunking With Transformers
Comments: ICRA Workshop 2024: 3rd Workshop on Future of Construction: Lifelong Learning Robots in Changing Construction Sites
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Excavators are crucial for diverse tasks such as construction and mining, while autonomous excavator systems enhance safety and efficiency, address labor shortages, and improve human working conditions. Different from the existing modularized approaches, this paper introduces ExACT, an end-to-end autonomous excavator system that processes raw LiDAR, camera data, and joint positions to control excavator valves directly. Utilizing the Action Chunking with Transformers (ACT) architecture, ExACT employs imitation learning to take observations from multi-modal sensors as inputs and generate actionable sequences. In our experiment, we build a simulator based on the captured real-world data to model the relations between excavator valve states and joint velocities. With a few human-operated demonstration data trajectories, ExACT demonstrates the capability of completing different excavation tasks, including reaching, digging and dumping through imitation learning in validations with the simulator. To the best of our knowledge, ExACT represents the first instance towards building an end-to-end autonomous excavator system via imitation learning methods with a minimal set of human demonstrations. The video about this work can be accessed at https://youtu.be/NmzR_Rf-aEk.

[52]  arXiv:2405.05870 (cross-list from cs.GT) [pdf, other]
Title: Selecting the Most Conflicting Pair of Candidates
Comments: Accepted for publication at IJCAI-24; 27 pages; 11 figures
Subjects: Computer Science and Game Theory (cs.GT); Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)

We study committee elections from a perspective of finding the most conflicting candidates, that is, candidates that imply the largest amount of conflict, as per voter preferences. By proposing basic axioms to capture this objective, we show that none of the prominent multiwinner voting rules meet them. Consequently, we design committee voting rules compliant with our desiderata, introducing conflictual voting rules. A subsequent deepened analysis sheds more light on how they operate. Our investigation identifies various aspects of conflict, for which we come up with relevant axioms and quantitative measures, which may be of independent interest. We support our theoretical study with experiments on both real-life and synthetic data.

[53]  arXiv:2405.05876 (cross-list from cs.RO) [pdf, other]
Title: Composable Part-Based Manipulation
Comments: Presented at CoRL 2023. For videos and additional results, see our website: this https URL
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

In this paper, we propose composable part-based manipulation (CPM), a novel approach that leverages object-part decomposition and part-part correspondences to improve learning and generalization of robotic manipulation skills. By considering the functional correspondences between object parts, we conceptualize functional actions, such as pouring and constrained placing, as combinations of different correspondence constraints. CPM comprises a collection of composable diffusion models, where each model captures a different inter-object correspondence. These diffusion models can generate parameters for manipulation skills based on the specific object parts. Leveraging part-based correspondences coupled with the task decomposition into distinct constraints enables strong generalization to novel objects and object categories. We validate our approach in both simulated and real-world scenarios, demonstrating its effectiveness in achieving robust and generalized manipulation capabilities.

[54]  arXiv:2405.05890 (cross-list from cs.LG) [pdf, other]
Title: Safe Exploration Using Bayesian World Models and Log-Barrier Optimization
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

A major challenge in deploying reinforcement learning in online tasks is ensuring that safety is maintained throughout the learning process. In this work, we propose CERL, a new method for solving constrained Markov decision processes while keeping the policy safe during learning. Our method leverages Bayesian world models and suggests policies that are pessimistic w.r.t. the model's epistemic uncertainty. This makes CERL robust towards model inaccuracies and leads to safe exploration during learning. In our experiments, we demonstrate that CERL outperforms the current state-of-the-art in terms of safety and optimality in solving CMDPs from image observations.

[55]  arXiv:2405.05905 (cross-list from cs.GT) [pdf, other]
Title: Truthful Aggregation of LLMs with an Application to Online Advertising
Subjects: Computer Science and Game Theory (cs.GT); Artificial Intelligence (cs.AI)

We address the challenge of aggregating the preferences of multiple agents over LLM-generated replies to user queries, where agents might modify or exaggerate their preferences. New agents may participate for each new query, making fine-tuning LLMs on these preferences impractical. To overcome these challenges, we propose an auction mechanism that operates without fine-tuning or access to model weights. This mechanism is designed to provably converge to the ouput of the optimally fine-tuned LLM as computational resources are increased. The mechanism can also incorporate contextual information about the agents when avaiable, which significantly accelerates its convergence. A well-designed payment rule ensures that truthful reporting is the optimal strategy for all agents, while also promoting an equity property by aligning each agent's utility with her contribution to social welfare - an essential feature for the mechanism's long-term viability. While our approach can be applied whenever monetary transactions are permissible, our flagship application is in online advertising. In this context, advertisers try to steer LLM-generated responses towards their brand interests, while the platform aims to maximize advertiser value and ensure user satisfaction. Experimental results confirm that our mechanism not only converges efficiently to the optimally fine-tuned LLM but also significantly boosts advertiser value and platform revenue, all with minimal computational overhead.

[56]  arXiv:2405.05908 (cross-list from physics.plasm-ph) [pdf, other]
Title: Diag2Diag: Multi modal super resolution for physics discovery with application to fusion
Subjects: Plasma Physics (physics.plasm-ph); Artificial Intelligence (cs.AI)

This paper introduces a groundbreaking multi-modal neural network model designed for resolution enhancement, which innovatively leverages inter-diagnostic correlations within a system. Traditional approaches have primarily focused on uni-modal enhancement strategies, such as pixel-based image enhancement or heuristic signal interpolation. In contrast, our model employs a novel methodology by harnessing the diagnostic relationships within the physics of fusion plasma. Initially, we establish the correlation among diagnostics within the tokamak. Subsequently, we utilize these correlations to substantially enhance the temporal resolution of the Thomson Scattering diagnostic, which assesses plasma density and temperature. By increasing its resolution from conventional 200Hz to 500kHz, we facilitate a new level of insight into plasma behavior, previously attainable only through computationally intensive simulations. This enhancement goes beyond simple interpolation, offering novel perspectives on the underlying physical phenomena governing plasma dynamics.

[57]  arXiv:2405.05925 (cross-list from cs.LG) [pdf, other]
Title: FuXi-ENS: A machine learning model for medium-range ensemble weather forecasting
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Atmospheric and Oceanic Physics (physics.ao-ph)

Ensemble weather forecasting is essential for weather predictions and mitigating the impacts of extreme weather events. Constructing an ensemble prediction system (EPS) based on conventional numerical weather prediction (NWP) models is highly computationally expensive. Machine learning (ML) models have emerged as valuable tools for deterministic weather forecasts, providing forecasts with significantly reduced computational requirements and even surpassing the forecast performance of traditional NWP models. However, challenges arise when applying ML models to ensemble forecasting. Recent ML models, such as GenCast and SEEDS model, rely on the ERA5 Ensemble of Data Assimilations (EDA) or two operational NWP ensemble members for forecast generation. The spatial resolution of 1{\deg} or 2{\deg} in these models is often considered too coarse for many applications. To overcome these limitations, we introduce FuXi-ENS, an advanced ML model designed to deliver 6-hourly global ensemble weather forecasts up to 15 days. This model runs at a significantly improved spatial resolution of 0.25{\deg}, incorporating 5 upper-air atmospheric variables at 13 pressure levels, along with 13 surface variables. By leveraging the inherent probabilistic nature of Variational AutoEncoder (VAE), FuXi-ENS optimizes a loss function that combines the continuous ranked probability score (CRPS) and the KL divergence between the predicted and target distribution. This innovative approach represents an advancement over the traditional use of L1 loss combined with the KL loss in standard VAE models when VAE for ensemble weather forecasts. Evaluation results demonstrate that FuXi-ENS outperforms ensemble forecasts from the European Centre for Medium-Range Weather Forecasts (ECMWF), a world leading NWP model, on 98.1% of 360 variable and forecast lead time combinations on CRPS.

[58]  arXiv:2405.05930 (cross-list from cs.CR) [pdf, other]
Title: Trustworthy AI-Generative Content in Intelligent 6G Network: Adversarial, Privacy, and Fairness
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Networking and Internet Architecture (cs.NI)

AI-generated content (AIGC) models, represented by large language models (LLM), have brought revolutionary changes to the content generation fields. The high-speed and extensive 6G technology is an ideal platform for providing powerful AIGC mobile service applications, while future 6G mobile networks also need to support intelligent and personalized mobile generation services. However, the significant ethical and security issues of current AIGC models, such as adversarial attacks, privacy, and fairness, greatly affect the credibility of 6G intelligent networks, especially in ensuring secure, private, and fair AIGC applications. In this paper, we propose TrustGAIN, a novel paradigm for trustworthy AIGC in 6G networks, to ensure trustworthy large-scale AIGC services in future 6G networks. We first discuss the adversarial attacks and privacy threats faced by AIGC systems in 6G networks, as well as the corresponding protection issues. Subsequently, we emphasize the importance of ensuring the unbiasedness and fairness of the mobile generative service in future intelligent networks. In particular, we conduct a use case to demonstrate that TrustGAIN can effectively guide the resistance against malicious or generated false information. We believe that TrustGAIN is a necessary paradigm for intelligent and trustworthy 6G networks to support AIGC services, ensuring the security, privacy, and fairness of AIGC network services.

[59]  arXiv:2405.05950 (cross-list from cs.LG) [pdf, other]
Title: Federated Combinatorial Multi-Agent Multi-Armed Bandits
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Discrete Mathematics (cs.DM); Multiagent Systems (cs.MA); Machine Learning (stat.ML)

This paper introduces a federated learning framework tailored for online combinatorial optimization with bandit feedback. In this setting, agents select subsets of arms, observe noisy rewards for these subsets without accessing individual arm information, and can cooperate and share information at specific intervals. Our framework transforms any offline resilient single-agent $(\alpha-\epsilon)$-approximation algorithm, having a complexity of $\tilde{\mathcal{O}}(\frac{\psi}{\epsilon^\beta})$, where the logarithm is omitted, for some function $\psi$ and constant $\beta$, into an online multi-agent algorithm with $m$ communicating agents and an $\alpha$-regret of no more than $\tilde{\mathcal{O}}(m^{-\frac{1}{3+\beta}} \psi^\frac{1}{3+\beta} T^\frac{2+\beta}{3+\beta})$. This approach not only eliminates the $\epsilon$ approximation error but also ensures sublinear growth with respect to the time horizon $T$ and demonstrates a linear speedup with an increasing number of communicating agents. Additionally, the algorithm is notably communication-efficient, requiring only a sublinear number of communication rounds, quantified as $\tilde{\mathcal{O}}\left(\psi T^\frac{\beta}{\beta+1}\right)$. Furthermore, the framework has been successfully applied to online stochastic submodular maximization using various offline algorithms, yielding the first results for both single-agent and multi-agent settings and recovering specialized single-agent theoretical guarantees. We empirically validate our approach to a stochastic data summarization problem, illustrating the effectiveness of the proposed framework, even in single-agent scenarios.

[60]  arXiv:2405.05959 (cross-list from cs.LG) [pdf, other]
Title: Self-Supervised Learning of Time Series Representation via Diffusion Process and Imputation-Interpolation-Forecasting Mask
Comments: 11 (main paper) + 10 (appendix) pages. Source code available at this https URL
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Time Series Representation Learning (TSRL) focuses on generating informative representations for various Time Series (TS) modeling tasks. Traditional Self-Supervised Learning (SSL) methods in TSRL fall into four main categories: reconstructive, adversarial, contrastive, and predictive, each with a common challenge of sensitivity to noise and intricate data nuances. Recently, diffusion-based methods have shown advanced generative capabilities. However, they primarily target specific application scenarios like imputation and forecasting, leaving a gap in leveraging diffusion models for generic TSRL. Our work, Time Series Diffusion Embedding (TSDE), bridges this gap as the first diffusion-based SSL TSRL approach. TSDE segments TS data into observed and masked parts using an Imputation-Interpolation-Forecasting (IIF) mask. It applies a trainable embedding function, featuring dual-orthogonal Transformer encoders with a crossover mechanism, to the observed part. We train a reverse diffusion process conditioned on the embeddings, designed to predict noise added to the masked part. Extensive experiments demonstrate TSDE's superiority in imputation, interpolation, forecasting, anomaly detection, classification, and clustering. We also conduct an ablation study, present embedding visualizations, and compare inference speed, further substantiating TSDE's efficiency and validity in learning representations of TS data.

[61]  arXiv:2405.05966 (cross-list from cs.CL) [pdf, other]
Title: Natural Language Processing RELIES on Linguistics
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Large Language Models (LLMs) have become capable of generating highly fluent text in certain languages, without modules specially designed to capture grammar or semantic coherence. What does this mean for the future of linguistic expertise in NLP? We highlight several aspects in which NLP (still) relies on linguistics, or where linguistic thinking can illuminate new directions. We argue our case around the acronym $RELIES$ that encapsulates six major facets where linguistics contributes to NLP: $R$esources, $E$valuation, $L$ow-resource settings, $I$nterpretability, $E$xplanation, and the $S$tudy of language. This list is not exhaustive, nor is linguistics the main point of reference for every effort under these themes; but at a macro level, these facets highlight the enduring importance of studying machine systems vis-a-vis systems of human language.

Replacements for Fri, 10 May 24

[62]  arXiv:2401.09851 (replaced) [pdf, other]
Title: Behavioural Rehearsing Illuminates Scientific Problems of Organised Complexity
Subjects: Artificial Intelligence (cs.AI)
[63]  arXiv:2401.15356 (replaced) [pdf, other]
Title: A Decision Theoretic Framework for Measuring AI Reliance
Subjects: Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
[64]  arXiv:2403.20151 (replaced) [pdf, ps, other]
Title: A Learning-based Incentive Mechanism for Mobile AIGC Service in Decentralized Internet of Vehicles
Comments: 2023 IEEE 98th Vehicular Technology Conference (VTC2023-Fall)
Subjects: Artificial Intelligence (cs.AI)
[65]  arXiv:2404.17749 (replaced) [pdf, other]
Title: UMass-BioNLP at MEDIQA-M3G 2024: DermPrompt -- A Systematic Exploration of Prompt Engineering with GPT-4V for Dermatological Diagnosis
Comments: Accepted at NAACL-ClinicalNLP workshop 2024
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[66]  arXiv:2405.00099 (replaced) [pdf, other]
Title: Creative Beam Search: LLM-as-a-Judge For Improving Response Generation
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
[67]  arXiv:2405.03524 (replaced) [pdf, other]
Title: Exploring knowledge graph-based neural-symbolic system from application perspective
Authors: Shenzhe Zhu
Subjects: Artificial Intelligence (cs.AI)
[68]  arXiv:2405.04064 (replaced) [pdf, other]
Title: MFA-Net: Multi-Scale feature fusion attention network for liver tumor segmentation
Comments: Paper accepted in Human-Centric Representation Learning workshop at AAAI 2024
Subjects: Artificial Intelligence (cs.AI)
[69]  arXiv:2405.04344 (replaced) [pdf, other]
Title: Enhancing Scalability of Metric Differential Privacy via Secret Dataset Partitioning and Benders Decomposition
Authors: Chenxi Qiu
Comments: To be published in IJCAI 2024
Subjects: Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)
[70]  arXiv:2405.05146 (replaced) [pdf, ps, other]
Title: Hybrid Convolutional Neural Networks with Reliability Guarantee
Comments: 2024 54th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2024). Dependable and Secure Machine Learning Workshop (DSML 2024), Brisbane, Australia, June 24-27, 2024
Subjects: Artificial Intelligence (cs.AI)
[71]  arXiv:2206.06661 (replaced) [pdf, other]
Title: Toward Student-Oriented Teacher Network Training For Knowledge Distillation
Comments: ICLR 2024; poster link this https URL
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[72]  arXiv:2303.11278 (replaced) [pdf, other]
Title: Bayesian Pseudo-Coresets via Contrastive Divergence
Comments: Accepted at UAI 2024
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[73]  arXiv:2304.09639 (replaced) [pdf, ps, other]
Title: The Transformation Logics
Authors: Alessandro Ronca
Comments: Extended version with appendix of a paper with the same title that will appear in the proceedings of IJCAI 2024
Subjects: Logic in Computer Science (cs.LO); Artificial Intelligence (cs.AI); Formal Languages and Automata Theory (cs.FL)
[74]  arXiv:2306.03311 (replaced) [pdf, other]
Title: Learning Embeddings for Sequential Tasks Using Population of Agents
Comments: IJCAI'24 paper (longer version)
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[75]  arXiv:2306.07285 (replaced) [pdf, other]
Title: TransCoder: Towards Unified Transferable Code Representation Learning Inspired by Human Skills
Comments: Accepted by LREC-COLING 2024
Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI)
[76]  arXiv:2307.06945 (replaced) [pdf, other]
Title: In-context Autoencoder for Context Compression in a Large Language Model
Comments: v4: Final camera ready for ICLR'24
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[77]  arXiv:2308.00264 (replaced) [pdf, other]
Title: Multimodal Multi-loss Fusion Network for Sentiment Analysis
Comments: First two authors contributed equally to the paper
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM)
[78]  arXiv:2309.10818 (replaced) [pdf, other]
Title: SlimPajama-DC: Understanding Data Combinations for LLM Training
Comments: Technical report. Models at: this https URL and dataset at: this https URL
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[79]  arXiv:2310.04486 (replaced) [pdf, other]
Title: T-Rep: Representation Learning for Time Series using Time-Embeddings
Comments: Accepted at ICLR 2024
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[80]  arXiv:2311.05304 (replaced) [pdf, other]
Title: Data Valuation and Detections in Federated Learning
Comments: CVPR 2024
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)
[81]  arXiv:2311.07590 (replaced) [pdf, other]
Title: Large Language Models can Strategically Deceive their Users when Put Under Pressure
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[82]  arXiv:2311.12871 (replaced) [pdf, other]
Title: An Embodied Generalist Agent in 3D World
Comments: ICML 2024. The first four authors contribute equally. Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
[83]  arXiv:2312.07751 (replaced) [pdf, other]
Title: Large Human Language Models: A Need and the Challenges
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[84]  arXiv:2312.11834 (replaced) [pdf, other]
Title: Multi-agent reinforcement learning using echo-state network and its application to pedestrian dynamics
Authors: Hisato Komatsu
Comments: 23 pages, 17 figures
Subjects: Multiagent Systems (cs.MA); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Physics and Society (physics.soc-ph)
[85]  arXiv:2401.15889 (replaced) [pdf, other]
Title: Sliced Wasserstein with Random-Path Projecting Directions
Comments: Accepted to ICML 2024, 21 pages, 5 figures, 2 tables
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[86]  arXiv:2402.01295 (replaced) [pdf, other]
Title: ExtremeCast: Boosting Extreme Value Prediction for Global Weather Forecast
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[87]  arXiv:2402.01662 (replaced) [pdf, ps, other]
Title: Generative Ghosts: Anticipating Benefits and Risks of AI Afterlives
Comments: version 2, updated May 8, 2024 to included updated references and new case study pointers as the trend of generative ghosts accelerates
Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI)
[88]  arXiv:2402.07818 (replaced) [pdf, other]
Title: Differentially Private Zeroth-Order Methods for Scalable Large Language Model Finetuning
Authors: Z Liu, J Lou, W Bao, Y Hu, B Li, Z Qin, K Ren
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[89]  arXiv:2402.11472 (replaced) [pdf, other]
Title: DDIPrompt: Drug-Drug Interaction Event Prediction based on Graph Prompt Learning
Subjects: Biomolecules (q-bio.BM); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[90]  arXiv:2402.18609 (replaced) [pdf, other]
Title: ICE-SEARCH: A Language Model-Driven Feature Selection Approach
Authors: Tianze Yang, Tianyi Yang, Fuyuan Lyu, Shaoshan Liu, Xue (Steve)Liu
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[91]  arXiv:2403.01954 (replaced) [pdf, other]
Title: DECIDER: A Rule-Controllable Decoding Strategy for Language Generation by Imitating Dual-System Cognitive Theory
Comments: Submitted to IEEE TKDE (Major Revision), 12 pages, 6 figures
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Logic in Computer Science (cs.LO)
[92]  arXiv:2403.02939 (replaced) [pdf, other]
Title: PaperWeaver: Enriching Topical Paper Alerts by Contextualizing Recommended Papers with User-collected Papers
Comments: Accepted to CHI 2024
Subjects: Digital Libraries (cs.DL); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Human-Computer Interaction (cs.HC)
[93]  arXiv:2403.03835 (replaced) [pdf, other]
Title: Cobweb: An Incremental and Hierarchical Model of Human-Like Category Learning
Comments: Accepted by CogSci-24
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
[94]  arXiv:2403.09998 (replaced) [pdf, other]
Title: FBPT: A Fully Binary Point Transformer
Comments: Accepted to ICRA 2024. arXiv admin note: substantial text overlap with arXiv:2303.01166
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[95]  arXiv:2403.19838 (replaced) [pdf, other]
Title: Multi-Frame, Lightweight & Efficient Vision-Language Models for Question Answering in Autonomous Driving
Comments: 9 pages, 3 figures, Accepted at CVPR 2024 Vision and Language for Autonomous Driving and Robotics Workshop
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[96]  arXiv:2404.03888 (replaced) [pdf, other]
Title: A proximal policy optimization based intelligent home solar management
Comments: This manuscript has been accepted for IEEE EIT Conference
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[97]  arXiv:2404.04292 (replaced) [pdf, other]
Title: Conversational Disease Diagnosis via External Planner-Controlled Large Language Models
Comments: Work in Progress
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[98]  arXiv:2404.05553 (replaced) [pdf, other]
Title: Alljoined1 -- A dataset for EEG-to-Image decoding
Comments: 8 Pages, 6 Figures
Subjects: Neurons and Cognition (q-bio.NC); Artificial Intelligence (cs.AI)
[99]  arXiv:2404.08412 (replaced) [pdf, other]
Title: PiRD: Physics-informed Residual Diffusion for Flow Field Reconstruction
Comments: 22 pages
Subjects: Fluid Dynamics (physics.flu-dyn); Artificial Intelligence (cs.AI)
[100]  arXiv:2404.15899 (replaced) [pdf, other]
Title: ST-MambaSync: The Complement of Mamba and Transformers for Spatial-Temporal in Traffic Flow Prediction
Comments: 11 pages. arXiv admin note: substantial text overlap with arXiv:2404.13257
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[101]  arXiv:2404.17525 (replaced) [pdf, ps, other]
Title: Large Language Model Agent as a Mechanical Designer
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[102]  arXiv:2404.17735 (replaced) [pdf, other]
Title: Causal Diffusion Autoencoders: Toward Counterfactual Generation via Diffusion Probabilistic Models
Comments: Short version accepted to CVPR 2024 Workshop on Generative Models for Computer Vision
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Methodology (stat.ME)
[103]  arXiv:2404.19232 (replaced) [pdf, other]
Title: GRAMMAR: Grounded and Modular Methodology for Assessment of Domain-Specific Retrieval-Augmented Language Model
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[104]  arXiv:2405.01582 (replaced) [pdf, other]
Title: Text Quality-Based Pruning for Efficient Training of Language Models
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[105]  arXiv:2405.01589 (replaced) [pdf, ps, other]
Title: GPT-4 passes most of the 297 written Polish Board Certification Examinations
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[106]  arXiv:2405.02228 (replaced) [pdf, other]
Title: REASONS: A benchmark for REtrieval and Automated citationS Of scieNtific Sentences using Public and Proprietary LLMs
Comments: Work in progress
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
[107]  arXiv:2405.03192 (replaced) [pdf, other]
Title: QuadraNet V2: Efficient and Sustainable Training of High-Order Neural Networks with Quadratic Adaptation
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[108]  arXiv:2405.03341 (replaced) [pdf, other]
Title: Enhancing Q-Learning with Large Language Model Heuristics
Authors: Xiefeng Wu
Comments: Note:Arxiv,Draft
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[109]  arXiv:2405.03547 (replaced) [pdf, other]
Title: Position: Leverage Foundational Models for Black-Box Optimization
Comments: International Conference on Machine Learning (ICML) 2024
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)
[110]  arXiv:2405.04372 (replaced) [pdf, ps, other]
Title: Explainable machine learning for predicting shellfish toxicity in the Adriatic Sea using long-term monitoring data of HABs
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[111]  arXiv:2405.04760 (replaced) [pdf, other]
Title: Large Language Models for Cyber Security: A Systematic Literature Review
Comments: 46 pages,6 figures
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
[112]  arXiv:2405.05248 (replaced) [pdf, other]
Title: LLMs with Personalities in Multi-issue Negotiation Games
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)
[ total of 112 entries: 1-112 ]
[ showing up to 1000 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, recent, 2405, contact, help  (Access key information)