We gratefully acknowledge support from
the Simons Foundation and member institutions.

Artificial Intelligence

New submissions

[ total of 103 entries: 1-103 ]
[ showing up to 2000 entries per page: fewer | more ]

New submissions for Mon, 13 May 24

[1]  arXiv:2405.06058 [pdf, other]
Title: Large Language Models Show Human-like Social Desirability Biases in Survey Responses
Comments: 3 pages, 2 figures, submitted to PNAS Nexus
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computers and Society (cs.CY); Human-Computer Interaction (cs.HC)

As Large Language Models (LLMs) become widely used to model and simulate human behavior, understanding their biases becomes critical. We developed an experimental framework using Big Five personality surveys and uncovered a previously undetected social desirability bias in a wide range of LLMs. By systematically varying the number of questions LLMs were exposed to, we demonstrate their ability to infer when they are being evaluated. When personality evaluation is inferred, LLMs skew their scores towards the desirable ends of trait dimensions (i.e., increased extraversion, decreased neuroticism, etc). This bias exists in all tested models, including GPT-4/3.5, Claude 3, Llama 3, and PaLM-2. Bias levels appear to increase in more recent models, with GPT-4's survey responses changing by 1.20 (human) standard deviations and Llama 3's by 0.98 standard deviations-very large effects. This bias is robust to randomization of question order and paraphrasing. Reverse-coding all the questions decreases bias levels but does not eliminate them, suggesting that this effect cannot be attributed to acquiescence bias. Our findings reveal an emergent social desirability bias and suggest constraints on profiling LLMs with psychometric tests and on using LLMs as proxies for human participants.

[2]  arXiv:2405.06064 [pdf, other]
Title: LLMs for XAI: Future Directions for Explaining Explanations
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)

In response to the demand for Explainable Artificial Intelligence (XAI), we investigate the use of Large Language Models (LLMs) to transform ML explanations into natural, human-readable narratives. Rather than directly explaining ML models using LLMs, we focus on refining explanations computed using existing XAI algorithms. We outline several research directions, including defining evaluation metrics, prompt design, comparing LLM models, exploring further training methods, and integrating external data. Initial experiments and user study suggest that LLMs offer a promising way to enhance the interpretability and usability of XAI.

[3]  arXiv:2405.06109 [pdf, other]
Title: Scalable Exact Verification of Optimization Proxies for Large-Scale Optimal Power Flow
Subjects: Artificial Intelligence (cs.AI)

Optimal Power Flow (OPF) is a valuable tool for power system operators, but it is a difficult problem to solve for large systems.
Machine Learning (ML) algorithms, especially Neural Networks-based (NN) optimization proxies, have emerged as a promising new tool for solving OPF, by estimating the OPF solution much faster than traditional methods.
However, these ML algorithms act as black boxes, and it is hard to assess their worst-case performance across the entire range of possible inputs than an OPF can have.
Previous work has proposed a mixed-integer programming-based methodology to quantify the worst-case violations caused by a NN trained to estimate the OPF solution, throughout the entire input domain.
This approach, however, does not scale well to large power systems and more complex NN models.
This paper addresses these issues by proposing a scalable algorithm to compute worst-case violations of NN proxies used for approximating large power systems within a reasonable time limit.
This will help build trust in ML models to be deployed in large industry-scale power grids.

[4]  arXiv:2405.06149 [pdf, other]
Title: DisBeaNet: A Deep Neural Network to augment Unmanned Surface Vessels for maritime situational awareness
Subjects: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

Intelligent detection and tracking of the vessels on the sea play a significant role in conducting traffic avoidance in unmanned surface vessels(USV). Current traffic avoidance software relies mainly on Automated Identification System (AIS) and radar to track other vessels to avoid collisions and acts as a typical perception system to detect targets. However, in a contested environment, emitting radar energy also presents the vulnerability to detection by adversaries. Deactivating these Radiofrequency transmitting sources will increase the threat of detection and degrade the USV's ability to monitor shipping traffic in the vicinity. Therefore, an intelligent visual perception system based on an onboard camera with passive sensing capabilities that aims to assist USV in addressing this problem is presented in this paper. This paper will present a novel low-cost vision perception system for detecting and tracking vessels in the maritime environment. This novel low-cost vision perception system is introduced using the deep learning framework. A neural network, DisBeaNet, can detect vessels, track, and estimate the vessel's distance and bearing from the monocular camera. The outputs obtained from this neural network are used to determine the latitude and longitude of the identified vessel.

[5]  arXiv:2405.06203 [pdf, other]
Title: A First Step in Using Machine Learning Methods to Enhance Interaction Analysis for Embodied Learning Environments
Subjects: Artificial Intelligence (cs.AI)

Investigating children's embodied learning in mixed-reality environments, where they collaboratively simulate scientific processes, requires analyzing complex multimodal data to interpret their learning and coordination behaviors. Learning scientists have developed Interaction Analysis (IA) methodologies for analyzing such data, but this requires researchers to watch hours of videos to extract and interpret students' learning patterns. Our study aims to simplify researchers' tasks, using Machine Learning and Multimodal Learning Analytics to support the IA processes. Our study combines machine learning algorithms and multimodal analyses to support and streamline researcher efforts in developing a comprehensive understanding of students' scientific engagement through their movements, gaze, and affective responses in a simulated scenario. To facilitate an effective researcher-AI partnership, we present an initial case study to determine the feasibility of visually representing students' states, actions, gaze, affect, and movement on a timeline. Our case study focuses on a specific science scenario where students learn about photosynthesis. The timeline allows us to investigate the alignment of critical learning moments identified by multimodal and interaction analysis, and uncover insights into students' temporal learning progressions.

[6]  arXiv:2405.06218 [pdf, ps, other]
Title: Aligning Tutor Discourse Supporting Rigorous Thinking with Tutee Content Mastery for Predicting Math Achievement
Subjects: Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Human-Computer Interaction (cs.HC)

This work investigates how tutoring discourse interacts with students' proximal knowledge to explain and predict students' learning outcomes. Our work is conducted in the context of high-dosage human tutoring where 9th-grade students (N= 1080) attended small group tutorials and individually practiced problems on an Intelligent Tutoring System (ITS). We analyzed whether tutors' talk moves and students' performance on the ITS predicted scores on math learning assessments. We trained Random Forest Classifiers (RFCs) to distinguish high and low assessment scores based on tutor talk moves, student's ITS performance metrics, and their combination. A decision tree was extracted from each RFC to yield an interpretable model. We found AUCs of 0.63 for talk moves, 0.66 for ITS, and 0.77 for their combination, suggesting interactivity among the two feature sources. Specifically, the best decision tree emerged from combining the tutor talk moves that encouraged rigorous thinking and students' ITS mastery. In essence, tutor talk that encouraged mathematical reasoning predicted achievement for students who demonstrated high mastery on the ITS, whereas tutors' revoicing of students' mathematical ideas and contributions was predictive for students with low ITS mastery. Implications for practice are discussed.

[7]  arXiv:2405.06232 [pdf, other]
Title: Learning to Solve Geometry Problems via Simulating Human Dual-Reasoning Process
Comments: IJCAI 2024 Accepted
Subjects: Artificial Intelligence (cs.AI)

Geometry Problem Solving (GPS), which is a classic and challenging math problem, has attracted much attention in recent years. It requires a solver to comprehensively understand both text and diagram, master essential geometry knowledge, and appropriately apply it in reasoning. However, existing works follow a paradigm of neural machine translation and only focus on enhancing the capability of encoders, which neglects the essential characteristics of human geometry reasoning. In this paper, inspired by dual-process theory, we propose a Dual-Reasoning Geometry Solver (DualGeoSolver) to simulate the dual-reasoning process of humans for GPS. Specifically, we construct two systems in DualGeoSolver, namely Knowledge System and Inference System. Knowledge System controls an implicit reasoning process, which is responsible for providing diagram information and geometry knowledge according to a step-wise reasoning goal generated by Inference System. Inference System conducts an explicit reasoning process, which specifies the goal in each reasoning step and applies the knowledge to generate program tokens for resolving it. The two systems carry out the above process iteratively, which behaves more in line with human cognition. We conduct extensive experiments on two benchmark datasets, GeoQA and GeoQA+. The results demonstrate the superiority of DualGeoSolver in both solving accuracy and robustness from explicitly modeling human reasoning process and knowledge application.

[8]  arXiv:2405.06266 [pdf, other]
Title: A Multi-Channel Spatial-Temporal Transformer Model for Traffic Flow Forecasting
Journal-ref: Xiao J, Long B. A Multi-Channel Spatial-Temporal Transformer Model for Traffic Flow Forecasting[J]. Information Sciences, 2024: 120648
Subjects: Artificial Intelligence (cs.AI)

Traffic flow forecasting is a crucial task in transportation management and planning. The main challenges for traffic flow forecasting are that (1) as the length of prediction time increases, the accuracy of prediction will decrease; (2) the predicted results greatly rely on the extraction of temporal and spatial dependencies from the road networks. To overcome the challenges mentioned above, we propose a multi-channel spatial-temporal transformer model for traffic flow forecasting, which improves the accuracy of the prediction by fusing results from different channels of traffic data. Our approach leverages graph convolutional network to extract spatial features from each channel while using a transformer-based architecture to capture temporal dependencies across channels. We introduce an adaptive adjacency matrix to overcome limitations in feature extraction from fixed topological structures. Experimental results on six real-world datasets demonstrate that introducing a multi-channel mechanism into the temporal model enhances performance and our proposed model outperforms state-of-the-art models in terms of accuracy.

[9]  arXiv:2405.06296 [pdf, other]
Title: Fast Evaluation of DNN for Past Dataset in Incremental Learning
Authors: Naoto Sato
Subjects: Artificial Intelligence (cs.AI)

During the operation of a system including a deep neural network (DNN), new input values that were not included in the training dataset are given to the DNN. In such a case, the DNN may be incrementally trained with the new input values; however, that training may reduce the accuracy of the DNN in regard to the dataset that was previously obtained and used for the past training. It is necessary to evaluate the effect of the additional training on the accuracy for the past dataset. However, evaluation by testing all the input values included in the past dataset takes time. Therefore, we propose a new method to quickly evaluate the effect on the accuracy for the past dataset. In the proposed method, the gradient of the parameter values (such as weight and bias) for the past dataset is extracted by running the DNN before the training. Then, after the training, its effect on the accuracy with respect to the past dataset is calculated from the gradient and update differences of the parameter values. To show the usefulness of the proposed method, we present experimental results with several datasets. The results show that the proposed method can estimate the accuracy change by additional training in a constant time.

[10]  arXiv:2405.06413 [pdf, other]
Title: Multi-level Personalized Federated Learning on Heterogeneous and Long-Tailed Data
Comments: 14 pages, 10 figures
Subjects: Artificial Intelligence (cs.AI)

Federated learning (FL) offers a privacy-centric distributed learning framework, enabling model training on individual clients and central aggregation without necessitating data exchange. Nonetheless, FL implementations often suffer from non-i.i.d. and long-tailed class distributions across mobile applications, e.g., autonomous vehicles, which leads models to overfitting as local training may converge to sub-optimal. In our study, we explore the impact of data heterogeneity on model bias and introduce an innovative personalized FL framework, Multi-level Personalized Federated Learning (MuPFL), which leverages the hierarchical architecture of FL to fully harness computational resources at various levels. This framework integrates three pivotal modules: Biased Activation Value Dropout (BAVD) to mitigate overfitting and accelerate training; Adaptive Cluster-based Model Update (ACMU) to refine local models ensuring coherent global aggregation; and Prior Knowledge-assisted Classifier Fine-tuning (PKCF) to bolster classification and personalize models in accord with skewed local data with shared knowledge. Extensive experiments on diverse real-world datasets for image classification and semantic segmentation validate that MuPFL consistently outperforms state-of-the-art baselines, even under extreme non-i.i.d. and long-tail conditions, which enhances accuracy by as much as 7.39% and accelerates training by up to 80% at most, marking significant advancements in both efficiency and effectiveness.

[11]  arXiv:2405.06510 [pdf, other]
Title: UniDM: A Unified Framework for Data Manipulation with Large Language Models
Comments: MLSys24
Subjects: Artificial Intelligence (cs.AI)

Designing effective data manipulation methods is a long standing problem in data lakes. Traditional methods, which rely on rules or machine learning models, require extensive human efforts on training data collection and tuning models. Recent methods apply Large Language Models (LLMs) to resolve multiple data manipulation tasks. They exhibit bright benefits in terms of performance but still require customized designs to fit each specific task. This is very costly and can not catch up with the requirements of big data lake platforms. In this paper, inspired by the cross-task generality of LLMs on NLP tasks, we pave the first step to design an automatic and general solution to tackle with data manipulation tasks. We propose UniDM, a unified framework which establishes a new paradigm to process data manipulation tasks using LLMs. UniDM formalizes a number of data manipulation tasks in a unified form and abstracts three main general steps to solve each task. We develop an automatic context retrieval to allow the LLMs to retrieve data from data lakes, potentially containing evidence and factual information. For each step, we design effective prompts to guide LLMs to produce high quality results. By our comprehensive evaluation on a variety of benchmarks, our UniDM exhibits great generality and state-of-the-art performance on a wide variety of data manipulation tasks.

[12]  arXiv:2405.06624 [pdf, other]
Title: Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems
Subjects: Artificial Intelligence (cs.AI)

Ensuring that AI systems reliably and robustly avoid harmful or dangerous behaviours is a crucial challenge, especially for AI systems with a high degree of autonomy and general intelligence, or systems used in safety-critical contexts. In this paper, we will introduce and define a family of approaches to AI safety, which we will refer to as guaranteed safe (GS) AI. The core feature of these approaches is that they aim to produce AI systems which are equipped with high-assurance quantitative safety guarantees. This is achieved by the interplay of three core components: a world model (which provides a mathematical description of how the AI system affects the outside world), a safety specification (which is a mathematical description of what effects are acceptable), and a verifier (which provides an auditable proof certificate that the AI satisfies the safety specification relative to the world model). We outline a number of approaches for creating each of these three core components, describe the main technical challenges, and suggest a number of potential solutions to them. We also argue for the necessity of this approach to AI safety, and for the inadequacy of the main alternative approaches.

Cross-lists for Mon, 13 May 24

[13]  arXiv:2405.05978 (cross-list from math.OC) [pdf, other]
Title: Addressing Unboundedness in Quadratically-Constrained Mixed-Integer Problems
Subjects: Optimization and Control (math.OC); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)

Quadratically-constrained unbounded integer programs hold the distinction of being undecidable, suggesting a possible soft-spot for Mathematical Programming (MP) techniques, which otherwise constitute a good choice to treat integer or mixed-integer (MI) problems. We consider the challenge of minimizing MI convex quadratic objective functions subject to unbounded decision variables and quadratic constraints. Given the theoretical weakness of white-box MP solvers to handle such models, we turn to black-box meta-heuristics of the Evolution Strategies (ESs) family, and question their capacity to solve this challenge. Through an empirical assessment of quadratically-constrained quadratic objective functions, across varying Hessian forms and condition numbers, we compare the performance of the CPLEX solver to state-of-the-art MI ESs, which handle constraints by penalty. Our systematic investigation begins where the CPLEX solver encounters difficulties (timeouts as the search-space dimensionality increases, (D>=30), on which we report by means of detailed analyses. Overall, the empirical observations confirm that black-box and white-box solvers can be competitive, especially when the constraint function is separable, and that two common ESs' mutation operators can effectively handle the integer unboundedness. We also conclude that conditioning and separability are not intuitive factors in determining the complexity of this class of MI problems, where regular versus rough landscape structures can pose mirrored degrees of challenge for MP versus ESs.

[14]  arXiv:2405.05983 (cross-list from cs.CV) [pdf, ps, other]
Title: Real-Time Pill Identification for the Visually Impaired Using Deep Learning
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

The prevalence of mobile technology offers unique opportunities for addressing healthcare challenges, especially for individuals with visual impairments. This paper explores the development and implementation of a deep learning-based mobile application designed to assist blind and visually impaired individuals in real-time pill identification. Utilizing the YOLO framework, the application aims to accurately recognize and differentiate between various pill types through real-time image processing on mobile devices. The system incorporates Text-to- Speech (TTS) to provide immediate auditory feedback, enhancing usability and independence for visually impaired users. Our study evaluates the application's effectiveness in terms of detection accuracy and user experience, highlighting its potential to improve medication management and safety among the visually impaired community. Keywords-Deep Learning; YOLO Framework; Mobile Application; Visual Impairment; Pill Identification; Healthcare

[15]  arXiv:2405.05984 (cross-list from cs.LG) [pdf, other]
Title: Few-Shot Class Incremental Learning via Robust Transformer Approach
Comments: Under Review in Information Sciences
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Few-Shot Class-Incremental Learning presents an extension of the Class Incremental Learning problem where a model is faced with the problem of data scarcity while addressing the catastrophic forgetting problem. This problem remains an open problem because all recent works are built upon the convolutional neural networks performing sub-optimally compared to the transformer approaches. Our paper presents Robust Transformer Approach built upon the Compact Convolution Transformer. The issue of overfitting due to few samples is overcome with the notion of the stochastic classifier, where the classifier's weights are sampled from a distribution with mean and variance vectors, thus increasing the likelihood of correct classifications, and the batch-norm layer to stabilize the training process. The issue of CF is dealt with the idea of delta parameters, small task-specific trainable parameters while keeping the backbone networks frozen. A non-parametric approach is developed to infer the delta parameters for the model's predictions. The prototype rectification approach is applied to avoid biased prototype calculations due to the issue of data scarcity. The advantage of ROBUSTA is demonstrated through a series of experiments in the benchmark problems where it is capable of outperforming prior arts with big margins without any data augmentation protocols.

[16]  arXiv:2405.05985 (cross-list from cs.LG) [pdf, other]
Title: TrafficGPT: Towards Multi-Scale Traffic Analysis and Generation with Spatial-Temporal Agent Framework
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

The precise prediction of multi-scale traffic is a ubiquitous challenge in the urbanization process for car owners, road administrators, and governments. In the case of complex road networks, current and past traffic information from both upstream and downstream roads are crucial since various road networks have different semantic information about traffic. Rationalizing the utilization of semantic information can realize short-term, long-term, and unseen road traffic prediction. As the demands of multi-scale traffic analysis increase, on-demand interactions and visualizations are expected to be available for transportation participants. We have designed a multi-scale traffic generation system, namely TrafficGPT, using three AI agents to process multi-scale traffic data, conduct multi-scale traffic analysis, and present multi-scale visualization results. TrafficGPT consists of three essential AI agents: 1) a text-to-demand agent that is employed with Question & Answer AI to interact with users and extract prediction tasks through texts; 2) a traffic prediction agent that leverages multi-scale traffic data to generate temporal features and similarity, and fuse them with limited spatial features and similarity, to achieve accurate prediction of three tasks; and 3) a suggestion and visualization agent that uses the prediction results to generate suggestions and visualizations, providing users with a comprehensive understanding of traffic conditions. Our TrafficGPT system focuses on addressing concerns about traffic prediction from transportation participants, and conducted extensive experiments on five real-world road datasets to demonstrate its superior predictive and interactive performance

[17]  arXiv:2405.05989 (cross-list from cs.LG) [pdf, other]
Title: Clustering-based Multitasking Deep Neural Network for Solar Photovoltaics Power Generation Prediction
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

The increasing installation of Photovoltaics (PV) cells leads to more generation of renewable energy sources (RES), but results in increased uncertainties of energy scheduling. Predicting PV power generation is important for energy management and dispatch optimization in smart grid. However, the PV power generation data is often collected across different types of customers (e.g., residential, agricultural, industrial, and commercial) while the customer information is always de-identified. This often results in a forecasting model trained with all PV power generation data, allowing the predictor to learn various patterns through intra-model self-learning, instead of constructing a separate predictor for each customer type. In this paper, we propose a clustering-based multitasking deep neural network (CM-DNN) framework for PV power generation prediction. K-means is applied to cluster the data into different customer types. For each type, a deep neural network (DNN) is employed and trained until the accuracy cannot be improved. Subsequently, for a specified customer type (i.e., the target task), inter-model knowledge transfer is conducted to enhance its training accuracy. During this process, source task selection is designed to choose the optimal subset of tasks (excluding the target customer), and each selected source task uses a coefficient to determine the amount of DNN model knowledge (weights and biases) transferred to the aimed prediction task. The proposed CM-DNN is tested on a real-world PV power generation dataset and its superiority is demonstrated by comparing the prediction performance on training the dataset with a single model without clustering.

[18]  arXiv:2405.05990 (cross-list from cs.CR) [pdf, other]
Title: Special Characters Attack: Toward Scalable Training Data Extraction From Large Language Models
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)

Large language models (LLMs) have achieved remarkable performance on a wide range of tasks. However, recent studies have shown that LLMs can memorize training data and simple repeated tokens can trick the model to leak the data. In this paper, we take a step further and show that certain special characters or their combinations with English letters are stronger memory triggers, leading to more severe data leakage. The intuition is that, since LLMs are trained with massive data that contains a substantial amount of special characters (e.g. structural symbols {, } of JSON files, and @, # in emails and online posts), the model may memorize the co-occurrence between these special characters and the raw texts. This motivates us to propose a simple but effective Special Characters Attack (SCA) to induce training data leakage. Our experiments verify the high effectiveness of SCA against state-of-the-art LLMs: they can leak diverse training data, such as code corpus, web pages, and personally identifiable information, and sometimes generate non-stop outputs as a byproduct. We further show that the composition of the training data corpus can be revealed by inspecting the leaked data -- one crucial piece of information for pre-training high-performance LLMs. Our work can help understand the sensitivity of LLMs to special characters and identify potential areas for improvement.

[19]  arXiv:2405.05991 (cross-list from cs.LG) [pdf, other]
Title: Agent-oriented Joint Decision Support for Data Owners in Auction-based Federated Learning
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Science and Game Theory (cs.GT)

Auction-based Federated Learning (AFL) has attracted extensive research interest due to its ability to motivate data owners (DOs) to join FL through economic means. While many existing AFL methods focus on providing decision support to model users (MUs) and the AFL auctioneer, decision support for data owners remains open. To bridge this gap, we propose a first-of-its-kind agent-oriented joint Pricing, Acceptance and Sub-delegation decision support approach for data owners in AFL (PAS-AFL). By considering a DO's current reputation, pending FL tasks, willingness to train FL models, and its trust relationships with other DOs, it provides a systematic approach for a DO to make joint decisions on AFL bid acceptance, task sub-delegation and pricing based on Lyapunov optimization to maximize its utility. It is the first to enable each DO to take on multiple FL tasks simultaneously to earn higher income for DOs and enhance the throughput of FL tasks in the AFL ecosystem. Extensive experiments based on six benchmarking datasets demonstrate significant advantages of PAS-AFL compared to six alternative strategies, beating the best baseline by 28.77% and 2.64% on average in terms of utility and test accuracy of the resulting FL models, respectively.

[20]  arXiv:2405.05993 (cross-list from cs.LG) [pdf, ps, other]
Title: Precision Rehabilitation for Patients Post-Stroke based on Electronic Health Records and Machine Learning
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

In this study, we utilized statistical analysis and machine learning methods to examine whether rehabilitation exercises can improve patients post-stroke functional abilities, as well as forecast the improvement in functional abilities. Our dataset is patients' rehabilitation exercises and demographic information recorded in the unstructured electronic health records (EHRs) data and free-text rehabilitation procedure notes. We collected data for 265 stroke patients from the University of Pittsburgh Medical Center. We employed a pre-existing natural language processing (NLP) algorithm to extract data on rehabilitation exercises and developed a rule-based NLP algorithm to extract Activity Measure for Post-Acute Care (AM-PAC) scores, covering basic mobility (BM) and applied cognitive (AC) domains, from procedure notes. Changes in AM-PAC scores were classified based on the minimal clinically important difference (MCID), and significance was assessed using Friedman and Wilcoxon tests. To identify impactful exercises, we used Chi-square tests, Fisher's exact tests, and logistic regression for odds ratios. Additionally, we developed five machine learning models-logistic regression (LR), Adaboost (ADB), support vector machine (SVM), gradient boosting (GB), and random forest (RF)-to predict outcomes in functional ability. Statistical analyses revealed significant associations between functional improvements and specific exercises. The RF model achieved the best performance in predicting functional outcomes. In this study, we identified three rehabilitation exercises that significantly contributed to patient post-stroke functional ability improvement in the first two months. Additionally, the successful application of a machine learning model to predict patient-specific functional outcomes underscores the potential for precision rehabilitation.

[21]  arXiv:2405.06001 (cross-list from cs.LG) [pdf, other]
Title: LLM-QBench: A Benchmark Towards the Best Practice for Post-training Quantization of Large Language Models
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

Recent advancements in large language models (LLMs) are propelling us toward artificial general intelligence, thanks to their remarkable emergent abilities and reasoning capabilities. However, the substantial computational and memory requirements of LLMs limit their widespread adoption. Quan- tization, a key compression technique, offers a viable solution to mitigate these demands by compressing and accelerating LLMs, albeit with poten- tial risks to model accuracy. Numerous studies have aimed to minimize the accuracy loss associated with quantization. However, the quantization configurations in these studies vary and may not be optimized for hard- ware compatibility. In this paper, we focus on identifying the most effective practices for quantizing LLMs, with the goal of balancing performance with computational efficiency. For a fair analysis, we develop a quantization toolkit LLMC, and design four crucial principles considering the inference efficiency, quantized accuracy, calibration cost, and modularization. By benchmarking on various models and datasets with over 500 experiments, three takeaways corresponding to calibration data, quantization algorithm, and quantization schemes are derived. Finally, a best practice of LLM PTQ pipeline is constructed. All the benchmark results and the toolkit can be found at https://github.com/ModelTC/llmc.

[22]  arXiv:2405.06004 (cross-list from physics.ao-ph) [pdf, other]
Title: EWMoE: An effective model for global weather forecasting with mixture-of-experts
Subjects: Atmospheric and Oceanic Physics (physics.ao-ph); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Weather forecasting is a crucial task for meteorologic research, with direct social and economic impacts. Recently, data-driven weather forecasting models based on deep learning have shown great potential, achieving superior performance compared with traditional numerical weather prediction methods. However, these models often require massive training data and computational resources. In this paper, we propose EWMoE, an effective model for accurate global weather forecasting, which requires significantly less training data and computational resources. Our model incorporates three key components to enhance prediction accuracy: meteorology-specific embedding, a core Mixture-of-Experts (MoE) layer, and two specific loss functions. We conduct our evaluation on the ERA5 dataset using only two years of training data. Extensive experiments demonstrate that EWMoE outperforms current models such as FourCastNet and ClimaX at all forecast time, achieving competitive performance compared with the state-of-the-art Pangu-Weather model in evaluation metrics such as Anomaly Correlation Coefficient (ACC) and Root Mean Square Error (RMSE). Additionally, ablation studies indicate that applying the MoE architecture to weather forecasting offers significant advantages in improving accuracy and resource efficiency.

[23]  arXiv:2405.06038 (cross-list from cs.LG) [pdf, other]
Title: From Algorithm to Hardware: A Survey on Efficient and Safe Deployment of Deep Neural Networks
Comments: This manuscript is the accepted version for TNNLS(IEEE Transactions on Neural Networks and Learning Systems)
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Deep neural networks (DNNs) have been widely used in many artificial intelligence (AI) tasks. However, deploying them brings significant challenges due to the huge cost of memory, energy, and computation. To address these challenges, researchers have developed various model compression techniques such as model quantization and model pruning. Recently, there has been a surge in research of compression methods to achieve model efficiency while retaining the performance. Furthermore, more and more works focus on customizing the DNN hardware accelerators to better leverage the model compression techniques. In addition to efficiency, preserving security and privacy is critical for deploying DNNs. However, the vast and diverse body of related works can be overwhelming. This inspires us to conduct a comprehensive survey on recent research toward the goal of high-performance, cost-efficient, and safe deployment of DNNs. Our survey first covers the mainstream model compression techniques such as model quantization, model pruning, knowledge distillation, and optimizations of non-linear operations. We then introduce recent advances in designing hardware accelerators that can adapt to efficient model compression approaches. Additionally, we discuss how homomorphic encryption can be integrated to secure DNN deployment. Finally, we discuss several issues, such as hardware evaluation, generalization, and integration of various compression approaches. Overall, we aim to provide a big picture of efficient DNNs, from algorithm to hardware accelerators and security perspectives.

[24]  arXiv:2405.06059 (cross-list from cs.CL) [pdf, other]
Title: A Mixture-of-Experts Approach to Few-Shot Task Transfer in Open-Ended Text Worlds
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Open-ended worlds are those in which there are no pre-specified goals or environmental reward signal. As a consequence, an agent must know how to perform a multitude of tasks. However, when a new task is presented to an agent, we expect it to be able to reuse some of what it knows from previous tasks to rapidly learn that new task. We introduce a novel technique whereby policies for different a priori known tasks are combined into a Mixture-of-Experts model with an attention mechanism across a mix of frozen and unfrozen experts. The model learns when to attend to frozen task-specific experts when appropriate and learns new experts to handle novel situations. We work in an open-ended text-based environment in which the agent is tasked with behaving like different types of character roles and must rapidly learn behaviors associated with new character role types. We show that our agent both obtains more rewards in the zero-shot setting, and discovers these rewards with greater sample efficiency in the few-shot learning settings.

[25]  arXiv:2405.06087 (cross-list from cs.HC) [pdf, other]
Title: When Are Combinations of Humans and AI Useful?
Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)

Inspired by the increasing use of AI to augment humans, researchers have studied human-AI systems involving different tasks, systems, and populations. Despite such a large body of work, we lack a broad conceptual understanding of when combinations of humans and AI are better than either alone. Here, we addressed this question by conducting a meta-analysis of over 100 recent experimental studies reporting over 300 effect sizes. First, we found that, on average, human-AI combinations performed significantly worse than the best of humans or AI alone. Second, we found performance losses in tasks that involved making decisions and significantly greater gains in tasks that involved creating content. Finally, when humans outperformed AI alone, we found performance gains in the combination, but when the AI outperformed humans alone we found losses. These findings highlight the heterogeneity of the effects of human-AI collaboration and point to promising avenues for improving human-AI systems.

[26]  arXiv:2405.06145 (cross-list from cs.CL) [pdf, other]
Title: Reddit-Impacts: A Named Entity Recognition Dataset for Analyzing Clinical and Social Effects of Substance Use Derived from Social Media
Comments: 7 pages, 1 figure, 4 tables
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Substance use disorders (SUDs) are a growing concern globally, necessitating enhanced understanding of the problem and its trends through data-driven research. Social media are unique and important sources of information about SUDs, particularly since the data in such sources are often generated by people with lived experiences. In this paper, we introduce Reddit-Impacts, a challenging Named Entity Recognition (NER) dataset curated from subreddits dedicated to discussions on prescription and illicit opioids, as well as medications for opioid use disorder. The dataset specifically concentrates on the lesser-studied, yet critically important, aspects of substance use--its clinical and social impacts. We collected data from chosen subreddits using the publicly available Application Programming Interface for Reddit. We manually annotated text spans representing clinical and social impacts reported by people who also reported personal nonmedical use of substances including but not limited to opioids, stimulants and benzodiazepines. Our objective is to create a resource that can enable the development of systems that can automatically detect clinical and social impacts of substance use from text-based social media data. The successful development of such systems may enable us to better understand how nonmedical use of substances affects individual health and societal dynamics, aiding the development of effective public health strategies. In addition to creating the annotated data set, we applied several machine learning models to establish baseline performances. Specifically, we experimented with transformer models like BERT, and RoBERTa, one few-shot learning model DANN by leveraging the full training dataset, and GPT-3.5 by using one-shot learning, for automatic NER of clinical and social impacts. The dataset has been made available through the 2024 SMM4H shared tasks.

[27]  arXiv:2405.06164 (cross-list from cs.SE) [pdf, other]
Title: Skeet: Towards a Lightweight Serverless Framework Supporting Modern AI-Driven App Development
Journal-ref: Fumitake, K.; Kishi, S. and Neve, J. (2024). In Proceedings of the 19th International Conference on Evaluation of Novel Approaches to Software Engineering - ENASE
Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI)

The field of web and mobile software frameworks is relatively mature, with a large variety of tools in different languages that facilitate traditional app development where data in a relational database is displayed and modified. Our position is that many current frameworks became popular during single server deployment of MVC architecture apps, and do not facilitate modern aspects of app development such as cloud computing and the incorporation of emerging technologies such as AI. We present a novel framework which accomplishes these purposes, Skeet, which was recently released to general use, alongside an initial evaluation. Skeet provides an app structure that reflects current trends in architecture, and tool suites that allow developers with minimal knowledge of AI internals to easily incorporate such technologies into their apps and deploy them.

[28]  arXiv:2405.06192 (cross-list from cs.LG) [pdf, other]
Title: Contrastive Representation for Data Filtering in Cross-Domain Offline Reinforcement Learning
Comments: This paper has been accepted by ICML2024
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Cross-domain offline reinforcement learning leverages source domain data with diverse transition dynamics to alleviate the data requirement for the target domain. However, simply merging the data of two domains leads to performance degradation due to the dynamics mismatch. Existing methods address this problem by measuring the dynamics gap via domain classifiers while relying on the assumptions of the transferability of paired domains. In this paper, we propose a novel representation-based approach to measure the domain gap, where the representation is learned through a contrastive objective by sampling transitions from different domains. We show that such an objective recovers the mutual-information gap of transition functions in two domains without suffering from the unbounded issue of the dynamics gap in handling significantly different domains. Based on the representations, we introduce a data filtering algorithm that selectively shares transitions from the source domain according to the contrastive score functions. Empirical results on various tasks demonstrate that our method achieves superior performance, using only 10% of the target data to achieve 89.2% of the performance on 100% target dataset with state-of-the-art methods.

[29]  arXiv:2405.06196 (cross-list from cs.CV) [pdf, other]
Title: VLSM-Adapter: Finetuning Vision-Language Segmentation Efficiently with Lightweight Blocks
Comments: 12 pages, 5 figures, 2 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)

Foundation Vision-Language Models (VLMs) trained using large-scale open-domain images and text pairs have recently been adapted to develop Vision-Language Segmentation Models (VLSMs) that allow providing text prompts during inference to guide image segmentation. If robust and powerful VLSMs can be built for medical images, it could aid medical professionals in many clinical tasks where they must spend substantial time delineating the target structure of interest. VLSMs for medical images resort to fine-tuning base VLM or VLSM pretrained on open-domain natural image datasets due to fewer annotated medical image datasets; this fine-tuning is resource-consuming and expensive as it usually requires updating all or a significant fraction of the pretrained parameters. Recently, lightweight blocks called adapters have been proposed in VLMs that keep the pretrained model frozen and only train adapters during fine-tuning, substantially reducing the computing resources required. We introduce a novel adapter, VLSM-Adapter, that can fine-tune pretrained vision-language segmentation models using transformer encoders. Our experiments in widely used CLIP-based segmentation models show that with only 3 million trainable parameters, the VLSM-Adapter outperforms state-of-the-art and is comparable to the upper bound end-to-end fine-tuning. The source code is available at: https://github.com/naamiinepal/vlsm-adapter.

[30]  arXiv:2405.06204 (cross-list from cs.CL) [pdf, other]
Title: HC$^2$L: Hybrid and Cooperative Contrastive Learning for Cross-lingual Spoken Language Understanding
Comments: Accepted by IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI). arXiv admin note: text overlap with arXiv:2312.03716
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

State-of-the-art model for zero-shot cross-lingual spoken language understanding performs cross-lingual unsupervised contrastive learning to achieve the label-agnostic semantic alignment between each utterance and its code-switched data. However, it ignores the precious intent/slot labels, whose label information is promising to help capture the label-aware semantics structure and then leverage supervised contrastive learning to improve both source and target languages' semantics. In this paper, we propose Hybrid and Cooperative Contrastive Learning to address this problem. Apart from cross-lingual unsupervised contrastive learning, we design a holistic approach that exploits source language supervised contrastive learning, cross-lingual supervised contrastive learning and multilingual supervised contrastive learning to perform label-aware semantics alignments in a comprehensive manner. Each kind of supervised contrastive learning mechanism includes both single-task and joint-task scenarios. In our model, one contrastive learning mechanism's input is enhanced by others. Thus the total four contrastive learning mechanisms are cooperative to learn more consistent and discriminative representations in the virtuous cycle during the training process. Experiments show that our model obtains consistent improvements over 9 languages, achieving new state-of-the-art performance.

[31]  arXiv:2405.06206 (cross-list from cs.CR) [pdf, other]
Title: Concealing Backdoor Model Updates in Federated Learning by Trigger-Optimized Data Poisoning
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Federated Learning (FL) is a decentralized machine learning method that enables participants to collaboratively train a model without sharing their private data. Despite its privacy and scalability benefits, FL is susceptible to backdoor attacks, where adversaries poison the local training data of a subset of clients using a backdoor trigger, aiming to make the aggregated model produce malicious results when the same backdoor condition is met by an inference-time input. Existing backdoor attacks in FL suffer from common deficiencies: fixed trigger patterns and reliance on the assistance of model poisoning. State-of-the-art defenses based on Byzantine-robust aggregation exhibit a good defense performance on these attacks because of the significant divergence between malicious and benign model updates. To effectively conceal malicious model updates among benign ones, we propose DPOT, a backdoor attack strategy in FL that dynamically constructs backdoor objectives by optimizing a backdoor trigger, making backdoor data have minimal effect on model updates. We provide theoretical justifications for DPOT's attacking principle and display experimental results showing that DPOT, via only a data-poisoning attack, effectively undermines state-of-the-art defenses and outperforms existing backdoor attack techniques on various datasets.

[32]  arXiv:2405.06211 (cross-list from cs.CL) [pdf, other]
Title: A Survey on RAG Meets LLMs: Towards Retrieval-Augmented Large Language Models
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)

As one of the most advanced techniques in AI, Retrieval-Augmented Generation (RAG) techniques can offer reliable and up-to-date external knowledge, providing huge convenience for numerous tasks. Particularly in the era of AI-generated content (AIGC), the powerful capacity of retrieval in RAG in providing additional knowledge enables retrieval-augmented generation to assist existing generative AI in producing high-quality outputs. Recently, large Language Models (LLMs) have demonstrated revolutionary abilities in language understanding and generation, while still facing inherent limitations, such as hallucinations and out-of-date internal knowledge. Given the powerful abilities of RAG in providing the latest and helpful auxiliary information, retrieval-augmented large language models have emerged to harness external and authoritative knowledge bases, rather than solely relying on the model's internal knowledge, to augment the generation quality of LLMs. In this survey, we comprehensively review existing research studies in retrieval-augmented large language models (RA-LLMs), covering three primary technical perspectives: architectures, training strategies, and applications. As the preliminary knowledge, we briefly introduce the foundations and recent advances of LLMs. Then, to illustrate the practical significance of RAG for LLMs, we categorize mainstream relevant work by application areas, detailing specifically the challenges of each and the corresponding capabilities of RA-LLMs. Finally, to deliver deeper insights, we discuss current limitations and several promising directions for future research.

[33]  arXiv:2405.06239 (cross-list from cs.CL) [pdf, other]
Title: SaudiBERT: A Large Language Model Pretrained on Saudi Dialect Corpora
Authors: Faisal Qarah
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

In this paper, we introduce SaudiBERT, a monodialect Arabic language model pretrained exclusively on Saudi dialectal text. To demonstrate the model's effectiveness, we compared SaudiBERT with six different multidialect Arabic language models across 11 evaluation datasets, which are divided into two groups: sentiment analysis and text classification. SaudiBERT achieved average F1-scores of 86.15\% and 87.86\% in these groups respectively, significantly outperforming all other comparative models. Additionally, we present two novel Saudi dialectal corpora: the Saudi Tweets Mega Corpus (STMC), which contains over 141 million tweets in Saudi dialect, and the Saudi Forums Corpus (SFC), which includes 15.2 GB of text collected from five Saudi online forums. Both corpora are used in pretraining the proposed model, and they are the largest Saudi dialectal corpora ever reported in the literature. The results confirm the effectiveness of SaudiBERT in understanding and analyzing Arabic text expressed in Saudi dialect, achieving state-of-the-art results in most tasks and surpassing other language models included in the study. SaudiBERT model is publicly available on \url{https://huggingface.co/faisalq/SaudiBERT}.

[34]  arXiv:2405.06247 (cross-list from cs.LG) [pdf, other]
Title: Disttack: Graph Adversarial Attacks Toward Distributed GNN Training
Comments: Accepted by 30th International European Conference on Parallel and Distributed Computing(Euro-Par 2024)
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)

Graph Neural Networks (GNNs) have emerged as potent models for graph learning. Distributing the training process across multiple computing nodes is the most promising solution to address the challenges of ever-growing real-world graphs. However, current adversarial attack methods on GNNs neglect the characteristics and applications of the distributed scenario, leading to suboptimal performance and inefficiency in attacking distributed GNN training.
In this study, we introduce Disttack, the first framework of adversarial attacks for distributed GNN training that leverages the characteristics of frequent gradient updates in a distributed system. Specifically, Disttack corrupts distributed GNN training by injecting adversarial attacks into one single computing node. The attacked subgraphs are precisely perturbed to induce an abnormal gradient ascent in backpropagation, disrupting gradient synchronization between computing nodes and thus leading to a significant performance decline of the trained GNN. We evaluate Disttack on four large real-world graphs by attacking five widely adopted GNNs. Compared with the state-of-the-art attack method, experimental results demonstrate that Disttack amplifies the model accuracy degradation by 2.75$\times$ and achieves speedup by 17.33$\times$ on average while maintaining unnoticeability.

[35]  arXiv:2405.06260 (cross-list from cs.CV) [pdf, other]
Title: Precise Apple Detection and Localization in Orchards using YOLOv5 for Robotic Harvesting Systems
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

The advancement of agricultural robotics holds immense promise for transforming fruit harvesting practices, particularly within the apple industry. The accurate detection and localization of fruits are pivotal for the successful implementation of robotic harvesting systems. In this paper, we propose a novel approach to apple detection and position estimation utilizing an object detection model, YOLOv5. Our primary objective is to develop a robust system capable of identifying apples in complex orchard environments and providing precise location information. To achieve this, we curated an autonomously labeled dataset comprising diverse apple tree images, which was utilized for both training and evaluation purposes. Through rigorous experimentation, we compared the performance of our YOLOv5-based system with other popular object detection models, including SSD. Our results demonstrate that the YOLOv5 model outperforms its counterparts, achieving an impressive apple detection accuracy of approximately 85%. We believe that our proposed system's accurate apple detection and position estimation capabilities represent a significant advancement in agricultural robotics, laying the groundwork for more efficient and sustainable fruit harvesting practices.

[36]  arXiv:2405.06263 (cross-list from cs.LG) [pdf, other]
Title: Learning Latent Dynamic Robust Representations for World Models
Journal-ref: ICML 2024
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Visual Model-Based Reinforcement Learning (MBRL) promises to encapsulate agent's knowledge about the underlying dynamics of the environment, enabling learning a world model as a useful planner. However, top MBRL agents such as Dreamer often struggle with visual pixel-based inputs in the presence of exogenous or irrelevant noise in the observation space, due to failure to capture task-specific features while filtering out irrelevant spatio-temporal details. To tackle this problem, we apply a spatio-temporal masking strategy, a bisimulation principle, combined with latent reconstruction, to capture endogenous task-specific aspects of the environment for world models, effectively eliminating non-essential information. Joint training of representations, dynamics, and policy often leads to instabilities. To further address this issue, we develop a Hybrid Recurrent State-Space Model (HRSSM) structure, enhancing state representation robustness for effective policy learning. Our empirical evaluation demonstrates significant performance improvements over existing methods in a range of visually complex control tasks such as Maniskill \cite{gu2023maniskill2} with exogenous distractors from the Matterport environment. Our code is avaliable at https://github.com/bit1029public/HRSSM.

[37]  arXiv:2405.06270 (cross-list from cs.LG) [pdf, other]
Title: XAI4LLM. Let Machine Learning Models and LLMs Collaborate for Enhanced In-Context Learning in Healthcare
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

The integration of Large Language Models (LLMs) into healthcare diagnostics offers a promising avenue for clinical decision-making. This study outlines the development of a novel method for zero-shot/few-shot in-context learning (ICL) by integrating medical domain knowledge using a multi-layered structured prompt. We also explore the efficacy of two communication styles between the user and LLMs: the Numerical Conversational (NC) style, which processes data incrementally, and the Natural Language Single-Turn (NL-ST) style, which employs long narrative prompts. Our study systematically evaluates the diagnostic accuracy and risk factors, including gender bias and false negative rates, using a dataset of 920 patient records in various few-shot scenarios. Results indicate that traditional clinical machine learning (ML) models generally outperform LLMs in zero-shot and few-shot settings. However, the performance gap narrows significantly when employing few-shot examples alongside effective explainable AI (XAI) methods as sources of domain knowledge. Moreover, with sufficient time and an increased number of examples, the conversational style (NC) nearly matches the performance of ML models. Most notably, LLMs demonstrate comparable or superior cost-sensitive accuracy relative to ML models. This research confirms that, with appropriate domain knowledge and tailored communication strategies, LLMs can significantly enhance diagnostic processes. The findings highlight the importance of optimizing the number of training examples and communication styles to improve accuracy and reduce biases in LLM applications.

[38]  arXiv:2405.06289 (cross-list from cs.SD) [pdf, other]
Title: Look Once to Hear: Target Speech Hearing with Noisy Examples
Comments: Honorable mention at CHI 2024
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)

In crowded settings, the human brain can focus on speech from a target speaker, given prior knowledge of how they sound. We introduce a novel intelligent hearable system that achieves this capability, enabling target speech hearing to ignore all interfering speech and noise, but the target speaker. A naive approach is to require a clean speech example to enroll the target speaker. This is however not well aligned with the hearable application domain since obtaining a clean example is challenging in real world scenarios, creating a unique user interface problem. We present the first enrollment interface where the wearer looks at the target speaker for a few seconds to capture a single, short, highly noisy, binaural example of the target speaker. This noisy example is used for enrollment and subsequent speech extraction in the presence of interfering speakers and noise. Our system achieves a signal quality improvement of 7.01 dB using less than 5 seconds of noisy enrollment audio and can process 8 ms of audio chunks in 6.24 ms on an embedded CPU. Our user studies demonstrate generalization to real-world static and mobile speakers in previously unseen indoor and outdoor multipath environments. Finally, our enrollment interface for noisy examples does not cause performance degradation compared to clean examples, while being convenient and user-friendly. Taking a step back, this paper takes an important step towards enhancing the human auditory perception with artificial intelligence. We provide code and data at: https://github.com/vb000/LookOnceToHear.

[39]  arXiv:2405.06295 (cross-list from cs.CL) [pdf, other]
Title: Aspect-oriented Consumer Health Answer Summarization
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Community Question-Answering (CQA) forums have revolutionized how people seek information, especially those related to their healthcare needs, placing their trust in the collective wisdom of the public. However, there can be several answers in response to a single query, which makes it hard to grasp the key information related to the specific health concern. Typically, CQA forums feature a single top-voted answer as a representative summary for each query. However, a single answer overlooks the alternative solutions and other information frequently offered in other responses. Our research focuses on aspect-based summarization of health answers to address this limitation. Summarization of responses under different aspects such as suggestions, information, personal experiences, and questions can enhance the usability of the platforms. We formalize a multi-stage annotation guideline and contribute a unique dataset comprising aspect-based human-written health answer summaries. We build an automated multi-faceted answer summarization pipeline with this dataset based on task-specific fine-tuning of several state-of-the-art models. The pipeline leverages question similarity to retrieve relevant answer sentences, subsequently classifying them into the appropriate aspect type. Following this, we employ several recent abstractive summarization models to generate aspect-based summaries. Finally, we present a comprehensive human analysis and find that our summaries rank high in capturing relevant content and a wide range of solutions.

[40]  arXiv:2405.06299 (cross-list from eess.SP) [pdf, other]
Title: Cross-domain Learning Framework for Tracking Users in RIS-aided Multi-band ISAC Systems with Sparse Labeled Data
Subjects: Signal Processing (eess.SP); Artificial Intelligence (cs.AI)

Integrated sensing and communications (ISAC) is pivotal for 6G communications and is boosted by the rapid development of reconfigurable intelligent surfaces (RISs). Using the channel state information (CSI) across multiple frequency bands, RIS-aided multi-band ISAC systems can potentially track users' positions with high precision. Though tracking with CSI is desirable as no communication overheads are incurred, it faces challenges due to the multi-modalities of CSI samples, irregular and asynchronous data traffic, and sparse labeled data for learning the tracking function. This paper proposes the X2Track framework, where we model the tracking function by a hierarchical architecture, jointly utilizing multi-modal CSI indicators across multiple bands, and optimize it in a cross-domain manner, tackling the sparsity of labeled data for the target deployment environment (namely, target domain) by adapting the knowledge learned from another environment (namely, source domain). Under X2Track, we design an efficient deep learning algorithm to minimize tracking errors, based on transformer neural networks and adversarial learning techniques. Simulation results verify that X2Track achieves decimeter-level axial tracking errors even under scarce UL data traffic and strong interference conditions and can adapt to diverse deployment environments with fewer than 5% training data, or equivalently, 5 minutes of UE tracks, being labeled.

[41]  arXiv:2405.06301 (cross-list from cs.LG) [pdf, ps, other]
Title: Learning from String Sequences
Comments: 10 pages, 1 figure, 4 tables, Technical Report
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computational Engineering, Finance, and Science (cs.CE); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)

The Universal Similarity Metric (USM) has been demonstrated to give practically useful measures of "similarity" between sequence data. Here we have used the USM as an alternative distance metric in a K-Nearest Neighbours (K-NN) learner to allow effective pattern recognition of variable length sequence data. We compare this USM approach with the commonly used string-to-word vector approach. Our experiments have used two data sets of divergent domains: (1) spam email filtering and (2) protein subcellular localization. Our results with this data reveal that the USM-based K-NN learner (1) gives predictions with higher classification accuracy than those output by techniques that use the string-to-word vector approach, and (2) can be used to generate reliable probability forecasts.

[42]  arXiv:2405.06321 (cross-list from cs.CL) [pdf, other]
Title: Correlation Dimension of Natural Language in a Statistical Manifold
Comments: Published at Physical Review Research
Journal-ref: Du, X., & Tanaka-Ishii, K. (2024). Correlation dimension of natural language in a statistical manifold. Physical Review Research, 6(2), L022028
Subjects: Computation and Language (cs.CL); Statistical Mechanics (cond-mat.stat-mech); Artificial Intelligence (cs.AI)

The correlation dimension of natural language is measured by applying the Grassberger-Procaccia algorithm to high-dimensional sequences produced by a large-scale language model. This method, previously studied only in a Euclidean space, is reformulated in a statistical manifold via the Fisher-Rao distance. Language exhibits a multifractal, with global self-similarity and a universal dimension around 6.5, which is smaller than those of simple discrete random sequences and larger than that of a Barab\'asi-Albert process. Long memory is the key to producing self-similarity. Our method is applicable to any probabilistic model of real-world discrete sequences, and we show an application to music data.

[43]  arXiv:2405.06329 (cross-list from cs.CY) [pdf, ps, other]
Title: ChatGPTest: opportunities and cautionary tales of utilizing AI for questionnaire pretesting
Comments: 11 pages, 2 Figures
Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI)

The rapid advancements in generative artificial intelligence have opened up new avenues for enhancing various aspects of research, including the design and evaluation of survey questionnaires. However, the recent pioneering applications have not considered questionnaire pretesting. This article explores the use of GPT models as a useful tool for pretesting survey questionnaires, particularly in the early stages of survey design. Illustrated with two applications, the article suggests incorporating GPT feedback as an additional stage before human pretesting, potentially reducing successive iterations. The article also emphasizes the indispensable role of researchers' judgment in interpreting and implementing AI-generated feedback.

[44]  arXiv:2405.06354 (cross-list from cs.CV) [pdf, other]
Title: KeepOriginalAugment: Single Image-based Better Information-Preserving Data Augmentation Approach
Comments: This paper has been accepted at 20th International Conference on Artificial Intelligence Applications and Innovations 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Advanced image data augmentation techniques play a pivotal role in enhancing the training of models for diverse computer vision tasks. Notably, SalfMix and KeepAugment have emerged as popular strategies, showcasing their efficacy in boosting model performance. However, SalfMix reliance on duplicating salient features poses a risk of overfitting, potentially compromising the model's generalization capabilities. Conversely, KeepAugment, which selectively preserves salient regions and augments non-salient ones, introduces a domain shift that hinders the exchange of crucial contextual information, impeding overall model understanding. In response to these challenges, we introduce KeepOriginalAugment, a novel data augmentation approach. This method intelligently incorporates the most salient region within the non-salient area, allowing augmentation to be applied to either region. Striking a balance between data diversity and information preservation, KeepOriginalAugment enables models to leverage both diverse salient and non-salient regions, leading to enhanced performance. We explore three strategies for determining the placement of the salient region minimum, maximum, or random and investigate swapping perspective strategies to decide which part (salient or non-salient) undergoes augmentation. Our experimental evaluations, conducted on classification datasets such as CIFAR-10, CIFAR-100, and TinyImageNet, demonstrate the superior performance of KeepOriginalAugment compared to existing state-of-the-art techniques.

[45]  arXiv:2405.06363 (cross-list from cs.LG) [pdf, ps, other]
Title: Projection by Convolution: Optimal Sample Complexity for Reinforcement Learning in Continuous-Space MDPs
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

We consider the problem of learning an $\varepsilon$-optimal policy in a general class of continuous-space Markov decision processes (MDPs) having smooth Bellman operators. Given access to a generative model, we achieve rate-optimal sample complexity by performing a simple, \emph{perturbed} version of least-squares value iteration with orthogonal trigonometric polynomials as features. Key to our solution is a novel projection technique based on ideas from harmonic analysis. Our~$\widetilde{\mathcal{O}}(\epsilon^{-2-d/(\nu+1)})$ sample complexity, where $d$ is the dimension of the state-action space and $\nu$ the order of smoothness, recovers the state-of-the-art result of discretization approaches for the special case of Lipschitz MDPs $(\nu=0)$. At the same time, for $\nu\to\infty$, it recovers and greatly generalizes the $\mathcal{O}(\epsilon^{-2})$ rate of low-rank MDPs, which are more amenable to regression approaches. In this sense, our result bridges the gap between two popular but conflicting perspectives on continuous-space MDPs.

[46]  arXiv:2405.06372 (cross-list from eess.SY) [pdf, other]
Title: Intelligent Duty Cycling Management and Wake-up for Energy Harvesting IoT Networks with Correlated Activity
Subjects: Systems and Control (eess.SY); Artificial Intelligence (cs.AI)

This paper presents an approach for energy-neutral Internet of Things (IoT) scenarios where the IoT devices (IoTDs) rely entirely on their energy harvesting capabilities to sustain operation. We use a Markov chain to represent the operation and transmission states of the IoTDs, a modulated Poisson process to model their energy harvesting process, and a discrete-time Markov chain to model their battery state. The aim is to efficiently manage the duty cycling of the IoTDs, so as to prolong their battery life and reduce instances of low-energy availability. We propose a duty-cycling management based on K- nearest neighbors, aiming to strike a trade-off between energy efficiency and detection accuracy. This is done by incorporating spatial and temporal correlations among IoTDs' activity, as well as their energy harvesting capabilities. We also allow the base station to wake up specific IoTDs if more information about an event is needed upon initial detection. Our proposed scheme shows significant improvements in energy savings and performance, with up to 11 times lower misdetection probability and 50\% lower energy consumption for high-density scenarios compared to a random duty cycling benchmark.

[47]  arXiv:2405.06373 (cross-list from cs.CL) [pdf, other]
Title: LLM Discussion: Enhancing the Creativity of Large Language Models via Discussion Framework and Role-Play
Comments: 10 pages, 6 figures, Under Review of COLM
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Large language models (LLMs) have shown exceptional proficiency in natural language processing but often fall short of generating creative and original responses to open-ended questions. To enhance LLM creativity, our key insight is to emulate the human process of inducing collective creativity through engaging discussions with participants from diverse backgrounds and perspectives. To this end, we propose LLM Discussion, a three-phase discussion framework that facilitates vigorous and diverging idea exchanges and ensures convergence to creative answers. Moreover, we adopt a role-playing technique by assigning distinct roles to LLMs to combat the homogeneity of LLMs. We evaluate the efficacy of the proposed framework with the Alternative Uses Test, Similarities Test, Instances Test, and Scientific Creativity Test through both LLM evaluation and human study. Our proposed framework outperforms single-LLM approaches and existing multi-LLM frameworks across various creativity metrics.

[48]  arXiv:2405.06389 (cross-list from cs.CV) [pdf, other]
Title: Continual Novel Class Discovery via Feature Enhancement and Adaptation
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Continual Novel Class Discovery (CNCD) aims to continually discover novel classes without labels while maintaining the recognition capability for previously learned classes. The main challenges faced by CNCD include the feature-discrepancy problem, the inter-session confusion problem, etc. In this paper, we propose a novel Feature Enhancement and Adaptation method for the CNCD to tackle the above challenges, which consists of a guide-to-novel framework, a centroid-to-samples similarity constraint (CSS), and a boundary-aware prototype constraint (BAP). More specifically, the guide-to-novel framework is established to continually discover novel classes under the guidance of prior distribution. Afterward, the CSS is designed to constrain the relationship between centroid-to-samples similarities of different classes, thereby enhancing the distinctiveness of features among novel classes. Finally, the BAP is proposed to keep novel class features aware of the positions of other class prototypes during incremental sessions, and better adapt novel class features to the shared feature space. Experimental results on three benchmark datasets demonstrate the superiority of our method, especially in more challenging protocols with more incremental sessions.

[49]  arXiv:2405.06394 (cross-list from cs.LG) [pdf, other]
Title: Memory Mosaics
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)

Memory Mosaics are networks of associative memories working in concert to achieve a prediction task of interest. Like transformers, memory mosaics possess compositional capabilities and in-context learning capabilities. Unlike transformers, memory mosaics achieve these capabilities in comparatively transparent ways. We demonstrate these capabilities on toy examples and we also show that memory mosaics perform as well or better than transformers on medium-scale language modeling tasks.

[50]  arXiv:2405.06399 (cross-list from cs.LG) [pdf, other]
Title: Program Synthesis using Inductive Logic Programming for the Abstraction and Reasoning Corpus
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Programming Languages (cs.PL)

The Abstraction and Reasoning Corpus (ARC) is a general artificial intelligence benchmark that is currently unsolvable by any Machine Learning method, including Large Language Models (LLMs). It demands strong generalization and reasoning capabilities which are known to be weaknesses of Neural Network based systems. In this work, we propose a Program Synthesis system that uses Inductive Logic Programming (ILP), a branch of Symbolic AI, to solve ARC. We have manually defined a simple Domain Specific Language (DSL) that corresponds to a small set of object-centric abstractions relevant to ARC. This is the Background Knowledge used by ILP to create Logic Programs that provide reasoning capabilities to our system. The full system is capable of generalize to unseen tasks, since ILP can create Logic Program(s) from few examples, in the case of ARC: pairs of Input-Output grids examples for each task. These Logic Programs are able to generate Objects present in the Output grid and the combination of these can form a complete program that transforms an Input grid into an Output grid. We randomly chose some tasks from ARC that dont require more than the small number of the Object primitives we implemented and show that given only these, our system can solve tasks that require each, such different reasoning.

[51]  arXiv:2405.06409 (cross-list from cs.LG) [pdf, other]
Title: Visualizing Neural Network Imagination
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

In certain situations, neural networks will represent environment states in their hidden activations. Our goal is to visualize what environment states the networks are representing. We experiment with a recurrent neural network (RNN) architecture with a decoder network at the end. After training, we apply the decoder to the intermediate representations of the network to visualize what they represent. We define a quantitative interpretability metric and use it to demonstrate that hidden states can be highly interpretable on a simple task. We also develop autoencoder and adversarial techniques and show that benefit interpretability.

[52]  arXiv:2405.06418 (cross-list from cs.LG) [pdf, other]
Title: PAC-Bayesian Generalization Bounds for Knowledge Graph Representation Learning
Comments: Accepted to ICML 2024. This is not the final version of the paper. The camera-ready version will be uploaded soon
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

While a number of knowledge graph representation learning (KGRL) methods have been proposed over the past decade, very few theoretical analyses have been conducted on them. In this paper, we present the first PAC-Bayesian generalization bounds for KGRL methods. To analyze a broad class of KGRL models, we propose a generic framework named ReED (Relation-aware Encoder-Decoder), which consists of a relation-aware message passing encoder and a triplet classification decoder. Our ReED framework can express at least 15 different existing KGRL models, including not only graph neural network-based models such as R-GCN and CompGCN but also shallow-architecture models such as RotatE and ANALOGY. Our generalization bounds for the ReED framework provide theoretical grounds for the commonly used tricks in KGRL, e.g., parameter-sharing and weight normalization schemes, and guide desirable design choices for practical KGRL methods. We empirically show that the critical factors in our generalization bounds can explain actual generalization errors on three real-world knowledge graphs.

[53]  arXiv:2405.06419 (cross-list from cs.LG) [pdf, other]
Title: Time Evidence Fusion Network: Multi-source View in Long-Term Time Series Forecasting
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)

In real-world scenarios, time series forecasting often demands timeliness, making research on model backbones a perennially hot topic. To meet these performance demands, we propose a novel backbone from the perspective of information fusion. Introducing the Basic Probability Assignment (BPA) Module and the Time Evidence Fusion Network (TEFN), based on evidence theory, allows us to achieve superior performance. On the other hand, the perspective of multi-source information fusion effectively improves the accuracy of forecasting. Due to the fact that BPA is generated by fuzzy theory, TEFN also has considerable interpretability. In real data experiments, the TEFN partially achieved state-of-the-art, with low errors comparable to PatchTST, and operating efficiency surpass performance models such as Dlinear. Meanwhile, TEFN has high robustness and small error fluctuations in the random hyperparameter selection. TEFN is not a model that achieves the ultimate in single aspect, but a model that balances performance, accuracy, stability, and interpretability.

[54]  arXiv:2405.06422 (cross-list from cs.RO) [pdf, other]
Title: Contextual Affordances for Safe Exploration in Robotic Scenarios
Comments: 5 pages, 2 figures. Accepted at the 2nd Workshop on Human-aligned Reinforcement Learning for Autonomous Agents and Robots HARL, at the IEEE International Conference on Robotics and Automation ICRA, Yokohama, Japan, 2024
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)

Robotics has been a popular field of research in the past few decades, with much success in industrial applications such as manufacturing and logistics. This success is led by clearly defined use cases and controlled operating environments. However, robotics has yet to make a large impact in domestic settings. This is due in part to the difficulty and complexity of designing mass-manufactured robots that can succeed in the variety of homes and environments that humans live in and that can operate safely in close proximity to humans. This paper explores the use of contextual affordances to enable safe exploration and learning in robotic scenarios targeted in the home. In particular, we propose a simple state representation that allows us to extend contextual affordances to larger state spaces and showcase how affordances can improve the success and convergence rate of a reinforcement learning algorithm in simulation. Our results suggest that after further iterations, it is possible to consider the implementation of this approach in a real robot manipulator. Furthermore, in the long term, this work could be the foundation for future explorations of human-robot interactions in complex domestic environments. This could be possible once state-of-the-art robot manipulators achieve the required level of dexterity for the described affordances in this paper.

[55]  arXiv:2405.06424 (cross-list from cs.CL) [pdf, other]
Title: Improving Instruction Following in Language Models through Proxy-Based Uncertainty Estimation
Comments: Accepted to ICML 2024
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Assessing response quality to instructions in language models is vital but challenging due to the complexity of human language across different contexts. This complexity often results in ambiguous or inconsistent interpretations, making accurate assessment difficult. To address this issue, we propose a novel Uncertainty-aware Reward Model (URM) that introduces a robust uncertainty estimation for the quality of paired responses based on Bayesian approximation. Trained with preference datasets, our uncertainty-enabled proxy not only scores rewards for responses but also evaluates their inherent uncertainty. Empirical results demonstrate significant benefits of incorporating the proposed proxy into language model training. Our method boosts the instruction following capability of language models by refining data curation for training and improving policy optimization objectives, thereby surpassing existing methods by a large margin on benchmarks such as Vicuna and MT-bench. These findings highlight that our proposed approach substantially advances language model training and paves a new way of harnessing uncertainty within language models.

[56]  arXiv:2405.06459 (cross-list from cs.CL) [pdf, other]
Title: Are EEG-to-Text Models Working?
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

This work critically analyzes existing models for open-vocabulary EEG-to-Text translation. We identify a crucial limitation: previous studies often employed implicit teacher-forcing during evaluation, artificially inflating performance metrics. Additionally, they lacked a critical benchmark - comparing model performance on pure noise inputs. We propose a methodology to differentiate between models that truly learn from EEG signals and those that simply memorize training data. Our analysis reveals that model performance on noise data can be comparable to that on EEG data. These findings highlight the need for stricter evaluation practices in EEG-to-Text research, emphasizing transparent reporting and rigorous benchmarking with noise inputs. This approach will lead to more reliable assessments of model capabilities and pave the way for robust EEG-to-Text communication systems.

[57]  arXiv:2405.06478 (cross-list from cs.CY) [pdf, ps, other]
Title: Attention is all they need: Cognitive science and the (techno)political economy of attention in humans and machines
Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Social and Information Networks (cs.SI)

This paper critically analyses the "attention economy" within the framework of cognitive science and techno-political economics, as applied to both human and machine interactions. We explore how current business models, particularly in digital platform capitalism, harness user engagement by strategically shaping attentional patterns. These platforms utilize advanced AI and massive data analytics to enhance user engagement, creating a cycle of attention capture and data extraction. We review contemporary (neuro)cognitive theories of attention and platform engagement design techniques and criticize classical cognitivist and behaviourist theories for their inadequacies in addressing the potential harms of such engagement on user autonomy and wellbeing. 4E approaches to cognitive science, instead, emphasizing the embodied, extended, enactive, and ecological aspects of cognition, offer us an intrinsic normative standpoint and a more integrated understanding of how attentional patterns are actively constituted by adaptive digital environments. By examining the precarious nature of habit formation in digital contexts, we reveal the techno-economic underpinnings that threaten personal autonomy by disaggregating habits away from the individual, into an AI managed collection of behavioural patterns. Our current predicament suggests the necessity of a paradigm shift towards an ecology of attention. This shift aims to foster environments that respect and preserve human cognitive and social capacities, countering the exploitative tendencies of cognitive capitalism.

[58]  arXiv:2405.06485 (cross-list from cs.CC) [pdf, ps, other]
Title: Solving Quantified Boolean Formulas with Few Existential Variables
Subjects: Computational Complexity (cs.CC); Artificial Intelligence (cs.AI)

The quantified Boolean formula (QBF) problem is an important decision problem generally viewed as the archetype for PSPACE-completeness. Many problems of central interest in AI are in general not included in NP, e.g., planning, model checking, and non-monotonic reasoning, and for such problems QBF has successfully been used as a modelling tool. However, solvers for QBF are not as advanced as state of the art SAT solvers, which has prevented QBF from becoming a universal modelling language for PSPACE-complete problems. A theoretical explanation is that QBF (as well as many other PSPACE-complete problems) lacks natural parameters} guaranteeing fixed-parameter tractability (FPT).
In this paper we tackle this problem and consider a simple but overlooked parameter: the number of existentially quantified variables. This natural parameter is virtually unexplored in the literature which one might find surprising given the general scarcity of FPT algorithms for QBF. Via this parameterization we then develop a novel FPT algorithm applicable to QBF instances in conjunctive normal form (CNF) of bounded clause length. We complement this by a W[1]-hardness result for QBF in CNF of unbounded clause length as well as sharper lower bounds for the bounded arity case under the (strong) exponential-time hypothesis.

[59]  arXiv:2405.06511 (cross-list from q-bio.QM) [pdf, other]
Title: Towards Less Biased Data-driven Scoring with Deep Learning-Based End-to-end Database Search in Tandem Mass Spectrometry
Authors: Yonghan Yu, Ming Li
Subjects: Quantitative Methods (q-bio.QM); Artificial Intelligence (cs.AI)

Peptide identification in mass spectrometry-based proteomics is crucial for understanding protein function and dynamics. Traditional database search methods, though widely used, rely on heuristic scoring functions and statistical estimations have to be introduced for a higher identification rate. Here, we introduce DeepSearch, the first deep learning-based end-to-end database search method for tandem mass spectrometry. DeepSearch leverages a modified transformer-based encoder-decoder architecture under the contrastive learning framework. Unlike conventional methods that rely on ion-to-ion matching, DeepSearch adopts a data-driven approach to score peptide spectrum matches. DeepSearch is also the first deep learning-based method that can profile variable post-translational modifications in a zero-shot manner. We showed that DeepSearch's scoring scheme expressed less bias and did not require any statistical estimation. We validated DeepSearch's accuracy and robustness across various datasets, including those from species with diverse protein compositions and a modification-enriched dataset. DeepSearch sheds new light on database search methods in tandem mass spectrometry.

[60]  arXiv:2405.06522 (cross-list from cs.LG) [pdf, other]
Title: Heterogeneous Graph Neural Networks with Loss-decrease-aware Curriculum Learning
Authors: Yili Wang
Comments: arXiv admin note: substantial text overlap with arXiv:2402.18875 by other authors
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

In recent years, heterogeneous graph neural networks (HGNNs) have achieved excellent performance in handling heterogeneous information networks (HINs). Curriculum learning is a machine learning strategy where training examples are presented to a model in a structured order, starting with easy examples and gradually increasing difficulty, aiming to improve learning efficiency and generalization. To better exploit the rich information in HINs, previous methods have started to explore the use of curriculum learning strategy to train HGNNs. Specifically, these works utilize the absolute value of the loss at each training epoch to evaluate the learning difficulty of each training sample. However, the relative loss, rather than the absolute value of loss, reveals the learning difficulty. Therefore, we propose a novel loss-decrease-aware training schedule (LDTS). LDTS uses the trend of loss decrease between each training epoch to better evaluating the difficulty of training samples, thereby enhancing the curriculum learning of HGNNs for downstream tasks. Additionally, we propose a sampling strategy to alleviate training imbalance issues. Our method further demonstrate the efficacy of curriculum learning in enhancing HGNNs capabilities. We call our method Loss-decrease-aware Heterogeneous Graph Neural Networks (LDHGNN). The code is public at https://github.com/wangyili00/LDHGNN.

[61]  arXiv:2405.06553 (cross-list from cs.LG) [pdf, other]
Title: Scalable Property Valuation Models via Graph-based Deep Learning
Comments: 18 pages, 3 figures, Submitted to Expert Systems with Applications
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

This paper aims to enrich the capabilities of existing deep learning-based automated valuation models through an efficient graph representation of peer dependencies, thus capturing intricate spatial relationships. In particular, we develop two novel graph neural network models that effectively identify sequences of neighboring houses with similar features, employing different message passing algorithms. The first strategy consider standard spatial graph convolutions, while the second one utilizes transformer graph convolutions. This approach confers scalability to the modeling process. The experimental evaluation is conducted using a proprietary dataset comprising approximately 200,000 houses located in Santiago, Chile. We show that employing tailored graph neural networks significantly improves the accuracy of house price prediction, especially when utilizing transformer convolutional message passing layers.

[62]  arXiv:2405.06573 (cross-list from cs.SD) [pdf, other]
Title: An Investigation of Incorporating Mamba for Speech Enhancement
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)

This work aims to study a scalable state-space model (SSM), Mamba, for the speech enhancement (SE) task. We exploit a Mamba-based regression model to characterize speech signals and build an SE system upon Mamba, termed SEMamba. We explore the properties of Mamba by integrating it as the core model in both basic and advanced SE systems, along with utilizing signal-level distances as well as metric-oriented loss functions. SEMamba demonstrates promising results and attains a PESQ score of 3.55 on the VoiceBank-DEMAND dataset. When combined with the perceptual contrast stretching technique, the proposed SEMamba yields a new state-of-the-art PESQ score of 3.69.

[63]  arXiv:2405.06611 (cross-list from cs.HC) [pdf, other]
Title: "We are at the mercy of others' opinion": Supporting Blind People in Recreational Window Shopping with AI-infused Technology
Comments: Preprint, W4A'24, Proceedings of the 21st International Web for All Conference
Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI)

Engaging in recreational activities in public spaces poses challenges for blind people, often involving dependency on sighted help. Window shopping is a key recreational activity that remains inaccessible. In this paper, we investigate the information needs, challenges, and current approaches blind people have to recreational window shopping to inform the design of existing wayfinding and navigation technology for supporting blind shoppers in exploration and serendipitous discovery. We conduct a formative study with a total of 18 blind participants that include both focus groups (N=8) and interviews for requirements analysis (N=10). We find that there is a desire for push notifications of promotional information and pull notifications about shops of interest such as the targeted audience of a brand. Information about obstacles and points-of-interest required customization depending on one's mobility aid as well as presence of a crowd, children, and wheelchair users. We translate these findings into specific information modalities and rendering in the context of two existing AI-infused assistive applications: NavCog (a turn-by-turn navigation app) and Cabot (a navigation robot).

[64]  arXiv:2405.06627 (cross-list from cs.LG) [pdf, other]
Title: Conformal Validity Guarantees Exist for Any Data Distribution
Comments: ICML 2024. Code available at this https URL conformal-mfcs
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

As machine learning (ML) gains widespread adoption, practitioners are increasingly seeking means to quantify and control the risk these systems incur. This challenge is especially salient when ML systems have autonomy to collect their own data, such as in black-box optimization and active learning, where their actions induce sequential feedback-loop shifts in the data distribution. Conformal prediction has emerged as a promising approach to uncertainty and risk quantification, but existing variants either fail to accommodate sequences of data-dependent shifts, or do not fully exploit the fact that agent-induced shift is under our control. In this work we prove that conformal prediction can theoretically be extended to \textit{any} joint data distribution, not just exchangeable or quasi-exchangeable ones, although it is exceedingly impractical to compute in the most general case. For practical applications, we outline a procedure for deriving specific conformal algorithms for any data distribution, and we use this procedure to derive tractable algorithms for a series of agent-induced covariate shifts. We evaluate the proposed algorithms empirically on synthetic black-box optimization and active learning tasks.

[65]  arXiv:2405.06634 (cross-list from cs.CV) [pdf, other]
Title: Multimodal LLMs Struggle with Basic Visual Network Analysis: a VNA Benchmark
Comments: 11 pages, 3 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

We evaluate the zero-shot ability of GPT-4 and LLaVa to perform simple Visual Network Analysis (VNA) tasks on small-scale graphs. We evaluate the Vision Language Models (VLMs) on 5 tasks related to three foundational network science concepts: identifying nodes of maximal degree on a rendered graph, identifying whether signed triads are balanced or unbalanced, and counting components. The tasks are structured to be easy for a human who understands the underlying graph theoretic concepts, and can all be solved by counting the appropriate elements in graphs. We find that while GPT-4 consistently outperforms LLaVa, both models struggle with every visual network analysis task we propose. We publicly release the first benchmark for the evaluation of VLMs on foundational VNA tasks.

[66]  arXiv:2405.06636 (cross-list from cs.CV) [pdf, other]
Title: Federated Document Visual Question Answering: A Pilot Study
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)

An important handicap of document analysis research is that documents tend to be copyrighted or contain private information, which prohibits their open publication and the creation of centralised, large-scale document datasets. Instead, documents are scattered in private data silos, making extensive training over heterogeneous data a tedious task. In this work, we explore the use of a federated learning (FL) scheme as a way to train a shared model on decentralised private document data. We focus on the problem of Document VQA, a task particularly suited to this approach, as the type of reasoning capabilities required from the model can be quite different in diverse domains. Enabling training over heterogeneous document datasets can thus substantially enrich DocVQA models. We assemble existing DocVQA datasets from diverse domains to reflect the data heterogeneity in real-world applications. We explore the self-pretraining technique in this multi-modal setting, where the same data is used for both pretraining and finetuning, making it relevant for privacy preservation. We further propose combining self-pretraining with a Federated DocVQA training method using centralized adaptive optimization that outperforms the FedAvg baseline. With extensive experiments, we also present a multi-faceted analysis on training DocVQA models with FL, which provides insights for future research on this task. We show that our pretraining strategies can effectively learn and scale up under federated training with diverse DocVQA datasets and tuning hyperparameters is essential for practical document tasks under federation.

[67]  arXiv:2405.06639 (cross-list from cs.LG) [pdf, other]
Title: Value Augmented Sampling for Language Model Alignment and Personalization
Comments: Website: this https URL
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

Aligning Large Language Models (LLMs) to cater to different human preferences, learning new skills, and unlearning harmful behavior is an important problem. Search-based methods, such as Best-of-N or Monte-Carlo Tree Search, are performant, but impractical for LLM adaptation due to their high inference cost. On the other hand, using Reinforcement Learning (RL) for adaptation is computationally efficient, but performs worse due to the optimization challenges in co-training the value function and the policy. We present a new framework for reward optimization, Value Augmented Sampling (VAS), that can maximize different reward functions using data sampled from only the initial, frozen LLM. VAS solves for the optimal reward-maximizing policy without co-training the policy and the value function, making the optimization stable, outperforming established baselines, such as PPO and DPO, on standard benchmarks, and achieving comparable results to Best-of-128 with lower inference cost. Unlike existing RL methods that require changing the weights of the LLM, VAS does not require access to the weights of the pre-trained LLM. Thus, it can even adapt LLMs (e.g., ChatGPT), which are available only as APIs. In addition, our algorithm unlocks the new capability of composing several rewards and controlling the extent of each one during deployment time, paving the road ahead for the future of aligned, personalized LLMs.

Replacements for Mon, 13 May 24

[68]  arXiv:2208.09951 (replaced) [pdf, ps, other]
Title: Individual Fairness under Varied Notions of Group Fairness in Bipartite Matching - One Framework to Approximate Them All
Comments: 31 pages. To appear in IJCAI'24
Subjects: Artificial Intelligence (cs.AI); Data Structures and Algorithms (cs.DS)
[69]  arXiv:2404.14304 (replaced) [pdf, other]
Title: Explaining Arguments' Strength: Unveiling the Role of Attacks and Supports (Technical Report)
Comments: This paper has been accepted at IJCAI 2024 (the 33rd International Joint Conference on Artificial Intelligence)
Subjects: Artificial Intelligence (cs.AI)
[70]  arXiv:2205.09430 (replaced) [pdf, other]
Title: Action Conditioned Tactile Prediction: case study on slip prediction
Comments: Proceeding of Robotics: Science and Systems (RSS 2022)
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[71]  arXiv:2303.16045 (replaced) [pdf, other]
Title: An Optimal, Universal and Agnostic Decoding Method for Message Reconstruction, Bio and Technosignature Detection
Subjects: Information Theory (cs.IT); Artificial Intelligence (cs.AI)
[72]  arXiv:2306.00530 (replaced) [pdf, ps, other]
Title: CL-MRI: Self-Supervised Contrastive Learning to Improve the Accuracy of Undersampled MRI Reconstruction
Subjects: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[73]  arXiv:2308.01222 (replaced) [pdf, other]
Title: Calibration in Deep Learning: A Survey of the State-of-the-Art
Authors: Cheng Wang
Comments: 30 pages, Ongoing work
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[74]  arXiv:2309.01211 (replaced) [pdf, other]
Title: Physics-inspired Neural Networks for Parameter Learning of Adaptive Cruise Control Systems
Comments: 11 pages, 8 figures, 3 tables
Journal-ref: IEEE Transactions on Vehicular Technology (2024)
Subjects: Systems and Control (eess.SY); Artificial Intelligence (cs.AI)
[75]  arXiv:2309.12559 (replaced) [pdf, other]
Title: Invariant Learning via Probability of Sufficient and Necessary Causes
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[76]  arXiv:2309.13063 (replaced) [pdf, other]
Title: Using Large Language Models to Generate, Validate, and Apply User Intent Taxonomies
Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[77]  arXiv:2310.00752 (replaced) [pdf, other]
Title: TIGERScore: Towards Building Explainable Metric for All Text Generation Tasks
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[78]  arXiv:2310.12690 (replaced) [pdf, other]
Title: Neurosymbolic Grounding for Compositional World Models
Comments: Uploading ICLR,2024 Camera Ready Version
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
[79]  arXiv:2311.10319 (replaced) [pdf, other]
Title: Shifting to Machine Supervision: Annotation-Efficient Semi and Self-Supervised Learning for Automatic Medical Image Segmentation and Classification
Comments: Seventeen pages (incl. references), five figures, and one table. (Under Review)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[80]  arXiv:2311.11039 (replaced) [pdf, other]
Title: Synthetic Data Generation for Bridging Sim2Real Gap in a Production Environment
Comments: 17 pages, 9 figures, LaTeX; typos corrected; has not been presented in any conference or published in journal
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Robotics (cs.RO)
[81]  arXiv:2311.14688 (replaced) [pdf, other]
Title: Procedural Fairness Through Decoupling Objectionable Data Generating Components
Journal-ref: The 12th International Conference on Learning Representations (ICLR 2024)
Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[82]  arXiv:2312.07399 (replaced) [pdf, other]
Title: Large Language Models are Clinical Reasoners: Reasoning-Aware Diagnosis Framework with Prompt-Generated Rationales
Comments: Accepted to AAAI 2024
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[83]  arXiv:2401.04122 (replaced) [pdf, other]
Title: From Prompt Engineering to Prompt Science With Human in the Loop
Authors: Chirag Shah
Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI)
[84]  arXiv:2401.09556 (replaced) [pdf, ps, other]
Title: Deep learning enhanced mixed integer optimization: Learning to reduce model dimensionality
Subjects: Optimization and Control (math.OC); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[85]  arXiv:2401.15108 (replaced) [pdf, other]
Title: Tacit algorithmic collusion in deep reinforcement learning guided price competition: A study using EV charge pricing game
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); General Economics (econ.GN); Systems and Control (eess.SY)
[86]  arXiv:2402.01002 (replaced) [pdf, other]
Title: AI-generated faces influence gender stereotypes and racial homogenization
Comments: 47 pages, 19 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[87]  arXiv:2402.01446 (replaced) [pdf, other]
Title: Guidance Graph Optimization for Lifelong Multi-Agent Path Finding
Comments: Accepted to International Joint Conference on Artificial Intelligence (IJCAI), 2024
Subjects: Multiagent Systems (cs.MA); Artificial Intelligence (cs.AI); Robotics (cs.RO)
[88]  arXiv:2402.15347 (replaced) [pdf, other]
Title: Information-Theoretic Safe Bayesian Optimization
Comments: arXiv admin note: text overlap with arXiv:2212.04914
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
[89]  arXiv:2402.17045 (replaced) [pdf, other]
Title: An Investigation into the Performances of the State-of-the-art Machine Learning Approaches for Various Cyber-attack Detection: A Survey
Comments: 10
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[90]  arXiv:2402.17531 (replaced) [pdf, other]
Title: Nissist: An Incident Mitigation Copilot based on Troubleshooting Guides
Comments: Work in progress
Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[91]  arXiv:2403.10698 (replaced) [pdf, other]
Title: Robust Influence-based Training Methods for Noisy Brain MRI
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[92]  arXiv:2403.11894 (replaced) [pdf, other]
Title: From Explainable to Interpretable Deep Learning for Natural Language Processing in Healthcare: How Far from Reality?
Comments: This paper has been accepted by Computational and Structural Biotechnology Journal
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[93]  arXiv:2403.19669 (replaced) [pdf, other]
Title: Analyzing the Roles of Language and Vision in Learning from Limited Data
Comments: 8 pages, 4 figures
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
[94]  arXiv:2404.12744 (replaced) [pdf, other]
Title: Beyond Human Norms: Unveiling Unique Values of Large Language Models through Interdisciplinary Approaches
Comments: 16 pages, work in progress
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[95]  arXiv:2404.14851 (replaced) [pdf, other]
Title: From Matching to Generation: A Survey on Generative Information Retrieval
Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[96]  arXiv:2404.15518 (replaced) [pdf, other]
Title: An MRP Formulation for Supervised Learning: Generalized Temporal Difference Learning Models
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[97]  arXiv:2404.18534 (replaced) [pdf, other]
Title: Evaluating and Mitigating Linguistic Discrimination in Large Language Models
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Software Engineering (cs.SE)
[98]  arXiv:2405.00704 (replaced) [pdf, ps, other]
Title: A Survey on the Real Power of ChatGPT
Comments: 18 pages, 2 tables
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[99]  arXiv:2405.04532 (replaced) [pdf, other]
Title: QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving
Comments: The first three authors contribute equally to this project and are listed in the alphabetical order. Yujun Lin leads the quantization algorithm, Haotian Tang and Shang Yang lead the GPU kernels and the serving system. Code is available at this https URL
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Performance (cs.PF)
[100]  arXiv:2405.04620 (replaced) [pdf, ps, other]
Title: Folded context condensation in Path Integral formalism for infinite context transformers
Comments: 7 pages, 2 figures
Subjects: High Energy Physics - Phenomenology (hep-ph); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
[101]  arXiv:2405.04758 (replaced) [pdf, other]
Title: Honeyfile Camouflage: Hiding Fake Files in Plain Sight
Comments: 3rd Workshop on the security implications of Deepfakes and Cheapfakes (WDC) co-located at ACM ASIACCS 2024
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[102]  arXiv:2405.04883 (replaced) [pdf, other]
Title: FreeBind: Free Lunch in Unified Multimodal Space via Knowledge Fusion
Comments: Accepted by ICML 2024. The code and checkpoints will be released at this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[103]  arXiv:2405.05858 (replaced) [pdf, other]
Title: Free-Moving Object Reconstruction and Pose Estimation with Virtual Camera
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR); Robotics (cs.RO)
[ total of 103 entries: 1-103 ]
[ showing up to 2000 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, recent, 2405, contact, help  (Access key information)