We gratefully acknowledge support from
the Simons Foundation and member institutions.

Machine Learning

New submissions

[ total of 211 entries: 1-211 ]
[ showing up to 2000 entries per page: fewer | more ]

New submissions for Wed, 24 Apr 24

[1]  arXiv:2404.14433 [pdf, other]
Title: KATO: Knowledge Alignment and Transfer for Transistor Sizing of Different Design and Technology
Comments: 6 pages, received by DAC2024
Subjects: Machine Learning (cs.LG); Computational Engineering, Finance, and Science (cs.CE)

Automatic transistor sizing in circuit design continues to be a formidable challenge. Despite that Bayesian optimization (BO) has achieved significant success, it is circuit-specific, limiting the accumulation and transfer of design knowledge for broader applications. This paper proposes (1) efficient automatic kernel construction, (2) the first transfer learning across different circuits and technology nodes for BO, and (3) a selective transfer learning scheme to ensure only useful knowledge is utilized. These three novel components are integrated into BO with Multi-objective Acquisition Ensemble (MACE) to form Knowledge Alignment and Transfer Optimization (KATO) to deliver state-of-the-art performance: up to 2x simulation reduction and 1.2x design improvement over the baselines.

[2]  arXiv:2404.14436 [pdf, other]
Title: Investigating Resource-efficient Neutron/Gamma Classification ML Models Targeting eFPGAs
Subjects: Machine Learning (cs.LG); High Energy Physics - Experiment (hep-ex); Nuclear Experiment (nucl-ex); Instrumentation and Detectors (physics.ins-det)

There has been considerable interest and resulting progress in implementing machine learning (ML) models in hardware over the last several years from the particle and nuclear physics communities. A big driver has been the release of the Python package, hls4ml, which has enabled porting models specified and trained using Python ML libraries to register transfer level (RTL) code. So far, the primary end targets have been commercial FPGAs or synthesized custom blocks on ASICs. However, recent developments in open-source embedded FPGA (eFPGA) frameworks now provide an alternate, more flexible pathway for implementing ML models in hardware. These customized eFPGA fabrics can be integrated as part of an overall chip design. In general, the decision between a fully custom, eFPGA, or commercial FPGA ML implementation will depend on the details of the end-use application. In this work, we explored the parameter space for eFPGA implementations of fully-connected neural network (fcNN) and boosted decision tree (BDT) models using the task of neutron/gamma classification with a specific focus on resource efficiency. We used data collected using an AmBe sealed source incident on Stilbene, which was optically coupled to an OnSemi J-series SiPM to generate training and test data for this study. We investigated relevant input features and the effects of bit-resolution and sampling rate as well as trade-offs in hyperparameters for both ML architectures while tracking total resource usage. The performance metric used to track model performance was the calculated neutron efficiency at a gamma leakage of 10$^{-3}$. The results of the study will be used to aid the specification of an eFPGA fabric, which will be integrated as part of a test chip.

[3]  arXiv:2404.14442 [pdf, ps, other]
Title: Unified ODE Analysis of Smooth Q-Learning Algorithms
Authors: Donghwan Lee
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Convergence of Q-learning has been the focus of extensive research over the past several decades. Recently, an asymptotic convergence analysis for Q-learning was introduced using a switching system framework. This approach applies the so-called ordinary differential equation (ODE) approach to prove the convergence of the asynchronous Q-learning modeled as a continuous-time switching system, where notions from switching system theory are used to prove its asymptotic stability without using explicit Lyapunov arguments. However, to prove stability, restrictive conditions, such as quasi-monotonicity, must be satisfied for the underlying switching systems, which makes it hard to easily generalize the analysis method to other reinforcement learning algorithms, such as the smooth Q-learning variants. In this paper, we present a more general and unified convergence analysis that improves upon the switching system approach and can analyze Q-learning and its smooth variants. The proposed analysis is motivated by previous work on the convergence of synchronous Q-learning based on $p$-norm serving as a Lyapunov function. However, the proposed analysis addresses more general ODE models that can cover both asynchronous Q-learning and its smooth versions with simpler frameworks.

[4]  arXiv:2404.14444 [pdf, other]
Title: Practical Battery Health Monitoring using Uncertainty-Aware Bayesian Neural Network
Comments: 6 pages
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Emerging Technologies (cs.ET)

Battery health monitoring and prediction are critically important in the era of electric mobility with a huge impact on safety, sustainability, and economic aspects. Existing research often focuses on prediction accuracy but tends to neglect practical factors that may hinder the technology's deployment in real-world applications. In this paper, we address these practical considerations and develop models based on the Bayesian neural network for predicting battery end-of-life. Our models use sensor data related to battery health and apply distributions, rather than single-point, for each parameter of the models. This allows the models to capture the inherent randomness and uncertainty of battery health, which leads to not only accurate predictions but also quantifiable uncertainty. We conducted an experimental study and demonstrated the effectiveness of our proposed models, with a prediction error rate averaging 13.9%, and as low as 2.9% for certain tested batteries. Additionally, all predictions include quantifiable certainty, which improved by 66% from the initial to the mid-life stage of the battery. This research has practical values for battery technologies and contributes to accelerating the technology adoption in the industry.

[5]  arXiv:2404.14445 [pdf, other]
Title: A Multi-Faceted Evaluation Framework for Assessing Synthetic Data Generated by Large Language Models
Comments: 10 pages, 1 figure, 4 tables
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

The rapid advancements in generative AI and large language models (LLMs) have opened up new avenues for producing synthetic data, particularly in the realm of structured tabular formats, such as product reviews. Despite the potential benefits, concerns regarding privacy leakage have surfaced, especially when personal information is utilized in the training datasets. In addition, there is an absence of a comprehensive evaluation framework capable of quantitatively measuring the quality of the generated synthetic data and their utility for downstream tasks. In response to this gap, we introduce SynEval, an open-source evaluation framework designed to assess the fidelity, utility, and privacy preservation of synthetically generated tabular data via a suite of diverse evaluation metrics. We validate the efficacy of our proposed framework - SynEval - by applying it to synthetic product review data generated by three state-of-the-art LLMs: ChatGPT, Claude, and Llama. Our experimental findings illuminate the trade-offs between various evaluation metrics in the context of synthetic data generation. Furthermore, SynEval stands as a critical instrument for researchers and practitioners engaged with synthetic tabular data,, empowering them to judiciously determine the suitability of the generated data for their specific applications, with an emphasis on upholding user privacy.

[6]  arXiv:2404.14447 [pdf, ps, other]
Title: A Novel A.I Enhanced Reservoir Characterization with a Combined Mixture of Experts -- NVIDIA Modulus based Physics Informed Neural Operator Forward Model
Comments: 55 pages, 46 figures
Subjects: Machine Learning (cs.LG)

We have developed an advanced workflow for reservoir characterization, effectively addressing the challenges of reservoir history matching through a novel approach. This method integrates a Physics Informed Neural Operator (PINO) as a forward model within a sophisticated Cluster Classify Regress (CCR) framework. The process is enhanced by an adaptive Regularized Ensemble Kalman Inversion (aREKI), optimized for rapid uncertainty quantification in reservoir history matching. This innovative workflow parameterizes unknown permeability and porosity fields, capturing non-Gaussian posterior measures with techniques such as a variational convolution autoencoder and the CCR. Serving as exotic priors and a supervised model, the CCR synergizes with the PINO surrogate to accurately simulate the nonlinear dynamics of Peaceman well equations. The CCR approach allows for flexibility in applying distinct machine learning algorithms across its stages. Updates to the PINO reservoir surrogate are driven by a loss function derived from supervised data, initial conditions, and residuals of governing black oil PDEs. Our integrated model, termed PINO-Res-Sim, outputs crucial parameters including pressures, saturations, and production rates for oil, water, and gas. Validated against traditional simulators through controlled experiments on synthetic reservoirs and the Norne field, the methodology showed remarkable accuracy. Additionally, the PINO-Res-Sim in the aREKI workflow efficiently recovered unknown fields with a computational speedup of 100 to 6000 times faster than conventional methods. The learning phase for PINO-Res-Sim, conducted on an NVIDIA H100, was impressively efficient, compatible with ensemble-based methods for complex computational tasks.

[7]  arXiv:2404.14451 [pdf, other]
Title: Generative Subspace Adversarial Active Learning for Outlier Detection in Multiple Views of High-dimensional Data
Comments: 16 pages, Pre-print
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Outlier detection in high-dimensional tabular data is an important task in data mining, essential for many downstream tasks and applications. Existing unsupervised outlier detection algorithms face one or more problems, including inlier assumption (IA), curse of dimensionality (CD), and multiple views (MV). To address these issues, we introduce Generative Subspace Adversarial Active Learning (GSAAL), a novel approach that uses a Generative Adversarial Network with multiple adversaries. These adversaries learn the marginal class probability functions over different data subspaces, while a single generator in the full space models the entire distribution of the inlier class. GSAAL is specifically designed to address the MV limitation while also handling the IA and CD, being the only method to do so. We provide a comprehensive mathematical formulation of MV, convergence guarantees for the discriminators, and scalability results for GSAAL. Our extensive experiments demonstrate the effectiveness and scalability of GSAAL, highlighting its superior performance compared to other popular OD methods, especially in MV scenarios.

[8]  arXiv:2404.14455 [pdf, other]
Title: A Neuro-Symbolic Explainer for Rare Events: A Case Study on Predictive Maintenance
Comments: 26 pages
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Predictive Maintenance applications are increasingly complex, with interactions between many components. Black box models are popular approaches based on deep learning techniques due to their predictive accuracy. This paper proposes a neural-symbolic architecture that uses an online rule-learning algorithm to explain when the black box model predicts failures. The proposed system solves two problems in parallel: anomaly detection and explanation of the anomaly. For the first problem, we use an unsupervised state of the art autoencoder. For the second problem, we train a rule learning system that learns a mapping from the input features to the autoencoder reconstruction error. Both systems run online and in parallel. The autoencoder signals an alarm for the examples with a reconstruction error that exceeds a threshold. The causes of the signal alarm are hard for humans to understand because they result from a non linear combination of sensor data. The rule that triggers that example describes the relationship between the input features and the autoencoder reconstruction error. The rule explains the failure signal by indicating which sensors contribute to the alarm and allowing the identification of the component involved in the failure. The system can present global explanations for the black box model and local explanations for why the black box model predicts a failure. We evaluate the proposed system in a real-world case study of Metro do Porto and provide explanations that illustrate its benefits.

[9]  arXiv:2404.14456 [pdf, ps, other]
Title: Multifidelity Surrogate Models: A New Data Fusion Perspective
Authors: Daniel N Wilke
Comments: 8 pages, 4 figures, SACAM2024 Conference, 22-23 January 2024
Subjects: Machine Learning (cs.LG); Numerical Analysis (math.NA)

Multifidelity surrogate modelling combines data of varying accuracy and cost from different sources. It strategically uses low-fidelity models for rapid evaluations, saving computational resources, and high-fidelity models for detailed refinement. It improves decision-making by addressing uncertainties and surpassing the limits of single-fidelity models, which either oversimplify or are computationally intensive. Blending high-fidelity data for detailed responses with frequent low-fidelity data for quick approximations facilitates design optimisation in various domains.
Despite progress in interpolation, regression, enhanced sampling, error estimation, variable fidelity, and data fusion techniques, challenges persist in selecting fidelity levels and developing efficient data fusion methods. This study proposes a new fusion approach to construct multi-fidelity surrogate models by constructing gradient-only surrogates that use only gradients to construct regression surfaces. Results are demonstrated on foundational example problems that isolate and illustrate the fusion approach's efficacy, avoiding the need for complex examples that obfuscate the main concept.

[10]  arXiv:2404.14457 [pdf, ps, other]
Title: Graph Coloring Using Heat Diffusion
Authors: Vivek Chaudhary
Comments: 5 Pages, 3 Figures
Subjects: Machine Learning (cs.LG)

Graph coloring is a problem with varied applications in industry and science such as scheduling, resource allocation, and circuit design. The purpose of this paper is to establish if a new gradient based iterative solver framework known as heat diffusion can solve the graph coloring problem. We propose a solution to the graph coloring problem using the heat diffusion framework. We compare the solutions against popular methods and establish the competitiveness of heat diffusion method for the graph coloring problem.

[11]  arXiv:2404.14462 [pdf, other]
Title: Towards smaller, faster decoder-only transformers: Architectural variants and their implications
Comments: 8 pages, 6 figures
Subjects: Machine Learning (cs.LG)

Research on Large Language Models (LLMs) has recently seen exponential growth, largely focused on transformer-based architectures, as introduced by [1] and further advanced by the decoder-only variations in [2]. Contemporary studies typically aim to improve model capabilities by increasing both the architecture's complexity and the volume of training data. However, research exploring how to reduce model sizes while maintaining performance is limited. This study introduces three modifications to the decoder-only transformer architecture: ParallelGPT (p-gpt), LinearlyCompressedGPT (lc-gpt), and ConvCompressedGPT (cc-gpt). These variants achieve comparable performance to conventional architectures in code generation tasks while benefiting from reduced model sizes and faster training times. We open-source the model weights and codebase to support future research and development in this domain.

[12]  arXiv:2404.14523 [pdf, ps, other]
Title: Edge-Assisted ML-Aided Uncertainty-Aware Vehicle Collision Avoidance at Urban Intersections
Comments: Accepted in IEEE Transactions on Intelligent Vehicles
Subjects: Machine Learning (cs.LG)

Intersection crossing represents one of the most dangerous sections of the road infrastructure and Connected Vehicles (CVs) can serve as a revolutionary solution to the problem. In this work, we present a novel framework that detects preemptively collisions at urban crossroads, exploiting the Multi-access Edge Computing (MEC) platform of 5G networks. At the MEC, an Intersection Manager (IM) collects information from both vehicles and the road infrastructure to create a holistic view of the area of interest. Based on the historical data collected, the IM leverages the capabilities of an encoder-decoder recurrent neural network to predict, with high accuracy, the future vehicles' trajectories. As, however, accuracy is not a sufficient measure of how much we can trust a model, trajectory predictions are additionally associated with a measure of uncertainty towards confident collision forecasting and avoidance. Hence, contrary to any other approach in the state of the art, an uncertainty-aware collision prediction framework is developed that is shown to detect well in advance (and with high reliability) if two vehicles are on a collision course. Subsequently, collision detection triggers a number of alarms that signal the colliding vehicles to brake. Under real-world settings, thanks to the preemptive capabilities of the proposed approach, all the simulated imminent dangers are averted.

[13]  arXiv:2404.14552 [pdf, other]
Title: Generalizing Multi-Step Inverse Models for Representation Learning to Finite-Memory POMDPs
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Discovering an informative, or agent-centric, state representation that encodes only the relevant information while discarding the irrelevant is a key challenge towards scaling reinforcement learning algorithms and efficiently applying them to downstream tasks. Prior works studied this problem in high-dimensional Markovian environments, when the current observation may be a complex object but is sufficient to decode the informative state. In this work, we consider the problem of discovering the agent-centric state in the more challenging high-dimensional non-Markovian setting, when the state can be decoded from a sequence of past observations. We establish that generalized inverse models can be adapted for learning agent-centric state representation for this task. Our results include asymptotic theory in the deterministic dynamics setting as well as counter-examples for alternative intuitive algorithms. We complement these findings with a thorough empirical study on the agent-centric state discovery abilities of the different alternatives we put forward. Particularly notable is our analysis of past actions, where we show that these can be a double-edged sword: making the algorithms more successful when used correctly and causing dramatic failure when used incorrectly.

[14]  arXiv:2404.14588 [pdf, ps, other]
Title: Brain-Inspired Continual Learning-Robust Feature Distillation and Re-Consolidation for Class Incremental Learning
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

Artificial intelligence (AI) and neuroscience share a rich history, with advancements in neuroscience shaping the development of AI systems capable of human-like knowledge retention. Leveraging insights from neuroscience and existing research in adversarial and continual learning, we introduce a novel framework comprising two core concepts: feature distillation and re-consolidation. Our framework, named Robust Rehearsal, addresses the challenge of catastrophic forgetting inherent in continual learning (CL) systems by distilling and rehearsing robust features. Inspired by the mammalian brain's memory consolidation process, Robust Rehearsal aims to emulate the rehearsal of distilled experiences during learning tasks. Additionally, it mimics memory re-consolidation, where new experiences influence the integration of past experiences to mitigate forgetting. Extensive experiments conducted on CIFAR10, CIFAR100, and real-world helicopter attitude datasets showcase the superior performance of CL models trained with Robust Rehearsal compared to baseline methods. Furthermore, examining different optimization training objectives-joint, continual, and adversarial learning-we highlight the crucial role of feature learning in model performance. This underscores the significance of rehearsing CL-robust samples in mitigating catastrophic forgetting. In conclusion, aligning CL approaches with neuroscience insights offers promising solutions to the challenge of catastrophic forgetting, paving the way for more robust and human-like AI systems.

[15]  arXiv:2404.14618 [pdf, other]
Title: Hybrid LLM: Cost-Efficient and Quality-Aware Query Routing
Comments: Accepted to ICLR 2024 (main conference)
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

Large language models (LLMs) excel in most NLP tasks but also require expensive cloud servers for deployment due to their size, while smaller models that can be deployed on lower cost (e.g., edge) devices, tend to lag behind in terms of response quality. Therefore in this work we propose a hybrid inference approach which combines their respective strengths to save cost and maintain quality. Our approach uses a router that assigns queries to the small or large model based on the predicted query difficulty and the desired quality level. The desired quality level can be tuned dynamically at test time to seamlessly trade quality for cost as per the scenario requirements. In experiments our approach allows us to make up to 40% fewer calls to the large model, with no drop in response quality.

[16]  arXiv:2404.14620 [pdf, other]
Title: Fairness Incentives in Response to Unfair Dynamic Pricing
Subjects: Machine Learning (cs.LG); Computers and Society (cs.CY)

The use of dynamic pricing by profit-maximizing firms gives rise to demand fairness concerns, measured by discrepancies in consumer groups' demand responses to a given pricing strategy. Notably, dynamic pricing may result in buyer distributions unreflective of those of the underlying population, which can be problematic in markets where fair representation is socially desirable. To address this, policy makers might leverage tools such as taxation and subsidy to adapt policy mechanisms dependent upon their social objective. In this paper, we explore the potential for AI methods to assist such intervention strategies. To this end, we design a basic simulated economy, wherein we introduce a dynamic social planner (SP) to generate corporate taxation schedules geared to incentivizing firms towards adopting fair pricing behaviours, and to use the collected tax budget to subsidize consumption among underrepresented groups. To cover a range of possible policy scenarios, we formulate our social planner's learning problem as a multi-armed bandit, a contextual bandit and finally as a full reinforcement learning (RL) problem, evaluating welfare outcomes from each case. To alleviate the difficulty in retaining meaningful tax rates that apply to less frequently occurring brackets, we introduce FairReplayBuffer, which ensures that our RL agent samples experiences uniformly across a discretized fairness space. We find that, upon deploying a learned tax and redistribution policy, social welfare improves on that of the fairness-agnostic baseline, and approaches that of the analytically optimal fairness-aware baseline for the multi-armed and contextual bandit settings, and surpassing it by 13.19% in the full RL setting.

[17]  arXiv:2404.14635 [pdf, ps, other]
Title: Digital Twins for forecasting and decision optimisation with machine learning: applications in wastewater treatment
Comments: A bit thin, but an interesting application of ML methods for decision making
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Prediction and optimisation are two widely used techniques that have found many applications in solving real-world problems. While prediction is concerned with estimating the unknown future values of a variable, optimisation is concerned with optimising the decision given all the available data. These methods are used together to solve problems for sequential decision-making where often we need to predict the future values of variables and then use them for determining the optimal decisions. This paradigm is known as forecast and optimise and has numerous applications, e.g., forecast demand for a product and then optimise inventory, forecast energy demand and schedule generations, forecast demand for a service and schedule staff, to name a few. In this extended abstract, we review a digital twin that was developed and applied in wastewater treatment in Urban Utility to improve their operational efficiency. While the current study is tailored to the case study problem, the underlying principles can be used to solve similar problems in other domains.

[18]  arXiv:2404.14642 [pdf, other]
Title: Uncertainty Quantification on Graph Learning: A Survey
Subjects: Machine Learning (cs.LG)

Graphical models, including Graph Neural Networks (GNNs) and Probabilistic Graphical Models (PGMs), have demonstrated their exceptional capabilities across numerous fields. These models necessitate effective uncertainty quantification to ensure reliable decision-making amid the challenges posed by model training discrepancies and unpredictable testing scenarios. This survey examines recent works that address uncertainty quantification within the model architectures, training, and inference of GNNs and PGMs. We aim to provide an overview of the current landscape of uncertainty in graphical models by organizing the recent methods into uncertainty representation and handling. By summarizing state-of-the-art methods, this survey seeks to deepen the understanding of uncertainty quantification in graphical models, thereby increasing their effectiveness and safety in critical applications.

[19]  arXiv:2404.14662 [pdf, other]
Title: NExT: Teaching Large Language Models to Reason about Code Execution
Comments: 35 pages
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Programming Languages (cs.PL); Software Engineering (cs.SE)

A fundamental skill among human developers is the ability to understand and reason about program execution. As an example, a programmer can mentally simulate code execution in natural language to debug and repair code (aka. rubber duck debugging). However, large language models (LLMs) of code are typically trained on the surface textual form of programs, thus may lack a semantic understanding of how programs execute at run-time. To address this issue, we propose NExT, a method to teach LLMs to inspect the execution traces of programs (variable states of executed lines) and reason about their run-time behavior through chain-of-thought (CoT) rationales. Specifically, NExT uses self-training to bootstrap a synthetic training set of execution-aware rationales that lead to correct task solutions (e.g., fixed programs) without laborious manual annotation. Experiments on program repair tasks based on MBPP and HumanEval demonstrate that NExT improves the fix rate of a PaLM 2 model, by 26.1% and 14.3% absolute, respectively, with significantly improved rationale quality as verified by automated metrics and human raters. Our model can also generalize to scenarios where program traces are absent at test-time.

[20]  arXiv:2404.14664 [pdf, ps, other]
Title: Employing Layerwised Unsupervised Learning to Lessen Data and Loss Requirements in Forward-Forward Algorithms
Comments: 8 pages, 6 figures
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Recent deep learning models such as ChatGPT utilizing the back-propagation algorithm have exhibited remarkable performance. However, the disparity between the biological brain processes and the back-propagation algorithm has been noted. The Forward-Forward algorithm, which trains deep learning models solely through the forward pass, has emerged to address this. Although the Forward-Forward algorithm cannot replace back-propagation due to limitations such as having to use special input and loss functions, it has the potential to be useful in special situations where back-propagation is difficult to use. To work around this limitation and verify usability, we propose an Unsupervised Forward-Forward algorithm. Using an unsupervised learning model enables training with usual loss functions and inputs without restriction. Through this approach, we lead to stable learning and enable versatile utilization across various datasets and tasks. From a usability perspective, given the characteristics of the Forward-Forward algorithm and the advantages of the proposed method, we anticipate its practical application even in scenarios such as federated learning, where deep learning layers need to be trained separately in physically distributed environments.

[21]  arXiv:2404.14674 [pdf, other]
Title: HOIN: High-Order Implicit Neural Representations
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)

Implicit neural representations (INR) suffer from worsening spectral bias, which results in overly smooth solutions to the inverse problem. To deal with this problem, we propose a universal framework for processing inverse problems called \textbf{High-Order Implicit Neural Representations (HOIN)}. By refining the traditional cascade structure to foster high-order interactions among features, HOIN enhances the model's expressive power and mitigates spectral bias through its neural tangent kernel's (NTK) strong diagonal properties, accelerating and optimizing inverse problem resolution. By analyzing the model's expression space, high-order derivatives, and the NTK matrix, we theoretically validate the feasibility of HOIN. HOIN realizes 1 to 3 dB improvements in most inverse problems, establishing a new state-of-the-art recovery quality and training efficiency, thus providing a new general paradigm for INR and paving the way for it to solve the inverse problem.

[22]  arXiv:2404.14688 [pdf, other]
Title: FMint: Bridging Human Designed and Data Pretrained Models for Differential Equation Foundation Model
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computational Engineering, Finance, and Science (cs.CE); Dynamical Systems (math.DS); Numerical Analysis (math.NA)

Human-designed algorithms have long been fundamental in solving a variety of scientific and engineering challenges. Recently, data-driven deep learning methods have also risen to prominence, offering innovative solutions across numerous scientific fields. While traditional algorithms excel in capturing the core aspects of specific problems, they often lack the flexibility needed for varying problem conditions due to the absence of specific data. Conversely, while data-driven approaches utilize vast datasets, they frequently fall short in domain-specific knowledge. To bridge these gaps, we introduce \textbf{FMint} (Foundation Model based on Initialization), a generative pre-trained model that synergizes the precision of human-designed algorithms with the adaptability of data-driven methods. This model is specifically engineered for high-accuracy simulation of dynamical systems. Starting from initial trajectories provided by conventional methods, FMint quickly delivers highly accurate solutions. It incorporates in-context learning and has been pre-trained on a diverse corpus of 500,000 dynamical systems, showcasing exceptional generalization across a broad spectrum of real-world applications. By effectively combining algorithmic rigor with data-driven flexibility, FMint sets the stage for the next generation of scientific foundation models, tackling complex problems with both efficiency and high accuracy.

[23]  arXiv:2404.14689 [pdf, other]
Title: Interpretable Prediction and Feature Selection for Survival Analysis
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Survival analysis is widely used as a technique to model time-to-event data when some data is censored, particularly in healthcare for predicting future patient risk. In such settings, survival models must be both accurate and interpretable so that users (such as doctors) can trust the model and understand model predictions. While most literature focuses on discrimination, interpretability is equally as important. A successful interpretable model should be able to describe how changing each feature impacts the outcome, and should only use a small number of features. In this paper, we present DyS (pronounced ``dice''), a new survival analysis model that achieves both strong discrimination and interpretability. DyS is a feature-sparse Generalized Additive Model, combining feature selection and interpretable prediction into one model. While DyS works well for all survival analysis problems, it is particularly useful for large (in $n$ and $p$) survival datasets such as those commonly found in observational healthcare studies. Empirical studies show that DyS competes with other state-of-the-art machine learning models for survival analysis, while being highly interpretable.

[24]  arXiv:2404.14701 [pdf, other]
Title: Deep neural networks for choice analysis: Enhancing behavioral regularity with gradient regularization
Subjects: Machine Learning (cs.LG)

Deep neural networks (DNNs) frequently present behaviorally irregular patterns, significantly limiting their practical potentials and theoretical validity in travel behavior modeling. This study proposes strong and weak behavioral regularities as novel metrics to evaluate the monotonicity of individual demand functions (a.k.a. law of demand), and further designs a constrained optimization framework with six gradient regularizers to enhance DNNs' behavioral regularity. The proposed framework is applied to travel survey data from Chicago and London to examine the trade-off between predictive power and behavioral regularity for large vs. small sample scenarios and in-domain vs. out-of-domain generalizations. The results demonstrate that, unlike models with strong behavioral foundations such as the multinomial logit, the benchmark DNNs cannot guarantee behavioral regularity. However, gradient regularization (GR) increases DNNs' behavioral regularity by around 6 percentage points (pp) while retaining their relatively high predictive power. In the small sample scenario, GR is more effective than in the large sample scenario, simultaneously improving behavioral regularity by about 20 pp and log-likelihood by around 1.7%. Comparing with the in-domain generalization of DNNs, GR works more effectively in out-of-domain generalization: it drastically improves the behavioral regularity of poorly performing benchmark DNNs by around 65 pp, indicating the criticality of behavioral regularization for enhancing model transferability and application in forecasting. Moreover, the proposed framework is applicable to other NN-based choice models such as TasteNets. Future studies could use behavioral regularity as a metric along with log-likelihood in evaluating travel demand models, and investigate other methods to further enhance behavioral regularity when adopting complex machine learning models.

[25]  arXiv:2404.14721 [pdf, other]
Title: Dynamically Anchored Prompting for Task-Imbalanced Continual Learning
Comments: Accepted by IJCAI 2024
Subjects: Machine Learning (cs.LG)

Existing continual learning literature relies heavily on a strong assumption that tasks arrive with a balanced data stream, which is often unrealistic in real-world applications. In this work, we explore task-imbalanced continual learning (TICL) scenarios where the distribution of task data is non-uniform across the whole learning process. We find that imbalanced tasks significantly challenge the capability of models to control the trade-off between stability and plasticity from the perspective of recent prompt-based continual learning methods. On top of the above finding, we propose Dynamically Anchored Prompting (DAP), a prompt-based method that only maintains a single general prompt to adapt to the shifts within a task stream dynamically. This general prompt is regularized in the prompt space with two specifically designed prompt anchors, called boosting anchor and stabilizing anchor, to balance stability and plasticity in TICL. Remarkably, DAP achieves this balance by only storing a prompt across the data stream, therefore offering a substantial advantage in rehearsal-free CL. Extensive experiments demonstrate that the proposed DAP results in 4.5% to 15% absolute improvements over state-of-the-art methods on benchmarks under task-imbalanced settings. Our code is available at https://github.com/chenxing6666/DAP

[26]  arXiv:2404.14728 [pdf, ps, other]
Title: Novel Topological Machine Learning Methodology for Stream-of-Quality Modeling in Smart Manufacturing
Comments: The paper has been submitted to Manufacturing Letters (Under Review)
Subjects: Machine Learning (cs.LG); Computers and Society (cs.CY)

This paper presents a topological analytics approach within the 5-level Cyber-Physical Systems (CPS) architecture for the Stream-of-Quality assessment in smart manufacturing. The proposed methodology not only enables real-time quality monitoring and predictive analytics but also discovers the hidden relationships between quality features and process parameters across different manufacturing processes. A case study in additive manufacturing was used to demonstrate the feasibility of the proposed methodology to maintain high product quality and adapt to product quality variations. This paper demonstrates how topological graph visualization can be effectively used for the real-time identification of new representative data through the Stream-of-Quality assessment.

[27]  arXiv:2404.14746 [pdf, ps, other]
Title: A Customer Level Fraudulent Activity Detection Benchmark for Enhancing Machine Learning Model Research and Evaluation
Comments: 12 pages, 3 figures, 1 table
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)

In the field of fraud detection, the availability of comprehensive and privacy-compliant datasets is crucial for advancing machine learning research and developing effective anti-fraud systems. Traditional datasets often focus on transaction-level information, which, while useful, overlooks the broader context of customer behavior patterns that are essential for detecting sophisticated fraud schemes. The scarcity of such data, primarily due to privacy concerns, significantly hampers the development and testing of predictive models that can operate effectively at the customer level. Addressing this gap, our study introduces a benchmark that contains structured datasets specifically designed for customer-level fraud detection. The benchmark not only adheres to strict privacy guidelines to ensure user confidentiality but also provides a rich source of information by encapsulating customer-centric features. We have developed the benchmark that allows for the comprehensive evaluation of various machine learning models, facilitating a deeper understanding of their strengths and weaknesses in predicting fraudulent activities. Through this work, we seek to bridge the existing gap in data availability, offering researchers and practitioners a valuable resource that empowers the development of next-generation fraud detection techniques.

[28]  arXiv:2404.14749 [pdf, ps, other]
Title: Semantic Cells: Evolutional Process to Acquire Sense Diversity of Items
Comments: 18 pages, 3 figures, 1 table
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)

Previous models for learning the semantic vectors of items and their groups, such as words, sentences, nodes, and graphs, using distributed representation have been based on the assumption that an item corresponds to one vector composed of dimensions corresponding to hidden contexts in the target. Multiple senses of an item are represented by assigning a vector to each of the domains where the item may appear or reflecting the context to the sense of the item. However, there may be multiple distinct senses of an item that change or evolve dynamically, according to the contextual shift or the emergence of novel contexts even within one domain, similar to a living entity evolving with environmental shifts. Setting the scope of disambiguity of items for sensemaking, the author presents a method in which a word or item in the data embraces multiple semantic vectors that evolve via interaction with others, similar to a cell embracing chromosomes crossing over with each other. We obtained two preliminary results: (1) the role of a word that evolves to acquire the largest or lower-middle variance of semantic vectors tends to be explainable by the author of the text; (2) the epicenters of earthquakes that acquire larger variance via crossover, corresponding to the interaction with diverse areas of land crust, are likely to correspond to the epicenters of forthcoming large earthquakes.

[29]  arXiv:2404.14754 [pdf, other]
Title: Skip the Benchmark: Generating System-Level High-Level Synthesis Data using Generative Machine Learning
Comments: Accepted at Great Lakes Symposium on VLSI 2024 (GLSVLSI 24)
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Hardware Architecture (cs.AR)

High-Level Synthesis (HLS) Design Space Exploration (DSE) is a widely accepted approach for efficiently exploring Pareto-optimal and optimal hardware solutions during the HLS process. Several HLS benchmarks and datasets are available for the research community to evaluate their methodologies. Unfortunately, these resources are limited and may not be sufficient for complex, multi-component system-level explorations. Generating new data using existing HLS benchmarks can be cumbersome, given the expertise and time required to effectively generate data for different HLS designs and directives. As a result, synthetic data has been used in prior work to evaluate system-level HLS DSE. However, the fidelity of the synthetic data to real data is often unclear, leading to uncertainty about the quality of system-level HLS DSE. This paper proposes a novel approach, called Vaegan, that employs generative machine learning to generate synthetic data that is robust enough to support complex system-level HLS DSE experiments that would be unattainable with only the currently available data. We explore and adapt a Variational Autoencoder (VAE) and Generative Adversarial Network (GAN) for this task and evaluate our approach using state-of-the-art datasets and metrics. We compare our approach to prior works and show that Vaegan effectively generates synthetic HLS data that closely mirrors the ground truth's distribution.

[30]  arXiv:2404.14757 [pdf, other]
Title: Integrating Mamba and Transformer for Long-Short Range Time Series Forecasting
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Time series forecasting is an important problem and plays a key role in a variety of applications including weather forecasting, stock market, and scientific simulations. Although transformers have proven to be effective in capturing dependency, its quadratic complexity of attention mechanism prevents its further adoption in long-range time series forecasting, thus limiting them attend to short-range range. Recent progress on state space models (SSMs) have shown impressive performance on modeling long range dependency due to their subquadratic complexity. Mamba, as a representative SSM, enjoys linear time complexity and has achieved strong scalability on tasks that requires scaling to long sequences, such as language, audio, and genomics. In this paper, we propose to leverage a hybrid framework Mambaformer that internally combines Mamba for long-range dependency, and Transformer for short range dependency, for long-short range forecasting. To the best of our knowledge, this is the first paper to combine Mamba and Transformer architecture in time series data. We investigate possible hybrid architectures to combine Mamba layer and attention layer for long-short range time series forecasting. The comparative study shows that the Mambaformer family can outperform Mamba and Transformer in long-short range time series forecasting problem. The code is available at https://github.com/XiongxiaoXu/Mambaformerin-Time-Series.

[31]  arXiv:2404.14815 [pdf, other]
Title: Time-aware Heterogeneous Graph Transformer with Adaptive Attention Merging for Health Event Prediction
Comments: 38 pages, 7 figures, 5 tables
Subjects: Machine Learning (cs.LG)

The widespread application of Electronic Health Records (EHR) data in the medical field has led to early successes in disease risk prediction using deep learning methods. These methods typically require extensive data for training due to their large parameter sets. However, existing works do not exploit the full potential of EHR data. A significant challenge arises from the infrequent occurrence of many medical codes within EHR data, limiting their clinical applicability. Current research often lacks in critical areas: 1) incorporating disease domain knowledge; 2) heterogeneously learning disease representations with rich meanings; 3) capturing the temporal dynamics of disease progression. To overcome these limitations, we introduce a novel heterogeneous graph learning model designed to assimilate disease domain knowledge and elucidate the intricate relationships between drugs and diseases. This model innovatively incorporates temporal data into visit-level embeddings and leverages a time-aware transformer alongside an adaptive attention mechanism to produce patient representations. When evaluated on two healthcare datasets, our approach demonstrated notable enhancements in both prediction accuracy and interpretability over existing methodologies, signifying a substantial advancement towards personalized and proactive healthcare management.

[32]  arXiv:2404.14829 [pdf, other]
Title: Revisiting Neural Networks for Continual Learning: An Architectural Perspective
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

Efforts to overcome catastrophic forgetting have primarily centered around developing more effective Continual Learning (CL) methods. In contrast, less attention was devoted to analyzing the role of network architecture design (e.g., network depth, width, and components) in contributing to CL. This paper seeks to bridge this gap between network architecture design and CL, and to present a holistic study on the impact of network architectures on CL. This work considers architecture design at the network scaling level, i.e., width and depth, and also at the network components, i.e., skip connections, global pooling layers, and down-sampling. In both cases, we first derive insights through systematically exploring how architectural designs affect CL. Then, grounded in these insights, we craft a specialized search space for CL and further propose a simple yet effective ArchCraft method to steer a CL-friendly architecture, namely, this method recrafts AlexNet/ResNet into AlexAC/ResAC. Experimental validation across various CL settings and scenarios demonstrates that improved architectures are parameter-efficient, achieving state-of-the-art performance of CL while being 86%, 61%, and 97% more compact in terms of parameters than the naive CL architecture in Class IL and Task IL. Code is available at https://github.com/byyx666/ArchCraft.

[33]  arXiv:2404.14855 [pdf, other]
Title: The Geometry of the Set of Equivalent Linear Neural Networks
Comments: 99 pages, 14 figures
Subjects: Machine Learning (cs.LG); Computational Geometry (cs.CG)

We characterize the geometry and topology of the set of all weight vectors for which a linear neural network computes the same linear transformation $W$. This set of weight vectors is called the fiber of $W$ (under the matrix multiplication map), and it is embedded in the Euclidean weight space of all possible weight vectors. The fiber is an algebraic variety that is not necessarily a manifold. We describe a natural way to stratify the fiber--that is, to partition the algebraic variety into a finite set of manifolds of varying dimensions called strata. We call this set of strata the rank stratification. We derive the dimensions of these strata and the relationships by which they adjoin each other. Although the strata are disjoint, their closures are not. Our strata satisfy the frontier condition: if a stratum intersects the closure of another stratum, then the former stratum is a subset of the closure of the latter stratum. Each stratum is a manifold of class $C^\infty$ embedded in weight space, so it has a well-defined tangent space and normal space at every point (weight vector). We show how to determine the subspaces tangent to and normal to a specified stratum at a specified point on the stratum, and we construct elegant bases for those subspaces.
To help achieve these goals, we first derive what we call a Fundamental Theorem of Linear Neural Networks, analogous to what Strang calls the Fundamental Theorem of Linear Algebra. We show how to decompose each layer of a linear neural network into a set of subspaces that show how information flows through the neural network. Each stratum of the fiber represents a different pattern by which information flows (or fails to flow) through the neural network. The topology of a stratum depends solely on this decomposition. So does its geometry, up to a linear transformation in weight space.

[34]  arXiv:2404.14875 [pdf, other]
Title: Regularized Gauss-Newton for Optimizing Overparameterized Neural Networks
Comments: 27 pages, 9 figures, 2 tables
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)

The generalized Gauss-Newton (GGN) optimization method incorporates curvature estimates into its solution steps, and provides a good approximation to the Newton method for large-scale optimization problems. GGN has been found particularly interesting for practical training of deep neural networks, not only for its impressive convergence speed, but also for its close relation with neural tangent kernel regression, which is central to recent studies that aim to understand the optimization and generalization properties of neural networks. This work studies a GGN method for optimizing a two-layer neural network with explicit regularization. In particular, we consider a class of generalized self-concordant (GSC) functions that provide smooth approximations to commonly-used penalty terms in the objective function of the optimization problem. This approach provides an adaptive learning rate selection technique that requires little to no tuning for optimal performance. We study the convergence of the two-layer neural network, considered to be overparameterized, in the optimization loop of the resulting GGN method for a given scaling of the network parameters. Our numerical experiments highlight specific aspects of GSC regularization that help to improve generalization of the optimized neural network. The code to reproduce the experimental results is available at https://github.com/adeyemiadeoye/ggn-score-nn.

[35]  arXiv:2404.14886 [pdf, other]
Title: GCEPNet: Graph Convolution-Enhanced Expectation Propagation for Massive MIMO Detection
Subjects: Machine Learning (cs.LG)

Massive MIMO (multiple-input multiple-output) detection is an important topic in wireless communication and various machine learning based methods have been developed recently for this task. Expectation propagation (EP) and its variants are widely used for MIMO detection and have achieved the best performance. However, EP-based solvers fail to capture the correlation between unknown variables, leading to loss of information, and in addition, they are computationally expensive. In this paper, we show that the real-valued system can be modeled as spectral signal convolution on graph, through which the correlation between unknown variables can be captured. Based on this analysis, we propose graph convolution-enhanced expectation propagation (GCEPNet), a graph convolution-enhanced EP detector. GCEPNet incorporates data-dependent attention scores into Chebyshev polynomial for powerful graph convolution with better generalization capacity. It enables a better estimation of the cavity distribution for EP and empirically achieves the state-of-the-art (SOTA) MIMO detection performance with much faster inference speed. To our knowledge, we are the first to shed light on the connection between the system model and graph convolution, and the first to design the data-dependent attention scores for graph convolution.

[36]  arXiv:2404.14909 [pdf, other]
Title: MultiSTOP: Solving Functional Equations with Reinforcement Learning
Comments: ICLR 2024 Workshop on AI4DifferentialEquations In Science
Subjects: Machine Learning (cs.LG); High Energy Physics - Theory (hep-th)

We develop MultiSTOP, a Reinforcement Learning framework for solving functional equations in physics. This new methodology produces actual numerical solutions instead of bounds on them. We extend the original BootSTOP algorithm by adding multiple constraints derived from domain-specific knowledge, even in integral form, to improve the accuracy of the solution. We investigate a particular equation in a one-dimensional Conformal Field Theory.

[37]  arXiv:2404.14928 [pdf, other]
Title: Graph Machine Learning in the Era of Large Language Models (LLMs)
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Social and Information Networks (cs.SI)

Graphs play an important role in representing complex relationships in various domains like social networks, knowledge graphs, and molecular discovery. With the advent of deep learning, Graph Neural Networks (GNNs) have emerged as a cornerstone in Graph Machine Learning (Graph ML), facilitating the representation and processing of graph structures. Recently, LLMs have demonstrated unprecedented capabilities in language tasks and are widely adopted in a variety of applications such as computer vision and recommender systems. This remarkable success has also attracted interest in applying LLMs to the graph domain. Increasing efforts have been made to explore the potential of LLMs in advancing Graph ML's generalization, transferability, and few-shot learning ability. Meanwhile, graphs, especially knowledge graphs, are rich in reliable factual knowledge, which can be utilized to enhance the reasoning capabilities of LLMs and potentially alleviate their limitations such as hallucinations and the lack of explainability. Given the rapid progress of this research direction, a systematic review summarizing the latest advancements for Graph ML in the era of LLMs is necessary to provide an in-depth understanding to researchers and practitioners. Therefore, in this survey, we first review the recent developments in Graph ML. We then explore how LLMs can be utilized to enhance the quality of graph features, alleviate the reliance on labeled data, and address challenges such as graph heterogeneity and out-of-distribution (OOD) generalization. Afterward, we delve into how graphs can enhance LLMs, highlighting their abilities to enhance LLM pre-training and inference. Furthermore, we investigate various applications and discuss the potential future directions in this promising field.

[38]  arXiv:2404.14933 [pdf, other]
Title: Fin-Fed-OD: Federated Outlier Detection on Financial Tabular Data
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Anomaly detection in real-world scenarios poses challenges due to dynamic and often unknown anomaly distributions, requiring robust methods that operate under an open-world assumption. This challenge is exacerbated in practical settings, where models are employed by private organizations, precluding data sharing due to privacy and competitive concerns. Despite potential benefits, the sharing of anomaly information across organizations is restricted. This paper addresses the question of enhancing outlier detection within individual organizations without compromising data confidentiality. We propose a novel method leveraging representation learning and federated learning techniques to improve the detection of unknown anomalies. Specifically, our approach utilizes latent representations obtained from client-owned autoencoders to refine the decision boundary of inliers. Notably, only model parameters are shared between organizations, preserving data privacy. The efficacy of our proposed method is evaluated on two standard financial tabular datasets and an image dataset for anomaly detection in a distributed setting. The results demonstrate a strong improvement in the classification of unknown outliers during the inference phase for each organization's model.

[39]  arXiv:2404.14941 [pdf, other]
Title: Delayed Bottlenecking: Alleviating Forgetting in Pre-trained Graph Neural Networks
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Pre-training GNNs to extract transferable knowledge and apply it to downstream tasks has become the de facto standard of graph representation learning. Recent works focused on designing self-supervised pre-training tasks to extract useful and universal transferable knowledge from large-scale unlabeled data. However, they have to face an inevitable question: traditional pre-training strategies that aim at extracting useful information about pre-training tasks, may not extract all useful information about the downstream task. In this paper, we reexamine the pre-training process within traditional pre-training and fine-tuning frameworks from the perspective of Information Bottleneck (IB) and confirm that the forgetting phenomenon in pre-training phase may cause detrimental effects on downstream tasks. Therefore, we propose a novel \underline{D}elayed \underline{B}ottlenecking \underline{P}re-training (DBP) framework which maintains as much as possible mutual information between latent representations and training data during pre-training phase by suppressing the compression operation and delays the compression operation to fine-tuning phase to make sure the compression can be guided with labeled fine-tuning data and downstream tasks. To achieve this, we design two information control objectives that can be directly optimized and further integrate them into the actual model design. Extensive experiments on both chemistry and biology domains demonstrate the effectiveness of DBP.

[40]  arXiv:2404.14953 [pdf, other]
Title: Dynamic pricing with Bayesian updates from online reviews
Subjects: Machine Learning (cs.LG)

When launching new products, firms face uncertainty about market reception. Online reviews provide valuable information not only to consumers but also to firms, allowing firms to adjust the product characteristics, including its selling price. In this paper, we consider a pricing model with online reviews in which the quality of the product is uncertain, and both the seller and the buyers Bayesianly update their beliefs to make purchasing & pricing decisions. We model the seller's pricing problem as a basic bandits' problem and show a close connection with the celebrated Catalan numbers, allowing us to efficiently compute the overall future discounted reward of the seller. With this tool, we analyze and compare the optimal static and dynamic pricing strategies in terms of the probability of effectively learning the quality of the product.

[41]  arXiv:2404.14961 [pdf, other]
Title: Cache-Aware Reinforcement Learning in Large-Scale Recommender Systems
Comments: 8 pages, 8 figures
Subjects: Machine Learning (cs.LG)

Modern large-scale recommender systems are built upon computation-intensive infrastructure and usually suffer from a huge difference in traffic between peak and off-peak periods. In peak periods, it is challenging to perform real-time computation for each request due to the limited budget of computational resources. The recommendation with a cache is a solution to this problem, where a user-wise result cache is used to provide recommendations when the recommender system cannot afford a real-time computation. However, the cached recommendations are usually suboptimal compared to real-time computation, and it is challenging to determine the items in the cache for each user. In this paper, we provide a cache-aware reinforcement learning (CARL) method to jointly optimize the recommendation by real-time computation and by the cache. We formulate the problem as a Markov decision process with user states and a cache state, where the cache state represents whether the recommender system performs recommendations by real-time computation or by the cache. The computational load of the recommender system determines the cache state. We perform reinforcement learning based on such a model to improve user engagement over multiple requests. Moreover, we show that the cache will introduce a challenge called critic dependency, which deteriorates the performance of reinforcement learning. To tackle this challenge, we propose an eigenfunction learning (EL) method to learn independent critics for CARL. Experiments show that CARL can significantly improve the users' engagement when considering the result cache. CARL has been fully launched in Kwai app, serving over 100 million users.

[42]  arXiv:2404.14970 [pdf, other]
Title: Integrating Heterogeneous Gene Expression Data through Knowledge Graphs for Improving Diabetes Prediction
Comments: 11 pages, 4 figures, 7th Workshop on Semantic Web Solutions for Large-scale Biomedical Data Analytics at ESWC2024
Subjects: Machine Learning (cs.LG)

Diabetes is a worldwide health issue affecting millions of people. Machine learning methods have shown promising results in improving diabetes prediction, particularly through the analysis of diverse data types, namely gene expression data. While gene expression data can provide valuable insights, challenges arise from the fact that the sample sizes in expression datasets are usually limited, and the data from different datasets with different gene expressions cannot be easily combined.
This work proposes a novel approach to address these challenges by integrating multiple gene expression datasets and domain-specific knowledge using knowledge graphs, a unique tool for biomedical data integration. KG embedding methods are then employed to generate vector representations, serving as inputs for a classifier. Experiments demonstrated the efficacy of our approach, revealing improvements in diabetes prediction when integrating multiple gene expression datasets and domain-specific knowledge about protein functions and interactions.

[43]  arXiv:2404.14973 [pdf, ps, other]
Title: Symbolic Integration Algorithm Selection with Machine Learning: LSTMs vs Tree LSTMs
Subjects: Machine Learning (cs.LG); Mathematical Software (cs.MS); Symbolic Computation (cs.SC)

Computer Algebra Systems (e.g. Maple) are used in research, education, and industrial settings. One of their key functionalities is symbolic integration, where there are many sub-algorithms to choose from that can affect the form of the output integral, and the runtime. Choosing the right sub-algorithm for a given problem is challenging: we hypothesise that Machine Learning can guide this sub-algorithm choice. A key consideration of our methodology is how to represent the mathematics to the ML model: we hypothesise that a representation which encodes the tree structure of mathematical expressions would be well suited. We trained both an LSTM and a TreeLSTM model for sub-algorithm prediction and compared them to Maple's existing approach. Our TreeLSTM performs much better than the LSTM, highlighting the benefit of using an informed representation of mathematical expressions. It is able to produce better outputs than Maple's current state-of-the-art meta-algorithm, giving a strong basis for further research.

[44]  arXiv:2404.14986 [pdf, other]
Title: $\texttt{MiniMol}$: A Parameter-Efficient Foundation Model for Molecular Learning
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

In biological tasks, data is rarely plentiful as it is generated from hard-to-gather measurements. Therefore, pre-training foundation models on large quantities of available data and then transfer to low-data downstream tasks is a promising direction. However, how to design effective foundation models for molecular learning remains an open question, with existing approaches typically focusing on models with large parameter capacities. In this work, we propose $\texttt{MiniMol}$, a foundational model for molecular learning with 10 million parameters. $\texttt{MiniMol}$ is pre-trained on a mix of roughly 3300 sparsely defined graph- and node-level tasks of both quantum and biological nature. The pre-training dataset includes approximately 6 million molecules and 500 million labels. To demonstrate the generalizability of $\texttt{MiniMol}$ across tasks, we evaluate it on downstream tasks from the Therapeutic Data Commons (TDC) ADMET group showing significant improvements over the prior state-of-the-art foundation model across 17 tasks. $\texttt{MiniMol}$ will be a public and open-sourced model for future research.

[45]  arXiv:2404.15018 [pdf, other]
Title: Conformal Predictive Systems Under Covariate Shift
Comments: 13 pages, 4 figures
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Conformal Predictive Systems (CPS) offer a versatile framework for constructing predictive distributions, allowing for calibrated inference and informative decision-making. However, their applicability has been limited to scenarios adhering to the Independent and Identically Distributed (IID) model assumption. This paper extends CPS to accommodate scenarios characterized by covariate shifts. We therefore propose Weighted CPS (WCPS), akin to Weighted Conformal Prediction (WCP), leveraging likelihood ratios between training and testing covariate distributions. This extension enables the construction of nonparametric predictive distributions capable of handling covariate shifts. We present theoretical underpinnings and conjectures regarding the validity and efficacy of WCPS and demonstrate its utility through empirical evaluations on both synthetic and real-world datasets. Our simulation experiments indicate that WCPS are probabilistically calibrated under covariate shift.

[46]  arXiv:2404.15029 [pdf, other]
Title: Explainable LightGBM Approach for Predicting Myocardial Infarction Mortality
Comments: This article has been accepted at the 2023 International Conference on Computational Science and Computational Intelligence (CSCI 23)
Subjects: Machine Learning (cs.LG)

Myocardial Infarction is a main cause of mortality globally, and accurate risk prediction is crucial for improving patient outcomes. Machine Learning techniques have shown promise in identifying high-risk patients and predicting outcomes. However, patient data often contain vast amounts of information and missing values, posing challenges for feature selection and imputation methods. In this article, we investigate the impact of the data preprocessing task and compare three ensembles boosted tree methods to predict the risk of mortality in patients with myocardial infarction. Further, we use the Tree Shapley Additive Explanations method to identify relationships among all the features for the performed predictions, leveraging the entirety of the available data in the analysis. Notably, our approach achieved a superior performance when compared to other existing machine learning approaches, with an F1-score of 91,2% and an accuracy of 91,8% for LightGBM without data preprocessing.

[47]  arXiv:2404.15034 [pdf, other]
Title: Deep Multi-View Channel-Wise Spatio-Temporal Network for Traffic Flow Prediction
Comments: Accepted by AAAI2020 workshop
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Accurately forecasting traffic flows is critically important to many real applications including public safety and intelligent transportation systems. The challenges of this problem include both the dynamic mobility patterns of the people and the complex spatial-temporal correlations of the urban traffic data. Meanwhile, most existing models ignore the diverse impacts of the various traffic observations (e.g. vehicle speed and road occupancy) on the traffic flow prediction, and different traffic observations can be considered as different channels of input features. We argue that the analysis in multiple-channel traffic observations might help to better address this problem. In this paper, we study the novel problem of multi-channel traffic flow prediction, and propose a deep \underline{M}ulti-\underline{V}iew \underline{C}hannel-wise \underline{S}patio-\underline{T}emporal \underline{Net}work (MVC-STNet) model to effectively address it. Specifically, we first construct the localized and globalized spatial graph where the multi-view fusion module is used to effectively extract the local and global spatial dependencies. Then LSTM is used to learn the temporal correlations. To effectively model the different impacts of various traffic observations on traffic flow prediction, a channel-wise graph convolutional network is also designed. Extensive experiments are conducted over the PEMS04 and PEMS08 datasets. The results demonstrate that the proposed MVC-STNet outperforms state-of-the-art methods by a large margin.

[48]  arXiv:2404.15065 [pdf, other]
Title: Formal Verification of Graph Convolutional Networks with Uncertain Node Features and Uncertain Graph Structure
Comments: under review
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Graph neural networks are becoming increasingly popular in the field of machine learning due to their unique ability to process data structured in graphs. They have also been applied in safety-critical environments where perturbations inherently occur. However, these perturbations require us to formally verify neural networks before their deployment in safety-critical environments as neural networks are prone to adversarial attacks. While there exists research on the formal verification of neural networks, there is no work verifying the robustness of generic graph convolutional network architectures with uncertainty in the node features and in the graph structure over multiple message-passing steps. This work addresses this research gap by explicitly preserving the non-convex dependencies of all elements in the underlying computations through reachability analysis with (matrix) polynomial zonotopes. We demonstrate our approach on three popular benchmark datasets.

[49]  arXiv:2404.15084 [pdf, other]
Title: Hyperparameter Optimization Can Even be Harmful in Off-Policy Learning and How to Deal with It
Comments: IJCAI'24
Subjects: Machine Learning (cs.LG)

There has been a growing interest in off-policy evaluation in the literature such as recommender systems and personalized medicine. We have so far seen significant progress in developing estimators aimed at accurately estimating the effectiveness of counterfactual policies based on biased logged data. However, there are many cases where those estimators are used not only to evaluate the value of decision making policies but also to search for the best hyperparameters from a large candidate space. This work explores the latter hyperparameter optimization (HPO) task for off-policy learning. We empirically show that naively applying an unbiased estimator of the generalization performance as a surrogate objective in HPO can cause an unexpected failure, merely pursuing hyperparameters whose generalization performance is greatly overestimated. We then propose simple and computationally efficient corrections to the typical HPO procedure to deal with the aforementioned issues simultaneously. Empirical investigations demonstrate the effectiveness of our proposed HPO algorithm in situations where the typical procedure fails severely.

[50]  arXiv:2404.15095 [pdf, ps, other]
Title: Using ARIMA to Predict the Expansion of Subscriber Data Consumption
Authors: Mike Wa Nkongolo
Subjects: Machine Learning (cs.LG)

This study discusses how insights retrieved from subscriber data can impact decision-making in telecommunications, focusing on predictive modeling using machine learning techniques such as the ARIMA model. The study explores time series forecasting to predict subscriber usage trends, evaluating the ARIMA model's performance using various metrics. It also compares ARIMA with Convolutional Neural Network (CNN) models, highlighting ARIMA's superiority in accuracy and execution speed. The study suggests future directions for research, including exploring additional forecasting models and considering other factors affecting subscriber data usage.

[51]  arXiv:2404.15109 [pdf, other]
Title: Compete and Compose: Learning Independent Mechanisms for Modular World Models
Subjects: Machine Learning (cs.LG)

We present COmpetitive Mechanisms for Efficient Transfer (COMET), a modular world model which leverages reusable, independent mechanisms across different environments. COMET is trained on multiple environments with varying dynamics via a two-step process: competition and composition. This enables the model to recognise and learn transferable mechanisms. Specifically, in the competition phase, COMET is trained with a winner-takes-all gradient allocation, encouraging the emergence of independent mechanisms. These are then re-used in the composition phase, where COMET learns to re-compose learnt mechanisms in ways that capture the dynamics of intervened environments. In so doing, COMET explicitly reuses prior knowledge, enabling efficient and interpretable adaptation. We evaluate COMET on environments with image-based observations. In contrast to competitive baselines, we demonstrate that COMET captures recognisable mechanisms without supervision. Moreover, we show that COMET is able to adapt to new environments with varying numbers of objects with improved sample efficiency compared to more conventional finetuning approaches.

[52]  arXiv:2404.15146 [pdf, other]
Title: Rethinking LLM Memorization through the Lens of Adversarial Compression
Comments: this https URL
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)

Large language models (LLMs) trained on web-scale datasets raise substantial concerns regarding permissible data usage. One major question is whether these models "memorize" all their training data or they integrate many data sources in some way more akin to how a human would learn and synthesize information. The answer hinges, to a large degree, on $\textit{how we define memorization}$. In this work, we propose the Adversarial Compression Ratio (ACR) as a metric for assessing memorization in LLMs -- a given string from the training data is considered memorized if it can be elicited by a prompt shorter than the string itself. In other words, these strings can be "compressed" with the model by computing adversarial prompts of fewer tokens. We outline the limitations of existing notions of memorization and show how the ACR overcomes these challenges by (i) offering an adversarial view to measuring memorization, especially for monitoring unlearning and compliance; and (ii) allowing for the flexibility to measure memorization for arbitrary strings at a reasonably low compute. Our definition serves as a valuable and practical tool for determining when model owners may be violating terms around data usage, providing a potential legal tool and a critical lens through which to address such scenarios. Project page: https://locuslab.github.io/acr-memorization.

[53]  arXiv:2404.15182 [pdf, other]
Title: FLoRA: Enhancing Vision-Language Models with Parameter-Efficient Federated Learning
Comments: 10 pages, 11 figures
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

In the rapidly evolving field of artificial intelligence, multimodal models, e.g., integrating vision and language into visual-language models (VLMs), have become pivotal for many applications, ranging from image captioning to multimodal search engines. Among these models, the Contrastive Language-Image Pre-training (CLIP) model has demonstrated remarkable performance in understanding and generating nuanced relationships between text and images. However, the conventional training of such models often requires centralized aggregation of vast datasets, posing significant privacy and data governance challenges. To address these concerns, this paper proposes a novel approach that leverages Federated Learning and parameter-efficient adapters, i.e., Low-Rank Adaptation (LoRA), to train VLMs. This methodology preserves data privacy by training models across decentralized data sources and ensures model adaptability and efficiency through LoRA's parameter-efficient fine-tuning. Our approach accelerates training time by up to 34.72 times and requires 2.47 times less memory usage than full fine-tuning.

[54]  arXiv:2404.15198 [pdf, other]
Title: Lossless and Near-Lossless Compression for Foundation Models
Subjects: Machine Learning (cs.LG); Information Theory (cs.IT)

With the growth of model sizes and scale of their deployment, their sheer size burdens the infrastructure requiring more network and more storage to accommodate these. While there is a vast literature about reducing model sizes, we investigate a more traditional type of compression -- one that compresses the model to a smaller form and is coupled with a decompression algorithm that returns it to its original size -- namely lossless compression. Somewhat surprisingly, we show that such lossless compression can gain significant network and storage reduction on popular models, at times reducing over $50\%$ of the model size. We investigate the source of model compressibility, introduce compression variants tailored for models and categorize models to compressibility groups. We also introduce a tunable lossy compression technique that can further reduce size even on the less compressible models with little to no effect on the model accuracy. We estimate that these methods could save over an ExaByte per month of network traffic downloaded from a large model hub like HuggingFace.

[55]  arXiv:2404.15199 [pdf, other]
Title: Reinforcement Learning with Adaptive Control Regularization for Safe Control of Critical Systems
Subjects: Machine Learning (cs.LG)

Reinforcement Learning (RL) is a powerful method for controlling dynamic systems, but its learning mechanism can lead to unpredictable actions that undermine the safety of critical systems. Here, we propose RL with Adaptive Control Regularization (RL-ACR) that ensures RL safety by combining the RL policy with a control regularizer that hard-codes safety constraints over forecasted system behaviors. The adaptability is achieved by using a learnable "focus" weight trained to maximize the cumulative reward of the policy combination. As the RL policy improves through off-policy learning, the focus weight improves the initial sub-optimum strategy by gradually relying more on the RL policy. We demonstrate the effectiveness of RL-ACR in a critical medical control application and further investigate its performance in four classic control environments.

[56]  arXiv:2404.15201 [pdf, other]
Title: CORE-BEHRT: A Carefully Optimized and Rigorously Evaluated BEHRT
Comments: 11 pages, 5 figures
Subjects: Machine Learning (cs.LG)

BERT-based models for Electronic Health Records (EHR) have surged in popularity following the release of BEHRT and Med-BERT. Subsequent models have largely built on these foundations despite the fundamental design choices of these pioneering models remaining underexplored. To address this issue, we introduce CORE-BEHRT, a Carefully Optimized and Rigorously Evaluated BEHRT. Through incremental optimization, we isolate the sources of improvement for key design choices, giving us insights into the effect of data representation and individual technical components on performance. Evaluating this across a set of generic tasks (death, pain treatment, and general infection), we showed that improving data representation can increase the average downstream performance from 0.785 to 0.797 AUROC, primarily when including medication and timestamps. Improving the architecture and training protocol on top of this increased average downstream performance to 0.801 AUROC. We then demonstrated the consistency of our optimization through a rigorous evaluation across 25 diverse clinical prediction tasks. We observed significant performance increases in 17 out of 25 tasks and improvements in 24 tasks, highlighting the generalizability of our findings. Our findings provide a strong foundation for future work and aim to increase the trustworthiness of BERT-based EHR models.

[57]  arXiv:2404.15209 [pdf, other]
Title: Data-Driven Knowledge Transfer in Batch $Q^*$ Learning
Subjects: Machine Learning (cs.LG); Methodology (stat.ME); Machine Learning (stat.ML)

In data-driven decision-making in marketing, healthcare, and education, it is desirable to utilize a large amount of data from existing ventures to navigate high-dimensional feature spaces and address data scarcity in new ventures. We explore knowledge transfer in dynamic decision-making by concentrating on batch stationary environments and formally defining task discrepancies through the lens of Markov decision processes (MDPs). We propose a framework of Transferred Fitted $Q$-Iteration algorithm with general function approximation, enabling the direct estimation of the optimal action-state function $Q^*$ using both target and source data. We establish the relationship between statistical performance and MDP task discrepancy under sieve approximation, shedding light on the impact of source and target sample sizes and task discrepancy on the effectiveness of knowledge transfer. We show that the final learning error of the $Q^*$ function is significantly improved from the single task rate both theoretically and empirically.

[58]  arXiv:2404.15225 [pdf, other]
Title: PHLP: Sole Persistent Homology for Link Prediction -- Interpretable Feature Extraction
Subjects: Machine Learning (cs.LG); Computational Geometry (cs.CG); Algebraic Topology (math.AT); Machine Learning (stat.ML)

Link prediction (LP), inferring the connectivity between nodes, is a significant research area in graph data, where a link represents essential information on relationships between nodes. Although graph neural network (GNN)-based models have achieved high performance in LP, understanding why they perform well is challenging because most comprise complex neural networks. We employ persistent homology (PH), a topological data analysis method that helps analyze the topological information of graphs, to explain the reasons for the high performance. We propose a novel method that employs PH for LP (PHLP) focusing on how the presence or absence of target links influences the overall topology. The PHLP utilizes the angle hop subgraph and new node labeling called degree double radius node labeling (Degree DRNL), distinguishing the information of graphs better than DRNL. Using only a classifier, PHLP performs similarly to state-of-the-art (SOTA) models on most benchmark datasets. Incorporating the outputs calculated using PHLP into the existing GNN-based SOTA models improves performance across all benchmark datasets. To the best of our knowledge, PHLP is the first method of applying PH to LP without GNNs. The proposed approach, employing PH while not relying on neural networks, enables the identification of crucial factors for improving performance.

[59]  arXiv:2404.15242 [pdf, other]
Title: A Hybrid Kernel-Free Boundary Integral Method with Operator Learning for Solving Parametric Partial Differential Equations In Complex Domains
Comments: 30 pages,6 figures
Subjects: Machine Learning (cs.LG); Numerical Analysis (math.NA)

The Kernel-Free Boundary Integral (KFBI) method presents an iterative solution to boundary integral equations arising from elliptic partial differential equations (PDEs). This method effectively addresses elliptic PDEs on irregular domains, including the modified Helmholtz, Stokes, and elasticity equations. The rapid evolution of neural networks and deep learning has invigorated the exploration of numerical PDEs. An increasing interest is observed in deep learning approaches that seamlessly integrate mathematical principles for investigating numerical PDEs. We propose a hybrid KFBI method, integrating the foundational principles of the KFBI method with the capabilities of deep learning. This approach, within the framework of the boundary integral method, designs a network to approximate the solution operator for the corresponding integral equations by mapping the parameters, inhomogeneous terms and boundary information of PDEs to the boundary density functions, which can be regarded as the solution of the integral equations. The models are trained using data generated by the Cartesian grid-based KFBI algorithm, exhibiting robust generalization capabilities. It accurately predicts density functions across diverse boundary conditions and parameters within the same class of equations. Experimental results demonstrate that the trained model can directly infer the boundary density function with satisfactory precision, obviating the need for iterative steps in solving boundary integral equations. Furthermore, applying the inference results of the model as initial values for iterations is also reasonable; this approach can retain the inherent second-order accuracy of the KFBI method while accelerating the traditional KFBI approach by reducing about 50% iterations.

[60]  arXiv:2404.15255 [pdf, other]
Title: How to use and interpret activation patching
Comments: A tutorial on activation patching. 13 pages, 2 figures
Subjects: Machine Learning (cs.LG)

Activation patching is a popular mechanistic interpretability technique, but has many subtleties regarding how it is applied and how one may interpret the results. We provide a summary of advice and best practices, based on our experience using this technique in practice. We include an overview of the different ways to apply activation patching and a discussion on how to interpret the results. We focus on what evidence patching experiments provide about circuits, and on the choice of metric and associated pitfalls.

[61]  arXiv:2404.15274 [pdf, other]
Title: Metric-guided Image Reconstruction Bounds via Conformal Prediction
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV); Medical Physics (physics.med-ph)

Recent advancements in machine learning have led to novel imaging systems and algorithms that address ill-posed problems. Assessing their trustworthiness and understanding how to deploy them safely at test time remains an important and open problem. We propose a method that leverages conformal prediction to retrieve upper/lower bounds and statistical inliers/outliers of reconstructions based on the prediction intervals of downstream metrics. We apply our method to sparse-view CT for downstream radiotherapy planning and show 1) that metric-guided bounds have valid coverage for downstream metrics while conventional pixel-wise bounds do not and 2) anatomical differences of upper/lower bounds between metric-guided and pixel-wise methods. Our work paves the way for more meaningful reconstruction bounds. Code available at https://github.com/matthewyccheung/conformal-metric

Cross-lists for Wed, 24 Apr 24

[62]  arXiv:2404.13630 (cross-list from cs.SE) [pdf, ps, other]
Title: Utilizing Deep Learning to Optimize Software Development Processes
Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)

This study explores the application of deep learning technologies in software development processes, particularly in automating code reviews, error prediction, and test generation to enhance code quality and development efficiency. Through a series of empirical studies, experimental groups using deep learning tools and control groups using traditional methods were compared in terms of code error rates and project completion times. The results demonstrated significant improvements in the experimental group, validating the effectiveness of deep learning technologies. The research also discusses potential optimization points, methodologies, and technical challenges of deep learning in software development, as well as how to integrate these technologies into existing software development workflows.

[63]  arXiv:2404.14416 (cross-list from physics.geo-ph) [pdf, other]
Title: Conditional diffusion models for downscaling & bias correction of Earth system model precipitation
Subjects: Geophysics (physics.geo-ph); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Atmospheric and Oceanic Physics (physics.ao-ph)

Climate change exacerbates extreme weather events like heavy rainfall and flooding. As these events cause severe losses of property and lives, accurate high-resolution simulation of precipitation is imperative. However, existing Earth System Models (ESMs) struggle with resolving small-scale dynamics and suffer from biases, especially for extreme events. Traditional statistical bias correction and downscaling methods fall short in improving spatial structure, while recent deep learning methods lack controllability over the output and suffer from unstable training. Here, we propose a novel machine learning framework for simultaneous bias correction and downscaling. We train a generative diffusion model in a supervised way purely on observational data. We map observational and ESM data to a shared embedding space, where both are unbiased towards each other and train a conditional diffusion model to reverse the mapping. Our method can be used to correct any ESM field, as the training is independent of the ESM. Our approach ensures statistical fidelity, preserves large-scale spatial patterns and outperforms existing methods especially regarding extreme events and small-scale spatial features that are crucial for impact assessments.

[64]  arXiv:2404.14418 (cross-list from cs.SI) [pdf, other]
Title: Mitigating Cascading Effects in Large Adversarial Graph Environments
Comments: 10 pages, 7 figures
Subjects: Social and Information Networks (cs.SI); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multiagent Systems (cs.MA)

A significant amount of society's infrastructure can be modeled using graph structures, from electric and communication grids, to traffic networks, to social networks. Each of these domains are also susceptible to the cascading spread of negative impacts, whether this be overloaded devices in the power grid or the reach of a social media post containing misinformation. The potential harm of a cascade is compounded when considering a malicious attack by an adversary that is intended to maximize the cascading impact. However, by exploiting knowledge of the cascading dynamics, targets with the largest cascading impact can be preemptively prioritized for defense, and the damage an adversary can inflict can be mitigated. While game theory provides tools for finding an optimal preemptive defense strategy, existing methods struggle to scale to the context of large graph environments because of the combinatorial explosion of possible actions that occurs when the attacker and defender can each choose multiple targets in the graph simultaneously. The proposed method enables a data-driven deep learning approach that uses multi-node representation learning and counterfactual data augmentation to generalize to the full combinatorial action space by training on a variety of small restricted subsets of the action space. We demonstrate through experiments that the proposed method is capable of identifying defense strategies that are less exploitable than SOTA methods for large graphs, while still being able to produce strategies near the Nash equilibrium for small-scale scenarios for which it can be computed. Moreover, the proposed method demonstrates superior prediction accuracy on a validation set of unseen cascades compared to other deep learning approaches.

[65]  arXiv:2404.14419 (cross-list from cs.SE) [pdf, other]
Title: Enhancing Fault Detection for Large Language Models via Mutation-Based Confidence Smoothing
Subjects: Software Engineering (cs.SE); Computation and Language (cs.CL); Machine Learning (cs.LG)

Large language models (LLMs) achieved great success in multiple application domains and attracted huge attention from different research communities recently. Unfortunately, even for the best LLM, there still exist many faults that LLM cannot correctly predict. Such faults will harm the usability of LLMs. How to quickly reveal them in LLMs is important, but challenging. The reasons are twofold, 1) the heavy labeling effort for preparing the test data, and 2) accessing closed-source LLMs such as GPT4 is money-required. To handle this problem, in the traditional deep learning testing field, test selection methods have been proposed for efficiently testing deep learning models by prioritizing faults. However, the usefulness of these methods on LLMs is unclear and under exploration. In this paper, we first study the effectiveness of existing fault detection methods for LLMs. Experimental results on four different tasks~(including both code tasks and natural language processing tasks) and four LLMs (e.g., LLaMA and GPT4) demonstrated that existing fault detection methods cannot perform well on LLMs (e.g., seven out of eight methods perform worse than random selection on LLaMA). To enhance existing fault detection methods, we propose MuCS, a prompt Mutation-based prediction Confidence Smoothing method for LLMs. Concretely, we mutate the prompts and compute the average prediction confidence of all mutants as the input of fault detection methods. The results show that our proposed solution significantly enhances existing methods with the improvement of test relative coverage by up to 97.64%.

[66]  arXiv:2404.14441 (cross-list from cs.CV) [pdf, ps, other]
Title: Optimizing Contrail Detection: A Deep Learning Approach with EfficientNet-b4 Encoding
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)

In the pursuit of environmental sustainability, the aviation industry faces the challenge of minimizing its ecological footprint. Among the key solutions is contrail avoidance, targeting the linear ice-crystal clouds produced by aircraft exhaust. These contrails exacerbate global warming by trapping atmospheric heat, necessitating precise segmentation and comprehensive analysis of contrail images to gauge their environmental impact. However, this segmentation task is complex due to the varying appearances of contrails under different atmospheric conditions and potential misalignment issues in predictive modeling. This paper presents an innovative deep-learning approach utilizing the efficient net-b4 encoder for feature extraction, seamlessly integrating misalignment correction, soft labeling, and pseudo-labeling techniques to enhance the accuracy and efficiency of contrail detection in satellite imagery. The proposed methodology aims to redefine contrail image analysis and contribute to the objectives of sustainable aviation by providing a robust framework for precise contrail detection and analysis in satellite imagery, thus aiding in the mitigation of aviation's environmental impact.

[67]  arXiv:2404.14449 (cross-list from cs.CL) [pdf, ps, other]
Title: Predicting Question Quality on StackOverflow with Neural Networks
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

The wealth of information available through the Internet and social media is unprecedented. Within computing fields, websites such as Stack Overflow are considered important sources for users seeking solutions to their computing and programming issues. However, like other social media platforms, Stack Overflow contains a mixture of relevant and irrelevant information. In this paper, we evaluated neural network models to predict the quality of questions on Stack Overflow, as an example of Question Answering (QA) communities. Our results demonstrate the effectiveness of neural network models compared to baseline machine learning models, achieving an accuracy of 80%. Furthermore, our findings indicate that the number of layers in the neural network model can significantly impact its performance.

[68]  arXiv:2404.14460 (cross-list from stat.ML) [pdf, other]
Title: Inference of Causal Networks using a Topological Threshold
Comments: 17 pages, 12 figures
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Methodology (stat.ME)

We propose a constraint-based algorithm, which automatically determines causal relevance thresholds, to infer causal networks from data. We call these topological thresholds. We present two methods for determining the threshold: the first seeks a set of edges that leaves no disconnected nodes in the network; the second seeks a causal large connected component in the data.
We tested these methods both for discrete synthetic and real data, and compared the results with those obtained for the PC algorithm, which we took as the benchmark. We show that this novel algorithm is generally faster and more accurate than the PC algorithm.
The algorithm for determining the thresholds requires choosing a measure of causality. We tested our methods for Fisher Correlations, commonly used in PC algorithm (for instance in \cite{kalisch2005}), and further proposed a discrete and asymmetric measure of causality, that we called Net Influence, which provided very good results when inferring causal networks from discrete data. This metric allows for inferring directionality of the edges in the process of applying the thresholds, speeding up the inference of causal DAGs.

[69]  arXiv:2404.14461 (cross-list from cs.CL) [pdf, other]
Title: Competition Report: Finding Universal Jailbreak Backdoors in Aligned LLMs
Comments: Competition Report
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Machine Learning (cs.LG)

Large language models are aligned to be safe, preventing users from generating harmful content like misinformation or instructions for illegal activities. However, previous work has shown that the alignment process is vulnerable to poisoning attacks. Adversaries can manipulate the safety training data to inject backdoors that act like a universal sudo command: adding the backdoor string to any prompt enables harmful responses from models that, otherwise, behave safely. Our competition, co-located at IEEE SaTML 2024, challenged participants to find universal backdoors in several large language models. This report summarizes the key findings and promising ideas for future research.

[70]  arXiv:2404.14463 (cross-list from cs.CL) [pdf, other]
Title: DAIC-WOZ: On the Validity of Using the Therapist's prompts in Automatic Depression Detection from Clinical Interviews
Comments: Accepted to Clinical NLP workshop at NAACL 2024
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Machine Learning (cs.LG)

Automatic depression detection from conversational data has gained significant interest in recent years. The DAIC-WOZ dataset, interviews conducted by a human-controlled virtual agent, has been widely used for this task. Recent studies have reported enhanced performance when incorporating interviewer's prompts into the model. In this work, we hypothesize that this improvement might be mainly due to a bias present in these prompts, rather than the proposed architectures and methods. Through ablation experiments and qualitative analysis, we discover that models using interviewer's prompts learn to focus on a specific region of the interviews, where questions about past experiences with mental health issues are asked, and use them as discriminative shortcuts to detect depressed participants. In contrast, models using participant responses gather evidence from across the entire interview. Finally, to highlight the magnitude of this bias, we achieve a 0.90 F1 score by intentionally exploiting it, the highest result reported to date on this dataset using only textual information. Our findings underline the need for caution when incorporating interviewers' prompts into models, as they may inadvertently learn to exploit targeted prompts, rather than learning to characterize the language and behavior that are genuinely indicative of the patient's mental health condition.

[71]  arXiv:2404.14497 (cross-list from cs.NI) [pdf, other]
Title: Mapping Wireless Networks into Digital Reality through Joint Vertical and Horizontal Learning
Comments: Accepted by IFIP/IEEE Networking 2024
Subjects: Networking and Internet Architecture (cs.NI); Machine Learning (cs.LG); Signal Processing (eess.SP)

In recent years, the complexity of 5G and beyond wireless networks has escalated, prompting a need for innovative frameworks to facilitate flexible management and efficient deployment. The concept of digital twins (DTs) has emerged as a solution to enable real-time monitoring, predictive configurations, and decision-making processes. While existing works primarily focus on leveraging DTs to optimize wireless networks, a detailed mapping methodology for creating virtual representations of network infrastructure and properties is still lacking. In this context, we introduce VH-Twin, a novel time-series data-driven framework that effectively maps wireless networks into digital reality. VH-Twin distinguishes itself through complementary vertical twinning (V-twinning) and horizontal twinning (H-twinning) stages, followed by a periodic clustering mechanism used to virtualize network regions based on their distinct geological and wireless characteristics. Specifically, V-twinning exploits distributed learning techniques to initialize a global twin model collaboratively from virtualized network clusters. H-twinning, on the other hand, is implemented with an asynchronous mapping scheme that dynamically updates twin models in response to network or environmental changes. Leveraging real-world wireless traffic data within a cellular wireless network, comprehensive experiments are conducted to verify that VH-Twin can effectively construct, deploy, and maintain network DTs. Parametric analysis also offers insights into how to strike a balance between twinning efficiency and model accuracy at scale.

[72]  arXiv:2404.14507 (cross-list from cs.CV) [pdf, other]
Title: Align Your Steps: Optimizing Sampling Schedules in Diffusion Models
Comments: Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Diffusion models (DMs) have established themselves as the state-of-the-art generative modeling approach in the visual domain and beyond. A crucial drawback of DMs is their slow sampling speed, relying on many sequential function evaluations through large neural networks. Sampling from DMs can be seen as solving a differential equation through a discretized set of noise levels known as the sampling schedule. While past works primarily focused on deriving efficient solvers, little attention has been given to finding optimal sampling schedules, and the entire literature relies on hand-crafted heuristics. In this work, for the first time, we propose a general and principled approach to optimizing the sampling schedules of DMs for high-quality outputs, called $\textit{Align Your Steps}$. We leverage methods from stochastic calculus and find optimal schedules specific to different solvers, trained DMs and datasets. We evaluate our novel approach on several image, video as well as 2D toy data synthesis benchmarks, using a variety of different samplers, and observe that our optimized schedules outperform previous hand-crafted schedules in almost all experiments. Our method demonstrates the untapped potential of sampling schedule optimization, especially in the few-step synthesis regime.

[73]  arXiv:2404.14527 (cross-list from cs.DC) [pdf, other]
Title: Mélange: Cost Efficient Large Language Model Serving by Exploiting GPU Heterogeneity
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)

Large language models (LLMs) are increasingly integrated into many online services. However, a major challenge in deploying LLMs is their high cost, due primarily to the use of expensive GPU instances. To address this problem, we find that the significant heterogeneity of GPU types presents an opportunity to increase GPU cost efficiency and reduce deployment costs. The broad and growing market of GPUs creates a diverse option space with varying costs and hardware specifications. Within this space, we show that there is not a linear relationship between GPU cost and performance, and identify three key LLM service characteristics that significantly affect which GPU type is the most cost effective: model request size, request rate, and latency service-level objective (SLO). We then present M\'elange, a framework for navigating the diversity of GPUs and LLM service specifications to derive the most cost-efficient set of GPUs for a given LLM service. We frame the task of GPU selection as a cost-aware bin-packing problem, where GPUs are bins with a capacity and cost, and items are request slices defined by a request size and rate. Upon solution, M\'elange derives the minimal-cost GPU allocation that adheres to a configurable latency SLO. Our evaluations across both real-world and synthetic datasets demonstrate that M\'elange can reduce deployment costs by up to 77% as compared to utilizing only a single GPU type, highlighting the importance of making heterogeneity-aware GPU provisioning decisions for LLM serving. Our source code is publicly available at https://github.com/tyler-griggs/melange-release.

[74]  arXiv:2404.14551 (cross-list from hep-th) [pdf, other]
Title: Learning S-Matrix Phases with Neural Operators
Comments: 36 pages, 8 figures
Subjects: High Energy Physics - Theory (hep-th); Machine Learning (cs.LG)

We use Fourier Neural Operators (FNOs) to study the relation between the modulus and phase of amplitudes in $2\to 2$ elastic scattering at fixed energies. Unlike previous approaches, we do not employ the integral relation imposed by unitarity, but instead train FNOs to discover it from many samples of amplitudes with finite partial wave expansions. When trained only on true samples, the FNO correctly predicts (unique or ambiguous) phases of amplitudes with infinite partial wave expansions. When also trained on false samples, it can rate the quality of its prediction by producing a true/false classifying index. We observe that the value of this index is strongly correlated with the violation of the unitarity constraint for the predicted phase, and present examples where it delineates the boundary between allowed and disallowed profiles of the modulus. Our application of FNOs is unconventional: it involves a simultaneous regression-classification task and emphasizes the role of statistics in ensembles of NOs. We comment on the merits and limitations of the approach and its potential as a new methodology in Theoretical Physics.

[75]  arXiv:2404.14586 (cross-list from cs.IT) [pdf, other]
Title: Latency-Distortion Tradeoffs in Communicating Classification Results over Noisy Channels
Comments: Submitted to IEEE Transactions on Communications
Subjects: Information Theory (cs.IT); Machine Learning (cs.LG); Networking and Internet Architecture (cs.NI)

In this work, the problem of communicating decisions of a classifier over a noisy channel is considered. With machine learning based models being used in variety of time-sensitive applications, transmission of these decisions in a reliable and timely manner is of significant importance. To this end, we study the scenario where a probability vector (representing the decisions of a classifier) at the transmitter, needs to be transmitted over a noisy channel. Assuming that the distortion between the original probability vector and the reconstructed one at the receiver is measured via f-divergence, we study the trade-off between transmission latency and the distortion. We completely analyze this trade-off using uniform, lattice, and sparse lattice-based quantization techniques to encode the probability vector by first characterizing bit budgets for each technique given a requirement on the allowed source distortion. These bounds are then combined with results from finite-blocklength literature to provide a framework for analyzing the effects of both quantization distortion and distortion due to decoding error probability (i.e., channel effects) on the incurred transmission latency. Our results show that there is an interesting interplay between source distortion (i.e., distortion for the probability vector measured via f-divergence) and the subsequent channel encoding/decoding parameters; and indicate that a joint design of these parameters is crucial to navigate the latency-distortion tradeoff. We study the impact of changing different parameters (e.g. number of classes, SNR, source distortion) on the latency-distortion tradeoff and perform experiments on AWGN and fading channels. Our results indicate that sparse lattice-based quantization is the most effective at minimizing latency across various regimes and for sparse, high-dimensional probability vectors (i.e., high number of classes).

[76]  arXiv:2404.14602 (cross-list from eess.SY) [pdf, other]
Title: Adaptive Bayesian Optimization for High-Precision Motion Systems
Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG); Robotics (cs.RO)

Controller tuning and parameter optimization are crucial in system design to improve closed-loop system performance. Bayesian optimization has been established as an efficient model-free controller tuning and adaptation method. However, Bayesian optimization methods are computationally expensive and therefore difficult to use in real-time critical scenarios. In this work, we propose a real-time purely data-driven, model-free approach for adaptive control, by online tuning low-level controller parameters. We base our algorithm on GoOSE, an algorithm for safe and sample-efficient Bayesian optimization, for handling performance and stability criteria. We introduce multiple computational and algorithmic modifications for computational efficiency and parallelization of optimization steps. We further evaluate the algorithm's performance on a real precision-motion system utilized in semiconductor industry applications by modifying the payload and reference stepsize and comparing it to an interpolated constrained optimization-based baseline approach.

[77]  arXiv:2404.14619 (cross-list from cs.CL) [pdf, other]
Title: OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

The reproducibility and transparency of large language models are crucial for advancing open research, ensuring the trustworthiness of results, and enabling investigations into data and model biases, as well as potential risks. To this end, we release OpenELM, a state-of-the-art open language model. OpenELM uses a layer-wise scaling strategy to efficiently allocate parameters within each layer of the transformer model, leading to enhanced accuracy. For example, with a parameter budget of approximately one billion parameters, OpenELM exhibits a 2.36% improvement in accuracy compared to OLMo while requiring $2\times$ fewer pre-training tokens.
Diverging from prior practices that only provide model weights and inference code, and pre-train on private datasets, our release includes the complete framework for training and evaluation of the language model on publicly available datasets, including training logs, multiple checkpoints, and pre-training configurations. We also release code to convert models to MLX library for inference and fine-tuning on Apple devices. This comprehensive release aims to empower and strengthen the open research community, paving the way for future open research endeavors.
Our source code along with pre-trained model weights and training recipes is available at \url{https://github.com/apple/corenet}. Additionally, \model models can be found on HuggingFace at: \url{https://huggingface.co/apple/OpenELM}.

[78]  arXiv:2404.14625 (cross-list from cs.RO) [pdf, other]
Title: Towards Multi-Morphology Controllers with Diversity and Knowledge Distillation
Comments: Accepted at the Genetic and Evolutionary Computation Conference 2024 Evolutionary Machine Learning track as a full paper
Subjects: Robotics (cs.RO); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

Finding controllers that perform well across multiple morphologies is an important milestone for large-scale robotics, in line with recent advances via foundation models in other areas of machine learning. However, the challenges of learning a single controller to control multiple morphologies make the `one robot one task' paradigm dominant in the field. To alleviate these challenges, we present a pipeline that: (1) leverages Quality Diversity algorithms like MAP-Elites to create a dataset of many single-task/single-morphology teacher controllers, then (2) distills those diverse controllers into a single multi-morphology controller that performs well across many different body plans by mimicking the sensory-action patterns of the teacher controllers via supervised learning. The distilled controller scales well with the number of teachers/morphologies and shows emergent properties. It generalizes to unseen morphologies in a zero-shot manner, providing robustness to morphological perturbations and instant damage recovery. Lastly, the distilled controller is also independent of the teacher controllers -- we can distill the teacher's knowledge into any controller model, making our approach synergistic with architectural improvements and existing training algorithms for teacher controllers.

[79]  arXiv:2404.14631 (cross-list from cs.CL) [pdf, other]
Title: Learning Word Embedding with Better Distance Weighting and Window Size Scheduling
Authors: Chaohao Yang
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

Distributed word representation (a.k.a. word embedding) is a key focus in natural language processing (NLP). As a highly successful word embedding model, Word2Vec offers an efficient method for learning distributed word representations on large datasets. However, Word2Vec lacks consideration for distances between center and context words. We propose two novel methods, Learnable Formulated Weights (LFW) and Epoch-based Dynamic Window Size (EDWS), to incorporate distance information into two variants of Word2Vec, the Continuous Bag-of-Words (CBOW) model and the Continuous Skip-gram (Skip-gram) model. For CBOW, LFW uses a formula with learnable parameters that best reflects the relationship of influence and distance between words to calculate distance-related weights for average pooling, providing insights for future NLP text modeling research. For Skip-gram, we improve its dynamic window size strategy to introduce distance information in a more balanced way. Experiments prove the effectiveness of LFW and EDWS in enhancing Word2Vec's performance, surpassing previous state-of-the-art methods.

[80]  arXiv:2404.14651 (cross-list from nlin.AO) [pdf, other]
Title: Forecasting the Forced Van der Pol Equation with Frequent Phase Shifts Using a Reservoir Computer
Subjects: Adaptation and Self-Organizing Systems (nlin.AO); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

A reservoir computer (RC) is a recurrent neural network (RNN) framework that achieves computational efficiency where only readout layer training is required. Additionally, it effectively predicts nonlinear dynamical system tasks and has various applications. RC is effective for forecasting nonautonomous dynamical systems with gradual changes to the external drive amplitude. This study investigates the predictability of nonautonomous dynamical systems with rapid changes to the phase of the external drive. The forced Van der Pol equation was employed for the base model, implementing forecasting tasks with the RC. The study findings suggest that, despite hidden variables, a nonautonomous dynamical system with rapid changes to the phase of the external drive is predictable. Therefore, RC can offer better schedules for individual shift workers.

[81]  arXiv:2404.14653 (cross-list from cs.CV) [pdf, ps, other]
Title: Machine Vision Based Assessment of Fall Color Changes in Apple Trees: Exploring Relationship with Leaf Nitrogen Concentration
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Apple trees being deciduous trees, shed leaves each year which is preceded by the change in color of leaves from green to yellow (also known as senescence) during the fall season. The rate and timing of color change are affected by the number of factors including nitrogen (N) deficiencies. The green color of leaves is highly dependent on the chlorophyll content, which in turn depends on the nitrogen concentration in the leaves. The assessment of the leaf color can give vital information on the nutrient status of the tree. The use of a machine vision based system to capture and quantify these timings and changes in leaf color can be a great tool for that purpose.
\par This study is based on data collected during the fall of 2021 and 2023 at a commercial orchard using a ground-based stereo-vision sensor for five weeks. The point cloud obtained from the sensor was segmented to get just the tree in the foreground. The study involved the segmentation of the trees in a natural background using point cloud data and quantification of the color using a custom-defined metric, \textit{yellowness index}, varying from $-1$ to $+1$ ($-1$ being completely green and $+1$ being completely yellow), which gives the proportion of yellow leaves on a tree. The performance of K-means based algorithm and gradient boosting algorithm were compared for \textit{yellowness index} calculation. The segmentation method proposed in the study was able to estimate the \textit{yellowness index} on the trees with $R^2 = 0.72$. The results showed that the metric was able to capture the gradual color transition from green to yellow over the study duration. It was also observed that the trees with lower nitrogen showed the color transition to yellow earlier than the trees with higher nitrogen. The onset of color transition during both years aligned with the $29^{th}$ week post-full bloom.

[82]  arXiv:2404.14661 (cross-list from cs.CV) [pdf, other]
Title: First Mapping the Canopy Height of Primeval Forests in the Tallest Tree Area of Asia
Subjects: Computer Vision and Pattern Recognition (cs.CV); Earth and Planetary Astrophysics (astro-ph.EP); Machine Learning (cs.LG)

We have developed the world's first canopy height map of the distribution area of world-level giant trees. This mapping is crucial for discovering more individual and community world-level giant trees, and for analyzing and quantifying the effectiveness of biodiversity conservation measures in the Yarlung Tsangpo Grand Canyon (YTGC) National Nature Reserve. We proposed a method to map the canopy height of the primeval forest within the world-level giant tree distribution area by using a spaceborne LiDAR fusion satellite imagery (Global Ecosystem Dynamics Investigation (GEDI), ICESat-2, and Sentinel-2) driven deep learning modeling. And we customized a pyramid receptive fields depth separable CNN (PRFXception). PRFXception, a CNN architecture specifically customized for mapping primeval forest canopy height to infer the canopy height at the footprint level of GEDI and ICESat-2 from Sentinel-2 optical imagery with a 10-meter spatial resolution. We conducted a field survey of 227 permanent plots using a stratified sampling method and measured several giant trees using UAV-LS. The predicted canopy height was compared with ICESat-2 and GEDI validation data (RMSE =7.56 m, MAE=6.07 m, ME=-0.98 m, R^2=0.58 m), UAV-LS point clouds (RMSE =5.75 m, MAE =3.72 m, ME = 0.82 m, R^2= 0.65 m), and ground survey data (RMSE = 6.75 m, MAE = 5.56 m, ME= 2.14 m, R^2=0.60 m). We mapped the potential distribution map of world-level giant trees and discovered two previously undetected giant tree communities with an 89% probability of having trees 80-100 m tall, potentially taller than Asia's tallest tree. This paper provides scientific evidence confirming southeastern Tibet--northwestern Yunnan as the fourth global distribution center of world-level giant trees initiatives and promoting the inclusion of the YTGC giant tree distribution area within the scope of China's national park conservation.

[83]  arXiv:2404.14680 (cross-list from cs.CL) [pdf, other]
Title: Automated Multi-Language to English Machine Translation Using Generative Pre-Trained Transformers
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

The task of accurate and efficient language translation is an extremely important information processing task. Machine learning enabled and automated translation that is accurate and fast is often a large topic of interest in the machine learning and data science communities. In this study, we examine using local Generative Pretrained Transformer (GPT) models to perform automated zero shot black-box, sentence wise, multi-natural-language translation into English text. We benchmark 16 different open-source GPT models, with no custom fine-tuning, from the Huggingface LLM repository for translating 50 different non-English languages into English using translated TED Talk transcripts as the reference dataset. These GPT model inference calls are performed strictly locally, on single A100 Nvidia GPUs. Benchmark metrics that are reported are language translation accuracy, using BLEU, GLEU, METEOR, and chrF text overlap measures, and wall-clock time for each sentence translation. The best overall performing GPT model for translating into English text for the BLEU metric is ReMM-v2-L2-13B with a mean score across all tested languages of $0.152$, for the GLEU metric is ReMM-v2-L2-13B with a mean score across all tested languages of $0.256$, for the chrF metric is Llama2-chat-AYT-13B with a mean score across all tested languages of $0.448$, and for the METEOR metric is ReMM-v2-L2-13B with a mean score across all tested languages of $0.438$.

[84]  arXiv:2404.14700 (cross-list from eess.AS) [pdf, other]
Title: FlashSpeech: Efficient Zero-Shot Speech Synthesis
Comments: Efficient zero-shot speech synthesis
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)

Recent progress in large-scale zero-shot speech synthesis has been significantly advanced by language models and diffusion models. However, the generation process of both methods is slow and computationally intensive. Efficient speech synthesis using a lower computing budget to achieve quality on par with previous work remains a significant challenge. In this paper, we present FlashSpeech, a large-scale zero-shot speech synthesis system with approximately 5\% of the inference time compared with previous work. FlashSpeech is built on the latent consistency model and applies a novel adversarial consistency training approach that can train from scratch without the need for a pre-trained diffusion model as the teacher. Furthermore, a new prosody generator module enhances the diversity of prosody, making the rhythm of the speech sound more natural. The generation processes of FlashSpeech can be achieved efficiently with one or two sampling steps while maintaining high audio quality and high similarity to the audio prompt for zero-shot speech generation. Our experimental results demonstrate the superior performance of FlashSpeech. Notably, FlashSpeech can be about 20 times faster than other zero-shot speech synthesis systems while maintaining comparable performance in terms of voice quality and similarity. Furthermore, FlashSpeech demonstrates its versatility by efficiently performing tasks like voice conversion, speech editing, and diverse speech sampling. Audio samples can be found in https://flashspeech.github.io/.

[85]  arXiv:2404.14743 (cross-list from stat.ML) [pdf, other]
Title: Gradient Guidance for Diffusion Models: An Optimization Perspective
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

Diffusion models have demonstrated empirical successes in various applications and can be adapted to task-specific needs via guidance. This paper introduces a form of gradient guidance for adapting or fine-tuning diffusion models towards user-specified optimization objectives. We study the theoretic aspects of a guided score-based sampling process, linking the gradient-guided diffusion model to first-order optimization. We show that adding gradient guidance to the sampling process of a pre-trained diffusion model is essentially equivalent to solving a regularized optimization problem, where the regularization term acts as a prior determined by the pre-training data. Diffusion models are able to learn data's latent subspace, however, explicitly adding the gradient of an external objective function to the sample process would jeopardize the structure in generated samples. To remedy this issue, we consider a modified form of gradient guidance based on a forward prediction loss, which leverages the pre-trained score function to preserve the latent structure in generated samples. We further consider an iteratively fine-tuned version of gradient-guided diffusion where one can query gradients at newly generated data points and update the score network using new samples. This process mimics a first-order optimization iteration in expectation, for which we proved O(1/K) convergence rate to the global optimum when the objective function is concave.

[86]  arXiv:2404.14758 (cross-list from math.OC) [pdf, other]
Title: Second-order Information Promotes Mini-Batch Robustness in Variance-Reduced Gradients
Subjects: Optimization and Control (math.OC); Machine Learning (cs.LG); Machine Learning (stat.ML)

We show that, for finite-sum minimization problems, incorporating partial second-order information of the objective function can dramatically improve the robustness to mini-batch size of variance-reduced stochastic gradient methods, making them more scalable while retaining their benefits over traditional Newton-type approaches. We demonstrate this phenomenon on a prototypical stochastic second-order algorithm, called Mini-Batch Stochastic Variance-Reduced Newton ($\texttt{Mb-SVRN}$), which combines variance-reduced gradient estimates with access to an approximate Hessian oracle. In particular, we show that when the data size $n$ is sufficiently large, i.e., $n\gg \alpha^2\kappa$, where $\kappa$ is the condition number and $\alpha$ is the Hessian approximation factor, then $\texttt{Mb-SVRN}$ achieves a fast linear convergence rate that is independent of the gradient mini-batch size $b$, as long $b$ is in the range between $1$ and $b_{\max}=O(n/(\alpha \log n))$. Only after increasing the mini-batch size past this critical point $b_{\max}$, the method begins to transition into a standard Newton-type algorithm which is much more sensitive to the Hessian approximation quality. We demonstrate this phenomenon empirically on benchmark optimization tasks showing that, after tuning the step size, the convergence rate of $\texttt{Mb-SVRN}$ remains fast for a wide range of mini-batch sizes, and the dependence of the phase transition point $b_{\max}$ on the Hessian approximation factor $\alpha$ aligns with our theoretical predictions.

[87]  arXiv:2404.14760 (cross-list from cs.CL) [pdf, other]
Title: Retrieval Augmented Generation for Domain-specific Question Answering
Comments: AAAI 2024 (Association for the Advancement of Artificial Intelligence) Scientific Document Understanding Workshop
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Machine Learning (cs.LG)

Question answering (QA) has become an important application in the advanced development of large language models. General pre-trained large language models for question-answering are not trained to properly understand the knowledge or terminology for a specific domain, such as finance, healthcare, education, and customer service for a product. To better cater to domain-specific understanding, we build an in-house question-answering system for Adobe products. We propose a novel framework to compile a large question-answer database and develop the approach for retrieval-aware finetuning of a Large Language model. We showcase that fine-tuning the retriever leads to major improvements in the final generation. Our overall approach reduces hallucinations during generation while keeping in context the latest retrieval information for contextual grounding.

[88]  arXiv:2404.14777 (cross-list from cs.CL) [pdf, other]
Title: CT-Agent: Clinical Trial Multi-Agent with Large Language Model-based Reasoning
Authors: Ling Yue, Tianfan Fu
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

Large Language Models (LLMs) and multi-agent systems have shown impressive capabilities in natural language tasks but face challenges in clinical trial applications, primarily due to limited access to external knowledge. Recognizing the potential of advanced clinical trial tools that aggregate and predict based on the latest medical data, we propose an integrated solution to enhance their accessibility and utility. We introduce Clinical Agent System (CT-Agent), a Clinical multi-agent system designed for clinical trial tasks, leveraging GPT-4, multi-agent architectures, LEAST-TO-MOST, and ReAct reasoning technology. This integration not only boosts LLM performance in clinical contexts but also introduces novel functionalities. Our system autonomously manages the entire clinical trial process, demonstrating significant efficiency improvements in our evaluations, which include both computational benchmarks and expert feedback.

[89]  arXiv:2404.14786 (cross-list from cs.AI) [pdf, other]
Title: LLM-Enhanced Causal Discovery in Temporal Domain from Interventional Data
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Methodology (stat.ME)

In the field of Artificial Intelligence for Information Technology Operations, causal discovery is pivotal for operation and maintenance of graph construction, facilitating downstream industrial tasks such as root cause analysis. Temporal causal discovery, as an emerging method, aims to identify temporal causal relationships between variables directly from observations by utilizing interventional data. However, existing methods mainly focus on synthetic datasets with heavy reliance on intervention targets and ignore the textual information hidden in real-world systems, failing to conduct causal discovery for real industrial scenarios. To tackle this problem, in this paper we propose to investigate temporal causal discovery in industrial scenarios, which faces two critical challenges: 1) how to discover causal relationships without the interventional targets that are costly to obtain in practice, and 2) how to discover causal relations via leveraging the textual information in systems which can be complex yet abundant in industrial contexts. To address these challenges, we propose the RealTCD framework, which is able to leverage domain knowledge to discover temporal causal relationships without interventional targets. Specifically, we first develop a score-based temporal causal discovery method capable of discovering causal relations for root cause analysis without relying on interventional targets through strategic masking and regularization. Furthermore, by employing Large Language Models (LLMs) to handle texts and integrate domain knowledge, we introduce LLM-guided meta-initialization to extract the meta-knowledge from textual information hidden in systems to boost the quality of discovery. We conduct extensive experiments on simulation and real-world datasets to show the superiority of our proposed RealTCD framework over existing baselines in discovering temporal causal structures.

[90]  arXiv:2404.14795 (cross-list from cs.CL) [pdf, other]
Title: Talk Too Much: Poisoning Large Language Models under Token Limit
Subjects: Computation and Language (cs.CL); Cryptography and Security (cs.CR); Machine Learning (cs.LG)

Mainstream poisoning attacks on large language models (LLMs) typically set a fixed trigger in the input instance and specific responses for triggered queries. However, the fixed trigger setting (e.g., unusual words) may be easily detected by human detection, limiting the effectiveness and practicality in real-world scenarios. To enhance the stealthiness of the trigger, we present a poisoning attack against LLMs that is triggered by a generation/output condition-token limitation, which is a commonly adopted strategy by users for reducing costs. The poisoned model performs normally for output without token limitation, while becomes harmful for output with limited tokens. To achieve this objective, we introduce BrieFool, an efficient attack framework. It leverages the characteristics of generation limitation by efficient instruction sampling and poisoning data generation, thereby influencing the behavior of LLMs under target conditions. Our experiments demonstrate that BrieFool is effective across safety domains and knowledge domains. For instance, with only 20 generated poisoning examples against GPT-3.5-turbo, BrieFool achieves a 100% Attack Success Rate (ASR) and a 9.28/10 average Harmfulness Score (HS) under token limitation conditions while maintaining the benign performance.

[91]  arXiv:2404.14811 (cross-list from eess.SP) [pdf, other]
Title: FLARE: A New Federated Learning Framework with Adjustable Learning Rates over Resource-Constrained Wireless Networks
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG)

Wireless federated learning (WFL) suffers from heterogeneity prevailing in the data distributions, computing powers, and channel conditions of participating devices. This paper presents a new Federated Learning with Adjusted leaRning ratE (FLARE) framework to mitigate the impact of the heterogeneity. The key idea is to allow the participating devices to adjust their individual learning rates and local training iterations, adapting to their instantaneous computing powers. The convergence upper bound of FLARE is established rigorously under a general setting with non-convex models in the presence of non-i.i.d. datasets and imbalanced computing powers. By minimizing the upper bound, we further optimize the scheduling of FLARE to exploit the channel heterogeneity. A nested problem structure is revealed to facilitate iteratively allocating the bandwidth with binary search and selecting devices with a new greedy method. A linear problem structure is also identified and a low-complexity linear programming scheduling policy is designed when training models have large Lipschitz constants. Experiments demonstrate that FLARE consistently outperforms the baselines in test accuracy, and converges much faster with the proposed scheduling policy.

[92]  arXiv:2404.14836 (cross-list from eess.SY) [src]
Title: Probabilistic forecasting of power system imbalance using neural network-based ensembles
Comments: One of the co-authors objected with having it on Arxiv already
Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG)

Keeping the balance between electricity generation and consumption is becoming increasingly challenging and costly, mainly due to the rising share of renewables, electric vehicles and heat pumps and electrification of industrial processes. Accurate imbalance forecasts, along with reliable uncertainty estimations, enable transmission system operators (TSOs) to dispatch appropriate reserve volumes, reducing balancing costs. Further, market parties can use these probabilistic forecasts to design strategies that exploit asset flexibility to help balance the grid, generating revenue with known risks. Despite its importance, literature regarding system imbalance (SI) forecasting is limited. Further, existing methods do not focus on situations with high imbalance magnitude, which are crucial to forecast accurately for both TSOs and market parties. Hence, we propose an ensemble of C-VSNs, which are our adaptation of variable selection networks (VSNs). Each minute, our model predicts the imbalance of the current and upcoming two quarter-hours, along with uncertainty estimations on these forecasts. We evaluate our approach by forecasting the imbalance of Belgium, where high imbalance magnitude is defined as $|$SI$| > 500\,$MW (occurs 1.3% of the time in Belgium). For high imbalance magnitude situations, our model outperforms the state-of-the-art by 23.4% (in terms of continuous ranked probability score (CRPS), which evaluates probabilistic forecasts), while also attaining a 6.5% improvement in overall CRPS. Similar improvements are achieved in terms of root-mean-squared error. Additionally, we developed a fine-tuning methodology to effectively include new inputs with limited history in our model. This work was performed in collaboration with Elia (the Belgian TSO) to further improve their imbalance forecasts, demonstrating the relevance of our work.

[93]  arXiv:2404.14850 (cross-list from cs.CL) [pdf, other]
Title: Simple, Efficient and Scalable Structure-aware Adapter Boosts Protein Language Models
Comments: 30 pages, 4 figures, 8 tables
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Biomolecules (q-bio.BM)

Fine-tuning Pre-trained protein language models (PLMs) has emerged as a prominent strategy for enhancing downstream prediction tasks, often outperforming traditional supervised learning approaches. As a widely applied powerful technique in natural language processing, employing Parameter-Efficient Fine-Tuning techniques could potentially enhance the performance of PLMs. However, the direct transfer to life science tasks is non-trivial due to the different training strategies and data forms. To address this gap, we introduce SES-Adapter, a simple, efficient, and scalable adapter method for enhancing the representation learning of PLMs. SES-Adapter incorporates PLM embeddings with structural sequence embeddings to create structure-aware representations. We show that the proposed method is compatible with different PLM architectures and across diverse tasks. Extensive evaluations are conducted on 2 types of folding structures with notable quality differences, 9 state-of-the-art baselines, and 9 benchmark datasets across distinct downstream tasks. Results show that compared to vanilla PLMs, SES-Adapter improves downstream task performance by a maximum of 11% and an average of 3%, with significantly accelerated training speed by a maximum of 1034% and an average of 362%, the convergence rate is also improved by approximately 2 times. Moreover, positive optimization is observed even with low-quality predicted structures. The source code for SES-Adapter is available at https://github.com/tyang816/SES-Adapter.

[94]  arXiv:2404.14869 (cross-list from cs.HC) [pdf, other]
Title: EEGEncoder: Advancing BCI with Transformer-Based Motor Imagery Classification
Authors: Wangdan Liao
Subjects: Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)

Brain-computer interfaces (BCIs) harness electroencephalographic signals for direct neural control of devices, offering a significant benefit for individuals with motor impairments. Traditional machine learning methods for EEG-based motor imagery (MI) classification encounter challenges such as manual feature extraction and susceptibility to noise. This paper introduces EEGEncoder, a deep learning framework that employs transformer models to surmount these limitations. Our innovative multi-scale fusion architecture captures both immediate and extended temporal features, thereby enhancing MI task classification precision. EEGEncoder's key innovations include the inaugural application of transformers in MI-EEG signal classification, a mixup data augmentation strategy for bolstered generalization, and a multi-task learning approach for refined predictive accuracy. When tested on the BCI Competition IV dataset 2a, our model established a new benchmark with its state-of-the-art performance. EEGEncoder signifies a substantial advancement in BCI technology, offering a robust, efficient, and effective tool for transforming thought into action, with the potential to significantly enhance the quality of life for those dependent on BCIs.

[95]  arXiv:2404.14873 (cross-list from stat.ML) [pdf, ps, other]
Title: Estimating the Distribution of Parameters in Differential Equations with Repeated Cross-Sectional Data
Comments: 16 pages, 10 figures
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Numerical Analysis (math.NA)

Differential equations are pivotal in modeling and understanding the dynamics of various systems, offering insights into their future states through parameter estimation fitted to time series data. In fields such as economy, politics, and biology, the observation data points in the time series are often independently obtained (i.e., Repeated Cross-Sectional (RCS) data). With RCS data, we found that traditional methods for parameter estimation in differential equations, such as using mean values of time trajectories or Gaussian Process-based trajectory generation, have limitations in estimating the shape of parameter distributions, often leading to a significant loss of data information. To address this issue, we introduce a novel method, Estimation of Parameter Distribution (EPD), providing accurate distribution of parameters without loss of data information. EPD operates in three main steps: generating synthetic time trajectories by randomly selecting observed values at each time point, estimating parameters of a differential equation that minimize the discrepancy between these trajectories and the true solution of the equation, and selecting the parameters depending on the scale of discrepancy. We then evaluated the performance of EPD across several models, including exponential growth, logistic population models, and target cell-limited models with delayed virus production, demonstrating its superiority in capturing the shape of parameter distributions. Furthermore, we applied EPD to real-world datasets, capturing various shapes of parameter distributions rather than a normal distribution. These results effectively address the heterogeneity within systems, marking a substantial progression in accurately modeling systems using RCS data.

[96]  arXiv:2404.14901 (cross-list from cs.SE) [pdf, other]
Title: Beyond Code Generation: An Observational Study of ChatGPT Usage in Software Engineering Practice
Comments: Accepted at the ACM International Conference on the Foundations of Software Engineering (FSE) 2024
Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)

Large Language Models (LLMs) are frequently discussed in academia and the general public as support tools for virtually any use case that relies on the production of text, including software engineering. Currently there is much debate, but little empirical evidence, regarding the practical usefulness of LLM-based tools such as ChatGPT for engineers in industry. We conduct an observational study of 24 professional software engineers who have been using ChatGPT over a period of one week in their jobs, and qualitatively analyse their dialogues with the chatbot as well as their overall experience (as captured by an exit survey). We find that, rather than expecting ChatGPT to generate ready-to-use software artifacts (e.g., code), practitioners more often use ChatGPT to receive guidance on how to solve their tasks or learn about a topic in more abstract terms. We also propose a theoretical framework for how (i) purpose of the interaction, (ii) internal factors (e.g., the user's personality), and (iii) external factors (e.g., company policy) together shape the experience (in terms of perceived usefulness and trust). We envision that our framework can be used by future research to further the academic discussion on LLM usage by software engineering practitioners, and to serve as a reference point for the design of future empirical LLM research in this domain.

[97]  arXiv:2404.14906 (cross-list from cs.CV) [pdf, other]
Title: Driver Activity Classification Using Generalizable Representations from Vision-Language Models
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Driver activity classification is crucial for ensuring road safety, with applications ranging from driver assistance systems to autonomous vehicle control transitions. In this paper, we present a novel approach leveraging generalizable representations from vision-language models for driver activity classification. Our method employs a Semantic Representation Late Fusion Neural Network (SRLF-Net) to process synchronized video frames from multiple perspectives. Each frame is encoded using a pretrained vision-language encoder, and the resulting embeddings are fused to generate class probability predictions. By leveraging contrastively-learned vision-language representations, our approach achieves robust performance across diverse driver activities. We evaluate our method on the Naturalistic Driving Action Recognition Dataset, demonstrating strong accuracy across many classes. Our results suggest that vision-language representations offer a promising avenue for driver monitoring systems, providing both accuracy and interpretability through natural language descriptors.

[98]  arXiv:2404.14913 (cross-list from eess.AS) [pdf, other]
Title: Additive Margin in Contrastive Self-Supervised Frameworks to Learn Discriminative Speaker Representations
Comments: accepted at Odyssey 2024: The Speaker and Language Recognition Workshop. arXiv admin note: text overlap with arXiv:2306.03664
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)

Self-Supervised Learning (SSL) frameworks became the standard for learning robust class representations by benefiting from large unlabeled datasets. For Speaker Verification (SV), most SSL systems rely on contrastive-based loss functions. We explore different ways to improve the performance of these techniques by revisiting the NT-Xent contrastive loss. Our main contribution is the definition of the NT-Xent-AM loss and the study of the importance of Additive Margin (AM) in SimCLR and MoCo SSL methods to further separate positive from negative pairs. Despite class collisions, we show that AM enhances the compactness of same-speaker embeddings and reduces the number of false negatives and false positives on SV. Additionally, we demonstrate the effectiveness of the symmetric contrastive loss, which provides more supervision for the SSL task. Implementing these two modifications to SimCLR improves performance and results in 7.85% EER on VoxCeleb1-O, outperforming other equivalent methods.

[99]  arXiv:2404.14942 (cross-list from cs.CR) [pdf, other]
Title: Manipulating Recommender Systems: A Survey of Poisoning Attacks and Countermeasures
Subjects: Cryptography and Security (cs.CR); Information Retrieval (cs.IR); Machine Learning (cs.LG)

Recommender systems have become an integral part of online services to help users locate specific information in a sea of data. However, existing studies show that some recommender systems are vulnerable to poisoning attacks, particularly those that involve learning schemes. A poisoning attack is where an adversary injects carefully crafted data into the process of training a model, with the goal of manipulating the system's final recommendations. Based on recent advancements in artificial intelligence, such attacks have gained importance recently. While numerous countermeasures to poisoning attacks have been developed, they have not yet been systematically linked to the properties of the attacks. Consequently, assessing the respective risks and potential success of mitigation strategies is difficult, if not impossible. This survey aims to fill this gap by primarily focusing on poisoning attacks and their countermeasures. This is in contrast to prior surveys that mainly focus on attacks and their detection methods. Through an exhaustive literature review, we provide a novel taxonomy for poisoning attacks, formalise its dimensions, and accordingly organise 30+ attacks described in the literature. Further, we review 40+ countermeasures to detect and/or prevent poisoning attacks, evaluating their effectiveness against specific types of attacks. This comprehensive survey should serve as a point of reference for protecting recommender systems against poisoning attacks. The article concludes with a discussion on open issues in the field and impactful directions for future research. A rich repository of resources associated with poisoning attacks is available at https://github.com/tamlhp/awesome-recsys-poisoning.

[100]  arXiv:2404.14943 (cross-list from cs.CL) [pdf, other]
Title: Does It Make Sense to Explain a Black Box With Another Black Box?
Comments: This article was originally published in French at the Journal TAL. VOL 64 n{\deg}3/2023. arXiv admin note: substantial text overlap with arXiv:2402.10888
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

Although counterfactual explanations are a popular approach to explain ML black-box classifiers, they are less widespread in NLP. Most methods find those explanations by iteratively perturbing the target document until it is classified differently by the black box. We identify two main families of counterfactual explanation methods in the literature, namely, (a) \emph{transparent} methods that perturb the target by adding, removing, or replacing words, and (b) \emph{opaque} approaches that project the target document into a latent, non-interpretable space where the perturbation is carried out subsequently. This article offers a comparative study of the performance of these two families of methods on three classical NLP tasks. Our empirical evidence shows that opaque approaches can be an overkill for downstream applications such as fake news detection or sentiment analysis since they add an additional level of complexity with no significant performance gain. These observations motivate our discussion, which raises the question of whether it makes sense to explain a black box using another black box.

[101]  arXiv:2404.14966 (cross-list from cs.CV) [pdf, other]
Title: Mamba3D: Enhancing Local Features for 3D Point Cloud Analysis via State Space Model
Comments: 10 pages, 4 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Existing Transformer-based models for point cloud analysis suffer from quadratic complexity, leading to compromised point cloud resolution and information loss. In contrast, the newly proposed Mamba model, based on state space models (SSM), outperforms Transformer in multiple areas with only linear complexity. However, the straightforward adoption of Mamba does not achieve satisfactory performance on point cloud tasks. In this work, we present Mamba3D, a state space model tailored for point cloud learning to enhance local feature extraction, achieving superior performance, high efficiency, and scalability potential. Specifically, we propose a simple yet effective Local Norm Pooling (LNP) block to extract local geometric features. Additionally, to obtain better global features, we introduce a bidirectional SSM (bi-SSM) with both a token forward SSM and a novel backward SSM that operates on the feature channel. Extensive experimental results show that Mamba3D surpasses Transformer-based counterparts and concurrent works in multiple tasks, with or without pre-training. Notably, Mamba3D achieves multiple SoTA, including an overall accuracy of 92.6% (train from scratch) on the ScanObjectNN and 95.1% (with single-modal pre-training) on the ModelNet40 classification task, with only linear complexity.

[102]  arXiv:2404.14994 (cross-list from cs.CL) [pdf, other]
Title: Transformers Can Represent $n$-gram Language Models
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computational Complexity (cs.CC); Formal Languages and Automata Theory (cs.FL); Machine Learning (cs.LG)

Plenty of existing work has analyzed the abilities of the transformer architecture by describing its representational capacity with formal models of computation. However, the focus so far has been on analyzing the architecture in terms of language \emph{acceptance}. We contend that this is an ill-suited problem in the study of \emph{language models} (LMs), which are definitionally \emph{probability distributions} over strings. In this paper, we focus on the relationship between transformer LMs and $n$-gram LMs, a simple and historically relevant class of language models. We show that transformer LMs using the hard or sparse attention mechanisms can exactly represent any $n$-gram LM, giving us a concrete lower bound on their probabilistic representational capacity. This provides a first step towards understanding the mechanisms that transformer LMs can use to represent probability distributions over strings.

[103]  arXiv:2404.14999 (cross-list from cs.DB) [pdf, other]
Title: A Unified Replay-based Continuous Learning Framework for Spatio-Temporal Prediction on Streaming Data
Comments: Accepted by ICDE 2024
Subjects: Databases (cs.DB); Machine Learning (cs.LG)

The widespread deployment of wireless and mobile devices results in a proliferation of spatio-temporal data that is used in applications, e.g., traffic prediction, human mobility mining, and air quality prediction, where spatio-temporal prediction is often essential to enable safety, predictability, or reliability. Many recent proposals that target deep learning for spatio-temporal prediction suffer from so-called catastrophic forgetting, where previously learned knowledge is entirely forgotten when new data arrives. Such proposals may experience deteriorating prediction performance when applied in settings where data streams into the system. To enable spatio-temporal prediction on streaming data, we propose a unified replay-based continuous learning framework. The framework includes a replay buffer of previously learned samples that are fused with training data using a spatio-temporal mixup mechanism in order to preserve historical knowledge effectively, thus avoiding catastrophic forgetting. To enable holistic representation preservation, the framework also integrates a general spatio-temporal autoencoder with a carefully designed spatio-temporal simple siamese (STSimSiam) network that aims to ensure prediction accuracy and avoid holistic feature loss by means of mutual information maximization. The framework further encompasses five spatio-temporal data augmentation methods to enhance the performance of STSimSiam. Extensive experiments on real data offer insight into the effectiveness of the proposed framework.

[104]  arXiv:2404.15024 (cross-list from cs.CV) [pdf, other]
Title: A Learning Paradigm for Interpretable Gradients
Comments: VISAPP 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

This paper studies interpretability of convolutional networks by means of saliency maps. Most approaches based on Class Activation Maps (CAM) combine information from fully connected layers and gradient through variants of backpropagation. However, it is well understood that gradients are noisy and alternatives like guided backpropagation have been proposed to obtain better visualization at inference. In this work, we present a novel training approach to improve the quality of gradients for interpretability. In particular, we introduce a regularization loss such that the gradient with respect to the input image obtained by standard backpropagation is similar to the gradient obtained by guided backpropagation. We find that the resulting gradient is qualitatively less noisy and improves quantitatively the interpretability properties of different networks, using several interpretability methods.

[105]  arXiv:2404.15045 (cross-list from cs.CL) [pdf, other]
Title: Multi-Head Mixture-of-Experts
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Sparse Mixtures of Experts (SMoE) scales model capacity without significant increases in training and inference costs, but exhibits the following two issues: (1) Low expert activation, where only a small subset of experts are activated for optimization. (2) Lacking fine-grained analytical capabilities for multiple semantic concepts within individual tokens. We propose Multi-Head Mixture-of-Experts (MH-MoE), which employs a multi-head mechanism to split each token into multiple sub-tokens. These sub-tokens are then assigned to and processed by a diverse set of experts in parallel, and seamlessly reintegrated into the original token form. The multi-head mechanism enables the model to collectively attend to information from various representation spaces within different experts, while significantly enhances expert activation, thus deepens context understanding and alleviate overfitting. Moreover, our MH-MoE is straightforward to implement and decouples from other SMoE optimization methods, making it easy to integrate with other SMoE models for enhanced performance. Extensive experimental results across three tasks: English-focused language modeling, Multi-lingual language modeling and Masked multi-modality modeling tasks, demonstrate the effectiveness of MH-MoE.

[106]  arXiv:2404.15081 (cross-list from cs.CV) [pdf, other]
Title: Perturbing Attention Gives You More Bang for the Buck: Subtle Imaging Perturbations That Efficiently Fool Customized Diffusion Models
Comments: Published at CVPR 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV); Cryptography and Security (cs.CR); Machine Learning (cs.LG)

Diffusion models (DMs) embark a new era of generative modeling and offer more opportunities for efficient generating high-quality and realistic data samples. However, their widespread use has also brought forth new challenges in model security, which motivates the creation of more effective adversarial attackers on DMs to understand its vulnerability. We propose CAAT, a simple but generic and efficient approach that does not require costly training to effectively fool latent diffusion models (LDMs). The approach is based on the observation that cross-attention layers exhibits higher sensitivity to gradient change, allowing for leveraging subtle perturbations on published images to significantly corrupt the generated images. We show that a subtle perturbation on an image can significantly impact the cross-attention layers, thus changing the mapping between text and image during the fine-tuning of customized diffusion models. Extensive experiments demonstrate that CAAT is compatible with diverse diffusion models and outperforms baseline attack methods in a more effective (more noise) and efficient (twice as fast as Anti-DreamBooth and Mist) manner.

[107]  arXiv:2404.15096 (cross-list from cs.RO) [pdf, other]
Title: Impedance Matching: Enabling an RL-Based Running Jump in a Quadruped Robot
Comments: Accepted by Ubiquitous Robots 2024
Subjects: Robotics (cs.RO); Machine Learning (cs.LG)

Replicating the remarkable athleticism seen in animals has long been a challenge in robotics control. Although Reinforcement Learning (RL) has demonstrated significant progress in dynamic legged locomotion control, the substantial sim-to-real gap often hinders the real-world demonstration of truly dynamic movements. We propose a new framework to mitigate this gap through frequency-domain analysis-based impedance matching between simulated and real robots. Our framework offers a structured guideline for parameter selection and the range for dynamics randomization in simulation, thus facilitating a safe sim-to-real transfer. The learned policy using our framework enabled jumps across distances of 55 cm and heights of 38 cm. The results are, to the best of our knowledge, one of the highest and longest running jumps demonstrated by an RL-based control policy in a real quadruped robot. Note that the achieved jumping height is approximately 85% of that obtained from a state-of-the-art trajectory optimization method, which can be seen as the physical limit for the given robot hardware. In addition, our control policy accomplished stable walking at speeds up to 2 m/s in the forward and backward directions, and 1 m/s in the sideway direction.

[108]  arXiv:2404.15098 (cross-list from eess.SY) [pdf, other]
Title: Uncertainty Quantification of Data-Driven Output Predictors in the Output Error Setting
Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG)

We revisit the problem of predicting the output of an LTI system directly using offline input-output data (and without the use of a parametric model) in the behavioral setting. Existing works calculate the output predictions by projecting the recent samples of the input and output signals onto the column span of a Hankel matrix consisting of the offline input-output data. However, if the offline data is corrupted by noise, the output prediction is no longer exact. While some prior works propose mitigating noisy data through matrix low-ranking approximation heuristics, such as truncated singular value decomposition, the ensuing prediction accuracy remains unquantified. This paper fills these gaps by introducing two upper bounds on the prediction error under the condition that the noise is sufficiently small relative to the offline data's magnitude. The first bound pertains to prediction using the raw offline data directly, while the second one applies to the case of low-ranking approximation heuristic. Notably, the bounds do not require the ground truth about the system output, relying solely on noisy measurements with a known noise level and system order. Extensive numerical simulations show that both bounds decrease monotonically (and linearly) as a function of the noise level. Furthermore, our results demonstrate that applying the de-noising heuristic in the output error setup does not generally lead to a better prediction accuracy as compared to using raw data directly, nor a smaller upper bound on the prediction error. However, it allows for a more general upper bound, as the first upper bound requires a specific condition on the partitioning of the Hankel matrix.

[109]  arXiv:2404.15149 (cross-list from cs.CL) [pdf, other]
Title: Bias patterns in the application of LLMs for clinical decision support: A comprehensive study
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

Large Language Models (LLMs) have emerged as powerful candidates to inform clinical decision-making processes. While these models play an increasingly prominent role in shaping the digital landscape, two growing concerns emerge in healthcare applications: 1) to what extent do LLMs exhibit social bias based on patients' protected attributes (like race), and 2) how do design choices (like architecture design and prompting strategies) influence the observed biases? To answer these questions rigorously, we evaluated eight popular LLMs across three question-answering (QA) datasets using clinical vignettes (patient descriptions) standardized for bias evaluations. We employ red-teaming strategies to analyze how demographics affect LLM outputs, comparing both general-purpose and clinically-trained models. Our extensive experiments reveal various disparities (some significant) across protected groups. We also observe several counter-intuitive patterns such as larger models not being necessarily less biased and fined-tuned models on medical data not being necessarily better than the general-purpose models. Furthermore, our study demonstrates the impact of prompt design on bias patterns and shows that specific phrasing can influence bias patterns and reflection-type approaches (like Chain of Thought) can reduce biased outcomes effectively. Consistent with prior studies, we call on additional evaluations, scrutiny, and enhancement of LLMs used in clinical decision support applications.

[110]  arXiv:2404.15155 (cross-list from cs.CL) [pdf, other]
Title: Adaptive Collaboration Strategy for LLMs in Medical Decision Making
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Foundation models have become invaluable in advancing the medical field. Despite their promise, the strategic deployment of LLMs for effective utility in complex medical tasks remains an open question. Our novel framework, Medical Decision-making Agents (MDAgents) aims to address this gap by automatically assigning the effective collaboration structure for LLMs. Assigned solo or group collaboration structure is tailored to the complexity of the medical task at hand, emulating real-world medical decision making processes. We evaluate our framework and baseline methods with state-of-the-art LLMs across a suite of challenging medical benchmarks: MedQA, MedMCQA, PubMedQA, DDXPlus, PMC-VQA, Path-VQA, and MedVidQA, achieving the best performance in 5 out of 7 benchmarks that require an understanding of multi-modal medical reasoning. Ablation studies reveal that MDAgents excels in adapting the number of collaborating agents to optimize efficiency and accuracy, showcasing its robustness in diverse scenarios. We also explore the dynamics of group consensus, offering insights into how collaborative agents could behave in complex clinical team dynamics. Our code can be found at https://github.com/mitmedialab/MDAgents.

[111]  arXiv:2404.15168 (cross-list from eess.AS) [pdf, other]
Title: Artificial Neural Networks to Recognize Speakers Division from Continuous Bengali Speech
Subjects: Audio and Speech Processing (eess.AS); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Sound (cs.SD)

Voice based applications are ruling over the era of automation because speech has a lot of factors that determine a speakers information as well as speech. Modern Automatic Speech Recognition (ASR) is a blessing in the field of Human-Computer Interaction (HCI) for efficient communication among humans and devices using Artificial Intelligence technology. Speech is one of the easiest mediums of communication because it has a lot of identical features for different speakers. Nowadays it is possible to determine speakers and their identity using their speech in terms of speaker recognition. In this paper, we presented a method that will provide a speakers geographical identity in a certain region using continuous Bengali speech. We consider eight different divisions of Bangladesh as the geographical region. We applied the Mel Frequency Cepstral Coefficient (MFCC) and Delta features on an Artificial Neural Network to classify speakers division. We performed some preprocessing tasks like noise reduction and 8-10 second segmentation of raw audio before feature extraction. We used our dataset of more than 45 hours of audio data from 633 individual male and female speakers. We recorded the highest accuracy of 85.44%.

[112]  arXiv:2404.15176 (cross-list from eess.AS) [pdf, other]
Title: Voice Passing : a Non-Binary Voice Gender Prediction System for evaluating Transgender voice transition
Comments: 5 pages, 1 figure, keywords: Transgender voice, Gender perception, Speaker gender classification, CNN, X-Vector
Journal-ref: Proc. INTERSPEECH 2023, 5207-5211
Subjects: Audio and Speech Processing (eess.AS); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Sound (cs.SD)

This paper presents a software allowing to describe voices using a continuous Voice Femininity Percentage (VFP). This system is intended for transgender speakers during their voice transition and for voice therapists supporting them in this process. A corpus of 41 French cis- and transgender speakers was recorded. A perceptual evaluation allowed 57 participants to estimate the VFP for each voice. Binary gender classification models were trained on external gender-balanced data and used on overlapping windows to obtain average gender prediction estimates, which were calibrated to predict VFP and obtained higher accuracy than $F_0$ or vocal track length-based models. Training data speaking style and DNN architecture were shown to impact VFP estimation. Accuracy of the models was affected by speakers' age. This highlights the importance of style, age, and the conception of gender as binary or not, to build adequate statistical representations of cultural concepts.

[113]  arXiv:2404.15193 (cross-list from cs.NE) [pdf, other]
Title: Structurally Flexible Neural Networks: Evolving the Building Blocks for General Agents
Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Artificial neural networks used for reinforcement learning are structurally rigid, meaning that each optimized parameter of the network is tied to its specific placement in the network structure. It also means that a network only works with pre-defined and fixed input- and output sizes. This is a consequence of having the number of optimized parameters being directly dependent on the structure of the network. Structural rigidity limits the ability to optimize parameters of policies across multiple environments that do not share input and output spaces. Here, we evolve a set of neurons and plastic synapses each represented by a gated recurrent unit (GRU). During optimization, the parameters of these fundamental units of a neural network are optimized in different random structural configurations. Earlier work has shown that parameter sharing between units is important for making structurally flexible neurons We show that it is possible to optimize a set of distinct neuron- and synapse types allowing for a mitigation of the symmetry dilemma. We demonstrate this by optimizing a single set of neurons and synapses to solve multiple reinforcement learning control tasks simultaneously.

[114]  arXiv:2404.15197 (cross-list from cs.NI) [pdf, other]
Title: Multi-Task Learning as enabler for General-Purpose AI-native RAN
Comments: Accepted for 2024 IEEE ICC Workshop on Edge Learning over 5G Mobile Networks and Beyond
Subjects: Networking and Internet Architecture (cs.NI); Machine Learning (cs.LG)

The realization of data-driven AI-native architecture envisioned for 6G and beyond networks can eventually lead to multiple machine learning (ML) workloads distributed at the network edges driving downstream tasks like secondary carrier prediction, positioning, channel prediction etc. The independent life-cycle management of these edge-distributed independent multiple workloads sharing a resource-constrained compute node e.g., base station (BS) is a challenge that will scale with denser deployments. This study explores the effectiveness of multi-task learning (MTL) approaches in facilitating a general-purpose AI native Radio Access Network (RAN). The investigation focuses on four RAN tasks: (i) secondary carrier prediction, (ii) user location prediction, (iii) indoor link classification, and (iv) line-of-sight link classification. We validate the performance using realistic simulations considering multi-faceted design aspects of MTL including model architecture, loss and gradient balancing strategies, distributed learning topology, data sparsity and task groupings. The quantification and insights from simulations reveal that for the four RAN tasks considered (i) adoption of customized gate control-based expert architecture with uncertainty-based weighting makes MTL perform either best among all or at par with single task learning (STL) (ii) LoS classification task in MTL setting helps other tasks but its own performance is degraded (iii) for sparse training data, training a single global MTL model is helpful but MTL performance is on par with STL (iv) optimal set of group pairing exists for each task and (v) partial federation is much better than full model federation in MTL setting.

[115]  arXiv:2404.15204 (cross-list from cs.PL) [pdf, other]
Title: Towards a high-performance AI compiler with upstream MLIR
Comments: 13 pages, 8 figures, presented at CGO C4ML 2024 & MLIR Workshop EuroLLVM 2024
Subjects: Programming Languages (cs.PL); Artificial Intelligence (cs.AI); Hardware Architecture (cs.AR); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)

This work proposes a compilation flow using open-source compiler passes to build a framework to achieve ninja performance from a generic linear algebra high-level abstraction. We demonstrate this flow with a proof-of-concept MLIR project that uses input IR in Linalg-on-Tensor from TensorFlow and PyTorch, performs cache-level optimizations and lowering to micro-kernels for efficient vectorization, achieving over 90% of the performance of ninja-written equivalent programs. The contributions of this work include: (1) Packing primitives on the tensor dialect and passes for cache-aware distribution of tensors (single and multi-core) and type-aware instructions (VNNI, BFDOT, BFMMLA), including propagation of shapes across the entire function; (2) A linear algebra pipeline, including tile, fuse and bufferization strategies to get model-level IR into hardware friendly tile calls; (3) A mechanism for micro-kernel lowering to an open source library that supports various CPUs.

[116]  arXiv:2404.15207 (cross-list from cs.CE) [pdf, other]
Title: Simulation-Free Determination of Microstructure Representative Volume Element Size via Fisher Scores
Journal-ref: APL Mach. Learn. 2(2): 026101 (2024)
Subjects: Computational Engineering, Finance, and Science (cs.CE); Materials Science (cond-mat.mtrl-sci); Machine Learning (cs.LG); Applications (stat.AP)

A representative volume element (RVE) is a reasonably small unit of microstructure that can be simulated to obtain the same effective properties as the entire microstructure sample. Finite element (FE) simulation of RVEs, as opposed to much larger samples, saves computational expense, especially in multiscale modeling. Therefore, it is desirable to have a framework that determines RVE size prior to FE simulations. Existing methods select the RVE size based on when the FE-simulated properties of samples of increasing size converge with insignificant statistical variations, with the drawback that many samples must be simulated. We propose a simulation-free alternative that determines RVE size based only on a micrograph. The approach utilizes a machine learning model trained to implicitly characterize the stochastic nature of the input micrograph. The underlying rationale is to view RVE size as the smallest moving window size for which the stochastic nature of the microstructure within the window is stationary as the window moves across a large micrograph. For this purpose, we adapt a recently developed Fisher score-based framework for microstructure nonstationarity monitoring. Because the resulting RVE size is based solely on the micrograph and does not involve any FE simulation of specific properties, it constitutes an RVE for any property of interest that solely depends on the microstructure characteristics. Through numerical experiments of simple and complex microstructures, we validate our approach and show that our selected RVE sizes are consistent with when the chosen FE-simulated properties converge.

[117]  arXiv:2404.15211 (cross-list from cs.DC) [pdf, other]
Title: LACS: Learning-Augmented Algorithms for Carbon-Aware Resource Scaling with Uncertain Demand
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)

Motivated by an imperative to reduce the carbon emissions of cloud data centers, this paper studies the online carbon-aware resource scaling problem with unknown job lengths (OCSU) and applies it to carbon-aware resource scaling for executing computing workloads. The task is to dynamically scale resources (e.g., the number of servers) assigned to a job of unknown length such that it is completed before a deadline, with the objective of reducing the carbon emissions of executing the workload. The total carbon emissions of executing a job originate from the emissions of running the job and excess carbon emitted while switching between different scales (e.g., due to checkpoint and resume). Prior work on carbon-aware resource scaling has assumed accurate job length information, while other approaches have ignored switching losses and require carbon intensity forecasts. These assumptions prohibit the practical deployment of prior work for online carbon-aware execution of scalable computing workload. We propose LACS, a theoretically robust learning-augmented algorithm that solves OCSU. To achieve improved practical average-case performance, LACS integrates machine-learned predictions of job length. To achieve solid theoretical performance, LACS extends the recent theoretical advances on online conversion with switching costs to handle a scenario where the job length is unknown. Our experimental evaluations demonstrate that, on average, the carbon footprint of LACS lies within 1.2% of the online baseline that assumes perfect job length information and within 16% of the offline baseline that, in addition to the job length, also requires accurate carbon intensity forecasts. Furthermore, LACS achieves a 32% reduction in carbon footprint compared to the deadline-aware carbon-agnostic execution of the job.

[118]  arXiv:2404.15213 (cross-list from cs.HC) [pdf, other]
Title: Automatic Classification of Subjective Time Perception Using Multi-modal Physiological Data of Air Traffic Controllers
Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible
Subjects: Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)

One indicator of well-being can be the person's subjective time perception. In our project ChronoPilot, we aim to develop a device that modulates human subjective time perception. In this study, we present a method to automatically assess the subjective time perception of air traffic controllers, a group often faced with demanding conditions, using their physiological data and eleven state-of-the-art machine learning classifiers. The physiological data consist of photoplethysmogram, electrodermal activity, and temperature data. We find that the support vector classifier works best with an accuracy of 79 % and electrodermal activity provides the most descriptive biomarker. These findings are an important step towards closing the feedback loop of our ChronoPilot-device to automatically modulate the user's subjective time perception. This technological advancement may promise improvements in task management, stress reduction, and overall productivity in high-stakes professions.

[119]  arXiv:2404.15217 (cross-list from cs.CV) [pdf, other]
Title: Towards Large-Scale Training of Pathology Foundation Models
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Driven by the recent advances in deep learning methods and, in particular, by the development of modern self-supervised learning algorithms, increased interest and efforts have been devoted to build foundation models (FMs) for medical images. In this work, we present our scalable training pipeline for large pathology imaging data, and a comprehensive analysis of various hyperparameter choices and training techniques for building pathology FMs. We release and make publicly available the first batch of our pathology FMs (https://github.com/kaiko-ai/towards_large_pathology_fms) trained on open-access TCGA whole slide images, a commonly used collection of pathology images. The experimental evaluation shows that our models reach state-of-the-art performance on various patch-level downstream tasks, ranging from breast cancer subtyping to colorectal nuclear segmentation. Finally, to unify the evaluation approaches used in the field and to simplify future comparisons of different FMs, we present an open-source framework (https://github.com/kaiko-ai/eva) designed for the consistent evaluation of pathology FMs across various downstream tasks.

[120]  arXiv:2404.15224 (cross-list from cs.CV) [pdf, other]
Title: Deep Models for Multi-View 3D Object Recognition: A Review
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Human decision-making often relies on visual information from multiple perspectives or views. In contrast, machine learning-based object recognition utilizes information from a single image of the object. However, the information conveyed by a single image may not be sufficient for accurate decision-making, particularly in complex recognition problems. The utilization of multi-view 3D representations for object recognition has thus far demonstrated the most promising results for achieving state-of-the-art performance. This review paper comprehensively covers recent progress in multi-view 3D object recognition methods for 3D classification and retrieval tasks. Specifically, we focus on deep learning-based and transformer-based techniques, as they are widely utilized and have achieved state-of-the-art performance. We provide detailed information about existing deep learning-based and transformer-based multi-view 3D object recognition models, including the most commonly used 3D datasets, camera configurations and number of views, view selection strategies, pre-trained CNN architectures, fusion strategies, and recognition performance on 3D classification and 3D retrieval tasks. Additionally, we examine various computer vision applications that use multi-view classification. Finally, we highlight key findings and future directions for developing multi-view 3D object recognition methods to provide readers with a comprehensive understanding of the field.

[121]  arXiv:2404.15243 (cross-list from cs.NI) [pdf, other]
Title: UCINet0: A Machine Learning based Receiver for 5G NR PUCCH Format 0
Subjects: Networking and Internet Architecture (cs.NI); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Signal Processing (eess.SP)

Accurate decoding of Uplink Control Information (UCI) on the Physical Uplink Control Channel (PUCCH) is essential for enabling 5G wireless links. This paper explores an AI/ML-based receiver design for PUCCH Format 0. Format 0 signaling encodes the UCI content within the phase of a known base waveform and even supports multiplexing of up to 12 users within the same time-frequency resources. Our first-of-a-kind neural network classifier, which we term UCINet0, is capable of predicting when no user is transmitting on the PUCCH, as well as decoding the UCI content of any number of multiplexed users, up to 12. Inference results with both simulated and hardware-captured field datasets show that the UCINet0 model outperforms conventional DFT-based decoders across all SNR ranges.

[122]  arXiv:2404.15244 (cross-list from cs.CV) [pdf, other]
Title: Efficient Transformer Encoders for Mask2Former-style models
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Vision transformer based models bring significant improvements for image segmentation tasks. Although these architectures offer powerful capabilities irrespective of specific segmentation tasks, their use of computational resources can be taxing on deployed devices. One way to overcome this challenge is by adapting the computation level to the specific needs of the input image rather than the current one-size-fits-all approach. To this end, we introduce ECO-M2F or EffiCient TransfOrmer Encoders for Mask2Former-style models. Noting that the encoder module of M2F-style models incur high resource-intensive computations, ECO-M2F provides a strategy to self-select the number of hidden layers in the encoder, conditioned on the input image. To enable this self-selection ability for providing a balance between performance and computational efficiency, we present a three step recipe. The first step is to train the parent architecture to enable early exiting from the encoder. The second step is to create an derived dataset of the ideal number of encoder layers required for each training example. The third step is to use the aforementioned derived dataset to train a gating network that predicts the number of encoder layers to be used, conditioned on the input image. Additionally, to change the computational-accuracy tradeoff, only steps two and three need to be repeated which significantly reduces retraining time. Experiments on the public datasets show that the proposed approach reduces expected encoder computational cost while maintaining performance, adapts to various user compute resources, is flexible in architecture configurations, and can be extended beyond the segmentation task to object detection.

[123]  arXiv:2404.15245 (cross-list from stat.ME) [pdf, other]
Title: Mining Invariance from Nonlinear Multi-Environment Data: Binary Classification
Comments: Accepted to the 2024 International Symposium on Information Theory (ISIT)
Subjects: Methodology (stat.ME); Machine Learning (cs.LG)

Making predictions in an unseen environment given data from multiple training environments is a challenging task. We approach this problem from an invariance perspective, focusing on binary classification to shed light on general nonlinear data generation mechanisms. We identify a unique form of invariance that exists solely in a binary setting that allows us to train models invariant over environments. We provide sufficient conditions for such invariance and show it is robust even when environmental conditions vary greatly. Our formulation admits a causal interpretation, allowing us to compare it with various frameworks. Finally, we propose a heuristic prediction method and conduct experiments using real and synthetic datasets.

[124]  arXiv:2404.15247 (cross-list from cs.CL) [pdf, other]
Title: XFT: Unlocking the Power of Code Instruction Tuning by Simply Merging Upcycled Mixture-of-Experts
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Software Engineering (cs.SE)

We introduce XFT, a simple yet powerful training scheme, by simply merging upcycled Mixture-of-Experts (MoE) to unleash the performance limit of instruction-tuned code Large Language Models (LLMs). While vanilla sparse upcycling fails to improve instruction tuning, XFT introduces a shared expert mechanism with a novel routing weight normalization strategy into sparse upcycling, which significantly boosts instruction tuning. After fine-tuning the upcycled MoE model, XFT introduces a learnable model merging mechanism to compile the upcycled MoE model back to a dense model, achieving upcycled MoE-level performance with only dense-model compute. By applying XFT to a 1.3B model, we create a new state-of-the-art tiny code LLM (<3B) with 67.1 and 64.6 pass@1 on HumanEval and HumanEval+ respectively. With the same data and model architecture, XFT improves supervised fine-tuning (SFT) by 13% on HumanEval+, along with consistent improvements from 2% to 13% on MBPP+, MultiPL-E, and DS-1000, demonstrating its generalizability. XFT is fully orthogonal to existing techniques such as Evol-Instruct and OSS-Instruct, opening a new dimension for improving code instruction tuning. Codes are available at https://github.com/ise-uiuc/xft .

[125]  arXiv:2404.15258 (cross-list from math.PR) [pdf, other]
Title: Score matching for sub-Riemannian bridge sampling
Comments: 33 pages, 4 figures
Subjects: Probability (math.PR); Machine Learning (cs.LG); Differential Geometry (math.DG); Statistics Theory (math.ST); Machine Learning (stat.ML)

Simulation of conditioned diffusion processes is an essential tool in inference for stochastic processes, data imputation, generative modelling, and geometric statistics. Whilst simulating diffusion bridge processes is already difficult on Euclidean spaces, when considering diffusion processes on Riemannian manifolds the geometry brings in further complications. In even higher generality, advancing from Riemannian to sub-Riemannian geometries introduces hypoellipticity, and the possibility of finding appropriate explicit approximations for the score of the diffusion process is removed. We handle these challenges and construct a method for bridge simulation on sub-Riemannian manifolds by demonstrating how recent progress in machine learning can be modified to allow for training of score approximators on sub-Riemannian manifolds. Since gradients dependent on the horizontal distribution, we generalise the usual notion of denoising loss to work with non-holonomic frames using a stochastic Taylor expansion, and we demonstrate the resulting scheme both explicitly on the Heisenberg group and more generally using adapted coordinates. We perform numerical experiments exemplifying samples from the bridge process on the Heisenberg group and the concentration of this process for small time.

[126]  arXiv:2404.15261 (cross-list from math.OC) [pdf, other]
Title: All You Need is Resistance: On the Equivalence of Effective Resistance and Certain Optimal Transport Problems on Graphs
Comments: 35 pages, 7 figures
Subjects: Optimization and Control (math.OC); Discrete Mathematics (cs.DM); Machine Learning (cs.LG); Probability (math.PR)

The fields of effective resistance and optimal transport on graphs are filled with rich connections to combinatorics, geometry, machine learning, and beyond. In this article we put forth a bold claim: that the two fields should be understood as one and the same, up to a choice of $p$. We make this claim precise by introducing the parameterized family of $p$-Beckmann distances for probability measures on graphs and relate them sharply to certain Wasserstein distances. Then, we break open a suite of results including explicit connections to optimal stopping times and random walks on graphs, graph Sobolev spaces, and a Benamou-Brenier type formula for $2$-Beckmann distance. We further explore empirical implications in the world of unsupervised learning for graph data and propose further study of the usage of these metrics where Wasserstein distance may produce computational bottlenecks.

[127]  arXiv:2404.15269 (cross-list from cs.CL) [pdf, other]
Title: Aligning LLM Agents by Learning Latent Preference from User Edits
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Machine Learning (cs.LG)

We study interactive learning of language agents based on user edits made to the agent's output. In a typical setting such as writing assistants, the user interacts with a language agent to generate a response given a context, and may optionally edit the agent response to personalize it based on their latent preference, in addition to improving the correctness. The edit feedback is naturally generated, making it a suitable candidate for improving the agent's alignment with the user's preference, and for reducing the cost of user edits over time. We propose a learning framework, PRELUDE that infers a description of the user's latent preference based on historic edit data and using it to define a prompt policy that drives future response generation. This avoids fine-tuning the agent, which is costly, challenging to scale with the number of users, and may even degrade its performance on other tasks. Furthermore, learning descriptive preference improves interpretability, allowing the user to view and modify the learned preference. However, user preference can be complex and vary based on context, making it challenging to learn. To address this, we propose a simple yet effective algorithm named CIPHER that leverages a large language model (LLM) to infer the user preference for a given context based on user edits. In the future, CIPHER retrieves inferred preferences from the k-closest contexts in the history, and forms an aggregate preference for response generation. We introduce two interactive environments -- summarization and email writing, for evaluation using a GPT-4 simulated user. We compare with algorithms that directly retrieve user edits but do not learn descriptive preference, and algorithms that learn context-agnostic preference. On both tasks, CIPHER achieves the lowest edit distance cost and learns preferences that show significant similarity to the ground truth preferences

[128]  arXiv:2404.15273 (cross-list from math.OC) [pdf, other]
Title: Estimation Network Design framework for efficient distributed optimization
Comments: 8 pages, 4 figures. arXiv admin note: substantial text overlap with arXiv:2208.11377
Subjects: Optimization and Control (math.OC); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG); Multiagent Systems (cs.MA)

Distributed decision problems features a group of agents that can only communicate over a peer-to-peer network, without a central memory. In applications such as network control and data ranking, each agent is only affected by a small portion of the decision vector: this sparsity is typically ignored in distributed algorithms, while it could be leveraged to improve efficiency and scalability. To address this issue, our recent paper introduces Estimation Network Design (END), a graph theoretical language for the analysis and design of distributed iterations. END algorithms can be tuned to exploit the sparsity of specific problem instances, reducing communication overhead and minimizing redundancy, yet without requiring case-by-case convergence analysis. In this paper, we showcase the flexility of END in the context of distributed optimization. In particular, we study the sparsity-aware version of many established methods, including ADMM, AugDGM and Push-Sum DGD. Simulations on an estimation problem in sensor networks demonstrate that END algorithms can boost convergence speed and greatly reduce the communication and memory cost.

[129]  arXiv:2404.15276 (cross-list from cs.CV) [pdf, other]
Title: SMPLer: Taming Transformers for Monocular 3D Human Shape and Pose Estimation
Comments: Published at TPAMI 2024
Journal-ref: https://www.computer.org/csdl/journal/tp/2024/05/10354384/1SP2qWh8Fq0
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR); Machine Learning (cs.LG); Multimedia (cs.MM)

Existing Transformers for monocular 3D human shape and pose estimation typically have a quadratic computation and memory complexity with respect to the feature length, which hinders the exploitation of fine-grained information in high-resolution features that is beneficial for accurate reconstruction. In this work, we propose an SMPL-based Transformer framework (SMPLer) to address this issue. SMPLer incorporates two key ingredients: a decoupled attention operation and an SMPL-based target representation, which allow effective utilization of high-resolution features in the Transformer. In addition, based on these two designs, we also introduce several novel modules including a multi-scale attention and a joint-aware attention to further boost the reconstruction performance. Extensive experiments demonstrate the effectiveness of SMPLer against existing 3D human shape and pose estimation methods both quantitatively and qualitatively. Notably, the proposed algorithm achieves an MPJPE of 45.2 mm on the Human3.6M dataset, improving upon Mesh Graphormer by more than 10% with fewer than one-third of the parameters. Code and pretrained models are available at https://github.com/xuxy09/SMPLer.

Replacements for Wed, 24 Apr 24

[130]  arXiv:2104.00170 (replaced) [pdf, other]
Title: Are Bias Mitigation Techniques for Deep Learning Effective?
Comments: Published in WACV 2022 under the title "An Investigation of Critical Issues in Bias Mitigation Techniques"
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
[131]  arXiv:2109.09506 (replaced) [pdf, other]
Title: Decoupling Long- and Short-Term Patterns in Spatiotemporal Inference
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[132]  arXiv:2212.00484 (replaced) [pdf, other]
Title: Differentially-Private Data Synthetisation for Efficient Re-Identification Risk Control
Comments: 21 pages, 6 figures and 2 tables
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
[133]  arXiv:2301.05180 (replaced) [pdf, other]
Title: Effective Decision Boundary Learning for Class Incremental Learning
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
[134]  arXiv:2301.13833 (replaced) [pdf, other]
Title: A Mathematical Model for Curriculum Learning for Parities
Journal-ref: ICML 2023
Subjects: Machine Learning (cs.LG)
[135]  arXiv:2302.03232 (replaced) [pdf, other]
Title: Linear Optimal Partial Transport Embedding
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
[136]  arXiv:2305.01932 (replaced) [pdf, other]
Title: Fully Automatic Neural Network Reduction for Formal Verification
Comments: under review
Subjects: Machine Learning (cs.LG)
[137]  arXiv:2305.07877 (replaced) [pdf, ps, other]
Title: Differentiating Viral and Bacterial Infections: A Machine Learning Model Based on Routine Blood Test Values
Comments: 16 pages
Journal-ref: Heliyon, Volume 10, ISSUE 8, e29372, Cell, April 30, 2024
Subjects: Machine Learning (cs.LG)
[138]  arXiv:2305.15557 (replaced) [pdf, ps, other]
Title: Non-Parametric Learning of Stochastic Differential Equations with Non-asymptotic Fast Rates of Convergence
Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY); Optimization and Control (math.OC)
[139]  arXiv:2306.11246 (replaced) [pdf, other]
Title: Neural Inventory Control in Networks via Hindsight Differentiable Policy Optimization
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[140]  arXiv:2309.00236 (replaced) [pdf, other]
Title: Image Hijacks: Adversarial Images can Control Generative Models at Runtime
Comments: Project page at this https URL
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Cryptography and Security (cs.CR)
[141]  arXiv:2309.15877 (replaced) [src]
Title: Neuro-Inspired Hierarchical Multimodal Learning
Comments: I am requesting the withdrawal of this submission due to an inadvertent duplication. The paper was submitted twice under different IDs, which was not intentional. The other submission (arXiv:2404.09403) contains the most updated and comprehensive version of the paper, and I would like to retain that as the sole version on the platform
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[142]  arXiv:2310.00105 (replaced) [pdf, other]
Title: Latent Space Symmetry Discovery
Subjects: Machine Learning (cs.LG)
[143]  arXiv:2310.03720 (replaced) [pdf, other]
Title: SteP: Stacked LLM Policies for Web Actions
Comments: 30 pages, 15 figures
Subjects: Machine Learning (cs.LG)
[144]  arXiv:2310.05348 (replaced) [pdf, other]
Title: Continuous Invariance Learning
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[145]  arXiv:2310.18144 (replaced) [pdf, other]
Title: Improving Intrinsic Exploration by Creating Stationary Objectives
Comments: Accepted at ICLR 2024
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[146]  arXiv:2311.00259 (replaced) [pdf, other]
Title: Solutions to Elliptic and Parabolic Problems via Finite Difference Based Unsupervised Small Linear Convolutional Neural Networks
Comments: Submitted to CMA, under review
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Numerical Analysis (math.NA)
[147]  arXiv:2311.11483 (replaced) [pdf, ps, other]
Title: A Multi-Center Study on the Adaptability of a Shared Foundation Model for Electronic Health Records
Comments: 46 pages, 5 figures, 3 tables, 14 appendices
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[148]  arXiv:2311.16877 (replaced) [pdf, other]
Title: Imputation using training labels and classification via label imputation
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[149]  arXiv:2312.16243 (replaced) [pdf, other]
Title: Mixture Data for Training Cannot Ensure Out-of-distribution Generalization
Comments: 13 pages, 9 figures
Subjects: Machine Learning (cs.LG)
[150]  arXiv:2401.16386 (replaced) [pdf, other]
Title: Continual Learning with Pre-Trained Models: A Survey
Comments: Accepted to IJCAI 2024. Code is available at: this https URL
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
[151]  arXiv:2402.03923 (replaced) [pdf, other]
Title: Return-Aligned Decision Transformer
Subjects: Machine Learning (cs.LG)
[152]  arXiv:2402.07232 (replaced) [pdf, other]
Title: UVTM: Universal Vehicle Trajectory Modeling with ST Feature Domain Generation
Subjects: Machine Learning (cs.LG)
[153]  arXiv:2402.16358 (replaced) [pdf, other]
Title: An Integrated Data Processing Framework for Pretraining Foundation Models
Comments: 6 pages, 2 figures; accepted by SIGIR'24 demo track
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Information Retrieval (cs.IR)
[154]  arXiv:2403.01535 (replaced) [pdf, other]
Title: Neural Graph Generator: Feature-Conditioned Graph Generation using Latent Diffusion Models
Subjects: Machine Learning (cs.LG); Social and Information Networks (cs.SI)
[155]  arXiv:2403.03218 (replaced) [pdf, other]
[156]  arXiv:2403.08216 (replaced) [pdf, other]
Title: PaddingFlow: Improving Normalizing Flows with Padding-Dimensional Noise
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
[157]  arXiv:2403.12459 (replaced) [pdf, other]
Title: Non-negative Contrastive Learning
Comments: 22 pages. Accepted by ICLR 2024
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
[158]  arXiv:2403.16829 (replaced) [pdf, ps, other]
Title: Convergence of a model-free entropy-regularized inverse reinforcement learning algorithm
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[159]  arXiv:2404.03180 (replaced) [pdf, other]
Title: Goldfish: An Efficient Federated Unlearning Framework
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
[160]  arXiv:2404.06694 (replaced) [pdf, other]
Title: How to Craft Backdoors with Unlabeled Data Alone?
Comments: Accepted at ICLR 2024 Workshop on Navigating and Addressing Data Problems for Foundation Models (DPFM)
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)
[161]  arXiv:2404.07518 (replaced) [pdf, other]
Title: Remembering Transformer for Continual Learning
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
[162]  arXiv:2404.08233 (replaced) [pdf, other]
Title: Generalized Population-Based Training for Hyperparameter Optimization in Reinforcement Learning
Authors: Hui Bai, Ran Cheng
Comments: IEEE Transactions on Emerging Topics in Computational Intelligence
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)
[163]  arXiv:2404.08838 (replaced) [pdf, ps, other]
Title: Predicting Traffic Congestion at Urban Intersections Using Data-Driven Modeling
Subjects: Machine Learning (cs.LG)
[164]  arXiv:2404.09403 (replaced) [pdf, other]
Title: Neuro-Inspired Information-Theoretic Hierarchical Perception for Multimodal Learning
Comments: Accepted by ICLR 2024. Camera Ready Version
Subjects: Machine Learning (cs.LG)
[165]  arXiv:2404.10824 (replaced) [pdf, other]
Title: Decoupled Weight Decay for Any $p$ Norm
Comments: GitHub link: this https URL
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Optimization and Control (math.OC)
[166]  arXiv:2404.13327 (replaced) [pdf, other]
Title: Comparative Analysis on Snowmelt-Driven Streamflow Forecasting Using Machine Learning Techniques
Comments: 17 pages, 4 Tables, 7 figures
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[167]  arXiv:2404.13846 (replaced) [pdf, other]
Title: Filtered Direct Preference Optimization
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[168]  arXiv:2404.13964 (replaced) [pdf, other]
Title: An Economic Solution to Copyright Challenges of Generative AI
Subjects: Machine Learning (cs.LG); General Economics (econ.GN); Methodology (stat.ME)
[169]  arXiv:2404.14367 (replaced) [pdf, other]
Title: Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data
Subjects: Machine Learning (cs.LG)
[170]  arXiv:2206.00893 (replaced) [pdf, other]
Title: Leveraging Systematic Knowledge of 2D Transformations
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[171]  arXiv:2210.12860 (replaced) [pdf, ps, other]
Title: Explicit Second-Order Min-Max Optimization Methods with Optimal Convergence Guarantee
Comments: Provide a simple subroutine with a detailed complexity analysis; 30 pages, 9 figures
Subjects: Optimization and Control (math.OC); Computational Complexity (cs.CC); Machine Learning (cs.LG)
[172]  arXiv:2303.03593 (replaced) [pdf, other]
Title: ADELT: Transpilation Between Deep Learning Frameworks
Comments: 19 pages
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
[173]  arXiv:2304.13940 (replaced) [pdf, other]
Title: A Majorization-Minimization Gauss-Newton Method for 1-Bit Matrix Completion
Comments: 28 pages, 7 figures
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[174]  arXiv:2306.00107 (replaced) [pdf, other]
Title: MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training
Comments: accepted by ICLR 2024
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[175]  arXiv:2306.14572 (replaced) [pdf, other]
Title: Feature Imitating Networks Enhance The Performance, Reliability And Speed Of Deep Learning On Biomedical Image Processing Tasks
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[176]  arXiv:2307.14642 (replaced) [pdf, ps, other]
Title: Linear Convergence of Black-Box Variational Inference: Should We Stick the Landing?
Comments: Accepted to AISTATS'24; fixed missing expectations in iteration complexity statements
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Computation (stat.CO)
[177]  arXiv:2309.03291 (replaced) [pdf, other]
Title: CLEANing Cygnus A deep and fast with R2D2
Comments: accepted for publication in ApJL
Subjects: Instrumentation and Methods for Astrophysics (astro-ph.IM); Machine Learning (cs.LG); Image and Video Processing (eess.IV); Signal Processing (eess.SP)
[178]  arXiv:2309.03847 (replaced) [pdf, other]
Title: Mixtures of Gaussians are Privately Learnable with a Polynomial Number of Samples
Subjects: Machine Learning (stat.ML); Cryptography and Security (cs.CR); Data Structures and Algorithms (cs.DS); Information Theory (cs.IT); Machine Learning (cs.LG)
[179]  arXiv:2309.07136 (replaced) [pdf, other]
Title: Masked Transformer for Electrocardiogram Classification
Comments: more experimental results; more implementation details; different abstracts
Subjects: Signal Processing (eess.SP); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Applications (stat.AP)
[180]  arXiv:2310.00542 (replaced) [pdf, other]
Title: Watch Out! Simple Horizontal Class Backdoors Can Trivially Evade Defenses
Comments: To Appear in the 31st ACM Conference on Computer and Communications Security
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)
[181]  arXiv:2311.08945 (replaced) [pdf, other]
Title: A Single-Loop Algorithm for Decentralized Bilevel Optimization
Subjects: Optimization and Control (math.OC); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)
[182]  arXiv:2311.16214 (replaced) [pdf, other]
Title: DGR: Tackling Drifted and Correlated Noise in Quantum Error Correction via Decoding Graph Re-weighting
Comments: 13 pages, 19 figures
Subjects: Quantum Physics (quant-ph); Hardware Architecture (cs.AR); Emerging Technologies (cs.ET); Machine Learning (cs.LG)
[183]  arXiv:2311.16333 (replaced) [pdf, other]
Title: From Reactive to Proactive Volatility Modeling with Hemisphere Neural Networks
Subjects: Econometrics (econ.EM); Machine Learning (cs.LG)
[184]  arXiv:2312.05181 (replaced) [pdf, other]
Title: Tenplex: Dynamic Parallelism for Deep Learning using Parallelizable Tensor Collections
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[185]  arXiv:2312.08083 (replaced) [pdf, ps, other]
Title: Training of Neural Networks with Uncertain Data -- A Mixture of Experts Approach
Authors: Lucas Luttner
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[186]  arXiv:2401.09819 (replaced) [pdf, other]
Title: PPNet: A Two-Stage Neural Network for End-to-end Path Planning
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[187]  arXiv:2401.11176 (replaced) [pdf, other]
Title: Data-Driven Target Localization: Benchmarking Gradient Descent Using the Cramer-Rao Bound
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG)
[188]  arXiv:2401.17809 (replaced) [pdf, other]
Title: SWEA: Updating Factual Knowledge in Large Language Models via Subject Word Embedding Altering
Comments: Under review; Our code is available at this https URL
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[189]  arXiv:2402.01795 (replaced) [pdf, other]
Title: Few-Shot Scenario Testing for Autonomous Vehicles Based on Neighborhood Coverage and Similarity
Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG); Robotics (cs.RO); Software Engineering (cs.SE)
[190]  arXiv:2402.09126 (replaced) [pdf, other]
Title: MPIrigen: MPI Code Generation through Domain-Specific Language Models
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Software Engineering (cs.SE)
[191]  arXiv:2402.13251 (replaced) [pdf, other]
Title: FlashTex: Fast Relightable Mesh Texturing with LightControlNet
Comments: Project page: this https URL
Subjects: Graphics (cs.GR); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[192]  arXiv:2402.15300 (replaced) [pdf, other]
Title: Seeing is Believing: Mitigating Hallucination in Large Vision-Language Models via CLIP-Guided Decoding
Comments: Code URL: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Multimedia (cs.MM)
[193]  arXiv:2403.03100 (replaced) [pdf, other]
Title: NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models
Comments: Achieving human-level quality and naturalness on multi-speaker datasets (e.g., LibriSpeech) in a zero-shot way
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[194]  arXiv:2403.07890 (replaced) [pdf, other]
Title: $\widetilde{O}(T^{-1})$ Convergence to (Coarse) Correlated Equilibria in Full-Information General-Sum Markov Games
Subjects: Computer Science and Game Theory (cs.GT); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[195]  arXiv:2403.08755 (replaced) [pdf, other]
Title: DAM: Dynamic Adapter Merging for Continual Video QA Learning
Comments: The first two authors contribute equally
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
[196]  arXiv:2403.10558 (replaced) [pdf, other]
Title: Adaptive Hybrid Masking Strategy for Privacy-Preserving Face Recognition Against Model Inversion Attack
Subjects: Computer Vision and Pattern Recognition (cs.CV); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
[197]  arXiv:2403.11330 (replaced) [pdf, other]
Title: Improving Dialogue Agents by Decomposing One Global Explicit Annotation with Local Implicit Multimodal Feedback
Comments: 10 pages, 3 figures, 2 tables
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
[198]  arXiv:2404.00399 (replaced) [pdf, other]
[199]  arXiv:2404.01318 (replaced) [pdf, other]
Title: JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)
[200]  arXiv:2404.06129 (replaced) [pdf, other]
Title: Adaptable Recovery Behaviors in Robotics: A Behavior Trees and Motion Generators(BTMG) Approach for Failure Management
Subjects: Robotics (cs.RO); Machine Learning (cs.LG)
[201]  arXiv:2404.08968 (replaced) [pdf, other]
Title: MCPNet: An Interpretable Classifier via Multi-Level Concept Prototypes
Comments: Accepted by CVPR 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[202]  arXiv:2404.10180 (replaced) [pdf, other]
Title: Deferred NAM: Low-latency Top-K Context Injection via Deferred Context Encoding for Non-Streaming ASR
Comments: 9 pages, 3 figures, accepted by NAACL 2024 - Industry Track
Journal-ref: 2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics - Industry Track
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Audio and Speech Processing (eess.AS)
[203]  arXiv:2404.10540 (replaced) [pdf, other]
Title: SEVD: Synthetic Event-based Vision Dataset for Ego and Fixed Traffic Perception
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[204]  arXiv:2404.11526 (replaced) [pdf, other]
Title: A Comparison of Traditional and Deep Learning Methods for Parameter Estimation of the Ornstein-Uhlenbeck Process
Subjects: Computational Finance (q-fin.CP); Machine Learning (cs.LG)
[205]  arXiv:2404.11912 (replaced) [pdf, other]
Title: TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
[206]  arXiv:2404.12141 (replaced) [pdf, other]
Title: MolCRAFT: Structure-Based Drug Design in Continuous Parameter Space
Comments: 19 pages, 11 figures
Subjects: Biomolecules (q-bio.BM); Machine Learning (cs.LG)
[207]  arXiv:2404.12368 (replaced) [pdf, other]
Title: Gradient-Regularized Out-of-Distribution Detection
Comments: Under review
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[208]  arXiv:2404.12486 (replaced) [pdf, other]
Title: Follow-Me AI: Energy-Efficient User Interaction with Smart Environments
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI); Emerging Technologies (cs.ET); Machine Learning (cs.LG)
[209]  arXiv:2404.13078 (replaced) [pdf, other]
Title: Empowering Interdisciplinary Research with BERT-Based Models: An Approach Through SciBERT-CNN with Topic Modeling
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
[210]  arXiv:2404.13388 (replaced) [pdf, ps, other]
Title: Diagnosis of Multiple Fundus Disorders Amidst a Scarcity of Medical Experts Via Self-supervised Machine Learning
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[211]  arXiv:2404.14146 (replaced) [pdf, ps, other]
Title: Physics-based reward driven image analysis in microscopy
Comments: 12 pages, 4 figures
Subjects: Materials Science (cond-mat.mtrl-sci); Machine Learning (cs.LG)
[ total of 211 entries: 1-211 ]
[ showing up to 2000 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, recent, 2404, contact, help  (Access key information)