We gratefully acknowledge support from
the Simons Foundation and member institutions.

Computer Science

New submissions, skipping first 1000

[ total of 846 entries: 1-500 | 347-846 ]
[ showing 500 entries per page: fewer | more | all ]

New submissions for Fri, 7 Jun 24 (continued, showing last 48 of 394 entries)

[347]  arXiv:2406.04280 [pdf, other]
Title: xMIL: Insightful Explanations for Multiple Instance Learning in Histopathology
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

Multiple instance learning (MIL) is an effective and widely used approach for weakly supervised machine learning. In histopathology, MIL models have achieved remarkable success in tasks like tumor detection, biomarker prediction, and outcome prognostication. However, MIL explanation methods are still lagging behind, as they are limited to small bag sizes or disregard instance interactions. We revisit MIL through the lens of explainable AI (XAI) and introduce xMIL, a refined framework with more general assumptions. We demonstrate how to obtain improved MIL explanations using layer-wise relevance propagation (LRP) and conduct extensive evaluation experiments on three toy settings and four real-world histopathology datasets. Our approach consistently outperforms previous explanation attempts with particularly improved faithfulness scores on challenging biomarker prediction tasks. Finally, we showcase how xMIL explanations enable pathologists to extract insights from MIL models, representing a significant advance for knowledge discovery and model debugging in digital histopathology.

[348]  arXiv:2406.04284 [pdf, other]
Title: What is Dataset Distillation Learning?
Comments: ICML 2024
Subjects: Machine Learning (cs.LG)

Dataset distillation has emerged as a strategy to overcome the hurdles associated with large datasets by learning a compact set of synthetic data that retains essential information from the original dataset. While distilled data can be used to train high performing models, little is understood about how the information is stored. In this study, we posit and answer three questions about the behavior, representativeness, and point-wise information content of distilled data. We reveal distilled data cannot serve as a substitute for real data during training outside the standard evaluation setting for dataset distillation. Additionally, the distillation process retains high task performance by compressing information related to the early training dynamics of real models. Finally, we provide an framework for interpreting distilled data and reveal that individual distilled data points contain meaningful semantic information. This investigation sheds light on the intricate nature of distilled data, providing a better understanding on how they can be effectively utilized.

[349]  arXiv:2406.04286 [pdf, other]
Title: ABEX: Data Augmentation for Low-Resource NLU via Expanding Abstract Descriptions
Comments: ACL 2024 Main Conference. Code and data: this https URL
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

We present ABEX, a novel and effective generative data augmentation methodology for low-resource Natural Language Understanding (NLU) tasks. ABEX is based on ABstract-and-EXpand, a novel paradigm for generating diverse forms of an input document -- we first convert a document into its concise, abstract description and then generate new documents based on expanding the resultant abstraction. To learn the task of expanding abstract descriptions, we first train BART on a large-scale synthetic dataset with abstract-document pairs. Next, to generate abstract descriptions for a document, we propose a simple, controllable, and training-free method based on editing AMR graphs. ABEX brings the best of both worlds: by expanding from abstract representations, it preserves the original semantic properties of the documents, like style and meaning, thereby maintaining alignment with the original label and data distribution. At the same time, the fundamental process of elaborating on abstract descriptions facilitates diverse generations. We demonstrate the effectiveness of ABEX on 4 NLU tasks spanning 12 datasets and 4 low-resource settings. ABEX outperforms all our baselines qualitatively with improvements of 0.04% - 38.8%. Qualitatively, ABEX outperforms all prior methods from literature in terms of context and length diversity.

[350]  arXiv:2406.04287 [pdf, other]
Title: SpectralZoom: Efficient Segmentation with an Adaptive Hyperspectral Camera
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)

Hyperspectral image segmentation is crucial for many fields such as agriculture, remote sensing, biomedical imaging, battlefield sensing and astronomy. However, the challenge of hyper and multi spectral imaging is its large data footprint. We propose both a novel camera design and a vision transformer-based (ViT) algorithm that alleviate both the captured data footprint and the computational load for hyperspectral segmentation. Our camera is able to adaptively sample image regions or patches at different resolutions, instead of capturing the entire hyperspectral cube at one high resolution. Our segmentation algorithm works in concert with the camera, applying ViT-based segmentation only to adaptively selected patches. We show results both in simulation and on a real hardware platform demonstrating both accurate segmentation results and reduced computational burden.

[351]  arXiv:2406.04289 [pdf, other]
Title: What Languages are Easy to Language-Model? A Perspective from Learning Probabilistic Regular Languages
Comments: Accepted to ACL 2024
Subjects: Computation and Language (cs.CL)

What can large language models learn? By definition, language models (LM) are distributions over strings. Therefore, an intuitive way of addressing the above question is to formalize it as a matter of learnability of classes of distributions over strings. While prior work in this direction focused on assessing the theoretical limits, in contrast, we seek to understand the empirical learnability. Unlike prior empirical work, we evaluate neural LMs on their home turf-learning probabilistic languages-rather than as classifiers of formal languages. In particular, we investigate the learnability of regular LMs (RLMs) by RNN and Transformer LMs. We empirically test the learnability of RLMs as a function of various complexity parameters of the RLM and the hidden state size of the neural LM. We find that the RLM rank, which corresponds to the size of linear space spanned by the logits of its conditional distributions, and the expected length of sampled strings are strong and significant predictors of learnability for both RNNs and Transformers. Several other predictors also reach significance, but with differing patterns between RNNs and Transformers.

[352]  arXiv:2406.04290 [pdf, other]
Title: Providing High-Performance Execution with a Sequential Contract for Cryptographic Programs
Comments: 17 pages, 7 figures, 4 tables
Subjects: Cryptography and Security (cs.CR); Hardware Architecture (cs.AR)

Constant-time programming is a widely deployed approach to harden cryptographic programs against side channel attacks. However, modern processors violate the underlying assumptions of constant-time policies by speculatively executing unintended paths of the program.
In this work, we propose Cassandra, a novel hardware-software mechanism to protect constant-time cryptographic code against speculative control flow based attacks. Cassandra explores the radical design point of disabling the branch predictor and recording-and-replaying sequential control flow of the program. Two key insights that enable our design are that (1) the sequential control flow of a constant-time program is constant over different runs, and (2) cryptographic programs are highly looped and their control flow patterns repeat in a highly compressible way. These insights allow us to perform an offline branch analysis that significantly compresses control flow traces. We add a small component to a typical processor design, the Branch Trace Unit, to store compressed traces and determine fetch redirections according to the sequential model of the program. Moreover, we provide a formal security analysis and prove that our methodology adheres to a strong security contract by design. Despite providing a higher security guarantee, Cassandra counter-intuitively improves performance by 1.77% by eliminating branch misprediction penalties.

[353]  arXiv:2406.04291 [pdf, other]
Title: Stratified Prediction-Powered Inference for Hybrid Language Model Evaluation
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Prediction-powered inference (PPI) is a method that improves statistical estimates based on limited human-labeled data. PPI achieves this by combining small amounts of human-labeled data with larger amounts of data labeled by a reasonably accurate -- but potentially biased -- automatic system, in a way that results in tighter confidence intervals for certain parameters of interest (e.g., the mean performance of a language model). In this paper, we propose a method called Stratified Prediction-Powered Inference (StratPPI), in which we show that the basic PPI estimates can be considerably improved by employing simple data stratification strategies. Without making any assumptions on the underlying automatic labeling system or data distribution, we derive an algorithm for computing provably valid confidence intervals for population parameters (such as averages) that is based on stratified sampling. In particular, we show both theoretically and empirically that, with appropriate choices of stratification and sample allocation, our approach can provide substantially tighter confidence intervals than unstratified approaches. Specifically, StratPPI is expected to improve in cases where the performance of the autorater varies across different conditional distributions of the target data.

[354]  arXiv:2406.04292 [pdf, other]
Title: VISTA: Visualized Text Embedding For Universal Multi-Modal Retrieval
Comments: Accepted to ACL 2024 main conference
Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)

Multi-modal retrieval becomes increasingly popular in practice. However, the existing retrievers are mostly text-oriented, which lack the capability to process visual information. Despite the presence of vision-language models like CLIP, the current methods are severely limited in representing the text-only and image-only data. In this work, we present a new embedding model VISTA for universal multi-modal retrieval. Our work brings forth threefold technical contributions. Firstly, we introduce a flexible architecture which extends a powerful text encoder with the image understanding capability by introducing visual token embeddings. Secondly, we develop two data generation strategies, which bring high-quality composed image-text to facilitate the training of the embedding model. Thirdly, we introduce a multi-stage training algorithm, which first aligns the visual token embedding with the text encoder using massive weakly labeled data, and then develops multi-modal representation capability using the generated composed image-text data. In our experiments, VISTA achieves superior performances across a variety of multi-modal retrieval tasks in both zero-shot and supervised settings. Our model, data, and source code are available at https://github.com/FlagOpen/FlagEmbedding.

[355]  arXiv:2406.04295 [pdf, other]
Title: Everything to the Synthetic: Diffusion-driven Test-time Adaptation via Synthetic-Domain Alignment
Comments: GitHub: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Test-time adaptation (TTA) aims to enhance the performance of source-domain pretrained models when tested on unknown shifted target domains. Traditional TTA methods primarily adapt model weights based on target data streams, making model performance sensitive to the amount and order of target data. Recently, diffusion-driven TTA methods have demonstrated strong performance by using an unconditional diffusion model, which is also trained on the source domain to transform target data into synthetic data as a source domain projection. This allows the source model to make predictions without weight adaptation. In this paper, we argue that the domains of the source model and the synthetic data in diffusion-driven TTA methods are not aligned. To adapt the source model to the synthetic domain of the unconditional diffusion model, we introduce a Synthetic-Domain Alignment (SDA) framework to fine-tune the source model with synthetic data. Specifically, we first employ a conditional diffusion model to generate labeled samples, creating a synthetic dataset. Subsequently, we use the aforementioned unconditional diffusion model to add noise to and denoise each sample before fine-tuning. This process mitigates the potential domain gap between the conditional and unconditional models. Extensive experiments across various models and benchmarks demonstrate that SDA achieves superior domain alignment and consistently outperforms existing diffusion-driven TTA methods. Our code is available at https://github.com/SHI-Labs/Diffusion-Driven-Test-Time-Adaptation-via-Synthetic-Domain-Alignment.

[356]  arXiv:2406.04298 [pdf, other]
Title: Measuring and Addressing Indexical Bias in Information Retrieval
Comments: ACL 2024
Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL)

Information Retrieval (IR) systems are designed to deliver relevant content, but traditional systems may not optimize rankings for fairness, neutrality, or the balance of ideas. Consequently, IR can often introduce indexical biases, or biases in the positional order of documents. Although indexical bias can demonstrably affect people's opinion, voting patterns, and other behaviors, these issues remain understudied as the field lacks reliable metrics and procedures for automatically measuring indexical bias. Towards this end, we introduce the PAIR framework, which supports automatic bias audits for ranked documents or entire IR systems. After introducing DUO, the first general-purpose automatic bias metric, we run an extensive evaluation of 8 IR systems on a new corpus of 32k synthetic and 4.7k natural documents, with 4k queries spanning 1.4k controversial issue topics. A human behavioral study validates our approach, showing that our bias metric can help predict when and how indexical bias will shift a reader's opinion.

[357]  arXiv:2406.04299 [pdf, other]
Title: NoisyGL: A Comprehensive Benchmark for Graph Neural Networks under Label Noise
Comments: Submitted to the 38th Conference on Neural Information Processing Systems (NeurIPS 2024) Track on Datasets and Benchmarks
Subjects: Machine Learning (cs.LG); Social and Information Networks (cs.SI)

Graph Neural Networks (GNNs) exhibit strong potential in node classification task through a message-passing mechanism. However, their performance often hinges on high-quality node labels, which are challenging to obtain in real-world scenarios due to unreliable sources or adversarial attacks. Consequently, label noise is common in real-world graph data, negatively impacting GNNs by propagating incorrect information during training. To address this issue, the study of Graph Neural Networks under Label Noise (GLN) has recently gained traction. However, due to variations in dataset selection, data splitting, and preprocessing techniques, the community currently lacks a comprehensive benchmark, which impedes deeper understanding and further development of GLN. To fill this gap, we introduce NoisyGL in this paper, the first comprehensive benchmark for graph neural networks under label noise. NoisyGL enables fair comparisons and detailed analyses of GLN methods on noisy labeled graph data across various datasets, with unified experimental settings and interface. Our benchmark has uncovered several important insights that were missed in previous research, and we believe these findings will be highly beneficial for future studies. We hope our open-source benchmark library will foster further advancements in this field. The code of the benchmark can be found in https://github.com/eaglelab-zju/NoisyGL.

[358]  arXiv:2406.04300 [pdf, other]
Title: Text-to-Drive: Diverse Driving Behavior Synthesis via Large Language Models
Comments: 14 pages, 7 figures
Subjects: Robotics (cs.RO)

Generating varied scenarios through simulation is crucial for training and evaluating safety-critical systems, such as autonomous vehicles. Yet, the task of modeling the trajectories of other vehicles to simulate diverse and meaningful close interactions remains prohibitively costly. Adopting language descriptions to generate driving behaviors emerges as a promising strategy, offering a scalable and intuitive method for human operators to simulate a wide range of driving interactions. However, the scarcity of large-scale annotated language-trajectory data makes this approach challenging.
To address this gap, we propose Text-to-Drive (T2D) to synthesize diverse driving behaviors via Large Language Models (LLMs). We introduce a knowledge-driven approach that operates in two stages. In the first stage, we employ the embedded knowledge of LLMs to generate diverse language descriptions of driving behaviors for a scene. Then, we leverage LLM's reasoning capabilities to synthesize these behaviors in simulation. At its core, T2D employs an LLM to construct a state chart that maps low-level states to high-level abstractions. This strategy aids in downstream tasks such as summarizing low-level observations, assessing policy alignment with behavior description, and shaping the auxiliary reward, all without needing human supervision. With our knowledge-driven approach, we demonstrate that T2D generates more diverse trajectories compared to other baselines and offers a natural language interface that allows for interactive incorporation of human preference. Please check our website for more examples: https://text-to-drive.github.io/

[359]  arXiv:2406.04301 [pdf, other]
Title: Neural Surface Reconstruction from Sparse Views Using Epipolar Geometry
Authors: Kaichen Zhou
Subjects: Computer Vision and Pattern Recognition (cs.CV)

This paper addresses the challenge of reconstructing surfaces from sparse view inputs, where ambiguity and occlusions due to missing information pose significant hurdles. We present a novel approach, named EpiS, that incorporates Epipolar information into the reconstruction process. Existing methods in sparse-view neural surface learning have mainly focused on mean and variance considerations using cost volumes for feature extraction. In contrast, our method aggregates coarse information from the cost volume into Epipolar features extracted from multiple source views, enabling the generation of fine-grained Signal Distance Function (SDF)-aware features. Additionally, we employ an attention mechanism along the line dimension to facilitate feature fusion based on the SDF feature. Furthermore, to address the information gaps in sparse conditions, we integrate depth information from monocular depth estimation using global and local regularization techniques. The global regularization utilizes a triplet loss function, while the local regularization employs a derivative loss function. Extensive experiments demonstrate that our approach outperforms state-of-the-art methods, especially in cases with sparse and generalizable conditions.

[360]  arXiv:2406.04302 [pdf, other]
Title: Representational Alignment Supports Effective Machine Teaching
Comments: Preprint
Subjects: Machine Learning (cs.LG)

A good teacher should not only be knowledgeable; but should be able to communicate in a way that the student understands -- to share the student's representation of the world. In this work, we integrate insights from machine teaching and pragmatic communication with the burgeoning literature on representational alignment to characterize a utility curve defining a relationship between representational alignment and teacher capability for promoting student learning. To explore the characteristics of this utility curve, we design a supervised learning environment that disentangles representational alignment from teacher accuracy. We conduct extensive computational experiments with machines teaching machines, complemented by a series of experiments in which machines teach humans. Drawing on our findings that improved representational alignment with a student improves student learning outcomes (i.e., task accuracy), we design a classroom matching procedure that assigns students to teachers based on the utility curve. If we are to design effective machine teachers, it is not enough to build teachers that are accurate -- we want teachers that can align, representationally, to their students too.

[361]  arXiv:2406.04303 [pdf, other]
Title: Vision-LSTM: xLSTM as Generic Vision Backbone
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Transformers are widely used as generic backbones in computer vision, despite initially introduced for natural language processing. Recently, the Long Short-Term Memory (LSTM) has been extended to a scalable and performant architecture - the xLSTM - which overcomes long-standing LSTM limitations via exponential gating and parallelizable matrix memory structure. In this report, we introduce Vision-LSTM (ViL), an adaption of the xLSTM building blocks to computer vision. ViL comprises a stack of xLSTM blocks where odd blocks process the sequence of patch tokens from top to bottom while even blocks go from bottom to top. Experiments show that ViL holds promise to be further deployed as new generic backbone for computer vision architectures.

[362]  arXiv:2406.04306 [pdf, other]
Title: Semantically Diverse Language Generation for Uncertainty Estimation in Language Models
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Large language models (LLMs) can suffer from hallucinations when generating text. These hallucinations impede various applications in society and industry by making LLMs untrustworthy. Current LLMs generate text in an autoregressive fashion by predicting and appending text tokens. When an LLM is uncertain about the semantic meaning of the next tokens to generate, it is likely to start hallucinating. Thus, it has been suggested that hallucinations stem from predictive uncertainty. We introduce Semantically Diverse Language Generation (SDLG) to quantify predictive uncertainty in LLMs. SDLG steers the LLM to generate semantically diverse yet likely alternatives for an initially generated text. This approach provides a precise measure of aleatoric semantic uncertainty, detecting whether the initial text is likely to be hallucinated. Experiments on question-answering tasks demonstrate that SDLG consistently outperforms existing methods while being the most computationally efficient, setting a new standard for uncertainty estimation in LLMs.

[363]  arXiv:2406.04308 [pdf, other]
Title: Approximation-Aware Bayesian Optimization
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

High-dimensional Bayesian optimization (BO) tasks such as molecular design often require 10,000 function evaluations before obtaining meaningful results. While methods like sparse variational Gaussian processes (SVGPs) reduce computational requirements in these settings, the underlying approximations result in suboptimal data acquisitions that slow the progress of optimization. In this paper we modify SVGPs to better align with the goals of BO: targeting informed data acquisition rather than global posterior fidelity. Using the framework of utility-calibrated variational inference, we unify GP approximation and data acquisition into a joint optimization problem, thereby ensuring optimal decisions under a limited computational budget. Our approach can be used with any decision-theoretic acquisition function and is compatible with trust region methods like TuRBO. We derive efficient joint objectives for the expected improvement and knowledge gradient acquisition functions in both the standard and batch BO settings. Our approach outperforms standard SVGPs on high-dimensional benchmark tasks in control and molecular design.

[364]  arXiv:2406.04309 [pdf, other]
Title: ReFiNe: Recursive Field Networks for Cross-modal Multi-scene Representation
Comments: SIGGRAPH 2024. Project Page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (cs.LG); Multimedia (cs.MM)

The common trade-offs of state-of-the-art methods for multi-shape representation (a single model "packing" multiple objects) involve trading modeling accuracy against memory and storage. We show how to encode multiple shapes represented as continuous neural fields with a higher degree of precision than previously possible and with low memory usage. Key to our approach is a recursive hierarchical formulation that exploits object self-similarity, leading to a highly compressed and efficient shape latent space. Thanks to the recursive formulation, our method supports spatial and global-to-local latent feature fusion without needing to initialize and maintain auxiliary data structures, while still allowing for continuous field queries to enable applications such as raytracing. In experiments on a set of diverse datasets, we provide compelling qualitative results and demonstrate state-of-the-art multi-scene reconstruction and compression results with a single network per dataset.

[365]  arXiv:2406.04312 [pdf, other]
Title: ReNO: Enhancing One-step Text-to-Image Models through Reward-based Noise Optimization
Comments: Preprint
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Text-to-Image (T2I) models have made significant advancements in recent years, but they still struggle to accurately capture intricate details specified in complex compositional prompts. While fine-tuning T2I models with reward objectives has shown promise, it suffers from "reward hacking" and may not generalize well to unseen prompt distributions. In this work, we propose Reward-based Noise Optimization (ReNO), a novel approach that enhances T2I models at inference by optimizing the initial noise based on the signal from one or multiple human preference reward models. Remarkably, solving this optimization problem with gradient ascent for 50 iterations yields impressive results on four different one-step models across two competitive benchmarks, T2I-CompBench and GenEval. Within a computational budget of 20-50 seconds, ReNO-enhanced one-step models consistently surpass the performance of all current open-source Text-to-Image models. Extensive user studies demonstrate that our model is preferred nearly twice as often compared to the popular SDXL model and is on par with the proprietary Stable Diffusion 3 with 8B parameters. Moreover, given the same computational resources, a ReNO-optimized one-step model outperforms widely-used open-source models such as SDXL and PixArt-$\alpha$, highlighting the efficiency and effectiveness of ReNO in enhancing T2I model performance at inference time. Code is available at https://github.com/ExplainableML/ReNO.

[366]  arXiv:2406.04313 [pdf, other]
Title: Improving Alignment and Robustness with Short Circuiting
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Computers and Society (cs.CY)

AI systems can take harmful actions and are highly vulnerable to adversarial attacks. We present an approach, inspired by recent advances in representation engineering, that "short-circuits" models as they respond with harmful outputs. Existing techniques aimed at improving alignment, such as refusal training, are often bypassed. Techniques such as adversarial training try to plug these holes by countering specific attacks. As an alternative to refusal training and adversarial training, short-circuiting directly controls the representations that are responsible for harmful outputs in the first place. Our technique can be applied to both text-only and multimodal language models to prevent the generation of harmful outputs without sacrificing utility -- even in the presence of powerful unseen attacks. Notably, while adversarial robustness in standalone image recognition remains an open challenge, short-circuiting allows the larger multimodal system to reliably withstand image "hijacks" that aim to produce harmful content. Finally, we extend our approach to AI agents, demonstrating considerable reductions in the rate of harmful actions when they are under attack. Our approach represents a significant step forward in the development of reliable safeguards to harmful behavior and adversarial attacks.

[367]  arXiv:2406.04314 [pdf, other]
Title: Step-aware Preference Optimization: Aligning Preference with Denoising Performance at Each Step
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Recently, Direct Preference Optimization (DPO) has extended its success from aligning large language models (LLMs) to aligning text-to-image diffusion models with human preferences. Unlike most existing DPO methods that assume all diffusion steps share a consistent preference order with the final generated images, we argue that this assumption neglects step-specific denoising performance and that preference labels should be tailored to each step's contribution. To address this limitation, we propose Step-aware Preference Optimization (SPO), a novel post-training approach that independently evaluates and adjusts the denoising performance at each step, using a step-aware preference model and a step-wise resampler to ensure accurate step-aware supervision. Specifically, at each denoising step, we sample a pool of images, find a suitable win-lose pair, and, most importantly, randomly select a single image from the pool to initialize the next denoising step. This step-wise resampler process ensures the next win-lose image pair comes from the same image, making the win-lose comparison independent of the previous step. To assess the preferences at each step, we train a separate step-aware preference model that can be applied to both noisy and clean images. Our experiments with Stable Diffusion v1.5 and SDXL demonstrate that SPO significantly outperforms the latest Diffusion-DPO in aligning generated images with complex, detailed prompts and enhancing aesthetics, while also achieving more than 20x times faster in training efficiency. Code and model: https://rockeycoss.github.io/spo.github.io/

[368]  arXiv:2406.04316 [pdf, other]
Title: Omni6DPose: A Benchmark and Model for Universal 6D Object Pose Estimation and Tracking
Subjects: Computer Vision and Pattern Recognition (cs.CV)

6D Object Pose Estimation is a crucial yet challenging task in computer vision, suffering from a significant lack of large-scale datasets. This scarcity impedes comprehensive evaluation of model performance, limiting research advancements. Furthermore, the restricted number of available instances or categories curtails its applications. To address these issues, this paper introduces Omni6DPose, a substantial dataset characterized by its diversity in object categories, large scale, and variety in object materials. Omni6DPose is divided into three main components: ROPE (Real 6D Object Pose Estimation Dataset), which includes 332K images annotated with over 1.5M annotations across 581 instances in 149 categories; SOPE(Simulated 6D Object Pose Estimation Dataset), consisting of 475K images created in a mixed reality setting with depth simulation, annotated with over 5M annotations across 4162 instances in the same 149 categories; and the manually aligned real scanned objects used in both ROPE and SOPE. Omni6DPose is inherently challenging due to the substantial variations and ambiguities. To address this challenge, we introduce GenPose++, an enhanced version of the SOTA category-level pose estimation framework, incorporating two pivotal improvements: Semantic-aware feature extraction and Clustering-based aggregation. Moreover, we provide a comprehensive benchmarking analysis to evaluate the performance of previous methods on this large-scale dataset in the realms of 6D object pose estimation and pose tracking.

[369]  arXiv:2406.04317 [pdf, other]
Title: Regularized KL-Divergence for Well-Defined Function-Space Variational Inference in Bayesian neural networks
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Bayesian neural networks (BNN) promise to combine the predictive performance of neural networks with principled uncertainty modeling important for safety-critical systems and decision making. However, posterior uncertainty estimates depend on the choice of prior, and finding informative priors in weight-space has proven difficult. This has motivated variational inference (VI) methods that pose priors directly on the function generated by the BNN rather than on weights. In this paper, we address a fundamental issue with such function-space VI approaches pointed out by Burt et al. (2020), who showed that the objective function (ELBO) is negative infinite for most priors of interest. Our solution builds on generalized VI (Knoblauch et al., 2019) with the regularized KL divergence (Quang, 2019) and is, to the best of our knowledge, the first well-defined variational objective for function-space inference in BNNs with Gaussian process (GP) priors. Experiments show that our method incorporates the properties specified by the GP prior on synthetic and small real-world data sets, and provides competitive uncertainty estimates for regression, classification and out-of-distribution detection compared to BNN baselines with both function and weight-space priors.

[370]  arXiv:2406.04318 [pdf, other]
Title: Adaptive Sampling of k-Space in Magnetic Resonance for Rapid Pathology Prediction
Comments: ICML 2024. Project website at this https URL
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

Magnetic Resonance (MR) imaging, despite its proven diagnostic utility, remains an inaccessible imaging modality for disease surveillance at the population level. A major factor rendering MR inaccessible is lengthy scan times. An MR scanner collects measurements associated with the underlying anatomy in the Fourier space, also known as the k-space. Creating a high-fidelity image requires collecting large quantities of such measurements, increasing the scan time. Traditionally to accelerate an MR scan, image reconstruction from under-sampled k-space data is the method of choice. However, recent works show the feasibility of bypassing image reconstruction and directly learning to detect disease directly from a sparser learned subset of the k-space measurements. In this work, we propose Adaptive Sampling for MR (ASMR), a sampling method that learns an adaptive policy to sequentially select k-space samples to optimize for target disease detection. On 6 out of 8 pathology classification tasks spanning the Knee, Brain, and Prostate MR scans, ASMR reaches within 2% of the performance of a fully sampled classifier while using only 8% of the k-space, as well as outperforming prior state-of-the-art work in k-space sampling such as EMRT, LOUPE, and DPS.

[371]  arXiv:2406.04320 [pdf, other]
Title: Chimera: Effectively Modeling Multivariate Time Series with 2-Dimensional State Space Models
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Modeling multivariate time series is a well-established problem with a wide range of applications from healthcare to financial markets. Traditional State Space Models (SSMs) are classical approaches for univariate time series modeling due to their simplicity and expressive power to represent linear dependencies. They, however, have fundamentally limited expressive power to capture non-linear dependencies, are slow in practice, and fail to model the inter-variate information flow. Despite recent attempts to improve the expressive power of SSMs by using deep structured SSMs, the existing methods are either limited to univariate time series, fail to model complex patterns (e.g., seasonal patterns), fail to dynamically model the dependencies of variate and time dimensions, and/or are input-independent. We present Chimera that uses two input-dependent 2-D SSM heads with different discretization processes to learn long-term progression and seasonal patterns. To improve the efficiency of complex 2D recurrence, we present a fast training using a new 2-dimensional parallel selective scan. We further present and discuss 2-dimensional Mamba and Mamba-2 as the spacial cases of our 2D SSM. Our experimental evaluation shows the superior performance of Chimera on extensive and diverse benchmarks, including ECG and speech time series classification, long-term and short-term time series forecasting, and time series anomaly detection.

[372]  arXiv:2406.04321 [pdf, other]
Title: VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling
Comments: The code and datasets will be available at this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM); Sound (cs.SD)

In this work, we systematically study music generation conditioned solely on the video. First, we present a large-scale dataset comprising 190K video-music pairs, including various genres such as movie trailers, advertisements, and documentaries. Furthermore, we propose VidMuse, a simple framework for generating music aligned with video inputs. VidMuse stands out by producing high-fidelity music that is both acoustically and semantically aligned with the video. By incorporating local and global visual cues, VidMuse enables the creation of musically coherent audio tracks that consistently match the video content through Long-Short-Term modeling. Through extensive experiments, VidMuse outperforms existing models in terms of audio quality, diversity, and audio-visual alignment. The code and datasets will be available at https://github.com/ZeyueT/VidMuse/.

[373]  arXiv:2406.04322 [pdf, other]
Title: DIRECT-3D: Learning Direct Text-to-3D Generation on Massive Noisy 3D Data
Comments: Accepted to CVPR 2024; code: this https URL; project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We present DIRECT-3D, a diffusion-based 3D generative model for creating high-quality 3D assets (represented by Neural Radiance Fields) from text prompts. Unlike recent 3D generative models that rely on clean and well-aligned 3D data, limiting them to single or few-class generation, our model is directly trained on extensive noisy and unaligned `in-the-wild' 3D assets, mitigating the key challenge (i.e., data scarcity) in large-scale 3D generation. In particular, DIRECT-3D is a tri-plane diffusion model that integrates two innovations: 1) A novel learning framework where noisy data are filtered and aligned automatically during the training process. Specifically, after an initial warm-up phase using a small set of clean data, an iterative optimization is introduced in the diffusion process to explicitly estimate the 3D pose of objects and select beneficial data based on conditional density. 2) An efficient 3D representation that is achieved by disentangling object geometry and color features with two separate conditional diffusion models that are optimized hierarchically. Given a prompt input, our model generates high-quality, high-resolution, realistic, and complex 3D objects with accurate geometric details in seconds. We achieve state-of-the-art performance in both single-class generation and text-to-3D generation. We also demonstrate that DIRECT-3D can serve as a useful 3D geometric prior of objects, for example to alleviate the well-known Janus problem in 2D-lifting methods such as DreamFusion. The code and models are available for research purposes at: https://github.com/qihao067/direct3d.

[374]  arXiv:2406.04323 [pdf, other]
Title: ATraDiff: Accelerating Online Reinforcement Learning with Imaginary Trajectories
Comments: ICML 2024 Accepted
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

Training autonomous agents with sparse rewards is a long-standing problem in online reinforcement learning (RL), due to low data efficiency. Prior work overcomes this challenge by extracting useful knowledge from offline data, often accomplished through the learning of action distribution from offline data and utilizing the learned distribution to facilitate online RL. However, since the offline data are given and fixed, the extracted knowledge is inherently limited, making it difficult to generalize to new tasks. We propose a novel approach that leverages offline data to learn a generative diffusion model, coined as Adaptive Trajectory Diffuser (ATraDiff). This model generates synthetic trajectories, serving as a form of data augmentation and consequently enhancing the performance of online RL methods. The key strength of our diffuser lies in its adaptability, allowing it to effectively handle varying trajectory lengths and mitigate distribution shifts between online and offline data. Because of its simplicity, ATraDiff seamlessly integrates with a wide spectrum of RL methods. Empirical evaluation shows that ATraDiff consistently achieves state-of-the-art performance across a variety of environments, with particularly pronounced improvements in complicated settings. Our code and demo video are available at https://atradiff.github.io .

[375]  arXiv:2406.04324 [pdf, other]
Title: SF-V: Single Forward Video Generation Model
Comments: Project Page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)

Diffusion-based video generation models have demonstrated remarkable success in obtaining high-fidelity videos through the iterative denoising process. However, these models require multiple denoising steps during sampling, resulting in high computational costs. In this work, we propose a novel approach to obtain single-step video generation models by leveraging adversarial training to fine-tune pre-trained video diffusion models. We show that, through the adversarial training, the multi-steps video diffusion model, i.e., Stable Video Diffusion (SVD), can be trained to perform single forward pass to synthesize high-quality videos, capturing both temporal and spatial dependencies in the video data. Extensive experiments demonstrate that our method achieves competitive generation quality of synthesized videos with significantly reduced computational overhead for the denoising process (i.e., around $23\times$ speedup compared with SVD and $6\times$ speedup compared with existing works, with even better generation quality), paving the way for real-time video synthesis and editing. More visualization results are made publicly available at https://snap-research.github.io/SF-V.

[376]  arXiv:2406.04325 [pdf, other]
Title: ShareGPT4Video: Improving Video Understanding and Generation with Better Captions
Comments: Project Page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We present the ShareGPT4Video series, aiming to facilitate the video understanding of large video-language models (LVLMs) and the video generation of text-to-video models (T2VMs) via dense and precise captions. The series comprises: 1) ShareGPT4Video, 40K GPT4V annotated dense captions of videos with various lengths and sources, developed through carefully designed data filtering and annotating strategy. 2) ShareCaptioner-Video, an efficient and capable captioning model for arbitrary videos, with 4.8M high-quality aesthetic videos annotated by it. 3) ShareGPT4Video-8B, a simple yet superb LVLM that reached SOTA performance on three advancing video benchmarks. To achieve this, taking aside the non-scalable costly human annotators, we find using GPT4V to caption video with a naive multi-frame or frame-concatenation input strategy leads to less detailed and sometimes temporal-confused results. We argue the challenge of designing a high-quality video captioning strategy lies in three aspects: 1) Inter-frame precise temporal change understanding. 2) Intra-frame detailed content description. 3) Frame-number scalability for arbitrary-length videos. To this end, we meticulously designed a differential video captioning strategy, which is stable, scalable, and efficient for generating captions for videos with arbitrary resolution, aspect ratios, and length. Based on it, we construct ShareGPT4Video, which contains 40K high-quality videos spanning a wide range of categories, and the resulting captions encompass rich world knowledge, object attributes, camera movements, and crucially, detailed and precise temporal descriptions of events. Based on ShareGPT4Video, we further develop ShareCaptioner-Video, a superior captioner capable of efficiently generating high-quality captions for arbitrary videos...

[377]  arXiv:2406.04327 [pdf, other]
Title: Causal Estimation of Memorisation Profiles
Comments: Published at the ACL 2024 Conference (main)
Subjects: Machine Learning (cs.LG)

Understanding memorisation in language models has practical and societal implications, e.g., studying models' training dynamics or preventing copyright infringements. Prior work defines memorisation as the causal effect of training with an instance on the model's ability to predict that instance. This definition relies on a counterfactual: the ability to observe what would have happened had the model not seen that instance. Existing methods struggle to provide computationally efficient and accurate estimates of this counterfactual. Further, they often estimate memorisation for a model architecture rather than for a specific model instance. This paper fills an important gap in the literature, proposing a new, principled, and efficient method to estimate memorisation based on the difference-in-differences design from econometrics. Using this method, we characterise a model's memorisation profile--its memorisation trends across training--by only observing its behaviour on a small set of instances throughout training. In experiments with the Pythia model suite, we find that memorisation (i) is stronger and more persistent in larger models, (ii) is determined by data order and learning rate, and (iii) has stable trends across model sizes, thus making memorisation in larger models predictable from smaller ones.

[378]  arXiv:2406.04328 [pdf, other]
Title: The Brain's Bitter Lesson: Scaling Speech Decoding With Self-Supervised Learning
Comments: 10 pages, 4 figures, under review
Subjects: Machine Learning (cs.LG)

The past few years have produced a series of spectacular advances in the decoding of speech from brain activity. The engine of these advances has been the acquisition of labelled data, with increasingly large datasets acquired from single subjects. However, participants exhibit anatomical and other individual differences, and datasets use varied scanners and task designs. As a result, prior work has struggled to leverage data from multiple subjects, multiple datasets, multiple tasks, and unlabelled datasets. In turn, the field has not benefited from the rapidly growing number of open neural data repositories to exploit large-scale data and deep learning. To address this, we develop an initial set of neuroscience-inspired self-supervised objectives, together with a neural architecture, for representation learning from heterogeneous and unlabelled neural recordings. Experimental results show that representations learned with these objectives generalise across subjects, datasets, and tasks, and are also learned faster than using only labelled data. In addition, we set new benchmarks for two foundational speech decoding tasks. Taken together, these methods now unlock the potential for training speech decoding models with orders of magnitude more existing data.

[379]  arXiv:2406.04329 [pdf, other]
Title: Simplified and Generalized Masked Diffusion for Discrete Data
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Masked (or absorbing) diffusion is actively explored as an alternative to autoregressive models for generative modeling of discrete data. However, existing work in this area has been hindered by unnecessarily complex model formulations and unclear relationships between different perspectives, leading to suboptimal parameterization, training objectives, and ad hoc adjustments to counteract these issues. In this work, we aim to provide a simple and general framework that unlocks the full potential of masked diffusion models. We show that the continuous-time variational objective of masked diffusion models is a simple weighted integral of cross-entropy losses. Our framework also enables training generalized masked diffusion models with state-dependent masking schedules. When evaluated by perplexity, our models trained on OpenWebText surpass prior diffusion language models at GPT-2 scale and demonstrate superior performance on 4 out of 5 zero-shot language modeling tasks. Furthermore, our models vastly outperform previous discrete diffusion models on pixel-level image modeling, achieving 2.78~(CIFAR-10) and 3.42 (ImageNet 64$\times$64) bits per dimension that are comparable or better than autoregressive models of similar sizes.

[380]  arXiv:2406.04330 [pdf, other]
Title: Parameter-Inverted Image Pyramid Networks
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Image pyramids are commonly used in modern computer vision tasks to obtain multi-scale features for precise understanding of images. However, image pyramids process multiple resolutions of images using the same large-scale model, which requires significant computational cost. To overcome this issue, we propose a novel network architecture known as the Parameter-Inverted Image Pyramid Networks (PIIP). Our core idea is to use models with different parameter sizes to process different resolution levels of the image pyramid, thereby balancing computational efficiency and performance. Specifically, the input to PIIP is a set of multi-scale images, where higher resolution images are processed by smaller networks. We further propose a feature interaction mechanism to allow features of different resolutions to complement each other and effectively integrate information from different spatial scales. Extensive experiments demonstrate that the PIIP achieves superior performance in tasks such as object detection, segmentation, and image classification, compared to traditional image pyramid methods and single-branch networks, while reducing computational cost. Notably, when applying our method on a large-scale vision foundation model InternViT-6B, we improve its performance by 1%-2% on detection and segmentation with only 40%-60% of the original computation. These results validate the effectiveness of the PIIP approach and provide a new technical direction for future vision computing tasks. Our code and models are available at https://github.com/OpenGVLab/PIIP.

[381]  arXiv:2406.04331 [pdf, other]
Title: PaCE: Parsimonious Concept Engineering for Large Language Models
Comments: 26 pages, 17 figures, 5 tables, dataset and code at this https URL
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Machine Learning (cs.LG)

Large Language Models (LLMs) are being used for a wide variety of tasks. While they are capable of generating human-like responses, they can also produce undesirable output including potentially harmful information, racist or sexist language, and hallucinations. Alignment methods are designed to reduce such undesirable output, via techniques such as fine-tuning, prompt engineering, and representation engineering. However, existing methods face several challenges: some require costly fine-tuning for every alignment task; some do not adequately remove undesirable concepts, failing alignment; some remove benign concepts, lowering the linguistic capabilities of LLMs. To address these issues, we propose Parsimonious Concept Engineering (PaCE), a novel activation engineering framework for alignment. First, to sufficiently model the concepts, we construct a large-scale concept dictionary in the activation space, in which each atom corresponds to a semantic concept. Then, given any alignment task, we instruct a concept partitioner to efficiently annotate the concepts as benign or undesirable. Finally, at inference time, we decompose the LLM activations along the concept dictionary via sparse coding, to accurately represent the activation as a linear combination of the benign and undesirable components. By removing the latter ones from the activation, we reorient the behavior of LLMs towards alignment goals. We conduct experiments on tasks such as response detoxification, faithfulness enhancement, and sentiment revising, and show that PaCE achieves state-of-the-art alignment performance while maintaining linguistic capabilities.

[382]  arXiv:2406.04332 [pdf, other]
Title: Coarse-To-Fine Tensor Trains for Compact Visual Representations
Comments: Project webpage: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

The ability to learn compact, high-quality, and easy-to-optimize representations for visual data is paramount to many applications such as novel view synthesis and 3D reconstruction. Recent work has shown substantial success in using tensor networks to design such compact and high-quality representations. However, the ability to optimize tensor-based representations, and in particular, the highly compact tensor train representation, is still lacking. This has prevented practitioners from deploying the full potential of tensor networks for visual data. To this end, we propose 'Prolongation Upsampling Tensor Train (PuTT)', a novel method for learning tensor train representations in a coarse-to-fine manner. Our method involves the prolonging or `upsampling' of a learned tensor train representation, creating a sequence of 'coarse-to-fine' tensor trains that are incrementally refined. We evaluate our representation along three axes: (1). compression, (2). denoising capability, and (3). image completion capability. To assess these axes, we consider the tasks of image fitting, 3D fitting, and novel view synthesis, where our method shows an improved performance compared to state-of-the-art tensor-based methods. For full results see our project webpage: https://sebulo.github.io/PuTT_website/

[383]  arXiv:2406.04333 [pdf, other]
Title: BitsFusion: 1.99 bits Weight Quantization of Diffusion Model
Comments: Project Page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Diffusion-based image generation models have achieved great success in recent years by showing the capability of synthesizing high-quality content. However, these models contain a huge number of parameters, resulting in a significantly large model size. Saving and transferring them is a major bottleneck for various applications, especially those running on resource-constrained devices. In this work, we develop a novel weight quantization method that quantizes the UNet from Stable Diffusion v1.5 to 1.99 bits, achieving a model with 7.9X smaller size while exhibiting even better generation quality than the original one. Our approach includes several novel techniques, such as assigning optimal bits to each layer, initializing the quantized model for better performance, and improving the training strategy to dramatically reduce quantization error. Furthermore, we extensively evaluate our quantized model across various benchmark datasets and through human evaluation to demonstrate its superior generation quality.

[384]  arXiv:2406.04334 [pdf, other]
Title: DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effective for LMMs
Comments: Project Page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Most large multimodal models (LMMs) are implemented by feeding visual tokens as a sequence into the first layer of a large language model (LLM). The resulting architecture is simple but significantly increases computation and memory costs, as it has to handle a large number of additional tokens in its input layer. This paper presents a new architecture DeepStack for LMMs. Considering $N$ layers in the language and vision transformer of LMMs, we stack the visual tokens into $N$ groups and feed each group to its aligned transformer layer \textit{from bottom to top}. Surprisingly, this simple method greatly enhances the power of LMMs to model interactions among visual tokens across layers but with minimal additional cost. We apply DeepStack to both language and vision transformer in LMMs, and validate the effectiveness of DeepStack LMMs with extensive empirical results. Using the same context length, our DeepStack 7B and 13B parameters surpass their counterparts by \textbf{2.7} and \textbf{2.9} on average across \textbf{9} benchmarks, respectively. Using only one-fifth of the context length, DeepStack rivals closely to the counterparts that use the full context length. These gains are particularly pronounced on high-resolution tasks, e.g., \textbf{4.2}, \textbf{11.0}, and \textbf{4.0} improvements on TextVQA, DocVQA, and InfoVQA compared to LLaVA-1.5-7B, respectively. We further apply DeepStack to vision transformer layers, which brings us a similar amount of improvements, \textbf{3.8} on average compared with LLaVA-1.5-7B.

[385]  arXiv:2406.04336 [pdf, other]
Title: On the Expressive Power of Spectral Invariant Graph Neural Networks
Comments: 31 pages; 3 figures; to appear in ICML 2024
Subjects: Machine Learning (cs.LG); Discrete Mathematics (cs.DM); Data Structures and Algorithms (cs.DS); Combinatorics (math.CO); Spectral Theory (math.SP)

Incorporating spectral information to enhance Graph Neural Networks (GNNs) has shown promising results but raises a fundamental challenge due to the inherent ambiguity of eigenvectors. Various architectures have been proposed to address this ambiguity, referred to as spectral invariant architectures. Notable examples include GNNs and Graph Transformers that use spectral distances, spectral projection matrices, or other invariant spectral features. However, the potential expressive power of these spectral invariant architectures remains largely unclear. The goal of this work is to gain a deep theoretical understanding of the expressive power obtainable when using spectral features. We first introduce a unified message-passing framework for designing spectral invariant GNNs, called Eigenspace Projection GNN (EPNN). A comprehensive analysis shows that EPNN essentially unifies all prior spectral invariant architectures, in that they are either strictly less expressive or equivalent to EPNN. A fine-grained expressiveness hierarchy among different architectures is also established. On the other hand, we prove that EPNN itself is bounded by a recently proposed class of Subgraph GNNs, implying that all these spectral invariant architectures are strictly less expressive than 3-WL. Finally, we discuss whether using spectral features can gain additional expressiveness when combined with more expressive GNNs.

[386]  arXiv:2406.04337 [pdf, other]
Title: Coherent Zero-Shot Visual Instruction Generation
Comments: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Despite the advances in text-to-image synthesis, particularly with diffusion models, generating visual instructions that require consistent representation and smooth state transitions of objects across sequential steps remains a formidable challenge. This paper introduces a simple, training-free framework to tackle the issues, capitalizing on the advancements in diffusion models and large language models (LLMs). Our approach systematically integrates text comprehension and image generation to ensure visual instructions are visually appealing and maintain consistency and accuracy throughout the instruction sequence. We validate the effectiveness by testing multi-step instructions and comparing the text alignment and consistency with several baselines. Our experiments show that our approach can visualize coherent and visually pleasing instructions

[387]  arXiv:2406.04338 [pdf, other]
Title: Physics3D: Learning Physical Properties of 3D Gaussians via Video Diffusion
Comments: Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR)

In recent years, there has been rapid development in 3D generation models, opening up new possibilities for applications such as simulating the dynamic movements of 3D objects and customizing their behaviors. However, current 3D generative models tend to focus only on surface features such as color and shape, neglecting the inherent physical properties that govern the behavior of objects in the real world. To accurately simulate physics-aligned dynamics, it is essential to predict the physical properties of materials and incorporate them into the behavior prediction process. Nonetheless, predicting the diverse materials of real-world objects is still challenging due to the complex nature of their physical attributes. In this paper, we propose \textbf{Physics3D}, a novel method for learning various physical properties of 3D objects through a video diffusion model. Our approach involves designing a highly generalizable physical simulation system based on a viscoelastic material model, which enables us to simulate a wide range of materials with high-fidelity capabilities. Moreover, we distill the physical priors from a video diffusion model that contains more understanding of realistic object materials. Extensive experiments demonstrate the effectiveness of our method with both elastic and plastic materials. Physics3D shows great potential for bridging the gap between the physical world and virtual neural space, providing a better integration and application of realistic physical principles in virtual environments. Project page: https://liuff19.github.io/Physics3D.

[388]  arXiv:2406.04339 [pdf, other]
Title: RoboMamba: Multimodal State Space Model for Efficient Robot Reasoning and Manipulation
Subjects: Computer Vision and Pattern Recognition (cs.CV)

A fundamental objective in robot manipulation is to enable models to comprehend visual scenes and execute actions. Although existing robot Multimodal Large Language Models (MLLMs) can handle a range of basic tasks, they still face challenges in two areas: 1) inadequate reasoning ability to tackle complex tasks, and 2) high computational costs for MLLM fine-tuning and inference. The recently proposed state space model (SSM) known as Mamba demonstrates promising capabilities in non-trivial sequence modeling with linear inference complexity. Inspired by this, we introduce RoboMamba, an end-to-end robotic MLLM that leverages the Mamba model to deliver both robotic reasoning and action capabilities, while maintaining efficient fine-tuning and inference. Specifically, we first integrate the vision encoder with Mamba, aligning visual data with language embedding through co-training, empowering our model with visual common sense and robot-related reasoning. To further equip RoboMamba with action pose prediction abilities, we explore an efficient fine-tuning strategy with a simple policy head. We find that once RoboMamba possesses sufficient reasoning capability, it can acquire manipulation skills with minimal fine-tuning parameters (0.1\% of the model) and time (20 minutes). In experiments, RoboMamba demonstrates outstanding reasoning capabilities on general and robotic evaluation benchmarks. Meanwhile, our model showcases impressive pose prediction results in both simulation and real-world experiments, achieving inference speeds 7 times faster than existing robot MLLMs. Our project web page: https://sites.google.com/view/robomamba-web

[389]  arXiv:2406.04340 [pdf, other]
Title: GLACE: Global Local Accelerated Coordinate Encoding
Comments: Large-scale visual localization with a single optimizable MLP. CVPR 2024. Code: this https URL Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Scene coordinate regression (SCR) methods are a family of visual localization methods that directly regress 2D-3D matches for camera pose estimation. They are effective in small-scale scenes but face significant challenges in large-scale scenes that are further amplified in the absence of ground truth 3D point clouds for supervision. Here, the model can only rely on reprojection constraints and needs to implicitly triangulate the points. The challenges stem from a fundamental dilemma: The network has to be invariant to observations of the same landmark at different viewpoints and lighting conditions, etc., but at the same time discriminate unrelated but similar observations. The latter becomes more relevant and severe in larger scenes. In this work, we tackle this problem by introducing the concept of co-visibility to the network. We propose GLACE, which integrates pre-trained global and local encodings and enables SCR to scale to large scenes with only a single small-sized network. Specifically, we propose a novel feature diffusion technique that implicitly groups the reprojection constraints with co-visibility and avoids overfitting to trivial solutions. Additionally, our position decoder parameterizes the output positions for large-scale scenes more effectively. Without using 3D models or depth maps for supervision, our method achieves state-of-the-art results on large-scale scenes with a low-map-size model. On Cambridge landmarks, with a single model, we achieve 17% lower median position error than Poker, the ensemble variant of the state-of-the-art SCR method ACE. Code is available at: https://github.com/cvg/glace.

[390]  arXiv:2406.04341 [pdf, other]
Title: Interpreting the Second-Order Effects of Neurons in CLIP
Comments: project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We interpret the function of individual neurons in CLIP by automatically describing them using text. Analyzing the direct effects (i.e. the flow from a neuron through the residual stream to the output) or the indirect effects (overall contribution) fails to capture the neurons' function in CLIP. Therefore, we present the "second-order lens", analyzing the effect flowing from a neuron through the later attention heads, directly to the output. We find that these effects are highly selective: for each neuron, the effect is significant for <2% of the images. Moreover, each effect can be approximated by a single direction in the text-image space of CLIP. We describe neurons by decomposing these directions into sparse sets of text representations. The sets reveal polysemantic behavior - each neuron corresponds to multiple, often unrelated, concepts (e.g. ships and cars). Exploiting this neuron polysemy, we mass-produce "semantic" adversarial examples by generating images with concepts spuriously correlated to the incorrect class. Additionally, we use the second-order effects for zero-shot segmentation and attribute discovery in images. Our results indicate that a scalable understanding of neurons can be used for model deception and for introducing new model capabilities.

[391]  arXiv:2406.04342 [pdf, other]
Title: Learning 1D Causal Visual Representation with De-focus Attention Networks
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Modality differences have led to the development of heterogeneous architectures for vision and language models. While images typically require 2D non-causal modeling, texts utilize 1D causal modeling. This distinction poses significant challenges in constructing unified multi-modal models. This paper explores the feasibility of representing images using 1D causal modeling. We identify an "over-focus" issue in existing 1D causal vision models, where attention overly concentrates on a small proportion of visual tokens. The issue of "over-focus" hinders the model's ability to extract diverse visual features and to receive effective gradients for optimization. To address this, we propose De-focus Attention Networks, which employ learnable bandpass filters to create varied attention patterns. During training, large and scheduled drop path rates, and an auxiliary loss on globally pooled features for global understanding tasks are introduced. These two strategies encourage the model to attend to a broader range of tokens and enhance network optimization. Extensive experiments validate the efficacy of our approach, demonstrating that 1D causal visual representation can perform comparably to 2D non-causal representation in tasks such as global perception, dense prediction, and multi-modal understanding. Code is released at https://github.com/OpenGVLab/De-focus-Attention-Networks.

[392]  arXiv:2406.04343 [pdf, other]
Title: Flash3D: Feed-Forward Generalisable 3D Scene Reconstruction from a Single Image
Comments: Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In this paper, we propose Flash3D, a method for scene reconstruction and novel view synthesis from a single image which is both very generalisable and efficient. For generalisability, we start from a "foundation" model for monocular depth estimation and extend it to a full 3D shape and appearance reconstructor. For efficiency, we base this extension on feed-forward Gaussian Splatting. Specifically, we predict a first layer of 3D Gaussians at the predicted depth, and then add additional layers of Gaussians that are offset in space, allowing the model to complete the reconstruction behind occlusions and truncations. Flash3D is very efficient, trainable on a single GPU in a day, and thus accessible to most researchers. It achieves state-of-the-art results when trained and tested on RealEstate10k. When transferred to unseen datasets like NYU it outperforms competitors by a large margin. More impressively, when transferred to KITTI, Flash3D achieves better PSNR than methods trained specifically on that dataset. In some instances, it even outperforms recent methods that use multiple views as input. Code, models, demo, and more results are available at https://www.robots.ox.ac.uk/~vgg/research/flash3d/.

[393]  arXiv:2406.04344 [pdf, other]
Title: Verbalized Machine Learning: Revisiting Machine Learning with Language Models
Comments: Technical Report v1 (92 pages, 15 figures)
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)

Motivated by the large progress made by large language models (LLMs), we introduce the framework of verbalized machine learning (VML). In contrast to conventional machine learning models that are typically optimized over a continuous parameter space, VML constrains the parameter space to be human-interpretable natural language. Such a constraint leads to a new perspective of function approximation, where an LLM with a text prompt can be viewed as a function parameterized by the text prompt. Guided by this perspective, we revisit classical machine learning problems, such as regression and classification, and find that these problems can be solved by an LLM-parameterized learner and optimizer. The major advantages of VML include (1) easy encoding of inductive bias: prior knowledge about the problem and hypothesis class can be encoded in natural language and fed into the LLM-parameterized learner; (2) automatic model class selection: the optimizer can automatically select a concrete model class based on data and verbalized prior knowledge, and it can update the model class during training; and (3) interpretable learner updates: the LLM-parameterized optimizer can provide explanations for why each learner update is performed. We conduct several studies to empirically evaluate the effectiveness of VML, and hope that VML can serve as a stepping stone to stronger interpretability and trustworthiness in ML.

[394]  arXiv:2406.04345 [pdf, other]
Title: Stereo-Depth Fusion through Virtual Pattern Projection
Comments: extended version of ICCV 2023: "Active Stereo Without Pattern Projector"
Subjects: Computer Vision and Pattern Recognition (cs.CV)

This paper presents a novel general-purpose stereo and depth data fusion paradigm that mimics the active stereo principle by replacing the unreliable physical pattern projector with a depth sensor. It works by projecting virtual patterns consistent with the scene geometry onto the left and right images acquired by a conventional stereo camera, using the sparse hints obtained from a depth sensor, to facilitate the visual correspondence. Purposely, any depth sensing device can be seamlessly plugged into our framework, enabling the deployment of a virtual active stereo setup in any possible environment and overcoming the severe limitations of physical pattern projection, such as the limited working range and environmental conditions. Exhaustive experiments on indoor and outdoor datasets featuring both long and close range, including those providing raw, unfiltered depth hints from off-the-shelf depth sensors, highlight the effectiveness of our approach in notably boosting the robustness and accuracy of algorithms and deep stereo without any code modification and even without re-training. Additionally, we assess the performance of our strategy on active stereo evaluation datasets with conventional pattern projection. Indeed, in all these scenarios, our virtual pattern projection paradigm achieves state-of-the-art performance. The source code is available at: https://github.com/bartn8/vppstereo.

Cross-lists for Fri, 7 Jun 24

[395]  arXiv:2405.08005 (cross-list from math.OC) [pdf, other]
Title: Graphon Mean Field Games with a Representative Player: Analysis and Learning Algorithm
Comments: Published as a conference paper at ICML 2024
Subjects: Optimization and Control (math.OC); Artificial Intelligence (cs.AI); Computer Science and Game Theory (cs.GT); Machine Learning (cs.LG); Machine Learning (stat.ML)

We propose a discrete time graphon game formulation on continuous state and action spaces using a representative player to study stochastic games with heterogeneous interaction among agents. This formulation admits both philosophical and mathematical advantages, compared to a widely adopted formulation using a continuum of players. We prove the existence and uniqueness of the graphon equilibrium with mild assumptions, and show that this equilibrium can be used to construct an approximate solution for finite player game on networks, which is challenging to analyze and solve due to curse of dimensionality. An online oracle-free learning algorithm is developed to solve the equilibrium numerically, and sample complexity analysis is provided for its convergence.

[396]  arXiv:2406.03499 (cross-list from physics.plasm-ph) [pdf, ps, other]
Title: Estimated electric conductivities of thermal plasma for air-fuel combustion and oxy-fuel combustion with potassium or cesium seeding
Authors: Osama A. Marzouk
Comments: 28 pages, 16 figures, 14 tables
Journal-ref: Heliyon, volume 10, issues 11, article number e31697, 2024
Subjects: Plasma Physics (physics.plasm-ph); Numerical Analysis (math.NA); Fluid Dynamics (physics.flu-dyn)

A complete model for estimating the electric conductivity of combustion product gases, with added cesium (Cs) or potassium (K) vapor for ionization, is presented. Neutral carrier gases serve as the bulk fluid that carries the seed material, as well as the electrons generated by the partial thermal (equilibrium) ionization of the seed alkali metal. The model accounts for electron-neutral scattering, as well as electron-ion and electron-electron scattering. The model is tested through comparison with published data. The model is aimed at being utilized for the plasma within magnetohydrodynamic (MHD) channels, where direct power extraction from passing electrically conducting plasma gas enables electric power generation. The thermal ionization model is then used to estimate the electric conductivity of seeded combustion gases under complete combustion of three selected fuels, namely: hydrogen (H2), methane (CH4), and carbon (C). For each of these three fuels, two options for the oxidizer were applied, namely: air (21 % molecular oxygen, 79 % molecular nitrogen by mole), and pure oxygen (oxy-combustion). Two types of seeds (with 1 % mole fraction, based on the composition before ionization) were also applied for each of the six combinations of (fuel-oxidizer), leading to a total of 12 different MHD plasma cases. For each of these cases, the electric conductivity was computed for a range of temperatures from 2000 K to 3000 K. The smallest estimated electric conductivity was 0.35 S/m for oxy-hydrogen combustion at 2000 K, with potassium seeding. The largest estimated electric conductivity was 180.30 S/m for oxy-carbon combustion at 3000 K, with cesium seeding. At 2000 K, replacing potassium with cesium causes a gain in the electric conductivity by a multiplicative gain factor of about 3.6 regardless of the fuel and oxidizer. This gain factor declines to between 1.77 and 2.07 at 3000 K.

[397]  arXiv:2406.03504 (cross-list from math.OC) [pdf, ps, other]
Title: A New Branch-and-Bound Pruning Framework for $\ell_0$-Regularized Problems
Subjects: Optimization and Control (math.OC); Machine Learning (cs.LG)

We consider the resolution of learning problems involving $\ell_0$-regularization via Branch-and-Bound (BnB) algorithms. These methods explore regions of the feasible space of the problem and check whether they do not contain solutions through "pruning tests". In standard implementations, evaluating a pruning test requires to solve a convex optimization problem, which may result in computational bottlenecks. In this paper, we present an alternative to implement pruning tests for some generic family of $\ell_0$-regularized problems. Our proposed procedure allows the simultaneous assessment of several regions and can be embedded in standard BnB implementations with a negligible computational overhead. We show through numerical simulations that our pruning strategy can improve the solving time of BnB procedures by several orders of magnitude for typical problems encountered in machine-learning applications.

[398]  arXiv:2406.03587 (cross-list from physics.soc-ph) [pdf, other]
Title: Subsuming Complex Networks by Node Walks
Comments: 14 pages and 7 figures
Subjects: Physics and Society (physics.soc-ph); Social and Information Networks (cs.SI)

The concept of node walk in graphs and complex networks has been addressed, consisting of one or more nodes that move into adjacent nodes, henceforth incorporating the respective connections. This type of dynamics is then applied to subsume complex networks. Three types of networks (Erd\'os- R\'eny, Barab\'asi-Albert, as well as a geometric model) are considered, while three node walks heuristics (uniformly random, largest degree, and smallest degree) are taken into account. Several interesting results are obtained and described, including the identification that the subsuming dynamics depend strongly on both the specific topology of the networks as well as the criteria controlling the node walks. The use of node walks as a model for studying the relationship between network topology and dynamics is motivated by this result. In addition, relatively high correlations between the initial node degree and the accumulated strength of the walking node were observed for some combinations of network types and dynamic rules, allowing some of the properties of the subsumption to be roughly predicted from the initial topology around the waking node which has been found, however, not to be enough for full determination of the subsumption dynamics. Another interesting result regards the quite distinct signatures (along the iterations) of walking node strengths obtained for the several considered combinations of network type and subsumption rules.

[399]  arXiv:2406.03616 (cross-list from stat.ML) [pdf, other]
Title: BEACON: A Bayesian Optimization Strategy for Novelty Search in Expensive Black-Box Systems
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

Novelty search (NS) refers to a class of exploration algorithms that automatically uncover diverse system behaviors through simulations or experiments. Systematically obtaining diverse outcomes is a key component in many real-world design problems such as material and drug discovery, neural architecture search, reinforcement learning, and robot navigation. Since the relationship between the inputs and outputs (i.e., behaviors) of these complex systems is typically not available in closed form, NS requires a black-box perspective. Consequently, popular NS algorithms rely on evolutionary optimization and other meta-heuristics that require intensive sampling of the input space, which is impractical when the system is expensive to evaluate. We propose a Bayesian optimization inspired algorithm for sample-efficient NS that is specifically designed for such expensive black-box systems. Our approach models the input-to-behavior mapping with multi-output Gaussian processes (MOGP) and selects the next point to evaluate by maximizing a novelty metric that depends on a posterior sample drawn from the MOGP that promotes both exploration and exploitation. By leveraging advances in efficient posterior sampling and high-dimensional Gaussian process modeling, we discuss how our approach can be made scalable with respect to both amount of data and number of inputs. We test our approach on ten synthetic benchmark problems and eight real-world problems (with up to 2133 inputs) including new applications such as discovery of diverse metal organic frameworks for use in clean energy technology. We show that our approach greatly outperforms existing NS algorithms by finding substantially larger sets of diverse behaviors under limited sample budgets.

[400]  arXiv:2406.03628 (cross-list from stat.ML) [pdf, other]
Title: Synthetic Oversampling: Theory and A Practical Approach Using LLMs to Address Data Imbalance
Comments: 59 pages, 7 figures
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

Imbalanced data and spurious correlations are common challenges in machine learning and data science. Oversampling, which artificially increases the number of instances in the underrepresented classes, has been widely adopted to tackle these challenges. In this article, we introduce OPAL (\textbf{O}versam\textbf{P}ling with \textbf{A}rtificial \textbf{L}LM-generated data), a systematic oversampling approach that leverages the capabilities of large language models (LLMs) to generate high-quality synthetic data for minority groups. Recent studies on synthetic data generation using deep generative models mostly target prediction tasks. Our proposal differs in that we focus on handling imbalanced data and spurious correlations. More importantly, we develop a novel theory that rigorously characterizes the benefits of using the synthetic data, and shows the capacity of transformers in generating high-quality synthetic data for both labels and covariates. We further conduct intensive numerical experiments to demonstrate the efficacy of our proposed approach compared to some representative alternative solutions.

[401]  arXiv:2406.03637 (cross-list from eess.AS) [pdf, other]
Title: Style Mixture of Experts for Expressive Text-To-Speech Synthesis
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)

Recent advances in style transfer text-to-speech (TTS) have improved the expressiveness of synthesized speech. Despite these advancements, encoding stylistic information from diverse and unseen reference speech remains challenging. This paper introduces StyleMoE, an approach that divides the embedding space, modeled by the style encoder, into tractable subsets handled by style experts. The proposed method replaces the style encoder in a TTS system with a Mixture of Experts (MoE) layer. By utilizing a gating network to route reference speeches to different style experts, each expert specializes in aspects of the style space during optimization. Our experiments objectively and subjectively demonstrate the effectiveness of our proposed method in increasing the coverage of the style space for diverse and unseen styles. This approach can enhance the performance of existing state-of-the-art style transfer TTS models, marking the first study of MoE in style transfer TTS to our knowledge.

[402]  arXiv:2406.03652 (cross-list from q-fin.PM) [pdf, other]
Title: Ensembling Portfolio Strategies for Long-Term Investments: A Distribution-Free Preference Framework for Decision-Making and Algorithms
Authors: Duy Khanh Lam
Comments: 25 pages, 12 figures, 3 tables, working paper
Subjects: Portfolio Management (q-fin.PM); Information Theory (cs.IT); Machine Learning (cs.LG); Computational Finance (q-fin.CP)

This paper investigates the problem of ensembling multiple strategies for sequential portfolios to outperform individual strategies in terms of long-term wealth. Due to the uncertainty of strategies' performances in the future market, which are often based on specific models and statistical assumptions, investors often mitigate risk and enhance robustness by combining multiple strategies, akin to common approaches in collective learning prediction. However, the absence of a distribution-free and consistent preference framework complicates decisions of combination due to the ambiguous objective. To address this gap, we introduce a novel framework for decision-making in combining strategies, irrespective of market conditions, by establishing the investor's preference between decisions and then forming a clear objective. Through this framework, we propose a combinatorial strategy construction, free from statistical assumptions, for any scale of component strategies, even infinite, such that it meets the determined criterion. Finally, we test the proposed strategy along with its accelerated variant and some other multi-strategies. The numerical experiments show results in favor of the proposed strategies, albeit with small tradeoffs in their Sharpe ratios, in which their cumulative wealths eventually exceed those of the best component strategies while the accelerated strategy significantly improves performance.

[403]  arXiv:2406.03653 (cross-list from stat.ML) [pdf, other]
Title: Equivalence Set Restricted Latent Class Models (ESRLCM)
Comments: 43 pages, 10 tables, 1 figure
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST)

Latent Class Models (LCMs) are used to cluster multivariate categorical data, commonly used to interpret survey responses. We propose a novel Bayesian model called the Equivalence Set Restricted Latent Class Model (ESRLCM). This model identifies clusters who have common item response probabilities, and does so more generically than traditional restricted latent attribute models. We verify the identifiability of ESRLCMs, and demonstrate the effectiveness in both simulations and real-world applications.

[404]  arXiv:2406.03657 (cross-list from eess.AS) [pdf, other]
Title: UrBAN: Urban Beehive Acoustics and PheNotyping Dataset
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)

In this paper, we present a multimodal dataset obtained from a honey bee colony in Montr\'eal, Quebec, Canada, spanning the years of 2021 to 2022. This apiary comprised 10 beehives, with microphones recording more than 2000 hours of high quality raw audio, and also sensors capturing temperature, and humidity. Periodic hive inspections involved monitoring colony honey bee population changes, assessing queen-related conditions, and documenting overall hive health. Additionally, health metrics, such as Varroa mite infestation rates and winter mortality assessments were recorded, offering valuable insights into factors affecting hive health status and resilience. In this study, we first outline the data collection process, sensor data description, and dataset structure. Furthermore, we demonstrate a practical application of this dataset by extracting various features from the raw audio to predict colony population using the number of frames of bees as a proxy.

[405]  arXiv:2406.03663 (cross-list from eess.IV) [pdf, ps, other]
Title: A Hybrid Deep Learning Classification of Perimetric Glaucoma Using Peripapillary Nerve Fiber Layer Reflectance and Other OCT Parameters from Three Anatomy Regions
Comments: 12 pages
Subjects: Image and Video Processing (eess.IV); Machine Learning (cs.LG); Quantitative Methods (q-bio.QM)

Precis: A hybrid deep-learning model combines NFL reflectance and other OCT parameters to improve glaucoma diagnosis. Objective: To investigate if a deep learning model could be used to combine nerve fiber layer (NFL) reflectance and other OCT parameters for glaucoma diagnosis. Patients and Methods: This is a prospective observational study where of 106 normal subjects and 164 perimetric glaucoma (PG) patients. Peripapillary NFL reflectance map, NFL thickness map, optic head analysis of disc, and macular ganglion cell complex thickness were obtained using spectral domain OCT. A hybrid deep learning model combined a fully connected network (FCN) and a convolution neural network (CNN) to develop and combine those OCT maps and parameters to distinguish normal and PG eyes. Two deep learning models were compared based on whether the NFL reflectance map was used as part of the input or not. Results: The hybrid deep learning model with reflectance achieved 0.909 sensitivity at 99% specificity and 0.926 at 95%. The overall accuracy was 0.948 with 0.893 sensitivity and 1.000 specificity, and the AROC was 0.979, which is significantly better than the logistic regression models (p < 0.001). The second best model is the hybrid deep learning model w/o reflectance, which also had significantly higher AROC than logistic regression models (p < 0.001). Logistic regression with reflectance model had slightly higher AROC or sensitivity than the other logistic regression model without reflectance (p = 0.024). Conclusions: Hybrid deep learning model significantly improved the diagnostic accuracy, without or without NFL reflectance. Hybrid deep learning model, combining reflectance/NFL thickness/GCC thickness/ONH parameter, may be a practical model for glaucoma screen purposes.

[406]  arXiv:2406.03688 (cross-list from eess.IV) [pdf, other]
Title: Shadow and Light: Digitally Reconstructed Radiographs for Disease Classification
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

In this paper, we introduce DRR-RATE, a large-scale synthetic chest X-ray dataset derived from the recently released CT-RATE dataset. DRR-RATE comprises of 50,188 frontal Digitally Reconstructed Radiographs (DRRs) from 21,304 unique patients. Each image is paired with a corresponding radiology text report and binary labels for 18 pathology classes. Given the controllable nature of DRR generation, it facilitates the inclusion of lateral view images and images from any desired viewing position. This opens up avenues for research into new and novel multimodal applications involving paired CT, X-ray images from various views, text, and binary labels. We demonstrate the applicability of DRR-RATE alongside existing large-scale chest X-ray resources, notably the CheXpert dataset and CheXnet model. Experiments demonstrate that CheXnet, when trained and tested on the DRR-RATE dataset, achieves sufficient to high AUC scores for the six common pathologies cited in common literature: Atelectasis, Cardiomegaly, Consolidation, Lung Lesion, Lung Opacity, and Pleural Effusion. Additionally, CheXnet trained on the CheXpert dataset can accurately identify several pathologies, even when operating out of distribution. This confirms that the generated DRR images effectively capture the essential pathology features from CT images. The dataset and labels are publicly accessible at https://huggingface.co/datasets/farrell236/DRR-RATE.

[407]  arXiv:2406.03690 (cross-list from math.OC) [pdf, other]
Title: AMPIC: Adaptive Model Predictive Ising Controller for large-scale urban traffic signals
Comments: 17 pages, 8 figures
Subjects: Optimization and Control (math.OC); Emerging Technologies (cs.ET); Systems and Control (eess.SY); Quantum Physics (quant-ph)

Realizing smooth traffic flow is important for achieving carbon neutrality. Adaptive traffic signal control, which considers traffic conditions, has thus attracted attention. However, it is difficult to ensure optimal vehicle flow throughout a large city using existing control methods because of their heavy computational load. Here, we propose a control method called AMPIC (Adaptive Model Predictive Ising Controller) that guarantees both scalability and optimality. The proposed method employs model predictive control to solve an optimal control problem at each control interval with explicit consideration of a predictive model of vehicle flow. This optimal control problem is transformed into a combinatorial optimization problem with binary variables that is equivalent to the so-called Ising problem. This transformation allows us to use an Ising solver, which has been widely studied and is expected to have fast and efficient optimization performance. We performed numerical experiments using a microscopic traffic simulator for a realistic city road network. The results show that AMPIC enables faster vehicle cruising speed with less waiting time than that achieved by classical control methods, resulting in lower CO2 emissions. The model predictive approach with a long prediction horizon thus effectively improves control performance. Systematic parametric studies on model cities indicate that the proposed method realizes smoother traffic flows for large city road networks. Among Ising solvers, D-Wave's quantum annealing is shown to find near-optimal solutions at a reasonable computational cost.

[408]  arXiv:2406.03696 (cross-list from stat.ML) [pdf, other]
Title: Discrete error dynamics of mini-batch gradient descent for least squares regression
Comments: 26 pages
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Optimization and Control (math.OC)

We study the discrete dynamics of mini-batch gradient descent for least squares regression when sampling without replacement. We show that the dynamics and generalization error of mini-batch gradient descent depends on a sample cross-covariance matrix $Z$ between the original features $X$ and a set of new features $\widetilde{X}$, in which each feature is modified by the mini-batches that appear before it during the learning process in an averaged way. Using this representation, we rigorously establish that the dynamics of mini-batch and full-batch gradient descent agree up to leading order with respect to the step size using the linear scaling rule. We also study discretization effects that a continuous-time gradient flow analysis cannot detect, and show that mini-batch gradient descent converges to a step-size dependent solution, in contrast with full-batch gradient descent. Finally, we investigate the effects of batching, assuming a random matrix model, by using tools from free probability theory to numerically compute the spectrum of $Z$.

[409]  arXiv:2406.03711 (cross-list from physics.flu-dyn) [pdf, other]
Title: Pi-fusion: Physics-informed diffusion model for learning fluid dynamics
Subjects: Fluid Dynamics (physics.flu-dyn); Artificial Intelligence (cs.AI)

Physics-informed deep learning has been developed as a novel paradigm for learning physical dynamics recently. While general physics-informed deep learning methods have shown early promise in learning fluid dynamics, they are difficult to generalize in arbitrary time instants in real-world scenario, where the fluid motion can be considered as a time-variant trajectory involved large-scale particles. Inspired by the advantage of diffusion model in learning the distribution of data, we first propose Pi-fusion, a physics-informed diffusion model for predicting the temporal evolution of velocity and pressure field in fluid dynamics. Physics-informed guidance sampling is proposed in the inference procedure of Pi-fusion to improve the accuracy and interpretability of learning fluid dynamics. Furthermore, we introduce a training strategy based on reciprocal learning to learn the quasiperiodical pattern of fluid motion and thus improve the generalizability of the model. The proposed approach are then evaluated on both synthetic and real-world dataset, by comparing it with state-of-the-art physics-informed deep learning methods. Experimental results show that the proposed approach significantly outperforms existing methods for predicting temporal evolution of velocity and pressure field, confirming its strong generalization by drawing probabilistic inference of forward process and physics-informed guidance sampling. The proposed Pi-fusion can also be generalized in learning other physical dynamics governed by partial differential equations.

[410]  arXiv:2406.03715 (cross-list from math.PR) [pdf, other]
Title: Strong convergence rates for full-discrete approximations of the stochastic Allen-Cahn equations on 2D torus
Subjects: Probability (math.PR); Numerical Analysis (math.NA)

In this paper we construct space-time full discretizations of stochastic Allen-Cahn equations driven by space-time white noise on 2D torus. The approximations are implemented by tamed exponential Euler discretization in time and spectral Galerkin method in space. We finally obtain the convergence rates with the spatial order of $\alpha-\delta$ and the temporal order of ${\alpha}/{6}-\delta$ in $\mathcal C^{-\alpha}$ for $\alpha\in(0,1/3)$ and $\delta>0$ arbitrarily small.

[411]  arXiv:2406.03734 (cross-list from math.OC) [pdf, other]
Title: Policy Gradient Methods for the Cost-Constrained LQR: Strong Duality and Global Convergence
Subjects: Optimization and Control (math.OC); Systems and Control (eess.SY)

In safety-critical applications, reinforcement learning (RL) needs to consider safety constraints. However, theoretical understandings of constrained RL for continuous control are largely absent. As a case study, this paper presents a cost-constrained LQR formulation, where a number of LQR costs with user-defined penalty matrices are subject to constraints. To solve it, we propose a policy gradient primal-dual method to find an optimal state feedback gain. Despite the non-convexity of the cost-constrained LQR problem, we provide a constructive proof for strong duality and a geometric interpretation of an optimal multiplier set. By proving that the concave dual function is Lipschitz smooth, we further provide convergence guarantees for the PG primal-dual method. Finally, we perform simulations to validate our theoretical findings.

[412]  arXiv:2406.03766 (cross-list from eess.SP) [pdf, other]
Title: Privacy Preserving Semi-Decentralized Mean Estimation over Intermittently-Connected Networks
Comments: 14 pages, 6 figures. arXiv admin note: text overlap with arXiv:2303.00035
Subjects: Signal Processing (eess.SP); Distributed, Parallel, and Cluster Computing (cs.DC); Information Theory (cs.IT); Machine Learning (cs.LG); Systems and Control (eess.SY)

We consider the problem of privately estimating the mean of vectors distributed across different nodes of an unreliable wireless network, where communications between nodes can fail intermittently. We adopt a semi-decentralized setup, wherein to mitigate the impact of intermittently connected links, nodes can collaborate with their neighbors to compute a local consensus, which they relay to a central server. In such a setting, the communications between any pair of nodes must ensure that the privacy of the nodes is rigorously maintained to prevent unauthorized information leakage. We study the tradeoff between collaborative relaying and privacy leakage due to the data sharing among nodes and, subsequently, propose PriCER: Private Collaborative Estimation via Relaying -- a differentially private collaborative algorithm for mean estimation to optimize this tradeoff. The privacy guarantees of PriCER arise (i) implicitly, by exploiting the inherent stochasticity of the flaky network connections, and (ii) explicitly, by adding Gaussian perturbations to the estimates exchanged by the nodes. Local and central privacy guarantees are provided against eavesdroppers who can observe different signals, such as the communications amongst nodes during local consensus and (possibly multiple) transmissions from the relays to the central server. We substantiate our theoretical findings with numerical simulations. Our implementation is available at https://github.com/rajarshisaha95/private-collaborative-relaying.

[413]  arXiv:2406.03783 (cross-list from math.CO) [pdf, other]
Title: Flips in colorful triangulations
Subjects: Combinatorics (math.CO); Discrete Mathematics (cs.DM)

The associahedron is the graph $\mathcal{G}_N$ that has as nodes all triangulations of a convex $N$-gon, and an edge between any two triangulations that differ in a flip operation, which consists of removing an edge shared by two triangles and replacing it by the other diagonal of the resulting 4-gon. In this paper, we consider a large collection of induced subgraphs of $\mathcal{G}_N$ obtained by Ramsey-type colorability properties. Specifically, coloring the points of the $N$-gon red and blue alternatingly, we consider only colorful triangulations, namely triangulations in which every triangle has points in both colors, i.e., monochromatic triangles are forbidden. The resulting induced subgraph of $\mathcal{G}_N$ on colorful triangulations is denoted by $\mathcal{F}_N$. We prove that $\mathcal{F}_N$ has a Hamilton cycle for all $N\geq 8$, resolving a problem raised by Sagan, i.e., all colorful triangulations on $N$ points can be listed so that any two cyclically consecutive triangulations differ in a flip. In fact, we prove that for an arbitrary fixed coloring pattern of the $N$ points with at least 10 changes of color, the resulting subgraph of $\mathcal{G}_N$ on colorful triangulations (for that coloring pattern) admits a Hamilton cycle. We also provide an efficient algorithm for computing a Hamilton path in $\mathcal{F}_N$ that runs in time $\mathcal{O}(1)$ on average per generated node. This algorithm is based on a new and algorithmic construction of a tree rotation Gray code for listing all $n$-vertex $k$-ary trees that runs in time $\mathcal{O}(k)$ on average per generated tree.

[414]  arXiv:2406.03787 (cross-list from math.OC) [pdf, other]
Title: Projection-Free Variance Reduction Methods for Stochastic Constrained Multi-Level Compositional Optimization
Subjects: Optimization and Control (math.OC); Machine Learning (cs.LG)

This paper investigates projection-free algorithms for stochastic constrained multi-level optimization. In this context, the objective function is a nested composition of several smooth functions, and the decision set is closed and convex. Existing projection-free algorithms for solving this problem suffer from two limitations: 1) they solely focus on the gradient mapping criterion and fail to match the optimal sample complexities in unconstrained settings; 2) their analysis is exclusively applicable to non-convex functions, without considering convex and strongly convex objectives. To address these issues, we introduce novel projection-free variance reduction algorithms and analyze their complexities under different criteria. For gradient mapping, our complexities improve existing results and match the optimal rates for unconstrained problems. For the widely-used Frank-Wolfe gap criterion, we provide theoretical guarantees that align with those for single-level problems. Additionally, by using a stage-wise adaptation, we further obtain complexities for convex and strongly convex functions. Finally, numerical experiments on different tasks demonstrate the effectiveness of our methods.

[415]  arXiv:2406.03810 (cross-list from astro-ph.IM) [pdf, ps, other]
Title: Spherinator and HiPSter: Representation Learning for Unbiased Knowledge Discovery from Simulations
Comments: 4 pages, 1 figure
Subjects: Instrumentation and Methods for Astrophysics (astro-ph.IM); Machine Learning (cs.LG)

Simulations are the best approximation to experimental laboratories in astrophysics and cosmology. However, the complexity, richness, and large size of their outputs severely limit the interpretability of their predictions. We describe a new, unbiased, and machine learning based approach to obtaining useful scientific insights from a broad range of simulations. The method can be used on today's largest simulations and will be essential to solve the extreme data exploration and analysis challenges posed by the Exascale era. Furthermore, this concept is so flexible, that it will also enable explorative access to observed data. Our concept is based on applying nonlinear dimensionality reduction to learn compact representations of the data in a low-dimensional space. The simulation data is projected onto this space for interactive inspection, visual interpretation, sample selection, and local analysis. We present a prototype using a rotational invariant hyperspherical variational convolutional autoencoder, utilizing a power distribution in the latent space, and trained on galaxies from IllustrisTNG simulation. Thereby, we obtain a natural Hubble tuning fork like similarity space that can be visualized interactively on the surface of a sphere by exploiting the power of HiPS tilings in Aladin Lite.

[416]  arXiv:2406.03832 (cross-list from astro-ph.IM) [pdf, ps, other]
Title: UltraPINK -- New possibilities to explore Self-Organizing Kohonen Maps
Subjects: Instrumentation and Methods for Astrophysics (astro-ph.IM); Human-Computer Interaction (cs.HC)

Unsupervised learning algorithms like self-organizing Kohonen maps are a promising approach to gain an overview among massive datasets. With UltraPINK, researchers can train, inspect, and explore self-organizing maps, whereby the toolbox of interaction possibilities grows continually. Key feature of UltraPINK is the consideration of versality in astronomical data. By keeping the operations as abstract as possible and using design patterns meant for abstract usage, we ensure that data is compatible with UltraPINK, regardless of its type, formatting, or origin. Future work on the application will keep extending the catalogue of exploration tools and the interfaces towards other established applications to process astronomical data. Ultimatively, we aim towards a solid infrastructure for data analysis in astronomy.

[417]  arXiv:2406.03867 (cross-list from quant-ph) [pdf, other]
Title: A Comprehensive Study of Quantum Arithmetic Circuits
Comments: Under review at the Royal Society's Philosophical Transactions A
Subjects: Quantum Physics (quant-ph); Emerging Technologies (cs.ET)

In recent decades, the field of quantum computing has experienced remarkable progress. This progress is marked by the superior performance of many quantum algorithms compared to their classical counterparts, with Shor's algorithm serving as a prominent illustration. Quantum arithmetic circuits, which are the fundamental building blocks in numerous quantum algorithms, have attracted much attention. Despite extensive exploration of various designs in the existing literature, researchers remain keen on developing novel designs and improving existing ones.
In this review article, we aim to provide a systematically organized and easily comprehensible overview of the current state-of-the-art in quantum arithmetic circuits. Specifically, this study covers fundamental operations such as addition, subtraction, multiplication, division and modular exponentiation. We delve into the detailed quantum implementations of these prominent designs and evaluate their efficiency considering various objectives. We also discuss potential applications of presented arithmetic circuits and suggest future research directions.

[418]  arXiv:2406.03896 (cross-list from cond-mat.soft) [pdf, other]
Title: Data-driven discovery of self-similarity using neural networks
Comments: 21 pages, 15 figures, 5 tables
Subjects: Soft Condensed Matter (cond-mat.soft); Statistical Mechanics (cond-mat.stat-mech); Machine Learning (cs.LG)

Finding self-similarity is a key step for understanding the governing law behind complex physical phenomena. Traditional methods for identifying self-similarity often rely on specific models, which can introduce significant bias. In this paper, we present a novel neural network-based approach that discovers self-similarity directly from observed data, without presupposing any models. The presence of self-similar solutions in a physical problem signals that the governing law contains a function whose arguments are given by power-law monomials of physical parameters, which are characterized by power-law exponents. The basic idea is to enforce such particular forms structurally in a neural network in a parametrized way. We train the neural network model using the observed data, and when the training is successful, we can extract the power exponents that characterize scale-transformation symmetries of the physical problem. We demonstrate the effectiveness of our method with both synthetic and experimental data, validating its potential as a robust, model-independent tool for exploring self-similarity in complex systems.

[419]  arXiv:2406.03901 (cross-list from eess.IV) [pdf, other]
Title: Polyp and Surgical Instrument Segmentation with Double Encoder-Decoder Networks
Authors: Adrian Galdran
Journal-ref: NMI, Vol. 1 No. 1 (2021): MedAI: Transparency in Medical Image Segmentation
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

This paper describes a solution for the MedAI competition, in which participants were required to segment both polyps and surgical instruments from endoscopic images. Our approach relies on a double encoder-decoder neural network which we have previously applied for polyp segmentation, but with a series of enhancements: a more powerful encoder architecture, an improved optimization procedure, and the post-processing of segmentations based on tempered model ensembling. Experimental results show that our method produces segmentations that show a good agreement with manual delineations provided by medical experts.

[420]  arXiv:2406.03902 (cross-list from eess.IV) [pdf, other]
Title: C^2RV: Cross-Regional and Cross-View Learning for Sparse-View CBCT Reconstruction
Comments: Accepted to CVPR 2024
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

Cone beam computed tomography (CBCT) is an important imaging technology widely used in medical scenarios, such as diagnosis and preoperative planning. Using fewer projection views to reconstruct CT, also known as sparse-view reconstruction, can reduce ionizing radiation and further benefit interventional radiology. Compared with sparse-view reconstruction for traditional parallel/fan-beam CT, CBCT reconstruction is more challenging due to the increased dimensionality caused by the measurement process based on cone-shaped X-ray beams. As a 2D-to-3D reconstruction problem, although implicit neural representations have been introduced to enable efficient training, only local features are considered and different views are processed equally in previous works, resulting in spatial inconsistency and poor performance on complicated anatomies. To this end, we propose C^2RV by leveraging explicit multi-scale volumetric representations to enable cross-regional learning in the 3D space. Additionally, the scale-view cross-attention module is introduced to adaptively aggregate multi-scale and multi-view features. Extensive experiments demonstrate that our C^2RV achieves consistent and significant improvement over previous state-of-the-art methods on datasets with diverse anatomy.

[421]  arXiv:2406.03903 (cross-list from eess.IV) [pdf, other]
Title: Data-Centric Label Smoothing for Explainable Glaucoma Screening from Eye Fundus Images
Comments: Accepted to ISBI 2024 (Challenges), 2nd position in the JustRAIGS challenge (this https URL)
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

As current computing capabilities increase, modern machine learning and computer vision system tend to increase in complexity, mostly by means of larger models and advanced optimization strategies. Although often neglected, in many problems there is also much to be gained by considering potential improvements in understanding and better leveraging already-available training data, including annotations. This so-called data-centric approach can lead to substantial performance increases, sometimes beyond what can be achieved by larger models. In this paper we adopt such an approach for the task of justifiable glaucoma screening from retinal images. In particular, we focus on how to combine information from multiple annotators of different skills into a tailored label smoothing scheme that allows us to better employ a large collection of fundus images, instead of discarding samples suffering from inter-rater variability. Internal validation results indicate that our bespoke label smoothing approach surpasses the performance of a standard resnet50 model and also the same model trained with conventional label smoothing techniques, in particular for the multi-label scenario of predicting clinical reasons of glaucoma likelihood in a highly imbalanced screening context. Our code is made available at github.com/agaldran/justraigs .

[422]  arXiv:2406.03913 (cross-list from math.OC) [pdf, other]
Title: Recognizing weighted means in geodesic spaces
Subjects: Optimization and Control (math.OC); Numerical Analysis (math.NA)

Geodesic metric spaces support a variety of averaging constructions for given finite sets. Computing such averages has generated extensive interest in diverse disciplines. Here we consider the inverse problem of recognizing computationally whether or not a given point is such an average, exactly or approximately. In nonpositively curved spaces, several averaging notions, including the usual weighted barycenter, produce the same "mean set". In such spaces, at points where the tangent cone is a Euclidean space, the recognition problem reduces to Euclidean projection onto a polytope. Hadamard manifolds comprise one example. Another consists of CAT(0) cubical complexes, at relative-interior points: the recognition problem is harder for general points, but we present an efficient semidefinite-programming-based algorithm.

[423]  arXiv:2406.03924 (cross-list from stat.ML) [pdf, other]
Title: Statistical Multicriteria Benchmarking via the GSD-Front
Authors: Christoph Jansen (1), Georg Schollmeyer (2), Julian Rodemann (2), Hannah Blocher (2), Thomas Augustin (2) ((1) Lancaster University Leipzig, (2) Ludwig-Maximilians-Universität München)
Comments: CJ, GS,JR and HB equally contributed to this work
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Methodology (stat.ME)

Given the vast number of classifiers that have been (and continue to be) proposed, reliable methods for comparing them are becoming increasingly important. The desire for reliability is broken down into three main aspects: (1) Comparisons should allow for different quality metrics simultaneously. (2) Comparisons should take into account the statistical uncertainty induced by the choice of benchmark suite. (3) The robustness of the comparisons under small deviations in the underlying assumptions should be verifiable. To address (1), we propose to compare classifiers using a generalized stochastic dominance ordering (GSD) and present the GSD-front as an information-efficient alternative to the classical Pareto-front. For (2), we propose a consistent statistical estimator for the GSD-front and construct a statistical test for whether a (potentially new) classifier lies in the GSD-front of a set of state-of-the-art classifiers. For (3), we relax our proposed test using techniques from robust statistics and imprecise probabilities. We illustrate our concepts on the benchmark suite PMLB and on the platform OpenML.

[424]  arXiv:2406.03938 (cross-list from q-bio.PE) [pdf, other]
Title: Diversity in Evolutionary Dynamics
Subjects: Populations and Evolution (q-bio.PE); Computational Engineering, Finance, and Science (cs.CE)

We consider the dynamics imposed by natural selection on the populations of two competing, sexually reproducing, haploid species. In this setting, the fitness of any genome varies over time due to the changing population mix of the competing species; crucially, this fitness variation arises naturally from the model itself, without the need for imposing it exogenously as is typically the case. Previous work on this model [14] showed that, in the special case where each of the two species exhibits just two phenotypes, genetic diversity is maintained at all times. This finding supported the tenet that sexual reproduction is advantageous because it promotes diversity, which increases the survivability of a species.
In the present paper we consider the more realistic case where there are more than two phenotypes available to each species. The conclusions about diversity in general turn out to be very different from the two-phenotype case.
Our first result is negative: namely, we show that sexual reproduction does not guarantee the maintenance of diversity at all times, i.e., the result of [14] does not generalize. Our counterexample consists of two competing species with just three phenotypes each. We show that, for any time~$t_0$ and any $\varepsilon>0$, there is a time $t\ge t_0$ at which the combined diversity of both species is smaller than~$\varepsilon$. Our main result is a complementary positive statement, which says that in any non-degenerate example, diversity is maintained in a weaker, ``infinitely often'' sense.
Thus, our results refute the supposition that sexual reproduction ensures diversity at all times, but affirm a weaker assertion that extended periods of high diversity are necessarily a recurrent event.

[425]  arXiv:2406.03961 (cross-list from eess.IV) [pdf, ps, other]
Title: LDM-RSIC: Exploring Distortion Prior with Latent Diffusion Models for Remote Sensing Image Compression
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

Deep learning-based image compression algorithms typically focus on designing encoding and decoding networks and improving the accuracy of entropy model estimation to enhance the rate-distortion (RD) performance. However, few algorithms leverage the compression distortion prior from existing compression algorithms to improve RD performance. In this paper, we propose a latent diffusion model-based remote sensing image compression (LDM-RSIC) method, which aims to enhance the final decoding quality of RS images by utilizing the generated distortion prior from a LDM. Our approach consists of two stages. In the first stage, a self-encoder learns prior from the high-quality input image. In the second stage, the prior is generated through an LDM, conditioned on the decoded image of an existing learning-based image compression algorithm, to be used as auxiliary information for generating the texture-rich enhanced image. To better utilize the prior, a channel attention and gate-based dynamic feature attention module (DFAM) is embedded into a Transformer-based multi-scale enhancement network (MEN) for image enhancement. Extensive experiments demonstrate the proposed LDM-RSIC significantly outperforms existing state-of-the-art traditional and learning-based image compression algorithms in terms of both subjective perception and objective metrics. Additionally, we use the LDM-based scheme to improve the traditional image compression algorithm JPEG2000 and obtain 32.00% bit savings on the DOTA testing set. The code will be available at https://github.com/mlkk518/LDM-RSIC.

[426]  arXiv:2406.03972 (cross-list from quant-ph) [pdf, ps, other]
Title: Eigenpath traversal by Poisson-distributed phase randomisation
Comments: 19 pages
Subjects: Quantum Physics (quant-ph); Data Structures and Algorithms (cs.DS)

We present a framework for quantum computation, similar to Adiabatic Quantum Computation (AQC), that is based on the quantum Zeno effect. By performing randomised dephasing operations at intervals determined by a Poisson process, we are able to track the eigenspace associated to a particular eigenvalue.
We derive a simple differential equation for the fidelity, leading to general theorems bounding the time complexity of a whole class of algorithms. We also use eigenstate filtering to optimise the scaling of the complexity in the error tolerance $\epsilon$.
In many cases the bounds given by our general theorems are optimal, giving a time complexity of $O(1/\Delta_m)$ with $\Delta_m$ the minimum of the gap. This allows us to prove optimal results using very general features of problems, minimising the problem-specific insight necessary.
As two applications of our framework, we obtain optimal scaling for the Grover problem (i.e.\ $O(\sqrt{N})$ where $N$ is the database size) and the Quantum Linear System Problem (i.e.\ $O(\kappa\log(1/\epsilon))$ where $\kappa$ is the condition number and $\epsilon$ the error tolerance) by direct applications of our theorems.

[427]  arXiv:2406.04000 (cross-list from physics.optics) [pdf, other]
Title: Stochastic logic in biased coupled photonic probabilistic bits
Subjects: Optics (physics.optics); Emerging Technologies (cs.ET)

Optical computing often employs tailor-made hardware to implement specific algorithms, trading generality for improved performance in key aspects like speed and power efficiency. An important computing approach that is still missing its corresponding optical hardware is probabilistic computing, used e.g. for solving difficult combinatorial optimization problems. In this study, we propose an experimentally viable photonic approach to solve arbitrary probabilistic computing problems. Our method relies on the insight that coherent Ising machines composed of coupled and biased optical parametric oscillators can emulate stochastic logic. We demonstrate the feasibility of our approach by using numerical simulations equivalent to the full density matrix formulation of coupled optical parametric oscillators.

[428]  arXiv:2406.04001 (cross-list from math.OC) [pdf, other]
Title: Benign Nonconvex Landscapes in Optimal and Robust Control, Part II: Extended Convex Lifting
Subjects: Optimization and Control (math.OC); Systems and Control (eess.SY); Dynamical Systems (math.DS)

Many optimal and robust control problems are nonconvex and potentially nonsmooth in their policy optimization forms. In Part II of this paper, we introduce a new and unified Extended Convex Lifting (ECL) framework to reveal hidden convexity in classical optimal and robust control problems from a modern optimization perspective. Our ECL offers a bridge between nonconvex policy optimization and convex reformulations, enabling convex analysis for nonconvex problems. Despite non-convexity and non-smoothness, the existence of an ECL not only reveals that minimizing the original function is equivalent to a convex problem but also certifies a class of first-order non-degenerate stationary points to be globally optimal. Therefore, no spurious stationarity exists in the set of non-degenerate policies. This ECL framework can cover many benchmark control problems, including state feedback linear quadratic regulator (LQR), dynamic output feedback linear quadratic Gaussian (LQG) control, and $\mathcal{H}_\infty$ robust control. ECL can also handle a class of distributed control problems when the notion of quadratic invariance (QI) holds. We further show that all static stabilizing policies are non-degenerate for state feedback LQR and $\mathcal{H}_\infty$ control under standard assumptions. We believe that the new ECL framework may be of independent interest for analyzing nonconvex problems beyond control.

[429]  arXiv:2406.04004 (cross-list from quant-ph) [pdf, other]
Title: T-Count Optimizing Genetic Algorithm for Quantum State Preparation
Comments: To appear in IEEE QSW 2024 proceedings
Subjects: Quantum Physics (quant-ph); Neural and Evolutionary Computing (cs.NE)

Quantum state preparation is a crucial process within numerous quantum algorithms, and the need for efficient initialization of quantum registers is ever increasing as demand for useful quantum computing grows. The problem arises as the number of qubits to be initialized grows, the circuits required to implement the desired state also exponentially increase in size leading to loss of fidelity to noise. This is mainly due to the susceptibility to environmental effects of the non-Clifford T gate, whose use should thus be reduced as much as possible. In this paper, we present and utilize a genetic algorithm for state preparation circuits consisting of gates from the Clifford + T gate set and optimize them in T-Count as to reduce the impact of noise. Whilst the method presented here does not always produce the most accurate circuits in terms of fidelity, it can generate high-fidelity, non-trivial quantum states such as quantum Fourier transform states. In addition, our algorithm does automatically generate fault tolerantly implementable solutions where the number of the most error prone components is reduced. We present an evaluation of the algorithm when trialed against preparing random, Poisson probability distribution, W, GHZ, and quantum Fourier transform states. We also experimentally demonstrate the scalability issues as qubit count increases, which highlights the need for further optimization of the search process.

[430]  arXiv:2406.04012 (cross-list from stat.ML) [pdf, other]
Title: Variational inference, Mixture of Gaussians, Bayesian Machine Learning
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

Variational inference (VI) is a popular approach in Bayesian inference, that looks for the best approximation of the posterior distribution within a parametric family, minimizing a loss that is typically the (reverse) Kullback-Leibler (KL) divergence. Despite its empirical success, the theoretical properties of VI have only received attention recently, and mostly when the parametric family is the one of Gaussians. This work aims to contribute to the theoretical study of VI in the non-Gaussian case by investigating the setting of Mixture of Gaussians with fixed covariance and constant weights. In this view, VI over this specific family can be casted as the minimization of a Mollified relative entropy, i.e. the KL between the convolution (with respect to a Gaussian kernel) of an atomic measure supported on Diracs, and the target distribution. The support of the atomic measure corresponds to the localization of the Gaussian components. Hence, solving variational inference becomes equivalent to optimizing the positions of the Diracs (the particles), which can be done through gradient descent and takes the form of an interacting particle system. We study two sources of error of variational inference in this context when optimizing the mollified relative entropy. The first one is an optimization result, that is a descent lemma establishing that the algorithm decreases the objective at each iteration. The second one is an approximation error, that upper bounds the objective between an optimal finite mixture and the target distribution.

[431]  arXiv:2406.04034 (cross-list from math.CO) [pdf, ps, other]
Title: The geometry of intersecting codes and applications to additive combinatorics and factorization theory
Comments: 31 pages
Subjects: Combinatorics (math.CO); Information Theory (cs.IT); Number Theory (math.NT)

Intersecting codes are linear codes where every two nonzero codewords have non-trivially intersecting support. In this article we expand on the theory of this family of codes, by showing that nondegenerate intersecting codes correspond to sets of points (with multiplicites) in a projective space that are not contained in two hyperplanes. This correspondence allows the use of geometric arguments to demonstrate properties and provide constructions of intersecting codes. We improve on existing bounds on their length and provide explicit constructions of short intersecting codes. Finally, generalizing a link between coding theory and the theory of the Davenport constant (a combinatorial invariant of finite abelian groups), we provide new asymptotic bounds on the weighted $2$-wise Davenport constant. These bounds then yield results on factorizations in rings of algebraic integers and related structures.

[432]  arXiv:2406.04047 (cross-list from stat.ML) [pdf, other]
Title: Slicing Mutual Information Generalization Bounds for Neural Networks
Comments: Accepted at ICML 2024
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

The ability of machine learning (ML) algorithms to generalize well to unseen data has been studied through the lens of information theory, by bounding the generalization error with the input-output mutual information (MI), i.e., the MI between the training data and the learned hypothesis. Yet, these bounds have limited practicality for modern ML applications (e.g., deep learning), due to the difficulty of evaluating MI in high dimensions. Motivated by recent findings on the compressibility of neural networks, we consider algorithms that operate by slicing the parameter space, i.e., trained on random lower-dimensional subspaces. We introduce new, tighter information-theoretic generalization bounds tailored for such algorithms, demonstrating that slicing improves generalization. Our bounds offer significant computational and statistical advantages over standard MI bounds, as they rely on scalable alternative measures of dependence, i.e., disintegrated mutual information and $k$-sliced mutual information. Then, we extend our analysis to algorithms whose parameters do not need to exactly lie on random subspaces, by leveraging rate-distortion theory. This strategy yields generalization bounds that incorporate a distortion term measuring model compressibility under slicing, thereby tightening existing bounds without compromising performance or requiring model compression. Building on this, we propose a regularization scheme enabling practitioners to control generalization through compressibility. Finally, we empirically validate our results and achieve the computation of non-vacuous information-theoretic generalization bounds for neural networks, a task that was previously out of reach.

[433]  arXiv:2406.04071 (cross-list from stat.ML) [pdf, other]
Title: Dynamic angular synchronization under smoothness constraints
Comments: 40 pages, 9 figures
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST)

Given an undirected measurement graph $\mathcal{H} = ([n], \mathcal{E})$, the classical angular synchronization problem consists of recovering unknown angles $\theta_1^*,\dots,\theta_n^*$ from a collection of noisy pairwise measurements of the form $(\theta_i^* - \theta_j^*) \mod 2\pi$, for all $\{i,j\} \in \mathcal{E}$. This problem arises in a variety of applications, including computer vision, time synchronization of distributed networks, and ranking from pairwise comparisons. In this paper, we consider a dynamic version of this problem where the angles, and also the measurement graphs evolve over $T$ time points. Assuming a smoothness condition on the evolution of the latent angles, we derive three algorithms for joint estimation of the angles over all time points. Moreover, for one of the algorithms, we establish non-asymptotic recovery guarantees for the mean-squared error (MSE) under different statistical models. In particular, we show that the MSE converges to zero as $T$ increases under milder conditions than in the static setting. This includes the setting where the measurement graphs are highly sparse and disconnected, and also when the measurement noise is large and can potentially increase with $T$. We complement our theoretical results with experiments on synthetic data.

[434]  arXiv:2406.04098 (cross-list from stat.ML) [pdf, other]
Title: A Large-Scale Neutral Comparison Study of Survival Models on Low-Dimensional Data
Comments: 42 pages, 28 figures
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

This work presents the first large-scale neutral benchmark experiment focused on single-event, right-censored, low-dimensional survival data. Benchmark experiments are essential in methodological research to scientifically compare new and existing model classes through proper empirical evaluation. Existing benchmarks in the survival literature are often narrow in scope, focusing, for example, on high-dimensional data. Additionally, they may lack appropriate tuning or evaluation procedures, or are qualitative reviews, rather than quantitative comparisons. This comprehensive study aims to fill the gap by neutrally evaluating a broad range of methods and providing generalizable conclusions. We benchmark 18 models, ranging from classical statistical approaches to many common machine learning methods, on 32 publicly available datasets. The benchmark tunes for both a discrimination measure and a proper scoring rule to assess performance in different settings. Evaluating on 8 survival metrics, we assess discrimination, calibration, and overall predictive performance of the tested models. Using discrimination measures, we find that no method significantly outperforms the Cox model. However, (tuned) Accelerated Failure Time models were able to achieve significantly better results with respect to overall predictive performance as measured by the right-censored log-likelihood. Machine learning methods that performed comparably well include Oblique Random Survival Forests under discrimination, and Cox-based likelihood-boosting under overall predictive performance. We conclude that for predictive purposes in the standard survival analysis setting of low-dimensional, right-censored data, the Cox Proportional Hazards model remains a simple and robust method, sufficient for practitioners.

[435]  arXiv:2406.04132 (cross-list from math.DS) [pdf, ps, other]
Title: Realizability of Subgroups by Subshifts of Finite Type
Authors: Nicolás Bitar
Comments: 26 pages, 2 figures. Comments welcome
Subjects: Dynamical Systems (math.DS); Discrete Mathematics (cs.DM); Group Theory (math.GR)

We study the problem of realizing families of subgroups as the set of stabilizers of configurations from a subshift of finite type (SFT). This problem generalizes both the existence of strongly and weakly aperiodic SFTs. We show that a finitely generated normal subgroup is realizable if and only if the quotient by the subgroup admits a strongly aperiodic SFT. We also show that if a subgroup is realizable, its subgroup membership problem must be decidable. The article also contains the introduction of periodically rigid groups, which are groups for which every weakly aperiodic subshift of finite type is strongly aperiodic. We conjecture that the only finitely generated periodically rigid groups are virtually $\mathbb{Z}$ groups and torsion-free virtually $\mathbb{Z}^2$ groups. Finally, we show virtually nilpotent and polycyclic groups satisfy the conjecture.

[436]  arXiv:2406.04142 (cross-list from math.OC) [pdf, other]
Title: Stochastic Polyak Step-sizes and Momentum: Convergence Guarantees and Practical Performance
Comments: 39 pages, 20 Figures
Subjects: Optimization and Control (math.OC); Machine Learning (cs.LG); Machine Learning (stat.ML)

Stochastic gradient descent with momentum, also known as Stochastic Heavy Ball method (SHB), is one of the most popular algorithms for solving large-scale stochastic optimization problems in various machine learning tasks. In practical scenarios, tuning the step-size and momentum parameters of the method is a prohibitively expensive and time-consuming process. In this work, inspired by the recent advantages of stochastic Polyak step-size in the performance of stochastic gradient descent (SGD), we propose and explore new Polyak-type variants suitable for the update rule of the SHB method. In particular, using the Iterate Moving Average (IMA) viewpoint of SHB, we propose and analyze three novel step-size selections: MomSPS$_{\max}$, MomDecSPS, and MomAdaSPS. For MomSPS$_{\max}$, we provide convergence guarantees for SHB to a neighborhood of the solution for convex and smooth problems (without assuming interpolation). If interpolation is also satisfied, then using MomSPS$_{\max}$, SHB converges to the true solution at a fast rate matching the deterministic HB. The other two variants, MomDecSPS and MomAdaSPS, are the first adaptive step-sizes for SHB that guarantee convergence to the exact minimizer without prior knowledge of the problem parameters and without assuming interpolation. The convergence analysis of SHB is tight and obtains the convergence guarantees of SGD with stochastic Polyak step-sizes as a special case. We supplement our analysis with experiments that validate the theory and demonstrate the effectiveness and robustness of the new algorithms.

[437]  arXiv:2406.04149 (cross-list from eess.IV) [pdf, ps, other]
Title: Characterizing segregation in blast rock piles a deep-learning approach leveraging aerial image analysis
Subjects: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI)

Blasted rock material serves a critical role in various engineering applications, yet the phenomenon of segregation-where particle sizes vary significantly along the gradient of a quarry pile-presents challenges for optimizing quarry material storage and handling. This study introduces an advanced image analysis methodology to characterize such segregation of rock fragments. The accurate delineation of detailed rock fragment size distributions was achieved through the analysis of drone-captured imagery, coupled with the application of an enhanced Unet semantic segmentation model integrated with an expansion-based post-processing technique. The quarry slope was stratified into four vertical sections, with the size distribution of each section quantified via ellipsoid shape approximations. Our results disclose pronounced vertical segregation patterns, with finer particles concentrated in the upper slope regions and coarser particles in the lower. Utilizing relative characteristic diameters, we offered insight into the degree of segregation, thereby illustrating the spatial heterogeneity in fragment size more clearly. The techniques outlined in this study deliver a scalable and accurate method for assessing fragment size distribution, with the potential to better inform resource management and operational decisions in quarry management.

[438]  arXiv:2406.04163 (cross-list from math.OC) [pdf, ps, other]
Title: Essentially Sharp Estimates on the Entropy Regularization Error in Discrete Discounted Markov Decision Processes
Comments: 25 pages, 1 figure
Subjects: Optimization and Control (math.OC); Machine Learning (cs.LG); Systems and Control (eess.SY)

We study the error introduced by entropy regularization of infinite-horizon discrete discounted Markov decision processes. We show that this error decreases exponentially in the inverse regularization strength both in a weighted KL-divergence and in value with a problem-specific exponent. We provide a lower bound matching our upper bound up to a polynomial factor. Our proof relies on the correspondence of the solutions of entropy-regularized Markov decision processes with gradient flows of the unregularized reward with respect to a Riemannian metric common in natural policy gradient methods. Further, this correspondence allows us to identify the limit of the gradient flow as the generalized maximum entropy optimal policy, thereby characterizing the implicit bias of the Kakade gradient flow which corresponds to a time-continuous version of the natural policy gradient method. We use this to show that for entropy-regularized natural policy gradient methods the overall error decays exponentially in the square root of the number of iterations improving existing sublinear guarantees.

[439]  arXiv:2406.04179 (cross-list from math.PR) [pdf, ps, other]
Title: On the zeros of partition functions with multi-spin interactions
Comments: 16 pages
Subjects: Probability (math.PR); Data Structures and Algorithms (cs.DS); Mathematical Physics (math-ph); Combinatorics (math.CO)

Let $X_1, \ldots, X_n$ be probability spaces, let $X$ be their direct product, let $\phi_1, \ldots, \phi_m: X \longrightarrow {\Bbb C}$ be random variables, each depending only on a few coordinates of a point $x=(x_1, \ldots, x_n)$, and let $f=\phi_1 + \ldots + \phi_m$. The expectation $E\thinspace e^{\lambda f}$, where $\lambda \in {\Bbb C}$, appears in statistical physics as the partition function of a system with multi-spin interactions, and also in combinatorics and computer science, where it is known as the partition function of edge-coloring models, tensor network contractions or a Holant polynomial. Assuming that each $\phi_i$ is 1-Lipschitz in the Hamming metric of $X$, that each $\phi_i(x)$ depends on at most $r \geq 2$ coordinates $x_1, \ldots, x_n$ of $x \in X$, and that for each $j$ there are at most $c \geq 1$ functions $\phi_i$ that depend on the coordinate $x_j$, we prove that $E\thinspace e^{\lambda f} \ne 0$ provided $| \lambda | \leq \ (3 c \sqrt{r-1})^{-1}$ and that the bound is sharp up to a logarithmic in $r$ factor. As a corollary, the value of the expectation can be efficiently approximated, provided $\lambda$ lies in a slightly smaller disc.

[440]  arXiv:2406.04188 (cross-list from eess.SP) [pdf, other]
Title: Digital Twin Aided RIS Communication: Robust Beamforming and Interference Management
Comments: Dataset and code files will be available soon on the DeepMIMIO website: this https URL
Subjects: Signal Processing (eess.SP); Information Theory (cs.IT)

Reconfigurable intelligent surfaces (RISs) are envisioned to play a key role in future wireless communication networks. However, channel estimation in RIS-aided wireless networks is challenging due to their passive nature and the large number of reflective elements, leading to high channel estimation overhead. Additionally, conventional methods like beam sweeping, which do not rely on explicit channel state information, often struggle in managing interference in multi-user networks. In this paper, we propose a novel approach that leverages digital twins (DTs) of the physical environments to approximate channels using electromagnetic 3D models and ray tracing, thus relaxing the need for channel estimation and extensive over-the-air computations in RIS-aided wireless networks. To address the digital twins channel approximation errors, we further refine this approach with a DT-specific robust transmission design that reliably meets minimum desired rates. The results show that our method secures these rates over 90% of the time, significantly outperforming beam sweeping, which achieves these rates less than 8% of the time due to its poor management of transmitting power and interference.

[441]  arXiv:2406.04203 (cross-list from math.PR) [pdf, other]
Title: Explicit Steady-State Approximations for Parallel Server Systems with Heterogeneous Servers
Subjects: Probability (math.PR); Systems and Control (eess.SY); Optimization and Control (math.OC)

The weighted-workload-task-allocation (WWTA) load-balancing policy is known to be throughput optimal for parallel server systems with heterogeneous servers. This work concerns the heavy traffic approximation of steady-state performance for parallel server systems operating under WWTA policy. Under a relaxed complete-resource-pooling condition, we prove that WWTA achieves a "strong form" of state-space collapse in heavy traffic and that the scaled workload for each server converges in distribution to an exponential random variable, whose parameter is explicitly given by system primitives. Various steady-state performance measures are shown to be approximated from this exponential random variable. Instead of proving a stochastic process limit followed by an interchange of limits - a method that dominates the literature, our method works directly with a pre-limit basic adjoint relationship (BAR) that characterizes the stationary distribution of each pre-limit system.

[442]  arXiv:2406.04212 (cross-list from eess.AS) [pdf, ps, other]
Title: Sound Event Bounding Boxes
Comments: Accepted for publication at Interspeech 2024
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)

Sound event detection is the task of recognizing sounds and determining their extent (onset/offset times) within an audio clip. Existing systems commonly predict sound presence confidence in short time frames. Then, thresholding produces binary frame-level presence decisions, with the extent of individual events determined by merging consecutive positive frames. In this paper, we show that frame-level thresholding degrades the prediction of the event extent by coupling it with the system's sound presence confidence. We propose to decouple the prediction of event extent and confidence by introducing SEBBs, which format each sound event prediction as a tuple of a class type, extent, and overall confidence. We also propose a change-detection-based algorithm to convert legacy frame-level outputs into SEBBs. We find the algorithm significantly improves the performance of DCASE 2023 Challenge systems, boosting the state of the art from .644 to .686 PSDS1.

[443]  arXiv:2406.04243 (cross-list from math.OC) [pdf, other]
Title: Policy Optimization in Control: Geometry and Algorithmic Implications
Subjects: Optimization and Control (math.OC); Systems and Control (eess.SY); Differential Geometry (math.DG)

This survey explores the geometric perspective on policy optimization within the realm of feedback control systems, emphasizing the intrinsic relationship between control design and optimization. By adopting a geometric viewpoint, we aim to provide a nuanced understanding of how various ``complete parameterization'' -- referring to the policy parameters together with its Riemannian geometry -- of control design problems, influence stability and performance of local search algorithms. The paper is structured to address key themes such as policy parameterization, the topology and geometry of stabilizing policies, and their implications for various (non-convex) dynamic performance measures. We focus on a few iconic control design problems, including the Linear Quadratic Regulator (LQR), Linear Quadratic Gaussian (LQG) control, and $\mathcal{H}_\infty$ control. In particular, we first discuss the topology and Riemannian geometry of stabilizing policies, distinguishing between their static and dynamic realizations. Expanding on this geometric perspective, we then explore structural properties of the aforementioned performance measures and their interplay with the geometry of stabilizing policies in presence of policy constraints; along the way, we address issues such as spurious stationary points, symmetries of dynamic feedback policies, and (non-)smoothness of the corresponding performance measures. We conclude the survey with algorithmic implications of policy optimization in feedback design.

[444]  arXiv:2406.04245 (cross-list from quant-ph) [pdf, ps, other]
Title: Online learning of a panoply of quantum objects
Comments: 34 pages. Comments welcome
Subjects: Quantum Physics (quant-ph); Machine Learning (cs.LG)

In many quantum tasks, there is an unknown quantum object that one wishes to learn. An online strategy for this task involves adaptively refining a hypothesis to reproduce such an object or its measurement statistics. A common evaluation metric for such a strategy is its regret, or roughly the accumulated errors in hypothesis statistics. We prove a sublinear regret bound for learning over general subsets of positive semidefinite matrices via the regularized-follow-the-leader algorithm and apply it to various settings where one wishes to learn quantum objects. For concrete applications, we present a sublinear regret bound for learning quantum states, effects, channels, interactive measurements, strategies, co-strategies, and the collection of inner products of pure states. Our bound applies to many other quantum objects with compact, convex representations. In proving our regret bound, we establish various matrix analysis results useful in quantum information theory. This includes a generalization of Pinsker's inequality for arbitrary positive semidefinite operators with possibly different traces, which may be of independent interest and applicable to more general classes of divergences.

[445]  arXiv:2406.04250 (cross-list from quant-ph) [pdf, other]
Title: Online learning of quantum processes
Comments: 14 + 72 pages, 6 figures
Subjects: Quantum Physics (quant-ph); Machine Learning (cs.LG); Machine Learning (stat.ML)

Among recent insights into learning quantum states, online learning and shadow tomography procedures are notable for their ability to accurately predict expectation values even of adaptively chosen observables. In contrast to the state case, quantum process learning tasks with a similarly adaptive nature have received little attention. In this work, we investigate online learning tasks for quantum processes. Whereas online learning is infeasible for general quantum channels, we show that channels of bounded gate complexity as well as Pauli channels can be online learned in the regret and mistake-bounded models of online learning. In fact, we can online learn probabilistic mixtures of any exponentially large set of known channels. We also provide a provably sample-efficient shadow tomography procedure for Pauli channels. Our results extend beyond quantum channels to non-Markovian multi-time processes, with favorable regret and mistake bounds, as well as a shadow tomography procedure. We complement our online learning upper bounds with mistake as well as computational lower bounds. On the technical side, we make use of the multiplicative weights update algorithm, classical adaptive data analysis, and Bell sampling, as well as tools from the theory of quantum combs for multi-time quantum processes. Our work initiates a study of online learning for classes of quantum channels and, more generally, non-Markovian quantum processes. Given the importance of online learning for state shadow tomography, this may serve as a step towards quantum channel variants of adaptive shadow tomography.

[446]  arXiv:2406.04259 (cross-list from math.AT) [pdf, other]
Title: Topological Stability and Latschev-type Reconstruction Theorems for $\boldsymbol{\mathrm{CAT}(κ)}$ Spaces
Subjects: Algebraic Topology (math.AT); Computational Geometry (cs.CG); Metric Geometry (math.MG)

We consider the problem of homotopy-type reconstruction of compact shapes $X\subset\mathbb{R}^N$ that are $\mathrm{CAT}(\kappa)$ in the intrinsic length metric. The reconstructed spaces are in the form of Vietoris--Rips complexes computed from a compact sample $S$, Hausdorff--close to the unknown shape $X$. Instead of the Euclidean metric on the sample, our reconstruction technique leverages a path-based metric to compute these complexes. As naturally emerging in the framework of reconstruction, we also study the Gromov--Hausdorff topological stability and finiteness problem for general compact $\mathrm{CAT}(\kappa)$ spaces. Our techniques provide novel sampling conditions alternative to the existing and commonly used techniques using weak feature size and $\mu$--reach. In particular, we introduce a new parameter, called the {\em restricted distortion}, which is a generalization of the well-known global distortion of embedding. We show examples of Euclidean subspaces, for which the known parameters such as the reach, $\mu$--reach and weak features size vanish, whereas the restricted distortion is finite, making our reconstruction results applicable for such spaces.

[447]  arXiv:2406.04269 (cross-list from eess.AS) [pdf, other]
Title: Beyond Performance Plateaus: A Comprehensive Study on Scalability in Speech Enhancement
Comments: 5 pages, 3 figures, 4 tables, Accepted by Interspeech 2024
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)

Deep learning-based speech enhancement (SE) models have achieved impressive performance in the past decade. Numerous advanced architectures have been designed to deliver state-of-the-art performance; however, their scalability potential remains unrevealed. Meanwhile, the majority of research focuses on small-sized datasets with restricted diversity, leading to a plateau in performance improvement. In this paper, we aim to provide new insights for addressing the above issues by exploring the scalability of SE models in terms of architectures, model sizes, compute budgets, and dataset sizes. Our investigation involves several popular SE architectures and speech data from different domains. Experiments reveal both similarities and distinctions between the scaling effects in SE and other tasks such as speech recognition. These findings further provide insights into the under-explored SE directions, e.g., larger-scale multi-domain corpora and efficiently scalable architectures.

[448]  arXiv:2406.04282 (cross-list from eess.SP) [pdf, other]
Title: A Statistical Characterization of Wireless Channels Conditioned on Side Information
Subjects: Signal Processing (eess.SP); Information Theory (cs.IT)

Statistical prior channel knowledge, such as the wide-sense-stationary-uncorrelated-scattering (WSSUS) property, and additional side information both can be used to enhance physical layer applications in wireless communication. Generally, the wireless channel's strongly fluctuating path phases and WSSUS property characterize the channel by a zero mean and Toeplitz-structured covariance matrices in different domains. In this work, we derive a framework to comprehensively categorize side information based on whether it preserves or abandons these statistical features conditioned on the given side information. To accomplish this, we combine insights from a generic channel model with the representation of wireless channels as probabilistic graphs. Additionally, we exemplify several applications, ranging from channel modeling to estimation and clustering, which demonstrate how the proposed framework can practically enhance physical layer methods utilizing machine learning (ML).

Replacements for Fri, 7 Jun 24

[449]  arXiv:1708.09157 (replaced) [pdf, other]
Title: Cross-lingual, Character-Level Neural Morphological Tagging
Comments: Published as a conference paper at EMNLP 2017; Fixed minor typos and cleaned up formatting
Subjects: Computation and Language (cs.CL)
[450]  arXiv:1912.12095 (replaced) [pdf, other]
Title: One Point, One Object: Simultaneous 3D Object Segmentation and 6-DOF Pose Estimation
Authors: Hongsen Liu
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[451]  arXiv:2008.05195 (replaced) [pdf, other]
Title: Competitive Demand Learning: A Non-cooperative Pricing Algorithm with Coordinated Price Experimentation
Journal-ref: Production and Operations Management 2024. Vol. 33(1)
Subjects: Computer Science and Game Theory (cs.GT)
[452]  arXiv:2009.04553 (replaced) [pdf, other]
Title: Threshold rates for properties of random codes
Comments: November 2021 version
Subjects: Information Theory (cs.IT); Discrete Mathematics (cs.DM); Combinatorics (math.CO)
[453]  arXiv:2106.03354 (replaced) [pdf, other]
Title: AI without networks
Comments: 47 pages with 8 figures + 33 pages supplementary with 7 figures and one table (total 80 pages)
Subjects: Machine Learning (cs.LG); Statistical Mechanics (cond-mat.stat-mech); Functional Analysis (math.FA); Machine Learning (stat.ML)
[454]  arXiv:2109.11725 (replaced) [pdf, other]
Title: Punctured Low-Bias Codes Behave Like Random Linear Codes
Subjects: Computational Complexity (cs.CC); Information Theory (cs.IT); Combinatorics (math.CO)
[455]  arXiv:2112.14734 (replaced) [pdf, other]
Title: Sequential memory improves sample and memory efficiency in Episodic Control
Comments: 21 pages, 8 figures
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO); Neurons and Cognition (q-bio.NC)
[456]  arXiv:2203.00387 (replaced) [pdf, other]
Title: Motion-aware Dynamic Graph Neural Network for Video Compressive Sensing
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[457]  arXiv:2203.12082 (replaced) [pdf, other]
Title: PlaneMVS: 3D Plane Reconstruction from Multi-View Stereo
Comments: CVPR 2022; source code: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[458]  arXiv:2205.08628 (replaced) [pdf, ps, other]
Title: Mechanized Analysis of Anselm's Modal Ontological Argument
Authors: John Rushby
Comments: This version includes a new postscript that considers alternative premises due to Andrzej Bilat (April 2021)
Journal-ref: International Journal for Philosophy of Religion, vol. 89, pp. 135-152, April 2021
Subjects: Logic in Computer Science (cs.LO)
[459]  arXiv:2205.10192 (replaced) [pdf, other]
Title: On the Trade-off between Redundancy and Local Coherence in Summarization
Comments: Accepted to JAIR
Journal-ref: Journal of Artificial Intelligence Research, 80, 273-326 (2024)
Subjects: Computation and Language (cs.CL)
[460]  arXiv:2206.06821 (replaced) [pdf, other]
Title: DoWhy-GCM: An extension of DoWhy for causal inference in graphical causal models
Journal-ref: Journal of Machine Learning Research 25(147), 2024
Subjects: Methodology (stat.ME); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
[461]  arXiv:2206.07438 (replaced) [pdf, other]
Title: Multi-Objective Hyperparameter Optimization in Machine Learning -- An Overview
Comments: Published at ACM TELO
Journal-ref: ACM Transactions on Evolutionary Learning and Optimization 3.4 (2023): 1-50
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[462]  arXiv:2206.08465 (replaced) [pdf, other]
Title: Variational Estimators of the Degree-corrected Latent Block Model for Bipartite Networks
Journal-ref: Journal of Machine Learning Research 25 (2024) 1-42
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[463]  arXiv:2207.12264 (replaced) [pdf, ps, other]
Title: Dynamics and triggers of misinformation on vaccines
Subjects: Physics and Society (physics.soc-ph); Computers and Society (cs.CY); Machine Learning (cs.LG); Social and Information Networks (cs.SI)
[464]  arXiv:2208.10790 (replaced) [pdf, other]
Title: Event-Triggered Time-Varying Bayesian Optimization
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[465]  arXiv:2209.00936 (replaced) [pdf, other]
Title: A Class-Aware Representation Refinement Framework for Graph Classification
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[466]  arXiv:2210.04288 (replaced) [pdf, other]
Title: CoopHash: Cooperative Learning of Multipurpose Descriptor and Contrastive Pair Generator via Variational MCMC Teaching for Supervised Image Hashing
Subjects: Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR); Machine Learning (cs.LG)
[467]  arXiv:2210.17180 (replaced) [pdf, other]
Title: Automated Dominative Subspace Mining for Efficient Neural Architecture Search
Comments: Published in IEEE TCSVT
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[468]  arXiv:2212.01976 (replaced) [pdf, other]
Title: FedCC: Robust Federated Learning against Model Poisoning Attacks
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
[469]  arXiv:2212.02459 (replaced) [pdf, ps, other]
Title: Resilient Distributed Optimization for Multi-Agent Cyberphysical Systems
Subjects: Robotics (cs.RO); Signal Processing (eess.SP); Systems and Control (eess.SY)
[470]  arXiv:2212.10192 (replaced) [pdf, other]
Title: Adam: Dense Retrieval Distillation with Adaptive Dark Examples
Comments: 13 pages, 3 figures
Subjects: Computation and Language (cs.CL)
[471]  arXiv:2212.13462 (replaced) [pdf, other]
Title: MVTN: Learning Multi-View Transformations for 3D Understanding
Comments: under review journal extension for the ICCV 2021 paper arXiv:2011.13244
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
[472]  arXiv:2301.02428 (replaced) [pdf, other]
Title: Sensitivity analysis using Physics-informed neural networks
Comments: 22 pages, 11 figures
Subjects: Numerical Analysis (math.NA)
[473]  arXiv:2301.06335 (replaced) [pdf, ps, other]
Title: Approximating the closest structured singular matrix polynomial
Comments: 28 pages
Subjects: Numerical Analysis (math.NA)
[474]  arXiv:2301.08146 (replaced) [pdf, other]
Title: What's happening in your neighborhood? A Weakly Supervised Approach to Detect Local News
Comments: 8 pages, 2 figures, 5 tables
Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL); Machine Learning (cs.LG)
[475]  arXiv:2302.01713 (replaced) [pdf, other]
Title: Towards Avoiding the Data Mess: Industry Insights from Data Mesh Implementations
Subjects: Artificial Intelligence (cs.AI)
[476]  arXiv:2302.02785 (replaced) [pdf, other]
Title: An intelligent tutor for planning in large partially observable environments
Subjects: Artificial Intelligence (cs.AI)
[477]  arXiv:2302.05372 (replaced) [pdf, ps, other]
Title: Towards Minimax Optimality of Model-based Robust Reinforcement Learning
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[478]  arXiv:2302.08053 (replaced) [pdf, ps, other]
Title: Selective Noise Suppression Methods Using Random SVPWM to Shape the Noise Spectrum of PMSMs
Authors: Jian Wen (1 and 2), Xiaobin Cheng (1 and 2), Peifeng Ji (1), Jun Yang (1 and 2), Feng Zhao (3) ((1) Institute of Acoustics, Chinese Academy of Sciences, (2) University of Chinese Academy of Sciences, (3) Institute of Electrical Engineering, Chinese Academy of Sciences)
Comments: 8 pages, 15 figures
Subjects: Systems and Control (eess.SY); Signal Processing (eess.SP)
[479]  arXiv:2302.12476 (replaced) [pdf, ps, other]
Title: Asymptotic behaviour of the semidiscrete FE approximations to weakly damped wave equations with minimal smoothness on initial data
Comments: 28 pages, 18 figures, 5 tables
Subjects: Numerical Analysis (math.NA)
[480]  arXiv:2303.00368 (replaced) [pdf, ps, other]
Title: Sufficient conditions for the surjectivity of radical curve parametrizations
Comments: 18 pages, no figures
Journal-ref: Journal of Algebra, Volume 640, 2024, Pages 129-146, ISSN 0021-8693
Subjects: Algebraic Geometry (math.AG); Symbolic Computation (cs.SC)
[481]  arXiv:2303.07139 (replaced) [pdf, other]
Title: Comparing statistical and machine learning methods for time series forecasting in data-driven logistics -- A simulation study
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[482]  arXiv:2304.07889 (replaced) [pdf, other]
Title: Ontology for Healthcare Artificial Intelligence Privacy in Brazil
Subjects: Artificial Intelligence (cs.AI)
[483]  arXiv:2304.08650 (replaced) [pdf, other]
Title: UAV-based Maritime Communications: Relaying to Enhance the Link Quality
Subjects: Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)
[484]  arXiv:2304.14545 (replaced) [pdf, other]
Title: Augmented balancing weights as linear regression
Subjects: Methodology (stat.ME); Machine Learning (cs.LG); Econometrics (econ.EM); Machine Learning (stat.ML)
[485]  arXiv:2305.11915 (replaced) [pdf, other]
Title: PINNs error estimates for nonlinear equations in $\mathbb{R}$-smooth Banach spaces
Comments: 30 pages, 9 figures
Subjects: Functional Analysis (math.FA); Machine Learning (cs.LG); Numerical Analysis (math.NA)
[486]  arXiv:2305.12659 (replaced) [pdf, other]
Title: UVOSAM: A Mask-free Paradigm for Unsupervised Video Object Segmentation via Segment Anything Model
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[487]  arXiv:2305.12798 (replaced) [pdf, other]
Title: Word Embeddings Are Steers for Language Models
Comments: ACL 2024 Long Paper, 9 pages, 3 figures
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[488]  arXiv:2305.14109 (replaced) [pdf, other]
Title: Combining Multi-Objective Bayesian Optimization with Reinforcement Learning for TinyML
Comments: 14 pages, 9 figures
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[489]  arXiv:2305.14592 (replaced) [pdf, other]
Title: Meta-Tuning LLMs to Leverage Lexical Knowledge for Generalizable Language Style Understanding
Comments: Accepted to ACL 2024 main conference
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
[490]  arXiv:2305.15577 (replaced) [pdf, other]
Title: Minimizing $f$-Divergences by Interpolating Velocity Fields
Comments: This manuscript is an extended version of the ICML2024 version. The code for reproducing our results can be found at this https URL
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[491]  arXiv:2305.16209 (replaced) [pdf, other]
Title: C-MCTS: Safe Planning with Monte Carlo Tree Search
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[492]  arXiv:2305.17139 (replaced) [pdf, other]
Title: A Measure-Theoretic Axiomatisation of Causality
Subjects: Artificial Intelligence (cs.AI); Statistics Theory (math.ST)
[493]  arXiv:2305.17834 (replaced) [pdf, other]
Title: Streaming Audio Transformers for Online Audio Tagging
Comments: Interspeech2024
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[494]  arXiv:2306.01376 (replaced) [pdf, other]
Title: DSHGT: Dual-Supervisors Heterogeneous Graph Transformer -- A pioneer study of using heterogeneous graph learning for detecting software vulnerabilities
Subjects: Software Engineering (cs.SE); Machine Learning (cs.LG)
[495]  arXiv:2306.03061 (replaced) [pdf, other]
Title: Structured Voronoi Sampling
Comments: Accepted at NeurIPS 2023
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[496]  arXiv:2306.04815 (replaced) [pdf, other]
Title: Catapults in SGD: spikes in the training loss and their impact on generalization through feature learning
Comments: ICML 2024
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
[497]  arXiv:2306.05001 (replaced) [pdf, other]
Title: COURIER: Contrastive User Intention Reconstruction for Large-Scale Visual Recommendation
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[498]  arXiv:2306.06209 (replaced) [pdf, other]
Title: Backdoor Attack with Sparse and Invisible Trigger
Comments: This paper was accepted by IEEE Transactions on Information Forensics and Security (TIFS). The first two authors contributed equally to this work. 14 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
[499]  arXiv:2306.06844 (replaced) [pdf, other]
Title: Provably Efficient Bayesian Optimization with Unknown Gaussian Process Hyperparameter Estimation
Comments: 25 pages, 5 figures
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[500]  arXiv:2306.07550 (replaced) [pdf, ps, other]
Title: Nested Sequents for Intermediate Logics: The Case of Gödel-Dummett Logics
Authors: Tim S. Lyon
Subjects: Logic in Computer Science (cs.LO); Logic (math.LO)
[501]  arXiv:2306.08141 (replaced) [pdf, other]
Title: ArtWhisperer: A Dataset for Characterizing Human-AI Interactions in Artistic Creations
Comments: 31 pages, 27 figures, ICML 2024
Subjects: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
[502]  arXiv:2306.09381 (replaced) [pdf, other]
Title: Spatiotemporal-Augmented Graph Neural Networks for Human Mobility Simulation
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[503]  arXiv:2306.09782 (replaced) [pdf, other]
Title: Full Parameter Fine-tuning for Large Language Models with Limited Resources
Comments: ACL 2024
Subjects: Computation and Language (cs.CL)
[504]  arXiv:2306.13493 (replaced) [pdf, other]
Title: Smoothed Circulant Embedding with Applications to Multilevel Monte Carlo Methods for PDEs with Random Coefficients
Comments: 36 pages, 11 figures, submitted to IMA Journal of Numerical Analysis
Subjects: Numerical Analysis (math.NA)
[505]  arXiv:2306.14075 (replaced) [pdf, ps, other]
Title: Join Size Bounds using Lp-Norms on Degree Sequences
Subjects: Databases (cs.DB); Information Theory (cs.IT)
[506]  arXiv:2306.17193 (replaced) [pdf, other]
Title: Uncovering the Limits of Machine Learning for Automatic Vulnerability Detection
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)
[507]  arXiv:2307.02818 (replaced) [pdf, other]
Title: Degree Heterogeneity in Higher-Order Networks: Inference in the Hypergraph $\boldsymbolβ$-Model
Subjects: Statistics Theory (math.ST); Information Theory (cs.IT); Social and Information Networks (cs.SI); Machine Learning (stat.ML)
[508]  arXiv:2307.05141 (replaced) [pdf, other]
Title: Deep Probabilistic Movement Primitives with a Bayesian Aggregator
Subjects: Robotics (cs.RO); Machine Learning (cs.LG)
[509]  arXiv:2307.15593 (replaced) [pdf, other]
Title: Robust Distortion-free Watermarks for Language Models
Comments: reformatting of camera-ready version accepted to TMLR, with minor edits to introduction
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Cryptography and Security (cs.CR)
[510]  arXiv:2307.16422 (replaced) [pdf, other]
Title: Statistically Optimal Generative Modeling with Maximum Deviation from the Empirical Distribution
Comments: ICML 2024
Subjects: Statistics Theory (math.ST); Machine Learning (cs.LG); Machine Learning (stat.ML)
[511]  arXiv:2308.06020 (replaced) [pdf, other]
Title: A direct sampling method based on the Green's function for time-dependent inverse scattering problems
Comments: 18 pages, 12 figures, 2 tables
Subjects: Numerical Analysis (math.NA); Mathematical Physics (math-ph)
[512]  arXiv:2308.07876 (replaced) [pdf, other]
Title: Leveraging Codebook Knowledge with NLI and ChatGPT for Zero-Shot Political Relation Classification
Comments: ACL 2024
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Machine Learning (cs.LG)
[513]  arXiv:2308.08841 (replaced) [pdf, other]
Title: Machine Learning-Assisted Discovery of Flow Reactor Designs
Comments: 11 pages, 9 figures, as accepted Nature Chemical Engineering
Subjects: Computational Engineering, Finance, and Science (cs.CE); Machine Learning (cs.LG)
[514]  arXiv:2308.08858 (replaced) [pdf, ps, other]
Title: Improving Sample Efficiency of Model-Free Algorithms for Zero-Sum Markov Games
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Science and Game Theory (cs.GT); Machine Learning (stat.ML)
[515]  arXiv:2308.12568 (replaced) [pdf, other]
Title: A Small and Fast BERT for Chinese Medical Punctuation Restoration
Comments: 5 pages, 2 figures, Accepted by INTERSPEECH 2024
Subjects: Computation and Language (cs.CL)
[516]  arXiv:2308.14915 (replaced) [pdf, other]
Title: Information-driven Affordance Discovery for Efficient Robotic Manipulation
Subjects: Robotics (cs.RO)
[517]  arXiv:2309.00169 (replaced) [pdf, other]
Title: RepCodec: A Speech Representation Codec for Speech Tokenization
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[518]  arXiv:2309.00610 (replaced) [pdf, other]
Title: CityDreamer: Compositional Generative Model of Unbounded 3D Cities
Comments: CVPR 2024. Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[519]  arXiv:2309.06054 (replaced) [pdf, other]
Title: Breaking through the learning plateaus of in-context learning in Transformer
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
[520]  arXiv:2309.07287 (replaced) [pdf, other]
Title: Enhancing Child Vocalization Classification with Phonetically-Tuned Embeddings for Assisting Autism Diagnosis
Comments: Accepted to Interspeech 2024
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[521]  arXiv:2309.08047 (replaced) [pdf, other]
Title: Bias in News Summarization: Measures, Pitfalls and Corpora
Comments: Findings of ACL 24 Camera Ready
Subjects: Computation and Language (cs.CL)
[522]  arXiv:2309.08511 (replaced) [pdf, other]
Title: Generalised Diffusion Probabilistic Scale-Spaces
Authors: Pascal Peter
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[523]  arXiv:2309.09524 (replaced) [pdf, other]
Title: Improved Factorized Neural Transducer Model For text-only Domain Adaptation
Comments: Interspeech 2024 cameraready
Subjects: Computation and Language (cs.CL)
[524]  arXiv:2309.09552 (replaced) [pdf, other]
Title: A Multitask Training Approach to Enhance Whisper with Contextual Biasing and Open-Vocabulary Keyword Spotting
Comments: 5 pages, 2 figures, Accepted to InterSpeech 2024
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[525]  arXiv:2309.09836 (replaced) [pdf, other]
Title: RECAP: Retrieval-Augmented Audio Captioning
Comments: ICASSP 2024. Code and data: this https URL
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Sound (cs.SD)
[526]  arXiv:2309.10740 (replaced) [pdf, other]
Title: ConsistencyTTA: Accelerating Diffusion-Based Text-to-Audio Generation with Consistency Distillation
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[527]  arXiv:2309.11361 (replaced) [pdf, other]
Title: Knowledge Graph Question Answering for Materials Science (KGQA4MAT): Developing Natural Language Interface for Metal-Organic Frameworks Knowledge Graph (MOF-KG) Using LLM
Comments: In 17th International Conference on Metadata and Semantics Research, October 2023
Subjects: Artificial Intelligence (cs.AI)
[528]  arXiv:2309.15402 (replaced) [pdf, other]
Title: Navigate through Enigmatic Labyrinth A Survey of Chain of Thought Reasoning: Advances, Frontiers and Future
Comments: Accepted to ACL 2024
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[529]  arXiv:2309.16002 (replaced) [pdf, other]
Title: Robust Blockwise Random Pivoting: Fast and Accurate Adaptive Interpolative Decomposition
Subjects: Numerical Analysis (math.NA)
[530]  arXiv:2309.17419 (replaced) [pdf, other]
Title: Enumerating minimal solution sets for metric graph problems
Comments: 26 pages, 4 figures
Subjects: Discrete Mathematics (cs.DM); Data Structures and Algorithms (cs.DS); Combinatorics (math.CO)
[531]  arXiv:2310.00160 (replaced) [pdf, other]
Title: Self-Specialization: Uncovering Latent Expertise within Large Language Models
Comments: ACL 2024 (Findings; Long Paper)
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[532]  arXiv:2310.00165 (replaced) [pdf, other]
Title: SCoRe: Submodular Combinatorial Representation Learning
Comments: Accepted to ICML 2024
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
[533]  arXiv:2310.00530 (replaced) [pdf, ps, other]
Title: Multi-tiling Neural Radiance Field (NeRF) -- Geometric Assessment on Large-scale Aerial Datasets
Comments: 9 Figure
Journal-ref: The Photogrammetric Record, 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[534]  arXiv:2310.02442 (replaced) [pdf, other]
Title: GenCO: Generating Diverse Designs with Combinatorial Constraints
Comments: Accepted to ICML 2024
Subjects: Machine Learning (cs.LG)
[535]  arXiv:2310.02721 (replaced) [pdf, other]
Title: Leveraging Temporal Graph Networks Using Module Decoupling
Subjects: Machine Learning (cs.LG)
[536]  arXiv:2310.03309 (replaced) [pdf, other]
Title: Concise and Organized Perception Facilitates Reasoning in Large Language Models
Comments: 26 pages
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[537]  arXiv:2310.03938 (replaced) [pdf, other]
Title: EFFUSE: Efficient Self-Supervised Feature Fusion for E2E ASR in Low Resource and Multilingual Scenarios
Comments: 5 pages, 2 figures, 3 tables
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[538]  arXiv:2310.04022 (replaced) [pdf, other]
Title: Nonlinear Methods for Shape Optimization Problems in Liquid Crystal Tactoids
Subjects: Numerical Analysis (math.NA)
[539]  arXiv:2310.04400 (replaced) [pdf, other]
Title: On the Embedding Collapse when Scaling up Recommendation Models
Comments: ICML 2024 Accepted
Subjects: Machine Learning (cs.LG); Information Retrieval (cs.IR)
[540]  arXiv:2310.04406 (replaced) [pdf, other]
Title: Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models
Comments: Code at this https URL
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[541]  arXiv:2310.04764 (replaced) [pdf, other]
Title: Characterizations of Monadic Second Order Definable Context-Free Sets of Graphs
Subjects: Formal Languages and Automata Theory (cs.FL); Logic in Computer Science (cs.LO)
[542]  arXiv:2310.05141 (replaced) [pdf, other]
Title: Transferable Availability Poisoning Attacks
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)
[543]  arXiv:2310.06430 (replaced) [pdf, other]
Title: Conformal Prediction for Deep Classifier via Label Ranking
Comments: Accepted by ICML 2024
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Statistics Theory (math.ST)
[544]  arXiv:2310.07579 (replaced) [pdf, other]
Title: In-Context Unlearning: Language Models as Few Shot Unlearners
Comments: Accepted at ICML 2024
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)
[545]  arXiv:2310.09639 (replaced) [pdf, other]
Title: DPZero: Private Fine-Tuning of Language Models without Backpropagation
Comments: ICML 2024
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Optimization and Control (math.OC); Machine Learning (stat.ML)
[546]  arXiv:2310.10195 (replaced) [pdf, other]
Title: AdaLomo: Low-memory Optimization with Adaptive Learning Rate
Comments: ACL 2024 camera ready version
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)
[547]  arXiv:2310.11897 (replaced) [pdf, other]
Title: Accelerated Policy Gradient: On the Convergence Rates of the Nesterov Momentum for Reinforcement Learning
Comments: 69 pages, 17 figures
Subjects: Machine Learning (cs.LG)
[548]  arXiv:2310.12419 (replaced) [pdf, other]
Title: Toward Unbiased Multiple-Target Fuzzing with Path Diversity
Subjects: Cryptography and Security (cs.CR)
[549]  arXiv:2310.12956 (replaced) [pdf, other]
Title: Eureka-Moments in Transformers: Multi-Step Tasks Reveal Softmax Induced Optimization Problems
Comments: Accepted at ICML 2024
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[550]  arXiv:2310.13571 (replaced) [pdf, ps, other]
Title: Why Can Large Language Models Generate Correct Chain-of-Thoughts?
Subjects: Computation and Language (cs.CL)
[551]  arXiv:2310.13585 (replaced) [pdf, other]
Title: POTLoc: Pseudo-Label Oriented Transformer for Point-Supervised Temporal Action Localization
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[552]  arXiv:2310.18924 (replaced) [pdf, other]
Title: Remaining useful life prediction of Lithium-ion batteries using spatio-temporal multimodal attention networks
Subjects: Machine Learning (cs.LG)
[553]  arXiv:2310.19220 (replaced) [pdf, other]
Title: From Stream to Pool: Dynamic Pricing Beyond i.i.d. Arrivals
Comments: Authors are alphabetically ordered
Subjects: Machine Learning (cs.LG); Computer Science and Game Theory (cs.GT)
[554]  arXiv:2311.02462 (replaced) [pdf, ps, other]
Title: Levels of AGI for Operationalizing Progress on the Path to AGI
Comments: version 4 - Position Paper accepted to ICML 2024. Note that due to ICML position paper titling format requirements, the title has changed slightly from that of the original arXiv pre-print. The original pre-print title was "Levels of AGI: Operationalizing Progress on the Path to AGI" but the official published title for ICML 2024 is "Levels of AGI for Operationalizing Progress on the Path to AGI"
Journal-ref: Proceedings of ICML 2024
Subjects: Artificial Intelligence (cs.AI)
[555]  arXiv:2311.02868 (replaced) [pdf, other]
Title: Sample Complexity Bounds for Estimating Probability Divergences under Invariances
Comments: ICML 2024
Subjects: Machine Learning (cs.LG)
[556]  arXiv:2311.03688 (replaced) [pdf, ps, other]
Title: Generalized Hamming weights and minimal shifts of Orlik-Terao algebras
Comments: 11 pages
Subjects: Information Theory (cs.IT); Commutative Algebra (math.AC)
[557]  arXiv:2311.05760 (replaced) [pdf, ps, other]
Title: Compressed and Sparse Models for Non-Convex Decentralized Learning
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS); Multiagent Systems (cs.MA); Optimization and Control (math.OC)
[558]  arXiv:2311.08967 (replaced) [pdf, other]
Title: Homomorphic Polynomial Public Key Cryptography for Quantum-secure Digital Signature
Comments: 16 pages, 1 figure
Subjects: Cryptography and Security (cs.CR)
[559]  arXiv:2311.09033 (replaced) [pdf, other]
Title: MELA: Multilingual Evaluation of Linguistic Acceptability
Comments: ACL 2024 camera-ready
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[560]  arXiv:2311.09048 (replaced) [pdf, other]
Title: GRASP: A novel benchmark for evaluating language GRounding And Situated Physics understanding in multimodal language models
Subjects: Computation and Language (cs.CL)
[561]  arXiv:2311.09109 (replaced) [pdf, other]
Title: Does Pre-trained Language Model Actually Infer Unseen Links in Knowledge Graph Completion?
Comments: Accepted at NAACL 2024 main oral, 15 pages, 10 figures
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[562]  arXiv:2311.09213 (replaced) [pdf, other]
Title: GENEVA: GENErating and Visualizing branching narratives using LLMs
Comments: Accepted at IEEE Conference on Games 2024
Subjects: Computation and Language (cs.CL)
[563]  arXiv:2311.09562 (replaced) [pdf, other]
Title: TextEE: Benchmark, Reevaluation, Reflections, and Future Challenges in Event Extraction
Comments: Paper accepted by ACL 2024 Findings
Subjects: Computation and Language (cs.CL)
[564]  arXiv:2311.09832 (replaced) [pdf, other]
Title: WatME: Towards Lossless Watermarking Through Lexical Redundancy
Comments: Accepted to ACL 2024 main conference
Subjects: Computation and Language (cs.CL)
[565]  arXiv:2311.10680 (replaced) [pdf, other]
Title: Optimal Embedding Dimension for Sparse Subspace Embeddings
Comments: STOC 2024
Subjects: Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG); Numerical Analysis (math.NA); Machine Learning (stat.ML)
[566]  arXiv:2311.14251 (replaced) [pdf, ps, other]
Title: Optimal 1-bit Error Exponent for 2-hop Relaying with Binary-Input Channels
Comments: IEEE Transactions on Information Theory
Subjects: Information Theory (cs.IT)
[567]  arXiv:2311.17451 (replaced) [pdf, other]
Title: Wireless Network Digital Twin for 6G: Generative AI as A Key Enabler
Subjects: Networking and Internet Architecture (cs.NI); Machine Learning (cs.LG)
[568]  arXiv:2311.18610 (replaced) [pdf, other]
Title: DiffCAD: Weakly-Supervised Probabilistic CAD Model Retrieval and Alignment from an RGB Image
Comments: SIGGRAPH 2024, Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[569]  arXiv:2311.18717 (replaced) [pdf, other]
Title: NFT Wash Trading: Direct vs. Indirect Estimation
Subjects: General Economics (econ.GN); Cryptography and Security (cs.CR); Multiagent Systems (cs.MA); Trading and Market Microstructure (q-fin.TR); Applications (stat.AP)
[570]  arXiv:2312.01616 (replaced) [pdf, other]
Title: SchurVINS: Schur Complement-Based Lightweight Visual Inertial Navigation System
Comments: Accepted by CVPR2024
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
[571]  arXiv:2312.03668 (replaced) [pdf, other]
Title: Integrating Pre-Trained Speech and Language Models for End-to-End Speech Recognition
Comments: 17 pages, 4 figures, 9 tables, accepted for Findings of ACL 2024. The model is available at this https URL
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
[572]  arXiv:2312.05601 (replaced) [pdf, other]
Title: A Meshless Solver for Blood Flow Simulations in Elastic Vessels Using Physics-Informed Neural Network
Subjects: Numerical Analysis (math.NA); Fluid Dynamics (physics.flu-dyn)
[573]  arXiv:2312.07104 (replaced) [pdf, other]
Title: SGLang: Efficient Execution of Structured Language Model Programs
Subjects: Artificial Intelligence (cs.AI); Programming Languages (cs.PL)
[574]  arXiv:2312.07364 (replaced) [pdf, other]
Title: Collapse-Aware Triplet Decoupling for Adversarially Robust Image Retrieval
Comments: Accepted by ICML2024
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[575]  arXiv:2312.07671 (replaced) [pdf, ps, other]
Title: Reacting like Humans: Incorporating Intrinsic Human Behaviors into NAO through Sound-Based Reactions to Fearful and Shocking Events for Enhanced Sociability
Comments: 16 pages, 11 figures
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV)
[576]  arXiv:2312.08800 (replaced) [pdf, other]
Title: Evaluating Large Language Models for Health-related Queries with Presuppositions
Comments: Findings of ACL 2024
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
[577]  arXiv:2312.10104 (replaced) [pdf, other]
Title: Lever LM: Configuring In-Context Sequence to Lever Large Vision Language Models
Comments: 17 pages, 6 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Machine Learning (cs.LG)
[578]  arXiv:2312.14591 (replaced) [pdf, other]
Title: Reasons to Reject? Aligning Language Models with Judgments
Comments: Accepted at ACL 2024 Findings. Our source codes and models are publicly available at this https URL
Subjects: Computation and Language (cs.CL)
[579]  arXiv:2312.14667 (replaced) [pdf, other]
Title: Token-Level Contrastive Learning with Modality-Aware Prompting for Multimodal Intent Recognition
Comments: Accepted by AAAI 2024 (Main Track, Long Paper)
Subjects: Multimedia (cs.MM); Machine Learning (cs.LG)
[580]  arXiv:2312.14792 (replaced) [pdf, ps, other]
Title: The Rate-Distortion-Perception-Classification Tradeoff: Joint Source Coding and Modulation via Inverse-Domain GANs
Comments: Paper accepted in IEEE Transactions on Signal Processing
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Information Theory (cs.IT); Probability (math.PR)
[581]  arXiv:2312.14922 (replaced) [pdf, other]
Title: Learning from higher-order statistics, efficiently: hypothesis tests, random features, and neural networks
Subjects: Machine Learning (stat.ML); Statistical Mechanics (cond-mat.stat-mech); Machine Learning (cs.LG)
[582]  arXiv:2312.16752 (replaced) [pdf, other]
Title: Relationships Between Necessary Conditions for Feedback Stabilizability
Comments: 15 pages, 2 figures; v2 adds the 2 figures and 3 new examples, and fixes some errors
Subjects: Optimization and Control (math.OC); Systems and Control (eess.SY); Algebraic Topology (math.AT); Differential Geometry (math.DG)
[583]  arXiv:2312.17518 (replaced) [pdf, ps, other]
Title: An algebraic characterization of binary CSS-T codes and cyclic CSS-T codes for quantum fault tolerance
Journal-ref: Quantum Inf Process 23, 230 (2024)
Subjects: Information Theory (cs.IT)
[584]  arXiv:2401.00793 (replaced) [pdf, other]
Title: SecFormer: Towards Fast and Accurate Privacy-Preserving Inference for Large Language Models
Comments: Accepted by ACL 2024
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Cryptography and Security (cs.CR)
[585]  arXiv:2401.01017 (replaced) [pdf, other]
Title: A Survey of Computation Offloading with Task Type
Authors: Siqi Zhang, Na Yi, Yi Ma
Comments: Accepted by IEEE Transactions on Intelligent Transportation Systems
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
[586]  arXiv:2401.02058 (replaced) [pdf, other]
Title: Neural Collapse for Cross-entropy Class-Imbalanced Learning with Unconstrained ReLU Feature Model
Comments: 2024 International Conference on Machine Learning
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[587]  arXiv:2401.04621 (replaced) [pdf, other]
Title: DebugBench: Evaluating Debugging Capability of Large Language Models
Comments: Accepted as Findings of ACL 2024
Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[588]  arXiv:2401.05749 (replaced) [pdf, other]
Title: A Shocking Amount of the Web is Machine Translated: Insights from Multi-Way Parallelism
Comments: Accepted at ACL Findings 2024
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[589]  arXiv:2401.06568 (replaced) [pdf, other]
Title: Lost in the Source Language: How Large Language Models Evaluate the Quality of Machine Translation
Comments: Accepted by ACL2024 Findings
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[590]  arXiv:2401.06688 (replaced) [pdf, other]
Title: Don't Rank, Combine! Combining Machine Translation Hypotheses Using Quality Estimation
Comments: Accepted at ACL 2024
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
[591]  arXiv:2401.07888 (replaced) [pdf, other]
Title: Multifidelity domain decomposition-based physics-informed neural networks and operators for time-dependent problems
Subjects: Numerical Analysis (math.NA); Machine Learning (cs.LG)
[592]  arXiv:2401.08295 (replaced) [pdf, other]
Title: SAPT: A Shared Attention Framework for Parameter-Efficient Continual Learning of Large Language Models
Comments: To appear at ACL 2024
Subjects: Computation and Language (cs.CL)
[593]  arXiv:2401.09670 (replaced) [pdf, other]
Title: DistServe: Disaggregating Prefill and Decoding for Goodput-optimized Large Language Model Serving
Comments: OSDI 2024
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
[594]  arXiv:2401.10186 (replaced) [pdf, other]
Title: Beyond Traditional Benchmarks: Analyzing Behaviors of Open LLMs on Data-to-Text Generation
Comments: Accepted to ACL 2024 Main Conference
Subjects: Computation and Language (cs.CL)
[595]  arXiv:2401.10338 (replaced) [pdf, ps, other]
Title: MELODY: Robust Semi-Supervised Hybrid Model for Entity-Level Online Anomaly Detection with Multivariate Time Series
Subjects: Machine Learning (cs.LG)
[596]  arXiv:2401.10774 (replaced) [pdf, other]
Title: Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads
Comments: The code for this implementation is available at this https URL
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)
[597]  arXiv:2401.11382 (replaced) [pdf, other]
Title: Using Large Language Model for End-to-End Chinese ASR and NER
Comments: 5 pages, 2 figures, Accepted to InterSpeech 2024
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[598]  arXiv:2401.13388 (replaced) [pdf, other]
Title: UNIMO-G: Unified Image Generation through Multimodal Conditional Diffusion
Comments: Accepted by ACL 2024, Main Conference, Long Paper
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[599]  arXiv:2401.13649 (replaced) [pdf, other]
Title: VisualWebArena: Evaluating Multimodal Agents on Realistic Visual Web Tasks
Comments: Accepted to ACL 2024. 24 pages. Project page: this https URL
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
[600]  arXiv:2401.14556 (replaced) [pdf, other]
Title: Looking Right is Sometimes Right: Investigating the Capabilities of Decoder-only LLMs for Sequence Labeling
Comments: Accepted at ACL 2024 Findings
Subjects: Computation and Language (cs.CL)
[601]  arXiv:2401.16467 (replaced) [pdf, other]
Title: ReGAL: Refactoring Programs to Discover Generalizable Abstractions
Comments: ICML 2024 Camera-Ready; First two authors contributed equally; Code: this https URL
Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Programming Languages (cs.PL)
[602]  arXiv:2401.17263 (replaced) [pdf, other]
Title: Robust Prompt Optimization for Defending Language Models Against Jailbreaking Attacks
Comments: Code available at this https URL
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
[603]  arXiv:2401.17264 (replaced) [pdf, other]
Title: Proactive Detection of Voice Cloning with Localized Watermarking
Comments: Published at ICML 2024. Code at this https URL - webpage at this https URL
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)
[604]  arXiv:2401.18046 (replaced) [pdf, other]
Title: Multipath parsing in the brain
Comments: Accepted at ACL2024, main conference. 15 pages
Subjects: Computation and Language (cs.CL)
[605]  arXiv:2402.00258 (replaced) [pdf, other]
Title: Multi-group Learning for Hierarchical Groups
Comments: Accepted in International Conference on Machine Learning 2024 (ICML 2024)
Subjects: Machine Learning (cs.LG)
[606]  arXiv:2402.00759 (replaced) [pdf, other]
Title: Building Expressive and Tractable Probabilistic Generative Models: A Review
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[607]  arXiv:2402.01156 (replaced) [pdf, other]
Title: An Empirical Study on Low Code Programming using Traditional vs Large Language Model Support
Subjects: Software Engineering (cs.SE)
[608]  arXiv:2402.01287 (replaced) [pdf, other]
Title: Spiking CenterNet: A Distillation-boosted Spiking Neural Network for Object Detection
Comments: 8 pages, 5 figures. Accepted at IJCNN 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
[609]  arXiv:2402.01344 (replaced) [pdf, other]
Title: Monotone, Bi-Lipschitz, and Polyak-Lojasiewicz Networks
Comments: International Conference on Machine Learning, Vienna, Austria, July 21 -- 17, 2024
Subjects: Machine Learning (cs.LG)
[610]  arXiv:2402.01501 (replaced) [pdf, ps, other]
Title: Satisfiability Modulo Exponential Integer Arithmetic
Subjects: Logic in Computer Science (cs.LO)
[611]  arXiv:2402.02500 (replaced) [pdf, other]
Title: Point Cloud Matters: Rethinking the Impact of Different Observation Spaces on Robot Learning
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[612]  arXiv:2402.03141 (replaced) [pdf, other]
Title: Boosting Reinforcement Learning with Strongly Delayed Feedback Through Auxiliary Short Delays
Comments: ICML 2024
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Systems and Control (eess.SY)
[613]  arXiv:2402.03169 (replaced) [pdf, ps, other]
Title: A Random Matrix Approach to Low-Multilinear-Rank Tensor Approximation
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Probability (math.PR)
[614]  arXiv:2402.03412 (replaced) [pdf, other]
Title: See More Details: Efficient Image Super-Resolution by Experts Mining
Comments: Accepted at ICML 2024
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
[615]  arXiv:2402.03625 (replaced) [pdf, other]
Title: Convex Relaxations of ReLU Neural Networks Approximate Global Optima in Polynomial Time
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
[616]  arXiv:2402.03903 (replaced) [pdf, other]
Title: Averaging $n$-step Returns Reduces Variance in Reinforcement Learning
Comments: ICML 2024. 27 pages, 7 figures, 3 tables
Subjects: Machine Learning (cs.LG)
[617]  arXiv:2402.04356 (replaced) [pdf, other]
Title: Bidirectional Autoregressive Diffusion Model for Dance Generation
Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Audio and Speech Processing (eess.AS)
[618]  arXiv:2402.04407 (replaced) [pdf, ps, other]
Title: Sharp Lower Bounds on the Manifold Widths of Sobolev and Besov Spaces
Subjects: Numerical Analysis (math.NA)
[619]  arXiv:2402.04467 (replaced) [pdf, other]
Title: DySLIM: Dynamics Stable Learning by Invariant Measure for Chaotic Systems
Comments: ICML 2024; Code to reproduce our experiments is available at this https URL
Subjects: Machine Learning (cs.LG); Dynamical Systems (math.DS)
[620]  arXiv:2402.04610 (replaced) [pdf, other]
Title: Early Stopping of Untrained Convolutional Neural Networks
Authors: Tim Jahn, Bangti Jin
Subjects: Numerical Analysis (math.NA)
[621]  arXiv:2402.04621 (replaced) [pdf, other]
Title: Feature Distribution on Graph Topology Mediates the Effect of Graph Convolution: Homophily Perspective
Comments: published in ICML 2024
Subjects: Machine Learning (cs.LG)
[622]  arXiv:2402.04788 (replaced) [pdf, other]
Title: MLLM-as-a-Judge: Assessing Multimodal LLM-as-a-Judge with Vision-Language Benchmark
Comments: ICML 2024 (Oral)
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[623]  arXiv:2402.04997 (replaced) [pdf, other]
Title: Generative Flows on Discrete State-Spaces: Enabling Multimodal Flows with Applications to Protein Co-Design
Comments: 60 pages, 11 figures, 6 tables; ICML 2024
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Quantitative Methods (q-bio.QM)
[624]  arXiv:2402.06031 (replaced) [pdf, other]
Title: An operator learning perspective on parameter-to-observable maps
Comments: 63 pages, 10 figures, 1 table
Subjects: Machine Learning (cs.LG); Statistics Theory (math.ST); Machine Learning (stat.ML)
[625]  arXiv:2402.06700 (replaced) [pdf, other]
Title: Entropy-Regularized Token-Level Policy Optimization for Language Agent Reinforcement
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[626]  arXiv:2402.06733 (replaced) [pdf, other]
Title: NICE: To Optimize In-Context Examples or Not?
Comments: Accepted as a full paper (9 pages) at ACL 2024 (Main)
Journal-ref: Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics 2024 (Volume 1: Long Papers)
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[627]  arXiv:2402.06888 (replaced) [pdf, other]
Title: Analysis of Self-Supervised Speech Models on Children's Speech and Infant Vocalizations
Comments: Accepted to 2024 ICASSP Workshop of Self-supervision in Audio, Speech and Beyond (SASB)
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[628]  arXiv:2402.07214 (replaced) [pdf, other]
Title: Through the Lens of Split Vote: Exploring Disagreement, Difficulty and Calibration in Legal Case Outcome Classification
Subjects: Computation and Language (cs.CL)
[629]  arXiv:2402.07483 (replaced) [pdf, other]
Title: T-RAG: Lessons from the LLM Trenches
Comments: Added Needle in a Haystack analysis for T-RAG
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[630]  arXiv:2402.07640 (replaced) [pdf, other]
Title: CMFeed: A Benchmark Dataset for Controllable Multimodal Feedback Synthesis
Subjects: Multimedia (cs.MM); Artificial Intelligence (cs.AI)
[631]  arXiv:2402.07844 (replaced) [pdf, other]
Title: Mercury: A Code Efficiency Benchmark for LLM Code Synthesis
Subjects: Software Engineering (cs.SE); Computation and Language (cs.CL)
[632]  arXiv:2402.07891 (replaced) [pdf, other]
Title: Label-Efficient Model Selection for Text Generation
Comments: Accepted to ACL (main conference)
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
[633]  arXiv:2402.08595 (replaced) [pdf, other]
Title: Homomorphism Counts for Graph Neural Networks: All About That Basis
Comments: Proceedings of the Forty-First International Conference on Machine Learning (ICML 2024). Code available at: this https URL
Subjects: Machine Learning (cs.LG)
[634]  arXiv:2402.08876 (replaced) [pdf, other]
Title: DUDF: Differentiable Unsigned Distance Fields with Hyperbolic Scaling
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR)
[635]  arXiv:2402.09470 (replaced) [pdf, other]
Title: Rolling Diffusion Models
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[636]  arXiv:2402.10013 (replaced) [pdf, other]
Title: Bridging the Empirical-Theoretical Gap in Neural Network Formal Language Learning Using Minimum Description Length
Comments: 9 pages, 5 figures, 3 appendix pages
Subjects: Computation and Language (cs.CL); Formal Languages and Automata Theory (cs.FL)
[637]  arXiv:2402.10073 (replaced) [pdf, other]
Title: Both Matter: Enhancing the Emotional Intelligence of Large Language Models without Compromising the General Intelligence
Comments: To appear at Findings of ACL 2024
Subjects: Computation and Language (cs.CL)
[638]  arXiv:2402.10422 (replaced) [pdf, other]
Title: Pushing the Limits of Zero-shot End-to-End Speech Translation
Comments: ACL 2024 (Findings)
Subjects: Computation and Language (cs.CL)
[639]  arXiv:2402.10450 (replaced) [pdf, other]
Title: PRISE: LLM-Style Sequence Compression for Learning Temporal Action Abstractions in Control
Comments: Accepted at the Forty-first International Conference on Machine Learning (ICML 2024)
Subjects: Machine Learning (cs.LG)
[640]  arXiv:2402.10571 (replaced) [pdf, other]
Title: Direct Preference Optimization with an Offset
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[641]  arXiv:2402.10588 (replaced) [pdf, other]
Title: Do Llamas Work in English? On the Latent Language of Multilingual Transformers
Comments: 12 pages. 28 with appendix
Subjects: Computation and Language (cs.CL); Computers and Society (cs.CY)
[642]  arXiv:2402.10639 (replaced) [pdf, other]
Title: Generalizability of Mixture of Domain-Specific Adapters from the Lens of Signed Weight Directions and its Application to Effective Model Pruning
Authors: Tuc Nguyen, Thai Le
Comments: ACL Main 2024
Subjects: Computation and Language (cs.CL)
[643]  arXiv:2402.10727 (replaced) [pdf, other]
Title: Predictive Uncertainty Quantification via Risk Decompositions for Strictly Proper Scoring Rules
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[644]  arXiv:2402.10890 (replaced) [pdf, other]
Title: When is Tree Search Useful for LLM Planning? It Depends on the Discriminator
Comments: ACL 2024 main
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[645]  arXiv:2402.11138 (replaced) [pdf, other]
Title: Contrastive Instruction Tuning
Comments: ACL 2024 Findings
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[646]  arXiv:2402.11349 (replaced) [pdf, other]
Title: Language Models Don't Learn the Physical Manifestation of Language
Comments: ACL 2024 Main
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[647]  arXiv:2402.11463 (replaced) [pdf, other]
Title: Attractor Memory for Long-Term Time Series Forecasting: A Chaos Perspective
Comments: arXiv admin note: text overlap with arXiv:nlin/0307015 by other authors
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Chaotic Dynamics (nlin.CD)
[648]  arXiv:2402.11485 (replaced) [pdf, other]
Title: LEIA: Facilitating Cross-lingual Knowledge Transfer in Language Models with Entity-based Data Augmentation
Comments: ACL Findings 2024
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[649]  arXiv:2402.11517 (replaced) [pdf, other]
Title: Knowledge-to-SQL: Enhancing SQL Generation with Data Expert LLM
Comments: Accepted to ACL2024 Findings
Subjects: Computation and Language (cs.CL)
[650]  arXiv:2402.11548 (replaced) [pdf, other]
Title: KMMLU: Measuring Massive Multitask Language Understanding in Korean
Comments: Under Review
Subjects: Computation and Language (cs.CL)
[651]  arXiv:2402.11597 (replaced) [pdf, other]
Title: Multi-Task Inference: Can Large Language Models Follow Multiple Instructions at Once?
Comments: acl 2024 (main)
Subjects: Computation and Language (cs.CL)
[652]  arXiv:2402.11674 (replaced) [pdf, other]
Title: A Fast Algorithm to Simulate Nonlinear Resistive Networks
Comments: ICML 2024
Subjects: Emerging Technologies (cs.ET); Disordered Systems and Neural Networks (cond-mat.dis-nn); Machine Learning (cs.LG)
[653]  arXiv:2402.11740 (replaced) [pdf, ps, other]
Title: Extraction of nonlinearity in neural networks with Koopman operator
Comments: 22 pages, 14 figures
Subjects: Machine Learning (cs.LG)
[654]  arXiv:2402.11894 (replaced) [pdf, other]
Title: Automating Dataset Updates Towards Reliable and Timely Evaluation of Large Language Models
Subjects: Computation and Language (cs.CL)
[655]  arXiv:2402.12343 (replaced) [pdf, other]
Title: Emulated Disalignment: Safety Alignment for Large Language Models May Backfire!
Comments: ACL 2024
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[656]  arXiv:2402.12424 (replaced) [pdf, other]
Title: Tables as Texts or Images: Evaluating the Table Reasoning Ability of LLMs and MLLMs
Comments: Accepted to ACL 2024 Findings
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
[657]  arXiv:2402.12451 (replaced) [pdf, other]
Title: The Revolution of Multimodal Large Language Models: A Survey
Comments: ACL 2024 (Findings)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Multimedia (cs.MM)
[658]  arXiv:2402.12621 (replaced) [pdf, other]
Title: Reflect-RL: Two-Player Online RL Fine-Tuning for LMs
Comments: ACL 2024
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)
[659]  arXiv:2402.12691 (replaced) [pdf, other]
Title: Tree-Planted Transformers: Unidirectional Transformer Language Models with Implicit Syntactic Supervision
Comments: Accepted by ACL 2024 (Findings)
Subjects: Computation and Language (cs.CL)
[660]  arXiv:2402.12991 (replaced) [pdf, other]
Title: TRAP: Targeted Random Adversarial Prompt Honeypot for Black-Box Identification
Comments: Accepted at ACL 2024 (findings)
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Cryptography and Security (cs.CR)
[661]  arXiv:2402.13212 (replaced) [pdf, other]
Title: Soft Self-Consistency Improves Language Model Agents
Comments: ACL 2024 Camera-Ready, the first three authors contributed equally; Code: this https URL
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[662]  arXiv:2402.13874 (replaced) [pdf, other]
Title: $Se^2$: Sequential Example Selection for In-Context Learning
Comments: Accepted by ACL 2024 Findings
Subjects: Computation and Language (cs.CL)
[663]  arXiv:2402.14008 (replaced) [pdf, other]
Title: OlympiadBench: A Challenging Benchmark for Promoting AGI with Olympiad-Level Bilingual Multimodal Scientific Problems
Comments: Accepted by ACL 2024 (main), update
Subjects: Computation and Language (cs.CL)
[664]  arXiv:2402.14116 (replaced) [pdf, other]
Title: FanOutQA: A Multi-Hop, Multi-Document Question Answering Benchmark for Large Language Models
Comments: 18 pages, 2 figures. ACL 2024
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[665]  arXiv:2402.14298 (replaced) [pdf, other]
Title: Multi-modal Stance Detection: New Datasets and Model
Comments: ACL'24 Findings
Subjects: Computation and Language (cs.CL)
[666]  arXiv:2402.14328 (replaced) [pdf, other]
Title: Understanding and Patching Compositional Reasoning in LLMs
Comments: Accepted by ACL'2024 Findings
Subjects: Computation and Language (cs.CL)
[667]  arXiv:2402.14490 (replaced) [pdf, other]
Title: Imbalanced Data Clustering using Equilibrium K-Means
Authors: Yudong He
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[668]  arXiv:2402.14569 (replaced) [pdf, other]
Title: Transformable Gaussian Reward Function for Socially-Aware Navigation with Deep Reinforcement Learning
Comments: 22 pages, 9 figures
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)
[669]  arXiv:2402.14979 (replaced) [pdf, other]
Title: Optimizing Language Models for Human Preferences is a Causal Inference Problem
Comments: UAI 2024
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Methodology (stat.ME)
[670]  arXiv:2402.15082 (replaced) [pdf, other]
Title: PEMT: Multi-Task Correlation Guided Mixture-of-Experts Enables Parameter-Efficient Transfer Learning
Comments: Accepted to Findings of the ACL 2024
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
[671]  arXiv:2402.15332 (replaced) [pdf, ps, other]
Title: Position: Categorical Deep Learning is an Algebraic Theory of All Architectures
Comments: To appear in ICML 2024. Comments welcome. More info at categoricaldeeplearning.com
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Category Theory (math.CT); Rings and Algebras (math.RA); Machine Learning (stat.ML)
[672]  arXiv:2402.15392 (replaced) [pdf, ps, other]
Title: Offline Inverse RL: New Solution Concepts and Provably Efficient Algorithms
Comments: International Conference on Machine Learning 41 (ICML 2024)
Subjects: Machine Learning (cs.LG)
[673]  arXiv:2402.15637 (replaced) [pdf, other]
Title: Addressing Order Sensitivity of In-Context Demonstration Examples in Causal Language Models
Subjects: Computation and Language (cs.CL)
[674]  arXiv:2402.15838 (replaced) [pdf, other]
Title: ListT5: Listwise Reranking with Fusion-in-Decoder Improves Zero-shot Retrieval
Comments: Accepted to ACL 2024 main (long)
Subjects: Information Retrieval (cs.IR)
[675]  arXiv:2402.16438 (replaced) [pdf, other]
Title: Language-Specific Neurons: The Key to Multilingual Capabilities in Large Language Models
Comments: Accepted by ACL 2024
Subjects: Computation and Language (cs.CL)
[676]  arXiv:2402.16775 (replaced) [pdf, other]
Title: A Comprehensive Evaluation of Quantization Strategies for Large Language Models
Comments: ACL 2024 Findings
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[677]  arXiv:2402.17120 (replaced) [pdf, other]
Title: LCEN: A Novel Feature Selection Algorithm for Nonlinear, Interpretable Machine Learning Models
Subjects: Machine Learning (cs.LG)
[678]  arXiv:2402.17316 (replaced) [pdf, other]
Title: Towards Robust and Efficient Cloud-Edge Elastic Model Adaptation via Selective Entropy Distillation
Comments: Published in ICLR 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[679]  arXiv:2402.17447 (replaced) [pdf, other]
Title: Deep Learning Based Named Entity Recognition Models for Recipes
Comments: 13 pages, 6 main figures and 2 in appendices, and 3 main tables; Accepted for publication in LREC-COLING 2024
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
[680]  arXiv:2402.17641 (replaced) [pdf, other]
Title: Variational Learning is Effective for Large Deep Networks
Comments: Published at International Conference on Machine Learning (ICML), 2024. The first two authors contributed equally. Code is available here: this https URL
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Optimization and Control (math.OC); Machine Learning (stat.ML)
[681]  arXiv:2402.18059 (replaced) [pdf, other]
Title: Token-Specific Watermarking with Enhanced Detectability and Semantic Coherence for Large Language Models
Comments: 22 pages, 13 figures, 5 tables
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Cryptography and Security (cs.CR)
[682]  arXiv:2402.18158 (replaced) [pdf, other]
Title: Evaluating Quantized Large Language Models
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[683]  arXiv:2402.18334 (replaced) [pdf, other]
Title: Learning to Generate Instruction Tuning Datasets for Zero-Shot Task Adaptation
Comments: ACL Findings 2024
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
[684]  arXiv:2403.00720 (replaced) [pdf, other]
Title: Subhomogeneous Deep Equilibrium Models
Subjects: Machine Learning (cs.LG); Numerical Analysis (math.NA); Optimization and Control (math.OC)
[685]  arXiv:2403.01165 (replaced) [pdf, other]
Title: STAR: Constraint LoRA with Dynamic Active Learning for Data-Efficient Fine-Tuning of Large Language Models
Comments: Accepted by ACL2024(Findings)
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[686]  arXiv:2403.01166 (replaced) [pdf, other]
Title: DINER: Debiasing Aspect-based Sentiment Analysis with Multi-variable Causal Inference
Comments: Accepted by ACL2024(Findings)
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[687]  arXiv:2403.01931 (replaced) [pdf, other]
Title: VariErr NLI: Separating Annotation Error from Human Label Variation
Comments: 14 pages, accepted at ACL 2024 main
Subjects: Computation and Language (cs.CL)
[688]  arXiv:2403.02271 (replaced) [pdf, other]
Title: RIFF: Learning to Rephrase Inputs for Few-shot Fine-tuning of Language Models
Comments: Final Version (Findings of ACL2024)
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
[689]  arXiv:2403.02354 (replaced) [pdf, other]
Title: Spatio-Temporal Field Neural Networks for Air Quality Inference
Comments: We want to recheck our model and experimental design
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[690]  arXiv:2403.02437 (replaced) [pdf, other]
Title: SoK: Challenges and Opportunities in Federated Unlearning
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC)
[691]  arXiv:2403.02451 (replaced) [pdf, other]
Title: Views Are My Own, but Also Yours: Benchmarking Theory of Mind Using Common Ground
Subjects: Computation and Language (cs.CL)
[692]  arXiv:2403.02660 (replaced) [pdf, other]
Title: A randomized lattice rule without component-by-component construction
Authors: Takashi Goda
Comments: revision, 21 pages, 3 figures
Subjects: Numerical Analysis (math.NA)
[693]  arXiv:2403.02977 (replaced) [pdf, other]
Title: Fast Iterative Region Inflation for Computing Large 2-D/3-D Convex Regions of Obstacle-Free Space
Subjects: Robotics (cs.RO)
[694]  arXiv:2403.03129 (replaced) [pdf, other]
Title: CoGenesis: A Framework Collaborating Large and Small Language Models for Secure Context-Aware Instruction Following
Comments: Accepted to ACL 2024 (Main Conference)
Subjects: Computation and Language (cs.CL)
[695]  arXiv:2403.03167 (replaced) [pdf, other]
Title: PARADISE: Evaluating Implicit Planning Skills of Language Models with Procedural Warnings and Tips Dataset
Comments: 9 pages, ACL 2024 Findings
Subjects: Computation and Language (cs.CL)
[696]  arXiv:2403.03234 (replaced) [pdf, other]
Title: Caduceus: Bi-Directional Equivariant Long-Range DNA Sequence Modeling
Comments: ICML 2024; Code to reproduce our experiments is available at this https URL
Subjects: Genomics (q-bio.GN); Machine Learning (cs.LG)
[697]  arXiv:2403.04346 (replaced) [pdf, ps, other]
Title: BrainKnow -- Extracting, Linking, and Synthesizing Neuroscience Knowledge
Comments: 22 pages, 7 figures
Subjects: Digital Libraries (cs.DL); Neurons and Cognition (q-bio.NC)
[698]  arXiv:2403.05535 (replaced) [pdf, other]
Title: Tell, Don't Show!: Language Guidance Eases Transfer Across Domains in Images and Videos
Comments: ICML 2024 Camera-Ready. Project Page and Code: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[699]  arXiv:2403.06189 (replaced) [pdf, other]
Title: Harmonious Group Choreography with Trajectory-Controllable Diffusion
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[700]  arXiv:2403.06840 (replaced) [pdf, other]
Title: RA-ISF: Learning to Answer and Understand from Retrieval Augmentation via Iterative Self-Feedback
Comments: 20 pages, multiple figures. Providing second version RA-ISF
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[701]  arXiv:2403.06932 (replaced) [pdf, other]
Title: ERA-CoT: Improving Chain-of-Thought through Entity Relationship Analysis
Comments: 15 pages, second version of ERA-CoT
Subjects: Computation and Language (cs.CL)
[702]  arXiv:2403.07245 (replaced) [pdf, other]
Title: Dataset Condensation for Time Series Classification via Dual Domain Matching
Comments: Accepted by KDD 2024 research track
Subjects: Machine Learning (cs.LG)
[703]  arXiv:2403.07723 (replaced) [pdf, ps, other]
Title: On the Last-Iterate Convergence of Shuffling Gradient Methods
Comments: ICML 2024
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
[704]  arXiv:2403.07746 (replaced) [pdf, other]
Title: Unleashing HyDRa: Hybrid Fusion, Depth Consistency and Radar for Unified 3D Perception
Comments: 10 pages, 4 figures Added eval on VoD
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[705]  arXiv:2403.07974 (replaced) [pdf, other]
Title: LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code
Comments: Website - this https URL
Subjects: Software Engineering (cs.SE); Computation and Language (cs.CL); Machine Learning (cs.LG)
[706]  arXiv:2403.09347 (replaced) [pdf, other]
Title: BurstAttention: An Efficient Distributed Attention Framework for Extremely Long Sequences
Comments: 13 pages, 7 figures
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)
[707]  arXiv:2403.09871 (replaced) [pdf, other]
Title: ThermoHands: A Benchmark for 3D Hand Pose Estimation from Egocentric Thermal Images
Comments: 15 pages, 6 figures, 4 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
[708]  arXiv:2403.10081 (replaced) [pdf, other]
Title: DRAGIN: Dynamic Retrieval Augmented Generation based on the Information Needs of Large Language Models
Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)
[709]  arXiv:2403.13169 (replaced) [pdf, other]
Title: Wav2Gloss: Generating Interlinear Glossed Text from Speech
Comments: ACL 2024 camera ready version
Subjects: Computation and Language (cs.CL)
[710]  arXiv:2403.13872 (replaced) [pdf, other]
Title: Spatial-Temporal Graph Representation Learning for Tactical Networks Future State Prediction
Subjects: Machine Learning (cs.LG); Social and Information Networks (cs.SI)
[711]  arXiv:2403.15097 (replaced) [pdf, other]
Title: Argument-Aware Approach To Event Linking
Comments: Paper accepted by ACL-findings 2024
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[712]  arXiv:2403.15191 (replaced) [pdf, other]
Title: VORTEX: Real-Time Off-Chain Payments and Cross-Chain Swaps for Cryptocurrencies
Subjects: Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC)
[713]  arXiv:2403.17270 (replaced) [pdf, other]
Title: Human Stress Response and Perceived Safety during Encounters with Quadruped Robots
Comments: 8 pages, 7 figs, 5 tables
Subjects: Robotics (cs.RO); Human-Computer Interaction (cs.HC)
[714]  arXiv:2403.17673 (replaced) [pdf, other]
Title: How Private are DP-SGD Implementations?
Comments: Proceedings of ICML 2024
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Data Structures and Algorithms (cs.DS)
[715]  arXiv:2403.18680 (replaced) [pdf, other]
Title: Non-Linear Inference Time Intervention: Improving LLM Truthfulness
Comments: Accepted on Interspeech 2024 Conference. Code is available at this https URL
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
[716]  arXiv:2403.18953 (replaced) [pdf, ps, other]
Title: Hybridizing Traditional and Next-Generation Reservoir Computing to Accurately and Efficiently Forecast Dynamical Systems
Comments: 12 pages, 7 figures
Journal-ref: Chaos 1 June 2024; 34 (6): 063114
Subjects: Machine Learning (cs.LG)
[717]  arXiv:2403.19223 (replaced) [pdf, ps, other]
Title: Computing large deviation rate functions of entropy production for diffusion processes by an interacting particle method
Subjects: Numerical Analysis (math.NA)
[718]  arXiv:2403.19260 (replaced) [pdf, other]
Title: NaijaHate: Evaluating Hate Speech Detection on Nigerian Twitter Using Representative Data
Comments: ACL 2024 main conference. Data and models available at this https URL
Subjects: Computation and Language (cs.CL)
[719]  arXiv:2403.19589 (replaced) [pdf, other]
Title: TOD3Cap: Towards 3D Dense Captioning in Outdoor Scenes
Comments: Code, data, and models are publicly available at this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[720]  arXiv:2404.00929 (replaced) [pdf, other]
Title: A Survey on Multilingual Large Language Models: Corpora, Alignment, and Bias
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[721]  arXiv:2404.05835 (replaced) [pdf, other]
Title: Parameter-Adaptive Approximate MPC: Tuning Neural-Network Controllers without Retraining
Comments: Accepted to L4DC 2024
Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG); Optimization and Control (math.OC)
[722]  arXiv:2404.09889 (replaced) [pdf, other]
Title: Is Table Retrieval a Solved Problem? Exploring Join-Aware Multi-Table Retrieval
Comments: ACL 2024 camera ready
Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[723]  arXiv:2404.10496 (replaced) [pdf, other]
Title: Spiral of Silences: How is Large Language Model Killing Information Retrieval? -- A Case Study on Open Domain Question Answering
Comments: Accepted to ACL2024
Subjects: Information Retrieval (cs.IR)
[724]  arXiv:2404.12464 (replaced) [pdf, other]
Title: NormAd: A Benchmark for Measuring the Cultural Adaptability of Large Language Models
Comments: Preprint. In Review
Subjects: Computation and Language (cs.CL)
[725]  arXiv:2404.13195 (replaced) [pdf, ps, other]
Title: Automatic BLAS Offloading on Unified Memory Architecture: A Study on NVIDIA Grace-Hopper
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
[726]  arXiv:2404.13874 (replaced) [pdf, other]
Title: VALOR-EVAL: Holistic Coverage and Faithfulness Evaluation of Large Vision-Language Models
Comments: ACL 2024 Findings
Subjects: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
[727]  arXiv:2404.13936 (replaced) [pdf, ps, other]
Title: A bound preserving cut discontinuous Galerkin method for one dimensional hyperbolic conservation laws
Comments: 32
Subjects: Numerical Analysis (math.NA)
[728]  arXiv:2404.14461 (replaced) [pdf, other]
Title: Competition Report: Finding Universal Jailbreak Backdoors in Aligned LLMs
Comments: Competition Report
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
[729]  arXiv:2404.14745 (replaced) [pdf, other]
Title: TAAT: Think and Act from Arbitrary Texts in Text2Motion
Comments: Updated errors in author information
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[730]  arXiv:2404.14964 (replaced) [pdf, other]
Title: Elucidating the theoretical underpinnings of surrogate gradient learning in spiking neural networks
Comments: 25 pages, 7 figures + 3 supplementary figures
Subjects: Neural and Evolutionary Computing (cs.NE); Neurons and Cognition (q-bio.NC)
[731]  arXiv:2404.15004 (replaced) [pdf, other]
Title: TAXI: Evaluating Categorical Knowledge Editing for Language Models
Comments: Accepted to ACL 2024 (Findings)
Subjects: Computation and Language (cs.CL)
[732]  arXiv:2404.15522 (replaced) [pdf, other]
Title: LogicBench: Towards Systematic Evaluation of Logical Reasoning Ability of Large Language Models
Comments: Accepted at ACL(Main) 2024 | First version available @ this https URL
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[733]  arXiv:2404.15611 (replaced) [pdf, other]
Title: Model Poisoning Attacks to Federated Learning via Multi-Round Consistency
Subjects: Cryptography and Security (cs.CR)
[734]  arXiv:2404.16363 (replaced) [pdf, other]
Title: Byzantine Attacks Exploiting Penalties in Ethereum PoS
Subjects: Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC)
[735]  arXiv:2404.16966 (replaced) [pdf, other]
Title: Examining the robustness of LLM evaluation to the distributional assumptions of benchmarks
Subjects: Computation and Language (cs.CL)
[736]  arXiv:2404.17140 (replaced) [pdf, other]
Title: Small Language Models Need Strong Verifiers to Self-Correct Reasoning
Comments: ACL Findings 2024 - Camera Ready
Subjects: Computation and Language (cs.CL)
[737]  arXiv:2405.00301 (replaced) [pdf, other]
Title: Enhanced Language Model Truthfulness with Learnable Intervention and Uncertainty Expression
Comments: 13 pages, 5 figures
Subjects: Computation and Language (cs.CL)
[738]  arXiv:2405.00892 (replaced) [pdf, other]
Title: Wake Vision: A Large-scale, Diverse Dataset and Benchmark Suite for TinyML Person Detection
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[739]  arXiv:2405.00899 (replaced) [pdf, other]
Title: Characterising the Creative Process in Humans and Large Language Models
Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Neurons and Cognition (q-bio.NC)
[740]  arXiv:2405.02492 (replaced) [pdf, other]
Title: Investigating the Generalizability of Assistive Robots Models over Various Tasks
Comments: Accepted to 2024 21st International Conference on Ubiquitous Robots (UR)
Subjects: Robotics (cs.RO)
[741]  arXiv:2405.02664 (replaced) [pdf, other]
Title: MedPromptExtract (Medical Data Extraction Tool): Anonymization and Hi-fidelity Automated data extraction using NLP and prompt engineering
Comments: 4 pages, 3 figures, pre-print sumitted to CIKM 2024
Subjects: Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
[742]  arXiv:2405.03035 (replaced) [pdf, other]
Title: Probabilistic Finite Automaton Emptiness is undecidable
Authors: Günter Rote
Comments: 63 pages, 14 figures, 2 tables, 53 footnotes, 11 sections plus 1 appendix. Added another proof and more history, which had been overlooked before
Subjects: Formal Languages and Automata Theory (cs.FL)
[743]  arXiv:2405.03064 (replaced) [pdf, other]
Title: RICE: Breaking Through the Training Bottlenecks of Reinforcement Learning with Explanation
Comments: Accepted by ICML 2024
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)
[744]  arXiv:2405.04061 (replaced) [pdf, other]
Title: Generalized Cauchy-Schwarz Divergence and Its Deep Learning Applications
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[745]  arXiv:2405.04776 (replaced) [pdf, other]
Title: Chain of Thoughtlessness? An Analysis of CoT in Planning
Subjects: Artificial Intelligence (cs.AI)
[746]  arXiv:2405.05847 (replaced) [pdf, other]
Title: Learned feature representations are biased by complexity, learning order, position, and more
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
[747]  arXiv:2405.07460 (replaced) [pdf, other]
Title: HoneyBee: A Scalable Modular Framework for Creating Multimodal Oncology Datasets with Foundational Embedding Models
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Databases (cs.DB)
[748]  arXiv:2405.07536 (replaced) [pdf, other]
Title: Multi-AUV Kinematic Task Assignment based on Self-organizing Map Neural Network and Dubins Path Generator
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
[749]  arXiv:2405.09005 (replaced) [pdf, other]
Title: Cons-training tensor networks
Comments: v2: mostly improved Fig 1 and 13 for clarity, improved exposition of ideas, and fixed a couple of transcription bugs in the pseudo algo. 3
Subjects: Numerical Analysis (math.NA); Machine Learning (cs.LG); Quantum Physics (quant-ph)
[750]  arXiv:2405.09482 (replaced) [pdf, other]
Title: Beyond Flesch-Kincaid: Prompt-based Metrics Improve Difficulty Classification of Educational Texts
Subjects: Computation and Language (cs.CL)
[751]  arXiv:2405.10150 (replaced) [pdf, other]
Title: Speaker Verification in Agent-Generated Conversations
Subjects: Computation and Language (cs.CL)
[752]  arXiv:2405.10467 (replaced) [pdf, other]
Title: Agent Design Pattern Catalogue: A Collection of Architectural Patterns for Foundation Model based Agents
Subjects: Artificial Intelligence (cs.AI); Software Engineering (cs.SE)
[753]  arXiv:2405.10517 (replaced) [pdf, other]
Title: Towards Better Question Generation in QA-based Event Extraction
Authors: Zijin Hong, Jian Liu
Comments: Accepted to ACL2024 Findings
Subjects: Computation and Language (cs.CL)
[754]  arXiv:2405.11684 (replaced) [pdf, other]
Title: Learning Regularities from Data using Spiking Functions: A Theory
Subjects: Machine Learning (cs.LG); Information Theory (cs.IT)
[755]  arXiv:2405.11876 (replaced) [pdf, other]
Title: Understanding crypter-as-a-service in a popular underground marketplace
Comments: A short version of this paper was accepted at the 6th Workshop on Attackers and Cyber-Crime Operations (WACCO)
Subjects: Cryptography and Security (cs.CR)
[756]  arXiv:2405.11968 (replaced) [pdf, other]
Title: Conditional Shift-Robust Conformal Prediction for Graph Neural Network
Authors: S. Akansha
Comments: 14 pages, 2 figures, 3 tables
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[757]  arXiv:2405.12684 (replaced) [pdf, other]
Title: Model Free Prediction with Uncertainty Assessment
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[758]  arXiv:2405.13034 (replaced) [pdf, other]
Title: Autonomous Workflow for Multimodal Fine-Grained Training Assistants Towards Mixed Reality
Comments: Accepted by ACL 2024
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
[759]  arXiv:2405.13753 (replaced) [pdf, other]
Title: A Dynamic Model of Performative Human-ML Collaboration: Theory and Empirical Evidence
Comments: 9 Pages and appendix
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); General Economics (econ.GN)
[760]  arXiv:2405.13902 (replaced) [pdf, other]
Title: LOGIN: A Large Language Model Consulted Graph Neural Network Training Framework
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[761]  arXiv:2405.14108 (replaced) [pdf, other]
Title: Deep Learning for Protein-Ligand Docking: Are We There Yet?
Comments: 30 pages, 1 table, 27 figures. Under review. Code, data, tutorials, and benchmark results are available at this https URL
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Biomolecules (q-bio.BM); Quantitative Methods (q-bio.QM)
[762]  arXiv:2405.14156 (replaced) [pdf, other]
Title: Unveiling the Tapestry of Consistency in Large Vision-Language Models
Comments: This project is available at this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[763]  arXiv:2405.15671 (replaced) [pdf, other]
Title: The Undecidability of Quantified Announcements
Comments: This paper contains a correction to the 2016 article, The Undecidablity of Quantified Announcements, published in Studia Logica
Journal-ref: The undecidability of quantified announcements. Studia Logica, 104(4) pages 597-640, 2016
Subjects: Logic in Computer Science (cs.LO)
[764]  arXiv:2405.15769 (replaced) [pdf, other]
Title: FastDrag: Manipulate Anything in One Step
Comments: 13 pages, 13 figures, Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[765]  arXiv:2405.16225 (replaced) [pdf, ps, other]
Title: Local Causal Structure Learning in the Presence of Latent Variables
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[766]  arXiv:2405.16488 (replaced) [pdf, ps, other]
Title: Partial train and isolate, mitigate backdoor attack
Authors: Yong Li, Han Gao
Comments: 9 pages, 2 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[767]  arXiv:2405.16526 (replaced) [pdf, other]
Title: Past, Present, and Future of Citation Practices in HCI
Subjects: Human-Computer Interaction (cs.HC); Computers and Society (cs.CY); Digital Libraries (cs.DL)
[768]  arXiv:2405.16849 (replaced) [pdf, other]
Title: Sync4D: Video Guided Controllable Dynamics for Physics-Based 4D Generation
Comments: Our project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[769]  arXiv:2405.17234 (replaced) [pdf, other]
Title: Benchmarking General Purpose In-Context Learning
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[770]  arXiv:2405.17272 (replaced) [pdf, other]
Title: DPN: Decoupling Partition and Navigation for Neural Solvers of Min-max Vehicle Routing Problems
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[771]  arXiv:2405.17345 (replaced) [pdf, other]
Title: Exploring and steering the moral compass of Large Language Models
Authors: Alejandro Tlaie
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[772]  arXiv:2405.17398 (replaced) [pdf, other]
Title: Vista: A Generalizable Driving World Model with High Fidelity and Versatile Controllability
Comments: Code and model: this https URL, video demos: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[773]  arXiv:2405.17814 (replaced) [pdf, other]
Title: FAIntbench: A Holistic and Precise Benchmark for Bias Evaluation in Text-to-Image Models
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[774]  arXiv:2405.18353 (replaced) [pdf, other]
Title: Simulating infinite-dimensional nonlinear diffusion bridges
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[775]  arXiv:2405.18457 (replaced) [pdf, other]
Title: Improving Linear System Solvers for Hyperparameter Optimisation in Iterative Gaussian Processes
Comments: Preprint. arXiv admin note: text overlap with arXiv:2405.18328
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[776]  arXiv:2405.18860 (replaced) [pdf, other]
Title: Empowering Embodied Manipulation: A Bimanual-Mobile Robot Manipulation Dataset for Household Tasks
Subjects: Robotics (cs.RO)
[777]  arXiv:2405.18942 (replaced) [pdf, other]
Title: Verifiably Robust Conformal Prediction
Subjects: Logic in Computer Science (cs.LO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[778]  arXiv:2405.19732 (replaced) [pdf, other]
Title: Two Optimizers Are Better Than One: LLM Catalyst Empowers Gradient-Based Optimization for Prompt Tuning
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Machine Learning (cs.LG)
[779]  arXiv:2405.19944 (replaced) [pdf, ps, other]
Title: Discrete-Time I&I Adaptive Interconnection and Damping Passivity-Based Control for Nonlinearly Parameterized Port-Controlled Hamiltonian Systems
Comments: 31 pages, 9 figures
Subjects: Systems and Control (eess.SY)
[780]  arXiv:2405.20172 (replaced) [pdf, other]
Title: Iterative Feature Boosting for Explainable Speech Emotion Recognition
Comments: Published in: 2023 International Conference on Machine Learning and Applications (ICMLA)
Journal-ref: 2023 International Conference on Machine Learning and Applications (ICMLA), Jacksonville, FL, USA, 2023, pp. 543-549
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[781]  arXiv:2405.20250 (replaced) [pdf, ps, other]
Title: Entropy annealing for policy mirror descent in continuous time and space
Subjects: Optimization and Control (math.OC); Machine Learning (cs.LG); Probability (math.PR)
[782]  arXiv:2405.20267 (replaced) [pdf, other]
Title: Auto Arena of LLMs: Automating LLM Evaluations with Agent Peer-battles and Committee Discussions
Subjects: Computation and Language (cs.CL)
[783]  arXiv:2405.20607 (replaced) [pdf, other]
Title: Textual Inversion and Self-supervised Refinement for Radiology Report Generation
Comments: This paper has been early accepted by MICCAI 2024!
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[784]  arXiv:2405.20703 (replaced) [pdf, other]
Title: It is Simple Sometimes: A Study On Improving Aspect-Based Sentiment Analysis Performance
Comments: Accepted to ACL 2024 Findings
Subjects: Computation and Language (cs.CL)
[785]  arXiv:2405.20988 (replaced) [pdf, other]
Title: Communication-Efficient Distributed Deep Learning via Federated Dynamic Averaging
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)
[786]  arXiv:2406.00083 (replaced) [pdf, other]
Title: BadRAG: Identifying Vulnerabilities in Retrieval Augmented Generation of Large Language Models
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Information Retrieval (cs.IR); Machine Learning (cs.LG)
[787]  arXiv:2406.00199 (replaced) [pdf, ps, other]
Title: Exfiltration of personal information from ChatGPT via prompt injection
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computers and Society (cs.CY); Emerging Technologies (cs.ET)
[788]  arXiv:2406.00252 (replaced) [pdf, other]
Title: Multi-Modal and Multi-Agent Systems Meet Rationality: A Survey
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Multiagent Systems (cs.MA)
[789]  arXiv:2406.00307 (replaced) [pdf, other]
Title: HENASY: Learning to Assemble Scene-Entities for Egocentric Video-Language Model
Comments: under submission
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[790]  arXiv:2406.00329 (replaced) [pdf, other]
Title: Whole Heart 3D+T Representation Learning Through Sparse 2D Cardiac MR Images
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[791]  arXiv:2406.00670 (replaced) [pdf, other]
Title: Cascade-CLIP: Cascaded Vision-Language Embeddings Alignment for Zero-Shot Semantic Segmentation
Comments: Accepted by ICML 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[792]  arXiv:2406.00702 (replaced) [pdf, ps, other]
Title: Enhanced Classification of Heart Sounds Using Mel Frequency Cepstral Coefficients: A Comparative Study of Single and Ensemble Classifier Strategies
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[793]  arXiv:2406.00773 (replaced) [pdf, other]
Title: Diffusion Tuning: Transferring Diffusion Models via Chain of Forgetting
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
[794]  arXiv:2406.00907 (replaced) [pdf, other]
Title: DDA: Dimensionality Driven Augmentation Search for Contrastive Learning in Laparoscopic Surgery
Comments: 29 pages, 16 figures; MIDL 2024 - Medical Imaging with Deep Learning
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[795]  arXiv:2406.01026 (replaced) [pdf, other]
Title: Strengthened Symbol Binding Makes Large Language Models Reliable Multiple-Choice Selectors
Comments: Accept at ACL2024 Main
Journal-ref: ACL 2024
Subjects: Computation and Language (cs.CL)
[796]  arXiv:2406.01057 (replaced) [pdf, other]
Title: Knapsack with Vertex Cover, Set Cover, and Hitting Set
Subjects: Data Structures and Algorithms (cs.DS); Computational Complexity (cs.CC)
[797]  arXiv:2406.01133 (replaced) [pdf, ps, other]
Title: Impact of Generative AI (Large Language Models) on the PRA model construction and maintenance, observations
Authors: Valentin Rychkov (EDF R\&D), Claudia Picoco (EDF R\&D), Emilie Caleca (EDF R\&D)
Subjects: Performance (cs.PF)
[798]  arXiv:2406.01349 (replaced) [pdf, other]
Title: Unleashing Generalization of End-to-End Autonomous Driving with Controllable Long Video Generation
Comments: Project Page: this https URL, 8 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[799]  arXiv:2406.01392 (replaced) [pdf, other]
Title: Sparsity-Accelerated Training for Large Language Models
Comments: Accepted to ACL 2024 Findings
Subjects: Computation and Language (cs.CL)
[800]  arXiv:2406.01425 (replaced) [pdf, other]
Title: Sensitivity-Informed Augmentation for Robust Segmentation
Comments: 10 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[801]  arXiv:2406.01514 (replaced) [pdf, other]
Title: Decoupled Alignment for Robust Plug-and-Play Adaptation
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)
[802]  arXiv:2406.01548 (replaced) [pdf, other]
Title: How to discretize continuous state-action spaces in Q-learning: A symbolic control approach
Comments: Q-learning, Symbolic control, Abstraction
Subjects: Systems and Control (eess.SY); Artificial Intelligence (cs.AI); Dynamical Systems (math.DS)
[803]  arXiv:2406.01624 (replaced) [pdf, other]
Title: Unveiling Hidden Factors: Explainable AI for Feature Boosting in Speech Emotion Recognition
Comments: Published in: Springer Nature International Journal of Applied Intelligence (2024)
Journal-ref: Applied Intelligence (2024), 1-24
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[804]  arXiv:2406.01799 (replaced) [pdf, other]
Title: Online Control in Population Dynamics
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
[805]  arXiv:2406.01852 (replaced) [pdf, other]
Title: Non-uniformity is All You Need: Efficient and Timely Encrypted Traffic Classification With ECHO
Subjects: Networking and Internet Architecture (cs.NI); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
[806]  arXiv:2406.01900 (replaced) [pdf, other]
Title: Follow-Your-Emoji: Fine-Controllable and Expressive Freestyle Portrait Animation
Comments: Project Page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[807]  arXiv:2406.01908 (replaced) [pdf, other]
Title: PDHG-Unrolled Learning-to-Optimize Method for Large-Scale Linear Programming
Comments: Accepted by ICML 2024
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
[808]  arXiv:2406.02004 (replaced) [pdf, ps, other]
Title: Efficiently Train ASR Models that Memorize Less and Perform Better with Per-core Clipping
Comments: Accepted to Interspeech'24
Subjects: Cryptography and Security (cs.CR); Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[809]  arXiv:2406.02061 (replaced) [pdf, other]
Title: Alice in Wonderland: Simple Tasks Showing Complete Reasoning Breakdown in State-Of-the-Art Large Language Models
Comments: v1.1
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[810]  arXiv:2406.02126 (replaced) [pdf, other]
Title: CityLight: A Universal Model Towards Real-world City-scale Traffic Signal Control Coordination
Subjects: Systems and Control (eess.SY); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multiagent Systems (cs.MA)
[811]  arXiv:2406.02169 (replaced) [src]
Title: A multilingual dataset for offensive language and hate speech detection for hausa, yoruba and igbo languages
Comments: The experimental result was erroneously reported and we also omitted other authors
Subjects: Computation and Language (cs.CL)
[812]  arXiv:2406.02265 (replaced) [pdf, other]
Title: Understanding Retrieval Robustness for Retrieval-Augmented Image Captioning
Comments: 9 pages, long paper at ACL 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
[813]  arXiv:2406.02290 (replaced) [pdf, other]
Title: A Study of Optimizations for Fine-tuning Large Language Models
Comments: 10 pages, 4 figures. Revised text for clarity, updated references
Subjects: Machine Learning (cs.LG)
[814]  arXiv:2406.02343 (replaced) [pdf, other]
Title: Cluster-Aware Similarity Diffusion for Instance Retrieval
Comments: This paper has been accepted by ICML2024
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
[815]  arXiv:2406.02347 (replaced) [pdf, other]
Title: Flash Diffusion: Accelerating Any Conditional Diffusion Model for Few Steps Image Generation
Comments: 16 pages + 16 pages appendices
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[816]  arXiv:2406.02381 (replaced) [pdf, other]
Title: Kirigami: large convolutional kernels improve deep learning-based RNA secondary structure prediction
Comments: -Updated authorship and acknowledgements
Subjects: Biomolecules (q-bio.BM); Artificial Intelligence (cs.AI)
[817]  arXiv:2406.02541 (replaced) [pdf, other]
Title: Enhancing Temporal Consistency in Video Editing by Reconstructing Videos with 3D Gaussian Splatting
Comments: Project page at this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[818]  arXiv:2406.02614 (replaced) [pdf, other]
Title: Frequency Enhanced Pre-training for Cross-city Few-shot Traffic Forecasting
Comments: Accepted by ECMLPKDD 2024 (Research Track)
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[819]  arXiv:2406.02616 (replaced) [pdf, other]
Title: Adaptive Layer Splitting for Wireless LLM Inference in Edge Computing: A Model-Based Reinforcement Learning Approach
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[820]  arXiv:2406.02624 (replaced) [pdf, other]
Title: Take a Step Further: Understanding Page Spray in Linux Kernel Exploitation
Subjects: Cryptography and Security (cs.CR); Software Engineering (cs.SE)
[821]  arXiv:2406.02749 (replaced) [pdf, other]
Title: Efficient Leverage Score Sampling for Tensor Train Decomposition
Subjects: Data Structures and Algorithms (cs.DS)
[822]  arXiv:2406.02778 (replaced) [pdf, other]
Title: MS-IMAP -- A Multi-Scale Graph Embedding Approach for Interpretable Manifold Learning
Subjects: Machine Learning (cs.LG)
[823]  arXiv:2406.02847 (replaced) [pdf, other]
Title: Exact Conversion of In-Context Learning to Model Weights in Linearized-Attention Transformers
Comments: Accepted to ICML 2024
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[824]  arXiv:2406.02875 (replaced) [pdf, other]
Title: Leveraging KANs For Enhanced Deep Koopman Operator Discovery
Comments: 6 pages, 4 figures, 2 tables
Subjects: Machine Learning (cs.LG); Dynamical Systems (math.DS); Applied Physics (physics.app-ph); Computational Physics (physics.comp-ph)
[825]  arXiv:2406.02876 (replaced) [pdf, other]
Title: LCS: A Language Converter Strategy for Zero-Shot Neural Machine Translation
Comments: ACL2024 Findings, Codes are at this https URL
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[826]  arXiv:2406.02881 (replaced) [pdf, other]
Title: Inv-Adapter: ID Customization Generation via Image Inversion and Lightweight Adapter
Comments: technical report
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[827]  arXiv:2406.02882 (replaced) [pdf, other]
Title: Outdated Issue Aware Decoding for Factual Knowledge Editing
Comments: ACL2024 Findings, Codes are at this https URL
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[828]  arXiv:2406.02886 (replaced) [pdf, other]
Title: PLaD: Preference-based Large Language Model Distillation with Pseudo-Preference Pairs
Comments: Findings of ACL 2024
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[829]  arXiv:2406.02887 (replaced) [pdf, other]
Title: USM RNN-T model weights binarization
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[830]  arXiv:2406.02918 (replaced) [pdf, other]
Title: U-KAN Makes Strong Backbone for Medical Image Segmentation and Generation
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
[831]  arXiv:2406.02966 (replaced) [pdf, ps, other]
Title: Generative AI and Digital Neocolonialism in Global Education: Towards an Equitable Framework
Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI)
[832]  arXiv:2406.03051 (replaced) [pdf, other]
Title: Adapter-X: A Novel General Parameter-Efficient Fine-Tuning Framework for Vision
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[833]  arXiv:2406.03095 (replaced) [pdf, other]
Title: EgoSurgery-Tool: A Dataset of Surgical Tool and Hand Detection from Egocentric Open Surgery Videos
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[834]  arXiv:2406.03099 (replaced) [pdf, other]
Title: Graph Convolutional Branch and Bound
Comments: Submitted to European Journal of Operational Research
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
[835]  arXiv:2406.03145 (replaced) [pdf, other]
Title: E(n) Equivariant Message Passing Cellular Networks
Subjects: Machine Learning (cs.LG)
[836]  arXiv:2406.03151 (replaced) [pdf, other]
Title: Which Side Are You On? A Multi-task Dataset for End-to-End Argument Summarisation and Evaluation
Comments: Published on ACL 2024 Findings
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
[837]  arXiv:2406.03154 (replaced) [pdf, other]
Title: Detecting Model Misspecification in Amortized Bayesian Inference with Neural Networks: An Extended Investigation
Comments: Extended version of the conference paper this https URL arXiv admin note: text overlap with arXiv:2112.08866
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[838]  arXiv:2406.03170 (replaced) [pdf, other]
Title: StatBot.Swiss: Bilingual Open Data Exploration in Natural Language
Comments: This work is accepted at ACL Findings 2024
Subjects: Computation and Language (cs.CL)
[839]  arXiv:2406.03248 (replaced) [pdf, other]
Title: Large Language Models as Evaluators for Recommendation Explanations
Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL)
[840]  arXiv:2406.03253 (replaced) [pdf, other]
Title: Generating Explanations for Cellular Neural Networks
Subjects: Machine Learning (cs.LG)
[841]  arXiv:2406.03262 (replaced) [pdf, other]
Title: ADer: A Comprehensive Benchmark for Multi-class Visual Anomaly Detection
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[842]  arXiv:2406.03337 (replaced) [pdf, other]
Title: Identifying latent state transition in non-linear dynamical systems
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[843]  arXiv:2406.03345 (replaced) [pdf, other]
Title: Feature Contamination: Neural Networks Learn Uncorrelated Features and Fail to Generalize
Comments: ICML 2024
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[844]  arXiv:2406.03437 (replaced) [pdf, other]
Title: Transfer Learning for Latent Variable Network Models
Subjects: Machine Learning (cs.LG)
[845]  arXiv:2406.03452 (replaced) [pdf, other]
Title: Using Synchronic Definitions and Semantic Relations to Classify Semantic Change Types
Subjects: Computation and Language (cs.CL)
[846]  arXiv:2406.03488 (replaced) [pdf, other]
Title: Seq1F1B: Efficient Sequence-Level Pipeline Parallelism for Large Language Model Training
Comments: 12 pages, 4 figures, 6 tables
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
[ total of 846 entries: 1-500 | 347-846 ]
[ showing 500 entries per page: fewer | more | all ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, recent, 2406, contact, help  (Access key information)