Computer Science
New submissions, skipping first 1000
[ showing 500 entries per page: fewer | more | all ]
New submissions for Fri, 7 Jun 24 (continued, showing last 48 of 394 entries)
- [347] arXiv:2406.04280 [pdf, other]
-
Title: xMIL: Insightful Explanations for Multiple Instance Learning in HistopathologyAuthors: Julius Hense, Mina Jamshidi Idaji, Oliver Eberle, Thomas Schnake, Jonas Dippel, Laure Ciernik, Oliver Buchstab, Andreas Mock, Frederick Klauschen, Klaus-Robert MüllerSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Multiple instance learning (MIL) is an effective and widely used approach for weakly supervised machine learning. In histopathology, MIL models have achieved remarkable success in tasks like tumor detection, biomarker prediction, and outcome prognostication. However, MIL explanation methods are still lagging behind, as they are limited to small bag sizes or disregard instance interactions. We revisit MIL through the lens of explainable AI (XAI) and introduce xMIL, a refined framework with more general assumptions. We demonstrate how to obtain improved MIL explanations using layer-wise relevance propagation (LRP) and conduct extensive evaluation experiments on three toy settings and four real-world histopathology datasets. Our approach consistently outperforms previous explanation attempts with particularly improved faithfulness scores on challenging biomarker prediction tasks. Finally, we showcase how xMIL explanations enable pathologists to extract insights from MIL models, representing a significant advance for knowledge discovery and model debugging in digital histopathology.
- [348] arXiv:2406.04284 [pdf, other]
-
Title: What is Dataset Distillation Learning?Comments: ICML 2024Subjects: Machine Learning (cs.LG)
Dataset distillation has emerged as a strategy to overcome the hurdles associated with large datasets by learning a compact set of synthetic data that retains essential information from the original dataset. While distilled data can be used to train high performing models, little is understood about how the information is stored. In this study, we posit and answer three questions about the behavior, representativeness, and point-wise information content of distilled data. We reveal distilled data cannot serve as a substitute for real data during training outside the standard evaluation setting for dataset distillation. Additionally, the distillation process retains high task performance by compressing information related to the early training dynamics of real models. Finally, we provide an framework for interpreting distilled data and reveal that individual distilled data points contain meaningful semantic information. This investigation sheds light on the intricate nature of distilled data, providing a better understanding on how they can be effectively utilized.
- [349] arXiv:2406.04286 [pdf, other]
-
Title: ABEX: Data Augmentation for Low-Resource NLU via Expanding Abstract DescriptionsAuthors: Sreyan Ghosh, Utkarsh Tyagi, Sonal Kumar, C. K. Evuru, S Ramaneswaran, S Sakshi, Dinesh ManochaComments: ACL 2024 Main Conference. Code and data: this https URLSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
We present ABEX, a novel and effective generative data augmentation methodology for low-resource Natural Language Understanding (NLU) tasks. ABEX is based on ABstract-and-EXpand, a novel paradigm for generating diverse forms of an input document -- we first convert a document into its concise, abstract description and then generate new documents based on expanding the resultant abstraction. To learn the task of expanding abstract descriptions, we first train BART on a large-scale synthetic dataset with abstract-document pairs. Next, to generate abstract descriptions for a document, we propose a simple, controllable, and training-free method based on editing AMR graphs. ABEX brings the best of both worlds: by expanding from abstract representations, it preserves the original semantic properties of the documents, like style and meaning, thereby maintaining alignment with the original label and data distribution. At the same time, the fundamental process of elaborating on abstract descriptions facilitates diverse generations. We demonstrate the effectiveness of ABEX on 4 NLU tasks spanning 12 datasets and 4 low-resource settings. ABEX outperforms all our baselines qualitatively with improvements of 0.04% - 38.8%. Qualitatively, ABEX outperforms all prior methods from literature in terms of context and length diversity.
- [350] arXiv:2406.04287 [pdf, other]
-
Title: SpectralZoom: Efficient Segmentation with an Adaptive Hyperspectral CameraSubjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
Hyperspectral image segmentation is crucial for many fields such as agriculture, remote sensing, biomedical imaging, battlefield sensing and astronomy. However, the challenge of hyper and multi spectral imaging is its large data footprint. We propose both a novel camera design and a vision transformer-based (ViT) algorithm that alleviate both the captured data footprint and the computational load for hyperspectral segmentation. Our camera is able to adaptively sample image regions or patches at different resolutions, instead of capturing the entire hyperspectral cube at one high resolution. Our segmentation algorithm works in concert with the camera, applying ViT-based segmentation only to adaptively selected patches. We show results both in simulation and on a real hardware platform demonstrating both accurate segmentation results and reduced computational burden.
- [351] arXiv:2406.04289 [pdf, other]
-
Title: What Languages are Easy to Language-Model? A Perspective from Learning Probabilistic Regular LanguagesAuthors: Nadav Borenstein, Anej Svete, Robin Chan, Josef Valvoda, Franz Nowak, Isabelle Augenstein, Eleanor Chodroff, Ryan CotterellComments: Accepted to ACL 2024Subjects: Computation and Language (cs.CL)
What can large language models learn? By definition, language models (LM) are distributions over strings. Therefore, an intuitive way of addressing the above question is to formalize it as a matter of learnability of classes of distributions over strings. While prior work in this direction focused on assessing the theoretical limits, in contrast, we seek to understand the empirical learnability. Unlike prior empirical work, we evaluate neural LMs on their home turf-learning probabilistic languages-rather than as classifiers of formal languages. In particular, we investigate the learnability of regular LMs (RLMs) by RNN and Transformer LMs. We empirically test the learnability of RLMs as a function of various complexity parameters of the RLM and the hidden state size of the neural LM. We find that the RLM rank, which corresponds to the size of linear space spanned by the logits of its conditional distributions, and the expected length of sampled strings are strong and significant predictors of learnability for both RNNs and Transformers. Several other predictors also reach significance, but with differing patterns between RNNs and Transformers.
- [352] arXiv:2406.04290 [pdf, other]
-
Title: Providing High-Performance Execution with a Sequential Contract for Cryptographic ProgramsComments: 17 pages, 7 figures, 4 tablesSubjects: Cryptography and Security (cs.CR); Hardware Architecture (cs.AR)
Constant-time programming is a widely deployed approach to harden cryptographic programs against side channel attacks. However, modern processors violate the underlying assumptions of constant-time policies by speculatively executing unintended paths of the program.
In this work, we propose Cassandra, a novel hardware-software mechanism to protect constant-time cryptographic code against speculative control flow based attacks. Cassandra explores the radical design point of disabling the branch predictor and recording-and-replaying sequential control flow of the program. Two key insights that enable our design are that (1) the sequential control flow of a constant-time program is constant over different runs, and (2) cryptographic programs are highly looped and their control flow patterns repeat in a highly compressible way. These insights allow us to perform an offline branch analysis that significantly compresses control flow traces. We add a small component to a typical processor design, the Branch Trace Unit, to store compressed traces and determine fetch redirections according to the sequential model of the program. Moreover, we provide a formal security analysis and prove that our methodology adheres to a strong security contract by design. Despite providing a higher security guarantee, Cassandra counter-intuitively improves performance by 1.77% by eliminating branch misprediction penalties. - [353] arXiv:2406.04291 [pdf, other]
-
Title: Stratified Prediction-Powered Inference for Hybrid Language Model EvaluationSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Prediction-powered inference (PPI) is a method that improves statistical estimates based on limited human-labeled data. PPI achieves this by combining small amounts of human-labeled data with larger amounts of data labeled by a reasonably accurate -- but potentially biased -- automatic system, in a way that results in tighter confidence intervals for certain parameters of interest (e.g., the mean performance of a language model). In this paper, we propose a method called Stratified Prediction-Powered Inference (StratPPI), in which we show that the basic PPI estimates can be considerably improved by employing simple data stratification strategies. Without making any assumptions on the underlying automatic labeling system or data distribution, we derive an algorithm for computing provably valid confidence intervals for population parameters (such as averages) that is based on stratified sampling. In particular, we show both theoretically and empirically that, with appropriate choices of stratification and sample allocation, our approach can provide substantially tighter confidence intervals than unstratified approaches. Specifically, StratPPI is expected to improve in cases where the performance of the autorater varies across different conditional distributions of the target data.
- [354] arXiv:2406.04292 [pdf, other]
-
Title: VISTA: Visualized Text Embedding For Universal Multi-Modal RetrievalComments: Accepted to ACL 2024 main conferenceSubjects: Information Retrieval (cs.IR); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
Multi-modal retrieval becomes increasingly popular in practice. However, the existing retrievers are mostly text-oriented, which lack the capability to process visual information. Despite the presence of vision-language models like CLIP, the current methods are severely limited in representing the text-only and image-only data. In this work, we present a new embedding model VISTA for universal multi-modal retrieval. Our work brings forth threefold technical contributions. Firstly, we introduce a flexible architecture which extends a powerful text encoder with the image understanding capability by introducing visual token embeddings. Secondly, we develop two data generation strategies, which bring high-quality composed image-text to facilitate the training of the embedding model. Thirdly, we introduce a multi-stage training algorithm, which first aligns the visual token embedding with the text encoder using massive weakly labeled data, and then develops multi-modal representation capability using the generated composed image-text data. In our experiments, VISTA achieves superior performances across a variety of multi-modal retrieval tasks in both zero-shot and supervised settings. Our model, data, and source code are available at https://github.com/FlagOpen/FlagEmbedding.
- [355] arXiv:2406.04295 [pdf, other]
-
Title: Everything to the Synthetic: Diffusion-driven Test-time Adaptation via Synthetic-Domain AlignmentAuthors: Jiayi Guo, Junhao Zhao, Chunjiang Ge, Chaoqun Du, Zanlin Ni, Shiji Song, Humphrey Shi, Gao HuangComments: GitHub: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
Test-time adaptation (TTA) aims to enhance the performance of source-domain pretrained models when tested on unknown shifted target domains. Traditional TTA methods primarily adapt model weights based on target data streams, making model performance sensitive to the amount and order of target data. Recently, diffusion-driven TTA methods have demonstrated strong performance by using an unconditional diffusion model, which is also trained on the source domain to transform target data into synthetic data as a source domain projection. This allows the source model to make predictions without weight adaptation. In this paper, we argue that the domains of the source model and the synthetic data in diffusion-driven TTA methods are not aligned. To adapt the source model to the synthetic domain of the unconditional diffusion model, we introduce a Synthetic-Domain Alignment (SDA) framework to fine-tune the source model with synthetic data. Specifically, we first employ a conditional diffusion model to generate labeled samples, creating a synthetic dataset. Subsequently, we use the aforementioned unconditional diffusion model to add noise to and denoise each sample before fine-tuning. This process mitigates the potential domain gap between the conditional and unconditional models. Extensive experiments across various models and benchmarks demonstrate that SDA achieves superior domain alignment and consistently outperforms existing diffusion-driven TTA methods. Our code is available at https://github.com/SHI-Labs/Diffusion-Driven-Test-Time-Adaptation-via-Synthetic-Domain-Alignment.
- [356] arXiv:2406.04298 [pdf, other]
-
Title: Measuring and Addressing Indexical Bias in Information RetrievalComments: ACL 2024Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL)
Information Retrieval (IR) systems are designed to deliver relevant content, but traditional systems may not optimize rankings for fairness, neutrality, or the balance of ideas. Consequently, IR can often introduce indexical biases, or biases in the positional order of documents. Although indexical bias can demonstrably affect people's opinion, voting patterns, and other behaviors, these issues remain understudied as the field lacks reliable metrics and procedures for automatically measuring indexical bias. Towards this end, we introduce the PAIR framework, which supports automatic bias audits for ranked documents or entire IR systems. After introducing DUO, the first general-purpose automatic bias metric, we run an extensive evaluation of 8 IR systems on a new corpus of 32k synthetic and 4.7k natural documents, with 4k queries spanning 1.4k controversial issue topics. A human behavioral study validates our approach, showing that our bias metric can help predict when and how indexical bias will shift a reader's opinion.
- [357] arXiv:2406.04299 [pdf, other]
-
Title: NoisyGL: A Comprehensive Benchmark for Graph Neural Networks under Label NoiseComments: Submitted to the 38th Conference on Neural Information Processing Systems (NeurIPS 2024) Track on Datasets and BenchmarksSubjects: Machine Learning (cs.LG); Social and Information Networks (cs.SI)
Graph Neural Networks (GNNs) exhibit strong potential in node classification task through a message-passing mechanism. However, their performance often hinges on high-quality node labels, which are challenging to obtain in real-world scenarios due to unreliable sources or adversarial attacks. Consequently, label noise is common in real-world graph data, negatively impacting GNNs by propagating incorrect information during training. To address this issue, the study of Graph Neural Networks under Label Noise (GLN) has recently gained traction. However, due to variations in dataset selection, data splitting, and preprocessing techniques, the community currently lacks a comprehensive benchmark, which impedes deeper understanding and further development of GLN. To fill this gap, we introduce NoisyGL in this paper, the first comprehensive benchmark for graph neural networks under label noise. NoisyGL enables fair comparisons and detailed analyses of GLN methods on noisy labeled graph data across various datasets, with unified experimental settings and interface. Our benchmark has uncovered several important insights that were missed in previous research, and we believe these findings will be highly beneficial for future studies. We hope our open-source benchmark library will foster further advancements in this field. The code of the benchmark can be found in https://github.com/eaglelab-zju/NoisyGL.
- [358] arXiv:2406.04300 [pdf, other]
-
Title: Text-to-Drive: Diverse Driving Behavior Synthesis via Large Language ModelsComments: 14 pages, 7 figuresSubjects: Robotics (cs.RO)
Generating varied scenarios through simulation is crucial for training and evaluating safety-critical systems, such as autonomous vehicles. Yet, the task of modeling the trajectories of other vehicles to simulate diverse and meaningful close interactions remains prohibitively costly. Adopting language descriptions to generate driving behaviors emerges as a promising strategy, offering a scalable and intuitive method for human operators to simulate a wide range of driving interactions. However, the scarcity of large-scale annotated language-trajectory data makes this approach challenging.
To address this gap, we propose Text-to-Drive (T2D) to synthesize diverse driving behaviors via Large Language Models (LLMs). We introduce a knowledge-driven approach that operates in two stages. In the first stage, we employ the embedded knowledge of LLMs to generate diverse language descriptions of driving behaviors for a scene. Then, we leverage LLM's reasoning capabilities to synthesize these behaviors in simulation. At its core, T2D employs an LLM to construct a state chart that maps low-level states to high-level abstractions. This strategy aids in downstream tasks such as summarizing low-level observations, assessing policy alignment with behavior description, and shaping the auxiliary reward, all without needing human supervision. With our knowledge-driven approach, we demonstrate that T2D generates more diverse trajectories compared to other baselines and offers a natural language interface that allows for interactive incorporation of human preference. Please check our website for more examples: https://text-to-drive.github.io/ - [359] arXiv:2406.04301 [pdf, other]
-
Title: Neural Surface Reconstruction from Sparse Views Using Epipolar GeometryAuthors: Kaichen ZhouSubjects: Computer Vision and Pattern Recognition (cs.CV)
This paper addresses the challenge of reconstructing surfaces from sparse view inputs, where ambiguity and occlusions due to missing information pose significant hurdles. We present a novel approach, named EpiS, that incorporates Epipolar information into the reconstruction process. Existing methods in sparse-view neural surface learning have mainly focused on mean and variance considerations using cost volumes for feature extraction. In contrast, our method aggregates coarse information from the cost volume into Epipolar features extracted from multiple source views, enabling the generation of fine-grained Signal Distance Function (SDF)-aware features. Additionally, we employ an attention mechanism along the line dimension to facilitate feature fusion based on the SDF feature. Furthermore, to address the information gaps in sparse conditions, we integrate depth information from monocular depth estimation using global and local regularization techniques. The global regularization utilizes a triplet loss function, while the local regularization employs a derivative loss function. Extensive experiments demonstrate that our approach outperforms state-of-the-art methods, especially in cases with sparse and generalizable conditions.
- [360] arXiv:2406.04302 [pdf, other]
-
Title: Representational Alignment Supports Effective Machine TeachingAuthors: Ilia Sucholutsky, Katherine M. Collins, Maya Malaviya, Nori Jacoby, Weiyang Liu, Theodore R. Sumers, Michalis Korakakis, Umang Bhatt, Mark Ho, Joshua B. Tenenbaum, Brad Love, Zachary A. Pardos, Adrian Weller, Thomas L. GriffithsComments: PreprintSubjects: Machine Learning (cs.LG)
A good teacher should not only be knowledgeable; but should be able to communicate in a way that the student understands -- to share the student's representation of the world. In this work, we integrate insights from machine teaching and pragmatic communication with the burgeoning literature on representational alignment to characterize a utility curve defining a relationship between representational alignment and teacher capability for promoting student learning. To explore the characteristics of this utility curve, we design a supervised learning environment that disentangles representational alignment from teacher accuracy. We conduct extensive computational experiments with machines teaching machines, complemented by a series of experiments in which machines teach humans. Drawing on our findings that improved representational alignment with a student improves student learning outcomes (i.e., task accuracy), we design a classroom matching procedure that assigns students to teachers based on the utility curve. If we are to design effective machine teachers, it is not enough to build teachers that are accurate -- we want teachers that can align, representationally, to their students too.
- [361] arXiv:2406.04303 [pdf, other]
-
Title: Vision-LSTM: xLSTM as Generic Vision BackboneSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Transformers are widely used as generic backbones in computer vision, despite initially introduced for natural language processing. Recently, the Long Short-Term Memory (LSTM) has been extended to a scalable and performant architecture - the xLSTM - which overcomes long-standing LSTM limitations via exponential gating and parallelizable matrix memory structure. In this report, we introduce Vision-LSTM (ViL), an adaption of the xLSTM building blocks to computer vision. ViL comprises a stack of xLSTM blocks where odd blocks process the sequence of patch tokens from top to bottom while even blocks go from bottom to top. Experiments show that ViL holds promise to be further deployed as new generic backbone for computer vision architectures.
- [362] arXiv:2406.04306 [pdf, other]
-
Title: Semantically Diverse Language Generation for Uncertainty Estimation in Language ModelsSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Large language models (LLMs) can suffer from hallucinations when generating text. These hallucinations impede various applications in society and industry by making LLMs untrustworthy. Current LLMs generate text in an autoregressive fashion by predicting and appending text tokens. When an LLM is uncertain about the semantic meaning of the next tokens to generate, it is likely to start hallucinating. Thus, it has been suggested that hallucinations stem from predictive uncertainty. We introduce Semantically Diverse Language Generation (SDLG) to quantify predictive uncertainty in LLMs. SDLG steers the LLM to generate semantically diverse yet likely alternatives for an initially generated text. This approach provides a precise measure of aleatoric semantic uncertainty, detecting whether the initial text is likely to be hallucinated. Experiments on question-answering tasks demonstrate that SDLG consistently outperforms existing methods while being the most computationally efficient, setting a new standard for uncertainty estimation in LLMs.
- [363] arXiv:2406.04308 [pdf, other]
-
Title: Approximation-Aware Bayesian OptimizationAuthors: Natalie Maus, Kyurae Kim, Geoff Pleiss, David Eriksson, John P. Cunningham, Jacob R. GardnerSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
High-dimensional Bayesian optimization (BO) tasks such as molecular design often require 10,000 function evaluations before obtaining meaningful results. While methods like sparse variational Gaussian processes (SVGPs) reduce computational requirements in these settings, the underlying approximations result in suboptimal data acquisitions that slow the progress of optimization. In this paper we modify SVGPs to better align with the goals of BO: targeting informed data acquisition rather than global posterior fidelity. Using the framework of utility-calibrated variational inference, we unify GP approximation and data acquisition into a joint optimization problem, thereby ensuring optimal decisions under a limited computational budget. Our approach can be used with any decision-theoretic acquisition function and is compatible with trust region methods like TuRBO. We derive efficient joint objectives for the expected improvement and knowledge gradient acquisition functions in both the standard and batch BO settings. Our approach outperforms standard SVGPs on high-dimensional benchmark tasks in control and molecular design.
- [364] arXiv:2406.04309 [pdf, other]
-
Title: ReFiNe: Recursive Field Networks for Cross-modal Multi-scene RepresentationComments: SIGGRAPH 2024. Project Page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (cs.LG); Multimedia (cs.MM)
The common trade-offs of state-of-the-art methods for multi-shape representation (a single model "packing" multiple objects) involve trading modeling accuracy against memory and storage. We show how to encode multiple shapes represented as continuous neural fields with a higher degree of precision than previously possible and with low memory usage. Key to our approach is a recursive hierarchical formulation that exploits object self-similarity, leading to a highly compressed and efficient shape latent space. Thanks to the recursive formulation, our method supports spatial and global-to-local latent feature fusion without needing to initialize and maintain auxiliary data structures, while still allowing for continuous field queries to enable applications such as raytracing. In experiments on a set of diverse datasets, we provide compelling qualitative results and demonstrate state-of-the-art multi-scene reconstruction and compression results with a single network per dataset.
- [365] arXiv:2406.04312 [pdf, other]
-
Title: ReNO: Enhancing One-step Text-to-Image Models through Reward-based Noise OptimizationComments: PreprintSubjects: Computer Vision and Pattern Recognition (cs.CV)
Text-to-Image (T2I) models have made significant advancements in recent years, but they still struggle to accurately capture intricate details specified in complex compositional prompts. While fine-tuning T2I models with reward objectives has shown promise, it suffers from "reward hacking" and may not generalize well to unseen prompt distributions. In this work, we propose Reward-based Noise Optimization (ReNO), a novel approach that enhances T2I models at inference by optimizing the initial noise based on the signal from one or multiple human preference reward models. Remarkably, solving this optimization problem with gradient ascent for 50 iterations yields impressive results on four different one-step models across two competitive benchmarks, T2I-CompBench and GenEval. Within a computational budget of 20-50 seconds, ReNO-enhanced one-step models consistently surpass the performance of all current open-source Text-to-Image models. Extensive user studies demonstrate that our model is preferred nearly twice as often compared to the popular SDXL model and is on par with the proprietary Stable Diffusion 3 with 8B parameters. Moreover, given the same computational resources, a ReNO-optimized one-step model outperforms widely-used open-source models such as SDXL and PixArt-$\alpha$, highlighting the efficiency and effectiveness of ReNO in enhancing T2I model performance at inference time. Code is available at https://github.com/ExplainableML/ReNO.
- [366] arXiv:2406.04313 [pdf, other]
-
Title: Improving Alignment and Robustness with Short CircuitingAuthors: Andy Zou, Long Phan, Justin Wang, Derek Duenas, Maxwell Lin, Maksym Andriushchenko, Rowan Wang, Zico Kolter, Matt Fredrikson, Dan HendrycksSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Computers and Society (cs.CY)
AI systems can take harmful actions and are highly vulnerable to adversarial attacks. We present an approach, inspired by recent advances in representation engineering, that "short-circuits" models as they respond with harmful outputs. Existing techniques aimed at improving alignment, such as refusal training, are often bypassed. Techniques such as adversarial training try to plug these holes by countering specific attacks. As an alternative to refusal training and adversarial training, short-circuiting directly controls the representations that are responsible for harmful outputs in the first place. Our technique can be applied to both text-only and multimodal language models to prevent the generation of harmful outputs without sacrificing utility -- even in the presence of powerful unseen attacks. Notably, while adversarial robustness in standalone image recognition remains an open challenge, short-circuiting allows the larger multimodal system to reliably withstand image "hijacks" that aim to produce harmful content. Finally, we extend our approach to AI agents, demonstrating considerable reductions in the rate of harmful actions when they are under attack. Our approach represents a significant step forward in the development of reliable safeguards to harmful behavior and adversarial attacks.
- [367] arXiv:2406.04314 [pdf, other]
-
Title: Step-aware Preference Optimization: Aligning Preference with Denoising Performance at Each StepSubjects: Computer Vision and Pattern Recognition (cs.CV)
Recently, Direct Preference Optimization (DPO) has extended its success from aligning large language models (LLMs) to aligning text-to-image diffusion models with human preferences. Unlike most existing DPO methods that assume all diffusion steps share a consistent preference order with the final generated images, we argue that this assumption neglects step-specific denoising performance and that preference labels should be tailored to each step's contribution. To address this limitation, we propose Step-aware Preference Optimization (SPO), a novel post-training approach that independently evaluates and adjusts the denoising performance at each step, using a step-aware preference model and a step-wise resampler to ensure accurate step-aware supervision. Specifically, at each denoising step, we sample a pool of images, find a suitable win-lose pair, and, most importantly, randomly select a single image from the pool to initialize the next denoising step. This step-wise resampler process ensures the next win-lose image pair comes from the same image, making the win-lose comparison independent of the previous step. To assess the preferences at each step, we train a separate step-aware preference model that can be applied to both noisy and clean images. Our experiments with Stable Diffusion v1.5 and SDXL demonstrate that SPO significantly outperforms the latest Diffusion-DPO in aligning generated images with complex, detailed prompts and enhancing aesthetics, while also achieving more than 20x times faster in training efficiency. Code and model: https://rockeycoss.github.io/spo.github.io/
- [368] arXiv:2406.04316 [pdf, other]
-
Title: Omni6DPose: A Benchmark and Model for Universal 6D Object Pose Estimation and TrackingSubjects: Computer Vision and Pattern Recognition (cs.CV)
6D Object Pose Estimation is a crucial yet challenging task in computer vision, suffering from a significant lack of large-scale datasets. This scarcity impedes comprehensive evaluation of model performance, limiting research advancements. Furthermore, the restricted number of available instances or categories curtails its applications. To address these issues, this paper introduces Omni6DPose, a substantial dataset characterized by its diversity in object categories, large scale, and variety in object materials. Omni6DPose is divided into three main components: ROPE (Real 6D Object Pose Estimation Dataset), which includes 332K images annotated with over 1.5M annotations across 581 instances in 149 categories; SOPE(Simulated 6D Object Pose Estimation Dataset), consisting of 475K images created in a mixed reality setting with depth simulation, annotated with over 5M annotations across 4162 instances in the same 149 categories; and the manually aligned real scanned objects used in both ROPE and SOPE. Omni6DPose is inherently challenging due to the substantial variations and ambiguities. To address this challenge, we introduce GenPose++, an enhanced version of the SOTA category-level pose estimation framework, incorporating two pivotal improvements: Semantic-aware feature extraction and Clustering-based aggregation. Moreover, we provide a comprehensive benchmarking analysis to evaluate the performance of previous methods on this large-scale dataset in the realms of 6D object pose estimation and pose tracking.
- [369] arXiv:2406.04317 [pdf, other]
-
Title: Regularized KL-Divergence for Well-Defined Function-Space Variational Inference in Bayesian neural networksSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Bayesian neural networks (BNN) promise to combine the predictive performance of neural networks with principled uncertainty modeling important for safety-critical systems and decision making. However, posterior uncertainty estimates depend on the choice of prior, and finding informative priors in weight-space has proven difficult. This has motivated variational inference (VI) methods that pose priors directly on the function generated by the BNN rather than on weights. In this paper, we address a fundamental issue with such function-space VI approaches pointed out by Burt et al. (2020), who showed that the objective function (ELBO) is negative infinite for most priors of interest. Our solution builds on generalized VI (Knoblauch et al., 2019) with the regularized KL divergence (Quang, 2019) and is, to the best of our knowledge, the first well-defined variational objective for function-space inference in BNNs with Gaussian process (GP) priors. Experiments show that our method incorporates the properties specified by the GP prior on synthetic and small real-world data sets, and provides competitive uncertainty estimates for regression, classification and out-of-distribution detection compared to BNN baselines with both function and weight-space priors.
- [370] arXiv:2406.04318 [pdf, other]
-
Title: Adaptive Sampling of k-Space in Magnetic Resonance for Rapid Pathology PredictionComments: ICML 2024. Project website at this https URLSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Magnetic Resonance (MR) imaging, despite its proven diagnostic utility, remains an inaccessible imaging modality for disease surveillance at the population level. A major factor rendering MR inaccessible is lengthy scan times. An MR scanner collects measurements associated with the underlying anatomy in the Fourier space, also known as the k-space. Creating a high-fidelity image requires collecting large quantities of such measurements, increasing the scan time. Traditionally to accelerate an MR scan, image reconstruction from under-sampled k-space data is the method of choice. However, recent works show the feasibility of bypassing image reconstruction and directly learning to detect disease directly from a sparser learned subset of the k-space measurements. In this work, we propose Adaptive Sampling for MR (ASMR), a sampling method that learns an adaptive policy to sequentially select k-space samples to optimize for target disease detection. On 6 out of 8 pathology classification tasks spanning the Knee, Brain, and Prostate MR scans, ASMR reaches within 2% of the performance of a fully sampled classifier while using only 8% of the k-space, as well as outperforming prior state-of-the-art work in k-space sampling such as EMRT, LOUPE, and DPS.
- [371] arXiv:2406.04320 [pdf, other]
-
Title: Chimera: Effectively Modeling Multivariate Time Series with 2-Dimensional State Space ModelsSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Modeling multivariate time series is a well-established problem with a wide range of applications from healthcare to financial markets. Traditional State Space Models (SSMs) are classical approaches for univariate time series modeling due to their simplicity and expressive power to represent linear dependencies. They, however, have fundamentally limited expressive power to capture non-linear dependencies, are slow in practice, and fail to model the inter-variate information flow. Despite recent attempts to improve the expressive power of SSMs by using deep structured SSMs, the existing methods are either limited to univariate time series, fail to model complex patterns (e.g., seasonal patterns), fail to dynamically model the dependencies of variate and time dimensions, and/or are input-independent. We present Chimera that uses two input-dependent 2-D SSM heads with different discretization processes to learn long-term progression and seasonal patterns. To improve the efficiency of complex 2D recurrence, we present a fast training using a new 2-dimensional parallel selective scan. We further present and discuss 2-dimensional Mamba and Mamba-2 as the spacial cases of our 2D SSM. Our experimental evaluation shows the superior performance of Chimera on extensive and diverse benchmarks, including ECG and speech time series classification, long-term and short-term time series forecasting, and time series anomaly detection.
- [372] arXiv:2406.04321 [pdf, other]
-
Title: VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term ModelingAuthors: Zeyue Tian, Zhaoyang Liu, Ruibin Yuan, Jiahao Pan, Xiaoqiang Huang, Qifeng Liu, Xu Tan, Qifeng Chen, Wei Xue, Yike GuoComments: The code and datasets will be available at this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM); Sound (cs.SD)
In this work, we systematically study music generation conditioned solely on the video. First, we present a large-scale dataset comprising 190K video-music pairs, including various genres such as movie trailers, advertisements, and documentaries. Furthermore, we propose VidMuse, a simple framework for generating music aligned with video inputs. VidMuse stands out by producing high-fidelity music that is both acoustically and semantically aligned with the video. By incorporating local and global visual cues, VidMuse enables the creation of musically coherent audio tracks that consistently match the video content through Long-Short-Term modeling. Through extensive experiments, VidMuse outperforms existing models in terms of audio quality, diversity, and audio-visual alignment. The code and datasets will be available at https://github.com/ZeyueT/VidMuse/.
- [373] arXiv:2406.04322 [pdf, other]
-
Title: DIRECT-3D: Learning Direct Text-to-3D Generation on Massive Noisy 3D DataSubjects: Computer Vision and Pattern Recognition (cs.CV)
We present DIRECT-3D, a diffusion-based 3D generative model for creating high-quality 3D assets (represented by Neural Radiance Fields) from text prompts. Unlike recent 3D generative models that rely on clean and well-aligned 3D data, limiting them to single or few-class generation, our model is directly trained on extensive noisy and unaligned `in-the-wild' 3D assets, mitigating the key challenge (i.e., data scarcity) in large-scale 3D generation. In particular, DIRECT-3D is a tri-plane diffusion model that integrates two innovations: 1) A novel learning framework where noisy data are filtered and aligned automatically during the training process. Specifically, after an initial warm-up phase using a small set of clean data, an iterative optimization is introduced in the diffusion process to explicitly estimate the 3D pose of objects and select beneficial data based on conditional density. 2) An efficient 3D representation that is achieved by disentangling object geometry and color features with two separate conditional diffusion models that are optimized hierarchically. Given a prompt input, our model generates high-quality, high-resolution, realistic, and complex 3D objects with accurate geometric details in seconds. We achieve state-of-the-art performance in both single-class generation and text-to-3D generation. We also demonstrate that DIRECT-3D can serve as a useful 3D geometric prior of objects, for example to alleviate the well-known Janus problem in 2D-lifting methods such as DreamFusion. The code and models are available for research purposes at: https://github.com/qihao067/direct3d.
- [374] arXiv:2406.04323 [pdf, other]
-
Title: ATraDiff: Accelerating Online Reinforcement Learning with Imaginary TrajectoriesComments: ICML 2024 AcceptedSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Training autonomous agents with sparse rewards is a long-standing problem in online reinforcement learning (RL), due to low data efficiency. Prior work overcomes this challenge by extracting useful knowledge from offline data, often accomplished through the learning of action distribution from offline data and utilizing the learned distribution to facilitate online RL. However, since the offline data are given and fixed, the extracted knowledge is inherently limited, making it difficult to generalize to new tasks. We propose a novel approach that leverages offline data to learn a generative diffusion model, coined as Adaptive Trajectory Diffuser (ATraDiff). This model generates synthetic trajectories, serving as a form of data augmentation and consequently enhancing the performance of online RL methods. The key strength of our diffuser lies in its adaptability, allowing it to effectively handle varying trajectory lengths and mitigate distribution shifts between online and offline data. Because of its simplicity, ATraDiff seamlessly integrates with a wide spectrum of RL methods. Empirical evaluation shows that ATraDiff consistently achieves state-of-the-art performance across a variety of environments, with particularly pronounced improvements in complicated settings. Our code and demo video are available at https://atradiff.github.io .
- [375] arXiv:2406.04324 [pdf, other]
-
Title: SF-V: Single Forward Video Generation ModelAuthors: Zhixing Zhang, Yanyu Li, Yushu Wu, Yanwu Xu, Anil Kag, Ivan Skorokhodov, Willi Menapace, Aliaksandr Siarohin, Junli Cao, Dimitris Metaxas, Sergey Tulyakov, Jian RenComments: Project Page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Diffusion-based video generation models have demonstrated remarkable success in obtaining high-fidelity videos through the iterative denoising process. However, these models require multiple denoising steps during sampling, resulting in high computational costs. In this work, we propose a novel approach to obtain single-step video generation models by leveraging adversarial training to fine-tune pre-trained video diffusion models. We show that, through the adversarial training, the multi-steps video diffusion model, i.e., Stable Video Diffusion (SVD), can be trained to perform single forward pass to synthesize high-quality videos, capturing both temporal and spatial dependencies in the video data. Extensive experiments demonstrate that our method achieves competitive generation quality of synthesized videos with significantly reduced computational overhead for the denoising process (i.e., around $23\times$ speedup compared with SVD and $6\times$ speedup compared with existing works, with even better generation quality), paving the way for real-time video synthesis and editing. More visualization results are made publicly available at https://snap-research.github.io/SF-V.
- [376] arXiv:2406.04325 [pdf, other]
-
Title: ShareGPT4Video: Improving Video Understanding and Generation with Better CaptionsAuthors: Lin Chen, Xilin Wei, Jinsong Li, Xiaoyi Dong, Pan Zhang, Yuhang Zang, Zehui Chen, Haodong Duan, Bin Lin, Zhenyu Tang, Li Yuan, Yu Qiao, Dahua Lin, Feng Zhao, Jiaqi WangComments: Project Page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
We present the ShareGPT4Video series, aiming to facilitate the video understanding of large video-language models (LVLMs) and the video generation of text-to-video models (T2VMs) via dense and precise captions. The series comprises: 1) ShareGPT4Video, 40K GPT4V annotated dense captions of videos with various lengths and sources, developed through carefully designed data filtering and annotating strategy. 2) ShareCaptioner-Video, an efficient and capable captioning model for arbitrary videos, with 4.8M high-quality aesthetic videos annotated by it. 3) ShareGPT4Video-8B, a simple yet superb LVLM that reached SOTA performance on three advancing video benchmarks. To achieve this, taking aside the non-scalable costly human annotators, we find using GPT4V to caption video with a naive multi-frame or frame-concatenation input strategy leads to less detailed and sometimes temporal-confused results. We argue the challenge of designing a high-quality video captioning strategy lies in three aspects: 1) Inter-frame precise temporal change understanding. 2) Intra-frame detailed content description. 3) Frame-number scalability for arbitrary-length videos. To this end, we meticulously designed a differential video captioning strategy, which is stable, scalable, and efficient for generating captions for videos with arbitrary resolution, aspect ratios, and length. Based on it, we construct ShareGPT4Video, which contains 40K high-quality videos spanning a wide range of categories, and the resulting captions encompass rich world knowledge, object attributes, camera movements, and crucially, detailed and precise temporal descriptions of events. Based on ShareGPT4Video, we further develop ShareCaptioner-Video, a superior captioner capable of efficiently generating high-quality captions for arbitrary videos...
- [377] arXiv:2406.04327 [pdf, other]
-
Title: Causal Estimation of Memorisation ProfilesComments: Published at the ACL 2024 Conference (main)Subjects: Machine Learning (cs.LG)
Understanding memorisation in language models has practical and societal implications, e.g., studying models' training dynamics or preventing copyright infringements. Prior work defines memorisation as the causal effect of training with an instance on the model's ability to predict that instance. This definition relies on a counterfactual: the ability to observe what would have happened had the model not seen that instance. Existing methods struggle to provide computationally efficient and accurate estimates of this counterfactual. Further, they often estimate memorisation for a model architecture rather than for a specific model instance. This paper fills an important gap in the literature, proposing a new, principled, and efficient method to estimate memorisation based on the difference-in-differences design from econometrics. Using this method, we characterise a model's memorisation profile--its memorisation trends across training--by only observing its behaviour on a small set of instances throughout training. In experiments with the Pythia model suite, we find that memorisation (i) is stronger and more persistent in larger models, (ii) is determined by data order and learning rate, and (iii) has stable trends across model sizes, thus making memorisation in larger models predictable from smaller ones.
- [378] arXiv:2406.04328 [pdf, other]
-
Title: The Brain's Bitter Lesson: Scaling Speech Decoding With Self-Supervised LearningComments: 10 pages, 4 figures, under reviewSubjects: Machine Learning (cs.LG)
The past few years have produced a series of spectacular advances in the decoding of speech from brain activity. The engine of these advances has been the acquisition of labelled data, with increasingly large datasets acquired from single subjects. However, participants exhibit anatomical and other individual differences, and datasets use varied scanners and task designs. As a result, prior work has struggled to leverage data from multiple subjects, multiple datasets, multiple tasks, and unlabelled datasets. In turn, the field has not benefited from the rapidly growing number of open neural data repositories to exploit large-scale data and deep learning. To address this, we develop an initial set of neuroscience-inspired self-supervised objectives, together with a neural architecture, for representation learning from heterogeneous and unlabelled neural recordings. Experimental results show that representations learned with these objectives generalise across subjects, datasets, and tasks, and are also learned faster than using only labelled data. In addition, we set new benchmarks for two foundational speech decoding tasks. Taken together, these methods now unlock the potential for training speech decoding models with orders of magnitude more existing data.
- [379] arXiv:2406.04329 [pdf, other]
-
Title: Simplified and Generalized Masked Diffusion for Discrete DataSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Masked (or absorbing) diffusion is actively explored as an alternative to autoregressive models for generative modeling of discrete data. However, existing work in this area has been hindered by unnecessarily complex model formulations and unclear relationships between different perspectives, leading to suboptimal parameterization, training objectives, and ad hoc adjustments to counteract these issues. In this work, we aim to provide a simple and general framework that unlocks the full potential of masked diffusion models. We show that the continuous-time variational objective of masked diffusion models is a simple weighted integral of cross-entropy losses. Our framework also enables training generalized masked diffusion models with state-dependent masking schedules. When evaluated by perplexity, our models trained on OpenWebText surpass prior diffusion language models at GPT-2 scale and demonstrate superior performance on 4 out of 5 zero-shot language modeling tasks. Furthermore, our models vastly outperform previous discrete diffusion models on pixel-level image modeling, achieving 2.78~(CIFAR-10) and 3.42 (ImageNet 64$\times$64) bits per dimension that are comparable or better than autoregressive models of similar sizes.
- [380] arXiv:2406.04330 [pdf, other]
-
Title: Parameter-Inverted Image Pyramid NetworksAuthors: Xizhou Zhu, Xue Yang, Zhaokai Wang, Hao Li, Wenhan Dou, Junqi Ge, Lewei Lu, Yu Qiao, Jifeng DaiSubjects: Computer Vision and Pattern Recognition (cs.CV)
Image pyramids are commonly used in modern computer vision tasks to obtain multi-scale features for precise understanding of images. However, image pyramids process multiple resolutions of images using the same large-scale model, which requires significant computational cost. To overcome this issue, we propose a novel network architecture known as the Parameter-Inverted Image Pyramid Networks (PIIP). Our core idea is to use models with different parameter sizes to process different resolution levels of the image pyramid, thereby balancing computational efficiency and performance. Specifically, the input to PIIP is a set of multi-scale images, where higher resolution images are processed by smaller networks. We further propose a feature interaction mechanism to allow features of different resolutions to complement each other and effectively integrate information from different spatial scales. Extensive experiments demonstrate that the PIIP achieves superior performance in tasks such as object detection, segmentation, and image classification, compared to traditional image pyramid methods and single-branch networks, while reducing computational cost. Notably, when applying our method on a large-scale vision foundation model InternViT-6B, we improve its performance by 1%-2% on detection and segmentation with only 40%-60% of the original computation. These results validate the effectiveness of the PIIP approach and provide a new technical direction for future vision computing tasks. Our code and models are available at https://github.com/OpenGVLab/PIIP.
- [381] arXiv:2406.04331 [pdf, other]
-
Title: PaCE: Parsimonious Concept Engineering for Large Language ModelsAuthors: Jinqi Luo, Tianjiao Ding, Kwan Ho Ryan Chan, Darshan Thaker, Aditya Chattopadhyay, Chris Callison-Burch, René VidalComments: 26 pages, 17 figures, 5 tables, dataset and code at this https URLSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Machine Learning (cs.LG)
Large Language Models (LLMs) are being used for a wide variety of tasks. While they are capable of generating human-like responses, they can also produce undesirable output including potentially harmful information, racist or sexist language, and hallucinations. Alignment methods are designed to reduce such undesirable output, via techniques such as fine-tuning, prompt engineering, and representation engineering. However, existing methods face several challenges: some require costly fine-tuning for every alignment task; some do not adequately remove undesirable concepts, failing alignment; some remove benign concepts, lowering the linguistic capabilities of LLMs. To address these issues, we propose Parsimonious Concept Engineering (PaCE), a novel activation engineering framework for alignment. First, to sufficiently model the concepts, we construct a large-scale concept dictionary in the activation space, in which each atom corresponds to a semantic concept. Then, given any alignment task, we instruct a concept partitioner to efficiently annotate the concepts as benign or undesirable. Finally, at inference time, we decompose the LLM activations along the concept dictionary via sparse coding, to accurately represent the activation as a linear combination of the benign and undesirable components. By removing the latter ones from the activation, we reorient the behavior of LLMs towards alignment goals. We conduct experiments on tasks such as response detoxification, faithfulness enhancement, and sentiment revising, and show that PaCE achieves state-of-the-art alignment performance while maintaining linguistic capabilities.
- [382] arXiv:2406.04332 [pdf, other]
-
Title: Coarse-To-Fine Tensor Trains for Compact Visual RepresentationsAuthors: Sebastian Loeschcke, Dan Wang, Christian Leth-Espensen, Serge Belongie, Michael J. Kastoryano, Sagie BenaimComments: Project webpage: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
The ability to learn compact, high-quality, and easy-to-optimize representations for visual data is paramount to many applications such as novel view synthesis and 3D reconstruction. Recent work has shown substantial success in using tensor networks to design such compact and high-quality representations. However, the ability to optimize tensor-based representations, and in particular, the highly compact tensor train representation, is still lacking. This has prevented practitioners from deploying the full potential of tensor networks for visual data. To this end, we propose 'Prolongation Upsampling Tensor Train (PuTT)', a novel method for learning tensor train representations in a coarse-to-fine manner. Our method involves the prolonging or `upsampling' of a learned tensor train representation, creating a sequence of 'coarse-to-fine' tensor trains that are incrementally refined. We evaluate our representation along three axes: (1). compression, (2). denoising capability, and (3). image completion capability. To assess these axes, we consider the tasks of image fitting, 3D fitting, and novel view synthesis, where our method shows an improved performance compared to state-of-the-art tensor-based methods. For full results see our project webpage: https://sebulo.github.io/PuTT_website/
- [383] arXiv:2406.04333 [pdf, other]
-
Title: BitsFusion: 1.99 bits Weight Quantization of Diffusion ModelAuthors: Yang Sui, Yanyu Li, Anil Kag, Yerlan Idelbayev, Junli Cao, Ju Hu, Dhritiman Sagar, Bo Yuan, Sergey Tulyakov, Jian RenComments: Project Page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
Diffusion-based image generation models have achieved great success in recent years by showing the capability of synthesizing high-quality content. However, these models contain a huge number of parameters, resulting in a significantly large model size. Saving and transferring them is a major bottleneck for various applications, especially those running on resource-constrained devices. In this work, we develop a novel weight quantization method that quantizes the UNet from Stable Diffusion v1.5 to 1.99 bits, achieving a model with 7.9X smaller size while exhibiting even better generation quality than the original one. Our approach includes several novel techniques, such as assigning optimal bits to each layer, initializing the quantized model for better performance, and improving the training strategy to dramatically reduce quantization error. Furthermore, we extensively evaluate our quantized model across various benchmark datasets and through human evaluation to demonstrate its superior generation quality.
- [384] arXiv:2406.04334 [pdf, other]
-
Title: DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effective for LMMsComments: Project Page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
Most large multimodal models (LMMs) are implemented by feeding visual tokens as a sequence into the first layer of a large language model (LLM). The resulting architecture is simple but significantly increases computation and memory costs, as it has to handle a large number of additional tokens in its input layer. This paper presents a new architecture DeepStack for LMMs. Considering $N$ layers in the language and vision transformer of LMMs, we stack the visual tokens into $N$ groups and feed each group to its aligned transformer layer \textit{from bottom to top}. Surprisingly, this simple method greatly enhances the power of LMMs to model interactions among visual tokens across layers but with minimal additional cost. We apply DeepStack to both language and vision transformer in LMMs, and validate the effectiveness of DeepStack LMMs with extensive empirical results. Using the same context length, our DeepStack 7B and 13B parameters surpass their counterparts by \textbf{2.7} and \textbf{2.9} on average across \textbf{9} benchmarks, respectively. Using only one-fifth of the context length, DeepStack rivals closely to the counterparts that use the full context length. These gains are particularly pronounced on high-resolution tasks, e.g., \textbf{4.2}, \textbf{11.0}, and \textbf{4.0} improvements on TextVQA, DocVQA, and InfoVQA compared to LLaVA-1.5-7B, respectively. We further apply DeepStack to vision transformer layers, which brings us a similar amount of improvements, \textbf{3.8} on average compared with LLaVA-1.5-7B.
- [385] arXiv:2406.04336 [pdf, other]
-
Title: On the Expressive Power of Spectral Invariant Graph Neural NetworksComments: 31 pages; 3 figures; to appear in ICML 2024Subjects: Machine Learning (cs.LG); Discrete Mathematics (cs.DM); Data Structures and Algorithms (cs.DS); Combinatorics (math.CO); Spectral Theory (math.SP)
Incorporating spectral information to enhance Graph Neural Networks (GNNs) has shown promising results but raises a fundamental challenge due to the inherent ambiguity of eigenvectors. Various architectures have been proposed to address this ambiguity, referred to as spectral invariant architectures. Notable examples include GNNs and Graph Transformers that use spectral distances, spectral projection matrices, or other invariant spectral features. However, the potential expressive power of these spectral invariant architectures remains largely unclear. The goal of this work is to gain a deep theoretical understanding of the expressive power obtainable when using spectral features. We first introduce a unified message-passing framework for designing spectral invariant GNNs, called Eigenspace Projection GNN (EPNN). A comprehensive analysis shows that EPNN essentially unifies all prior spectral invariant architectures, in that they are either strictly less expressive or equivalent to EPNN. A fine-grained expressiveness hierarchy among different architectures is also established. On the other hand, we prove that EPNN itself is bounded by a recently proposed class of Subgraph GNNs, implying that all these spectral invariant architectures are strictly less expressive than 3-WL. Finally, we discuss whether using spectral features can gain additional expressiveness when combined with more expressive GNNs.
- [386] arXiv:2406.04337 [pdf, other]
-
Title: Coherent Zero-Shot Visual Instruction GenerationComments: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Despite the advances in text-to-image synthesis, particularly with diffusion models, generating visual instructions that require consistent representation and smooth state transitions of objects across sequential steps remains a formidable challenge. This paper introduces a simple, training-free framework to tackle the issues, capitalizing on the advancements in diffusion models and large language models (LLMs). Our approach systematically integrates text comprehension and image generation to ensure visual instructions are visually appealing and maintain consistency and accuracy throughout the instruction sequence. We validate the effectiveness by testing multi-step instructions and comparing the text alignment and consistency with several baselines. Our experiments show that our approach can visualize coherent and visually pleasing instructions
- [387] arXiv:2406.04338 [pdf, other]
-
Title: Physics3D: Learning Physical Properties of 3D Gaussians via Video DiffusionComments: Project page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR)
In recent years, there has been rapid development in 3D generation models, opening up new possibilities for applications such as simulating the dynamic movements of 3D objects and customizing their behaviors. However, current 3D generative models tend to focus only on surface features such as color and shape, neglecting the inherent physical properties that govern the behavior of objects in the real world. To accurately simulate physics-aligned dynamics, it is essential to predict the physical properties of materials and incorporate them into the behavior prediction process. Nonetheless, predicting the diverse materials of real-world objects is still challenging due to the complex nature of their physical attributes. In this paper, we propose \textbf{Physics3D}, a novel method for learning various physical properties of 3D objects through a video diffusion model. Our approach involves designing a highly generalizable physical simulation system based on a viscoelastic material model, which enables us to simulate a wide range of materials with high-fidelity capabilities. Moreover, we distill the physical priors from a video diffusion model that contains more understanding of realistic object materials. Extensive experiments demonstrate the effectiveness of our method with both elastic and plastic materials. Physics3D shows great potential for bridging the gap between the physical world and virtual neural space, providing a better integration and application of realistic physical principles in virtual environments. Project page: https://liuff19.github.io/Physics3D.
- [388] arXiv:2406.04339 [pdf, other]
-
Title: RoboMamba: Multimodal State Space Model for Efficient Robot Reasoning and ManipulationAuthors: Jiaming Liu, Mengzhen Liu, Zhenyu Wang, Lily Lee, Kaichen Zhou, Pengju An, Senqiao Yang, Renrui Zhang, Yandong Guo, Shanghang ZhangSubjects: Computer Vision and Pattern Recognition (cs.CV)
A fundamental objective in robot manipulation is to enable models to comprehend visual scenes and execute actions. Although existing robot Multimodal Large Language Models (MLLMs) can handle a range of basic tasks, they still face challenges in two areas: 1) inadequate reasoning ability to tackle complex tasks, and 2) high computational costs for MLLM fine-tuning and inference. The recently proposed state space model (SSM) known as Mamba demonstrates promising capabilities in non-trivial sequence modeling with linear inference complexity. Inspired by this, we introduce RoboMamba, an end-to-end robotic MLLM that leverages the Mamba model to deliver both robotic reasoning and action capabilities, while maintaining efficient fine-tuning and inference. Specifically, we first integrate the vision encoder with Mamba, aligning visual data with language embedding through co-training, empowering our model with visual common sense and robot-related reasoning. To further equip RoboMamba with action pose prediction abilities, we explore an efficient fine-tuning strategy with a simple policy head. We find that once RoboMamba possesses sufficient reasoning capability, it can acquire manipulation skills with minimal fine-tuning parameters (0.1\% of the model) and time (20 minutes). In experiments, RoboMamba demonstrates outstanding reasoning capabilities on general and robotic evaluation benchmarks. Meanwhile, our model showcases impressive pose prediction results in both simulation and real-world experiments, achieving inference speeds 7 times faster than existing robot MLLMs. Our project web page: https://sites.google.com/view/robomamba-web
- [389] arXiv:2406.04340 [pdf, other]
-
Title: GLACE: Global Local Accelerated Coordinate EncodingComments: Large-scale visual localization with a single optimizable MLP. CVPR 2024. Code: this https URL Project page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
Scene coordinate regression (SCR) methods are a family of visual localization methods that directly regress 2D-3D matches for camera pose estimation. They are effective in small-scale scenes but face significant challenges in large-scale scenes that are further amplified in the absence of ground truth 3D point clouds for supervision. Here, the model can only rely on reprojection constraints and needs to implicitly triangulate the points. The challenges stem from a fundamental dilemma: The network has to be invariant to observations of the same landmark at different viewpoints and lighting conditions, etc., but at the same time discriminate unrelated but similar observations. The latter becomes more relevant and severe in larger scenes. In this work, we tackle this problem by introducing the concept of co-visibility to the network. We propose GLACE, which integrates pre-trained global and local encodings and enables SCR to scale to large scenes with only a single small-sized network. Specifically, we propose a novel feature diffusion technique that implicitly groups the reprojection constraints with co-visibility and avoids overfitting to trivial solutions. Additionally, our position decoder parameterizes the output positions for large-scale scenes more effectively. Without using 3D models or depth maps for supervision, our method achieves state-of-the-art results on large-scale scenes with a low-map-size model. On Cambridge landmarks, with a single model, we achieve 17% lower median position error than Poker, the ensemble variant of the state-of-the-art SCR method ACE. Code is available at: https://github.com/cvg/glace.
- [390] arXiv:2406.04341 [pdf, other]
-
Title: Interpreting the Second-Order Effects of Neurons in CLIPComments: project page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
We interpret the function of individual neurons in CLIP by automatically describing them using text. Analyzing the direct effects (i.e. the flow from a neuron through the residual stream to the output) or the indirect effects (overall contribution) fails to capture the neurons' function in CLIP. Therefore, we present the "second-order lens", analyzing the effect flowing from a neuron through the later attention heads, directly to the output. We find that these effects are highly selective: for each neuron, the effect is significant for <2% of the images. Moreover, each effect can be approximated by a single direction in the text-image space of CLIP. We describe neurons by decomposing these directions into sparse sets of text representations. The sets reveal polysemantic behavior - each neuron corresponds to multiple, often unrelated, concepts (e.g. ships and cars). Exploiting this neuron polysemy, we mass-produce "semantic" adversarial examples by generating images with concepts spuriously correlated to the incorrect class. Additionally, we use the second-order effects for zero-shot segmentation and attribute discovery in images. Our results indicate that a scalable understanding of neurons can be used for model deception and for introducing new model capabilities.
- [391] arXiv:2406.04342 [pdf, other]
-
Title: Learning 1D Causal Visual Representation with De-focus Attention NetworksAuthors: Chenxin Tao, Xizhou Zhu, Shiqian Su, Lewei Lu, Changyao Tian, Xuan Luo, Gao Huang, Hongsheng Li, Yu Qiao, Jie Zhou, Jifeng DaiSubjects: Computer Vision and Pattern Recognition (cs.CV)
Modality differences have led to the development of heterogeneous architectures for vision and language models. While images typically require 2D non-causal modeling, texts utilize 1D causal modeling. This distinction poses significant challenges in constructing unified multi-modal models. This paper explores the feasibility of representing images using 1D causal modeling. We identify an "over-focus" issue in existing 1D causal vision models, where attention overly concentrates on a small proportion of visual tokens. The issue of "over-focus" hinders the model's ability to extract diverse visual features and to receive effective gradients for optimization. To address this, we propose De-focus Attention Networks, which employ learnable bandpass filters to create varied attention patterns. During training, large and scheduled drop path rates, and an auxiliary loss on globally pooled features for global understanding tasks are introduced. These two strategies encourage the model to attend to a broader range of tokens and enhance network optimization. Extensive experiments validate the efficacy of our approach, demonstrating that 1D causal visual representation can perform comparably to 2D non-causal representation in tasks such as global perception, dense prediction, and multi-modal understanding. Code is released at https://github.com/OpenGVLab/De-focus-Attention-Networks.
- [392] arXiv:2406.04343 [pdf, other]
-
Title: Flash3D: Feed-Forward Generalisable 3D Scene Reconstruction from a Single ImageAuthors: Stanislaw Szymanowicz, Eldar Insafutdinov, Chuanxia Zheng, Dylan Campbell, João F. Henriques, Christian Rupprecht, Andrea VedaldiComments: Project page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
In this paper, we propose Flash3D, a method for scene reconstruction and novel view synthesis from a single image which is both very generalisable and efficient. For generalisability, we start from a "foundation" model for monocular depth estimation and extend it to a full 3D shape and appearance reconstructor. For efficiency, we base this extension on feed-forward Gaussian Splatting. Specifically, we predict a first layer of 3D Gaussians at the predicted depth, and then add additional layers of Gaussians that are offset in space, allowing the model to complete the reconstruction behind occlusions and truncations. Flash3D is very efficient, trainable on a single GPU in a day, and thus accessible to most researchers. It achieves state-of-the-art results when trained and tested on RealEstate10k. When transferred to unseen datasets like NYU it outperforms competitors by a large margin. More impressively, when transferred to KITTI, Flash3D achieves better PSNR than methods trained specifically on that dataset. In some instances, it even outperforms recent methods that use multiple views as input. Code, models, demo, and more results are available at https://www.robots.ox.ac.uk/~vgg/research/flash3d/.
- [393] arXiv:2406.04344 [pdf, other]
-
Title: Verbalized Machine Learning: Revisiting Machine Learning with Language ModelsComments: Technical Report v1 (92 pages, 15 figures)Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
Motivated by the large progress made by large language models (LLMs), we introduce the framework of verbalized machine learning (VML). In contrast to conventional machine learning models that are typically optimized over a continuous parameter space, VML constrains the parameter space to be human-interpretable natural language. Such a constraint leads to a new perspective of function approximation, where an LLM with a text prompt can be viewed as a function parameterized by the text prompt. Guided by this perspective, we revisit classical machine learning problems, such as regression and classification, and find that these problems can be solved by an LLM-parameterized learner and optimizer. The major advantages of VML include (1) easy encoding of inductive bias: prior knowledge about the problem and hypothesis class can be encoded in natural language and fed into the LLM-parameterized learner; (2) automatic model class selection: the optimizer can automatically select a concrete model class based on data and verbalized prior knowledge, and it can update the model class during training; and (3) interpretable learner updates: the LLM-parameterized optimizer can provide explanations for why each learner update is performed. We conduct several studies to empirically evaluate the effectiveness of VML, and hope that VML can serve as a stepping stone to stronger interpretability and trustworthiness in ML.
- [394] arXiv:2406.04345 [pdf, other]
-
Title: Stereo-Depth Fusion through Virtual Pattern ProjectionComments: extended version of ICCV 2023: "Active Stereo Without Pattern Projector"Subjects: Computer Vision and Pattern Recognition (cs.CV)
This paper presents a novel general-purpose stereo and depth data fusion paradigm that mimics the active stereo principle by replacing the unreliable physical pattern projector with a depth sensor. It works by projecting virtual patterns consistent with the scene geometry onto the left and right images acquired by a conventional stereo camera, using the sparse hints obtained from a depth sensor, to facilitate the visual correspondence. Purposely, any depth sensing device can be seamlessly plugged into our framework, enabling the deployment of a virtual active stereo setup in any possible environment and overcoming the severe limitations of physical pattern projection, such as the limited working range and environmental conditions. Exhaustive experiments on indoor and outdoor datasets featuring both long and close range, including those providing raw, unfiltered depth hints from off-the-shelf depth sensors, highlight the effectiveness of our approach in notably boosting the robustness and accuracy of algorithms and deep stereo without any code modification and even without re-training. Additionally, we assess the performance of our strategy on active stereo evaluation datasets with conventional pattern projection. Indeed, in all these scenarios, our virtual pattern projection paradigm achieves state-of-the-art performance. The source code is available at: https://github.com/bartn8/vppstereo.
Cross-lists for Fri, 7 Jun 24
- [395] arXiv:2405.08005 (cross-list from math.OC) [pdf, other]
-
Title: Graphon Mean Field Games with a Representative Player: Analysis and Learning AlgorithmComments: Published as a conference paper at ICML 2024Subjects: Optimization and Control (math.OC); Artificial Intelligence (cs.AI); Computer Science and Game Theory (cs.GT); Machine Learning (cs.LG); Machine Learning (stat.ML)
We propose a discrete time graphon game formulation on continuous state and action spaces using a representative player to study stochastic games with heterogeneous interaction among agents. This formulation admits both philosophical and mathematical advantages, compared to a widely adopted formulation using a continuum of players. We prove the existence and uniqueness of the graphon equilibrium with mild assumptions, and show that this equilibrium can be used to construct an approximate solution for finite player game on networks, which is challenging to analyze and solve due to curse of dimensionality. An online oracle-free learning algorithm is developed to solve the equilibrium numerically, and sample complexity analysis is provided for its convergence.
- [396] arXiv:2406.03499 (cross-list from physics.plasm-ph) [pdf, ps, other]
-
Title: Estimated electric conductivities of thermal plasma for air-fuel combustion and oxy-fuel combustion with potassium or cesium seedingAuthors: Osama A. MarzoukComments: 28 pages, 16 figures, 14 tablesJournal-ref: Heliyon, volume 10, issues 11, article number e31697, 2024Subjects: Plasma Physics (physics.plasm-ph); Numerical Analysis (math.NA); Fluid Dynamics (physics.flu-dyn)
A complete model for estimating the electric conductivity of combustion product gases, with added cesium (Cs) or potassium (K) vapor for ionization, is presented. Neutral carrier gases serve as the bulk fluid that carries the seed material, as well as the electrons generated by the partial thermal (equilibrium) ionization of the seed alkali metal. The model accounts for electron-neutral scattering, as well as electron-ion and electron-electron scattering. The model is tested through comparison with published data. The model is aimed at being utilized for the plasma within magnetohydrodynamic (MHD) channels, where direct power extraction from passing electrically conducting plasma gas enables electric power generation. The thermal ionization model is then used to estimate the electric conductivity of seeded combustion gases under complete combustion of three selected fuels, namely: hydrogen (H2), methane (CH4), and carbon (C). For each of these three fuels, two options for the oxidizer were applied, namely: air (21 % molecular oxygen, 79 % molecular nitrogen by mole), and pure oxygen (oxy-combustion). Two types of seeds (with 1 % mole fraction, based on the composition before ionization) were also applied for each of the six combinations of (fuel-oxidizer), leading to a total of 12 different MHD plasma cases. For each of these cases, the electric conductivity was computed for a range of temperatures from 2000 K to 3000 K. The smallest estimated electric conductivity was 0.35 S/m for oxy-hydrogen combustion at 2000 K, with potassium seeding. The largest estimated electric conductivity was 180.30 S/m for oxy-carbon combustion at 3000 K, with cesium seeding. At 2000 K, replacing potassium with cesium causes a gain in the electric conductivity by a multiplicative gain factor of about 3.6 regardless of the fuel and oxidizer. This gain factor declines to between 1.77 and 2.07 at 3000 K.
- [397] arXiv:2406.03504 (cross-list from math.OC) [pdf, ps, other]
-
Title: A New Branch-and-Bound Pruning Framework for $\ell_0$-Regularized ProblemsSubjects: Optimization and Control (math.OC); Machine Learning (cs.LG)
We consider the resolution of learning problems involving $\ell_0$-regularization via Branch-and-Bound (BnB) algorithms. These methods explore regions of the feasible space of the problem and check whether they do not contain solutions through "pruning tests". In standard implementations, evaluating a pruning test requires to solve a convex optimization problem, which may result in computational bottlenecks. In this paper, we present an alternative to implement pruning tests for some generic family of $\ell_0$-regularized problems. Our proposed procedure allows the simultaneous assessment of several regions and can be embedded in standard BnB implementations with a negligible computational overhead. We show through numerical simulations that our pruning strategy can improve the solving time of BnB procedures by several orders of magnitude for typical problems encountered in machine-learning applications.
- [398] arXiv:2406.03587 (cross-list from physics.soc-ph) [pdf, other]
-
Title: Subsuming Complex Networks by Node WalksComments: 14 pages and 7 figuresSubjects: Physics and Society (physics.soc-ph); Social and Information Networks (cs.SI)
The concept of node walk in graphs and complex networks has been addressed, consisting of one or more nodes that move into adjacent nodes, henceforth incorporating the respective connections. This type of dynamics is then applied to subsume complex networks. Three types of networks (Erd\'os- R\'eny, Barab\'asi-Albert, as well as a geometric model) are considered, while three node walks heuristics (uniformly random, largest degree, and smallest degree) are taken into account. Several interesting results are obtained and described, including the identification that the subsuming dynamics depend strongly on both the specific topology of the networks as well as the criteria controlling the node walks. The use of node walks as a model for studying the relationship between network topology and dynamics is motivated by this result. In addition, relatively high correlations between the initial node degree and the accumulated strength of the walking node were observed for some combinations of network types and dynamic rules, allowing some of the properties of the subsumption to be roughly predicted from the initial topology around the waking node which has been found, however, not to be enough for full determination of the subsumption dynamics. Another interesting result regards the quite distinct signatures (along the iterations) of walking node strengths obtained for the several considered combinations of network type and subsumption rules.
- [399] arXiv:2406.03616 (cross-list from stat.ML) [pdf, other]
-
Title: BEACON: A Bayesian Optimization Strategy for Novelty Search in Expensive Black-Box SystemsSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
Novelty search (NS) refers to a class of exploration algorithms that automatically uncover diverse system behaviors through simulations or experiments. Systematically obtaining diverse outcomes is a key component in many real-world design problems such as material and drug discovery, neural architecture search, reinforcement learning, and robot navigation. Since the relationship between the inputs and outputs (i.e., behaviors) of these complex systems is typically not available in closed form, NS requires a black-box perspective. Consequently, popular NS algorithms rely on evolutionary optimization and other meta-heuristics that require intensive sampling of the input space, which is impractical when the system is expensive to evaluate. We propose a Bayesian optimization inspired algorithm for sample-efficient NS that is specifically designed for such expensive black-box systems. Our approach models the input-to-behavior mapping with multi-output Gaussian processes (MOGP) and selects the next point to evaluate by maximizing a novelty metric that depends on a posterior sample drawn from the MOGP that promotes both exploration and exploitation. By leveraging advances in efficient posterior sampling and high-dimensional Gaussian process modeling, we discuss how our approach can be made scalable with respect to both amount of data and number of inputs. We test our approach on ten synthetic benchmark problems and eight real-world problems (with up to 2133 inputs) including new applications such as discovery of diverse metal organic frameworks for use in clean energy technology. We show that our approach greatly outperforms existing NS algorithms by finding substantially larger sets of diverse behaviors under limited sample budgets.
- [400] arXiv:2406.03628 (cross-list from stat.ML) [pdf, other]
-
Title: Synthetic Oversampling: Theory and A Practical Approach Using LLMs to Address Data ImbalanceComments: 59 pages, 7 figuresSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
Imbalanced data and spurious correlations are common challenges in machine learning and data science. Oversampling, which artificially increases the number of instances in the underrepresented classes, has been widely adopted to tackle these challenges. In this article, we introduce OPAL (\textbf{O}versam\textbf{P}ling with \textbf{A}rtificial \textbf{L}LM-generated data), a systematic oversampling approach that leverages the capabilities of large language models (LLMs) to generate high-quality synthetic data for minority groups. Recent studies on synthetic data generation using deep generative models mostly target prediction tasks. Our proposal differs in that we focus on handling imbalanced data and spurious correlations. More importantly, we develop a novel theory that rigorously characterizes the benefits of using the synthetic data, and shows the capacity of transformers in generating high-quality synthetic data for both labels and covariates. We further conduct intensive numerical experiments to demonstrate the efficacy of our proposed approach compared to some representative alternative solutions.
- [401] arXiv:2406.03637 (cross-list from eess.AS) [pdf, other]
-
Title: Style Mixture of Experts for Expressive Text-To-Speech SynthesisSubjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
Recent advances in style transfer text-to-speech (TTS) have improved the expressiveness of synthesized speech. Despite these advancements, encoding stylistic information from diverse and unseen reference speech remains challenging. This paper introduces StyleMoE, an approach that divides the embedding space, modeled by the style encoder, into tractable subsets handled by style experts. The proposed method replaces the style encoder in a TTS system with a Mixture of Experts (MoE) layer. By utilizing a gating network to route reference speeches to different style experts, each expert specializes in aspects of the style space during optimization. Our experiments objectively and subjectively demonstrate the effectiveness of our proposed method in increasing the coverage of the style space for diverse and unseen styles. This approach can enhance the performance of existing state-of-the-art style transfer TTS models, marking the first study of MoE in style transfer TTS to our knowledge.
- [402] arXiv:2406.03652 (cross-list from q-fin.PM) [pdf, other]
-
Title: Ensembling Portfolio Strategies for Long-Term Investments: A Distribution-Free Preference Framework for Decision-Making and AlgorithmsAuthors: Duy Khanh LamComments: 25 pages, 12 figures, 3 tables, working paperSubjects: Portfolio Management (q-fin.PM); Information Theory (cs.IT); Machine Learning (cs.LG); Computational Finance (q-fin.CP)
This paper investigates the problem of ensembling multiple strategies for sequential portfolios to outperform individual strategies in terms of long-term wealth. Due to the uncertainty of strategies' performances in the future market, which are often based on specific models and statistical assumptions, investors often mitigate risk and enhance robustness by combining multiple strategies, akin to common approaches in collective learning prediction. However, the absence of a distribution-free and consistent preference framework complicates decisions of combination due to the ambiguous objective. To address this gap, we introduce a novel framework for decision-making in combining strategies, irrespective of market conditions, by establishing the investor's preference between decisions and then forming a clear objective. Through this framework, we propose a combinatorial strategy construction, free from statistical assumptions, for any scale of component strategies, even infinite, such that it meets the determined criterion. Finally, we test the proposed strategy along with its accelerated variant and some other multi-strategies. The numerical experiments show results in favor of the proposed strategies, albeit with small tradeoffs in their Sharpe ratios, in which their cumulative wealths eventually exceed those of the best component strategies while the accelerated strategy significantly improves performance.
- [403] arXiv:2406.03653 (cross-list from stat.ML) [pdf, other]
-
Title: Equivalence Set Restricted Latent Class Models (ESRLCM)Comments: 43 pages, 10 tables, 1 figureSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST)
Latent Class Models (LCMs) are used to cluster multivariate categorical data, commonly used to interpret survey responses. We propose a novel Bayesian model called the Equivalence Set Restricted Latent Class Model (ESRLCM). This model identifies clusters who have common item response probabilities, and does so more generically than traditional restricted latent attribute models. We verify the identifiability of ESRLCMs, and demonstrate the effectiveness in both simulations and real-world applications.
- [404] arXiv:2406.03657 (cross-list from eess.AS) [pdf, other]
-
Title: UrBAN: Urban Beehive Acoustics and PheNotyping DatasetAuthors: Mahsa Abdollahi, Yi Zhu, Heitor R. Guimarães, Nico Coallier, Ségolène Maucourt, Pierre Giovenazzo, Tiago H. FalkSubjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
In this paper, we present a multimodal dataset obtained from a honey bee colony in Montr\'eal, Quebec, Canada, spanning the years of 2021 to 2022. This apiary comprised 10 beehives, with microphones recording more than 2000 hours of high quality raw audio, and also sensors capturing temperature, and humidity. Periodic hive inspections involved monitoring colony honey bee population changes, assessing queen-related conditions, and documenting overall hive health. Additionally, health metrics, such as Varroa mite infestation rates and winter mortality assessments were recorded, offering valuable insights into factors affecting hive health status and resilience. In this study, we first outline the data collection process, sensor data description, and dataset structure. Furthermore, we demonstrate a practical application of this dataset by extracting various features from the raw audio to predict colony population using the number of frames of bees as a proxy.
- [405] arXiv:2406.03663 (cross-list from eess.IV) [pdf, ps, other]
-
Title: A Hybrid Deep Learning Classification of Perimetric Glaucoma Using Peripapillary Nerve Fiber Layer Reflectance and Other OCT Parameters from Three Anatomy RegionsAuthors: Ou Tan, David S. Greenfield, Brian A. Francis, Rohit Varma, Joel S. Schuman, David Huang, Dongseok ChoiComments: 12 pagesSubjects: Image and Video Processing (eess.IV); Machine Learning (cs.LG); Quantitative Methods (q-bio.QM)
Precis: A hybrid deep-learning model combines NFL reflectance and other OCT parameters to improve glaucoma diagnosis. Objective: To investigate if a deep learning model could be used to combine nerve fiber layer (NFL) reflectance and other OCT parameters for glaucoma diagnosis. Patients and Methods: This is a prospective observational study where of 106 normal subjects and 164 perimetric glaucoma (PG) patients. Peripapillary NFL reflectance map, NFL thickness map, optic head analysis of disc, and macular ganglion cell complex thickness were obtained using spectral domain OCT. A hybrid deep learning model combined a fully connected network (FCN) and a convolution neural network (CNN) to develop and combine those OCT maps and parameters to distinguish normal and PG eyes. Two deep learning models were compared based on whether the NFL reflectance map was used as part of the input or not. Results: The hybrid deep learning model with reflectance achieved 0.909 sensitivity at 99% specificity and 0.926 at 95%. The overall accuracy was 0.948 with 0.893 sensitivity and 1.000 specificity, and the AROC was 0.979, which is significantly better than the logistic regression models (p < 0.001). The second best model is the hybrid deep learning model w/o reflectance, which also had significantly higher AROC than logistic regression models (p < 0.001). Logistic regression with reflectance model had slightly higher AROC or sensitivity than the other logistic regression model without reflectance (p = 0.024). Conclusions: Hybrid deep learning model significantly improved the diagnostic accuracy, without or without NFL reflectance. Hybrid deep learning model, combining reflectance/NFL thickness/GCC thickness/ONH parameter, may be a practical model for glaucoma screen purposes.
- [406] arXiv:2406.03688 (cross-list from eess.IV) [pdf, other]
-
Title: Shadow and Light: Digitally Reconstructed Radiographs for Disease ClassificationAuthors: Benjamin Hou, Qingqing Zhu, Tejas Sudarshan Mathai, Qiao Jin, Zhiyong Lu, Ronald M. SummersSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
In this paper, we introduce DRR-RATE, a large-scale synthetic chest X-ray dataset derived from the recently released CT-RATE dataset. DRR-RATE comprises of 50,188 frontal Digitally Reconstructed Radiographs (DRRs) from 21,304 unique patients. Each image is paired with a corresponding radiology text report and binary labels for 18 pathology classes. Given the controllable nature of DRR generation, it facilitates the inclusion of lateral view images and images from any desired viewing position. This opens up avenues for research into new and novel multimodal applications involving paired CT, X-ray images from various views, text, and binary labels. We demonstrate the applicability of DRR-RATE alongside existing large-scale chest X-ray resources, notably the CheXpert dataset and CheXnet model. Experiments demonstrate that CheXnet, when trained and tested on the DRR-RATE dataset, achieves sufficient to high AUC scores for the six common pathologies cited in common literature: Atelectasis, Cardiomegaly, Consolidation, Lung Lesion, Lung Opacity, and Pleural Effusion. Additionally, CheXnet trained on the CheXpert dataset can accurately identify several pathologies, even when operating out of distribution. This confirms that the generated DRR images effectively capture the essential pathology features from CT images. The dataset and labels are publicly accessible at https://huggingface.co/datasets/farrell236/DRR-RATE.
- [407] arXiv:2406.03690 (cross-list from math.OC) [pdf, other]
-
Title: AMPIC: Adaptive Model Predictive Ising Controller for large-scale urban traffic signalsComments: 17 pages, 8 figuresSubjects: Optimization and Control (math.OC); Emerging Technologies (cs.ET); Systems and Control (eess.SY); Quantum Physics (quant-ph)
Realizing smooth traffic flow is important for achieving carbon neutrality. Adaptive traffic signal control, which considers traffic conditions, has thus attracted attention. However, it is difficult to ensure optimal vehicle flow throughout a large city using existing control methods because of their heavy computational load. Here, we propose a control method called AMPIC (Adaptive Model Predictive Ising Controller) that guarantees both scalability and optimality. The proposed method employs model predictive control to solve an optimal control problem at each control interval with explicit consideration of a predictive model of vehicle flow. This optimal control problem is transformed into a combinatorial optimization problem with binary variables that is equivalent to the so-called Ising problem. This transformation allows us to use an Ising solver, which has been widely studied and is expected to have fast and efficient optimization performance. We performed numerical experiments using a microscopic traffic simulator for a realistic city road network. The results show that AMPIC enables faster vehicle cruising speed with less waiting time than that achieved by classical control methods, resulting in lower CO2 emissions. The model predictive approach with a long prediction horizon thus effectively improves control performance. Systematic parametric studies on model cities indicate that the proposed method realizes smoother traffic flows for large city road networks. Among Ising solvers, D-Wave's quantum annealing is shown to find near-optimal solutions at a reasonable computational cost.
- [408] arXiv:2406.03696 (cross-list from stat.ML) [pdf, other]
-
Title: Discrete error dynamics of mini-batch gradient descent for least squares regressionComments: 26 pagesSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Optimization and Control (math.OC)
We study the discrete dynamics of mini-batch gradient descent for least squares regression when sampling without replacement. We show that the dynamics and generalization error of mini-batch gradient descent depends on a sample cross-covariance matrix $Z$ between the original features $X$ and a set of new features $\widetilde{X}$, in which each feature is modified by the mini-batches that appear before it during the learning process in an averaged way. Using this representation, we rigorously establish that the dynamics of mini-batch and full-batch gradient descent agree up to leading order with respect to the step size using the linear scaling rule. We also study discretization effects that a continuous-time gradient flow analysis cannot detect, and show that mini-batch gradient descent converges to a step-size dependent solution, in contrast with full-batch gradient descent. Finally, we investigate the effects of batching, assuming a random matrix model, by using tools from free probability theory to numerically compute the spectrum of $Z$.
- [409] arXiv:2406.03711 (cross-list from physics.flu-dyn) [pdf, other]
-
Title: Pi-fusion: Physics-informed diffusion model for learning fluid dynamicsSubjects: Fluid Dynamics (physics.flu-dyn); Artificial Intelligence (cs.AI)
Physics-informed deep learning has been developed as a novel paradigm for learning physical dynamics recently. While general physics-informed deep learning methods have shown early promise in learning fluid dynamics, they are difficult to generalize in arbitrary time instants in real-world scenario, where the fluid motion can be considered as a time-variant trajectory involved large-scale particles. Inspired by the advantage of diffusion model in learning the distribution of data, we first propose Pi-fusion, a physics-informed diffusion model for predicting the temporal evolution of velocity and pressure field in fluid dynamics. Physics-informed guidance sampling is proposed in the inference procedure of Pi-fusion to improve the accuracy and interpretability of learning fluid dynamics. Furthermore, we introduce a training strategy based on reciprocal learning to learn the quasiperiodical pattern of fluid motion and thus improve the generalizability of the model. The proposed approach are then evaluated on both synthetic and real-world dataset, by comparing it with state-of-the-art physics-informed deep learning methods. Experimental results show that the proposed approach significantly outperforms existing methods for predicting temporal evolution of velocity and pressure field, confirming its strong generalization by drawing probabilistic inference of forward process and physics-informed guidance sampling. The proposed Pi-fusion can also be generalized in learning other physical dynamics governed by partial differential equations.
- [410] arXiv:2406.03715 (cross-list from math.PR) [pdf, other]
-
Title: Strong convergence rates for full-discrete approximations of the stochastic Allen-Cahn equations on 2D torusSubjects: Probability (math.PR); Numerical Analysis (math.NA)
In this paper we construct space-time full discretizations of stochastic Allen-Cahn equations driven by space-time white noise on 2D torus. The approximations are implemented by tamed exponential Euler discretization in time and spectral Galerkin method in space. We finally obtain the convergence rates with the spatial order of $\alpha-\delta$ and the temporal order of ${\alpha}/{6}-\delta$ in $\mathcal C^{-\alpha}$ for $\alpha\in(0,1/3)$ and $\delta>0$ arbitrarily small.
- [411] arXiv:2406.03734 (cross-list from math.OC) [pdf, other]
-
Title: Policy Gradient Methods for the Cost-Constrained LQR: Strong Duality and Global ConvergenceSubjects: Optimization and Control (math.OC); Systems and Control (eess.SY)
In safety-critical applications, reinforcement learning (RL) needs to consider safety constraints. However, theoretical understandings of constrained RL for continuous control are largely absent. As a case study, this paper presents a cost-constrained LQR formulation, where a number of LQR costs with user-defined penalty matrices are subject to constraints. To solve it, we propose a policy gradient primal-dual method to find an optimal state feedback gain. Despite the non-convexity of the cost-constrained LQR problem, we provide a constructive proof for strong duality and a geometric interpretation of an optimal multiplier set. By proving that the concave dual function is Lipschitz smooth, we further provide convergence guarantees for the PG primal-dual method. Finally, we perform simulations to validate our theoretical findings.
- [412] arXiv:2406.03766 (cross-list from eess.SP) [pdf, other]
-
Title: Privacy Preserving Semi-Decentralized Mean Estimation over Intermittently-Connected NetworksComments: 14 pages, 6 figures. arXiv admin note: text overlap with arXiv:2303.00035Subjects: Signal Processing (eess.SP); Distributed, Parallel, and Cluster Computing (cs.DC); Information Theory (cs.IT); Machine Learning (cs.LG); Systems and Control (eess.SY)
We consider the problem of privately estimating the mean of vectors distributed across different nodes of an unreliable wireless network, where communications between nodes can fail intermittently. We adopt a semi-decentralized setup, wherein to mitigate the impact of intermittently connected links, nodes can collaborate with their neighbors to compute a local consensus, which they relay to a central server. In such a setting, the communications between any pair of nodes must ensure that the privacy of the nodes is rigorously maintained to prevent unauthorized information leakage. We study the tradeoff between collaborative relaying and privacy leakage due to the data sharing among nodes and, subsequently, propose PriCER: Private Collaborative Estimation via Relaying -- a differentially private collaborative algorithm for mean estimation to optimize this tradeoff. The privacy guarantees of PriCER arise (i) implicitly, by exploiting the inherent stochasticity of the flaky network connections, and (ii) explicitly, by adding Gaussian perturbations to the estimates exchanged by the nodes. Local and central privacy guarantees are provided against eavesdroppers who can observe different signals, such as the communications amongst nodes during local consensus and (possibly multiple) transmissions from the relays to the central server. We substantiate our theoretical findings with numerical simulations. Our implementation is available at https://github.com/rajarshisaha95/private-collaborative-relaying.
- [413] arXiv:2406.03783 (cross-list from math.CO) [pdf, other]
-
Title: Flips in colorful triangulationsSubjects: Combinatorics (math.CO); Discrete Mathematics (cs.DM)
The associahedron is the graph $\mathcal{G}_N$ that has as nodes all triangulations of a convex $N$-gon, and an edge between any two triangulations that differ in a flip operation, which consists of removing an edge shared by two triangles and replacing it by the other diagonal of the resulting 4-gon. In this paper, we consider a large collection of induced subgraphs of $\mathcal{G}_N$ obtained by Ramsey-type colorability properties. Specifically, coloring the points of the $N$-gon red and blue alternatingly, we consider only colorful triangulations, namely triangulations in which every triangle has points in both colors, i.e., monochromatic triangles are forbidden. The resulting induced subgraph of $\mathcal{G}_N$ on colorful triangulations is denoted by $\mathcal{F}_N$. We prove that $\mathcal{F}_N$ has a Hamilton cycle for all $N\geq 8$, resolving a problem raised by Sagan, i.e., all colorful triangulations on $N$ points can be listed so that any two cyclically consecutive triangulations differ in a flip. In fact, we prove that for an arbitrary fixed coloring pattern of the $N$ points with at least 10 changes of color, the resulting subgraph of $\mathcal{G}_N$ on colorful triangulations (for that coloring pattern) admits a Hamilton cycle. We also provide an efficient algorithm for computing a Hamilton path in $\mathcal{F}_N$ that runs in time $\mathcal{O}(1)$ on average per generated node. This algorithm is based on a new and algorithmic construction of a tree rotation Gray code for listing all $n$-vertex $k$-ary trees that runs in time $\mathcal{O}(k)$ on average per generated tree.
- [414] arXiv:2406.03787 (cross-list from math.OC) [pdf, other]
-
Title: Projection-Free Variance Reduction Methods for Stochastic Constrained Multi-Level Compositional OptimizationSubjects: Optimization and Control (math.OC); Machine Learning (cs.LG)
This paper investigates projection-free algorithms for stochastic constrained multi-level optimization. In this context, the objective function is a nested composition of several smooth functions, and the decision set is closed and convex. Existing projection-free algorithms for solving this problem suffer from two limitations: 1) they solely focus on the gradient mapping criterion and fail to match the optimal sample complexities in unconstrained settings; 2) their analysis is exclusively applicable to non-convex functions, without considering convex and strongly convex objectives. To address these issues, we introduce novel projection-free variance reduction algorithms and analyze their complexities under different criteria. For gradient mapping, our complexities improve existing results and match the optimal rates for unconstrained problems. For the widely-used Frank-Wolfe gap criterion, we provide theoretical guarantees that align with those for single-level problems. Additionally, by using a stage-wise adaptation, we further obtain complexities for convex and strongly convex functions. Finally, numerical experiments on different tasks demonstrate the effectiveness of our methods.
- [415] arXiv:2406.03810 (cross-list from astro-ph.IM) [pdf, ps, other]
-
Title: Spherinator and HiPSter: Representation Learning for Unbiased Knowledge Discovery from SimulationsComments: 4 pages, 1 figureSubjects: Instrumentation and Methods for Astrophysics (astro-ph.IM); Machine Learning (cs.LG)
Simulations are the best approximation to experimental laboratories in astrophysics and cosmology. However, the complexity, richness, and large size of their outputs severely limit the interpretability of their predictions. We describe a new, unbiased, and machine learning based approach to obtaining useful scientific insights from a broad range of simulations. The method can be used on today's largest simulations and will be essential to solve the extreme data exploration and analysis challenges posed by the Exascale era. Furthermore, this concept is so flexible, that it will also enable explorative access to observed data. Our concept is based on applying nonlinear dimensionality reduction to learn compact representations of the data in a low-dimensional space. The simulation data is projected onto this space for interactive inspection, visual interpretation, sample selection, and local analysis. We present a prototype using a rotational invariant hyperspherical variational convolutional autoencoder, utilizing a power distribution in the latent space, and trained on galaxies from IllustrisTNG simulation. Thereby, we obtain a natural Hubble tuning fork like similarity space that can be visualized interactively on the surface of a sphere by exploiting the power of HiPS tilings in Aladin Lite.
- [416] arXiv:2406.03832 (cross-list from astro-ph.IM) [pdf, ps, other]
-
Title: UltraPINK -- New possibilities to explore Self-Organizing Kohonen MapsSubjects: Instrumentation and Methods for Astrophysics (astro-ph.IM); Human-Computer Interaction (cs.HC)
Unsupervised learning algorithms like self-organizing Kohonen maps are a promising approach to gain an overview among massive datasets. With UltraPINK, researchers can train, inspect, and explore self-organizing maps, whereby the toolbox of interaction possibilities grows continually. Key feature of UltraPINK is the consideration of versality in astronomical data. By keeping the operations as abstract as possible and using design patterns meant for abstract usage, we ensure that data is compatible with UltraPINK, regardless of its type, formatting, or origin. Future work on the application will keep extending the catalogue of exploration tools and the interfaces towards other established applications to process astronomical data. Ultimatively, we aim towards a solid infrastructure for data analysis in astronomy.
- [417] arXiv:2406.03867 (cross-list from quant-ph) [pdf, other]
-
Title: A Comprehensive Study of Quantum Arithmetic CircuitsComments: Under review at the Royal Society's Philosophical Transactions ASubjects: Quantum Physics (quant-ph); Emerging Technologies (cs.ET)
In recent decades, the field of quantum computing has experienced remarkable progress. This progress is marked by the superior performance of many quantum algorithms compared to their classical counterparts, with Shor's algorithm serving as a prominent illustration. Quantum arithmetic circuits, which are the fundamental building blocks in numerous quantum algorithms, have attracted much attention. Despite extensive exploration of various designs in the existing literature, researchers remain keen on developing novel designs and improving existing ones.
In this review article, we aim to provide a systematically organized and easily comprehensible overview of the current state-of-the-art in quantum arithmetic circuits. Specifically, this study covers fundamental operations such as addition, subtraction, multiplication, division and modular exponentiation. We delve into the detailed quantum implementations of these prominent designs and evaluate their efficiency considering various objectives. We also discuss potential applications of presented arithmetic circuits and suggest future research directions. - [418] arXiv:2406.03896 (cross-list from cond-mat.soft) [pdf, other]
-
Title: Data-driven discovery of self-similarity using neural networksComments: 21 pages, 15 figures, 5 tablesSubjects: Soft Condensed Matter (cond-mat.soft); Statistical Mechanics (cond-mat.stat-mech); Machine Learning (cs.LG)
Finding self-similarity is a key step for understanding the governing law behind complex physical phenomena. Traditional methods for identifying self-similarity often rely on specific models, which can introduce significant bias. In this paper, we present a novel neural network-based approach that discovers self-similarity directly from observed data, without presupposing any models. The presence of self-similar solutions in a physical problem signals that the governing law contains a function whose arguments are given by power-law monomials of physical parameters, which are characterized by power-law exponents. The basic idea is to enforce such particular forms structurally in a neural network in a parametrized way. We train the neural network model using the observed data, and when the training is successful, we can extract the power exponents that characterize scale-transformation symmetries of the physical problem. We demonstrate the effectiveness of our method with both synthetic and experimental data, validating its potential as a robust, model-independent tool for exploring self-similarity in complex systems.
- [419] arXiv:2406.03901 (cross-list from eess.IV) [pdf, other]
-
Title: Polyp and Surgical Instrument Segmentation with Double Encoder-Decoder NetworksAuthors: Adrian GaldranJournal-ref: NMI, Vol. 1 No. 1 (2021): MedAI: Transparency in Medical Image SegmentationSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
This paper describes a solution for the MedAI competition, in which participants were required to segment both polyps and surgical instruments from endoscopic images. Our approach relies on a double encoder-decoder neural network which we have previously applied for polyp segmentation, but with a series of enhancements: a more powerful encoder architecture, an improved optimization procedure, and the post-processing of segmentations based on tempered model ensembling. Experimental results show that our method produces segmentations that show a good agreement with manual delineations provided by medical experts.
- [420] arXiv:2406.03902 (cross-list from eess.IV) [pdf, other]
-
Title: C^2RV: Cross-Regional and Cross-View Learning for Sparse-View CBCT ReconstructionComments: Accepted to CVPR 2024Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
Cone beam computed tomography (CBCT) is an important imaging technology widely used in medical scenarios, such as diagnosis and preoperative planning. Using fewer projection views to reconstruct CT, also known as sparse-view reconstruction, can reduce ionizing radiation and further benefit interventional radiology. Compared with sparse-view reconstruction for traditional parallel/fan-beam CT, CBCT reconstruction is more challenging due to the increased dimensionality caused by the measurement process based on cone-shaped X-ray beams. As a 2D-to-3D reconstruction problem, although implicit neural representations have been introduced to enable efficient training, only local features are considered and different views are processed equally in previous works, resulting in spatial inconsistency and poor performance on complicated anatomies. To this end, we propose C^2RV by leveraging explicit multi-scale volumetric representations to enable cross-regional learning in the 3D space. Additionally, the scale-view cross-attention module is introduced to adaptively aggregate multi-scale and multi-view features. Extensive experiments demonstrate that our C^2RV achieves consistent and significant improvement over previous state-of-the-art methods on datasets with diverse anatomy.
- [421] arXiv:2406.03903 (cross-list from eess.IV) [pdf, other]
-
Title: Data-Centric Label Smoothing for Explainable Glaucoma Screening from Eye Fundus ImagesComments: Accepted to ISBI 2024 (Challenges), 2nd position in the JustRAIGS challenge (this https URL)Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
As current computing capabilities increase, modern machine learning and computer vision system tend to increase in complexity, mostly by means of larger models and advanced optimization strategies. Although often neglected, in many problems there is also much to be gained by considering potential improvements in understanding and better leveraging already-available training data, including annotations. This so-called data-centric approach can lead to substantial performance increases, sometimes beyond what can be achieved by larger models. In this paper we adopt such an approach for the task of justifiable glaucoma screening from retinal images. In particular, we focus on how to combine information from multiple annotators of different skills into a tailored label smoothing scheme that allows us to better employ a large collection of fundus images, instead of discarding samples suffering from inter-rater variability. Internal validation results indicate that our bespoke label smoothing approach surpasses the performance of a standard resnet50 model and also the same model trained with conventional label smoothing techniques, in particular for the multi-label scenario of predicting clinical reasons of glaucoma likelihood in a highly imbalanced screening context. Our code is made available at github.com/agaldran/justraigs .
- [422] arXiv:2406.03913 (cross-list from math.OC) [pdf, other]
-
Title: Recognizing weighted means in geodesic spacesSubjects: Optimization and Control (math.OC); Numerical Analysis (math.NA)
Geodesic metric spaces support a variety of averaging constructions for given finite sets. Computing such averages has generated extensive interest in diverse disciplines. Here we consider the inverse problem of recognizing computationally whether or not a given point is such an average, exactly or approximately. In nonpositively curved spaces, several averaging notions, including the usual weighted barycenter, produce the same "mean set". In such spaces, at points where the tangent cone is a Euclidean space, the recognition problem reduces to Euclidean projection onto a polytope. Hadamard manifolds comprise one example. Another consists of CAT(0) cubical complexes, at relative-interior points: the recognition problem is harder for general points, but we present an efficient semidefinite-programming-based algorithm.
- [423] arXiv:2406.03924 (cross-list from stat.ML) [pdf, other]
-
Title: Statistical Multicriteria Benchmarking via the GSD-FrontAuthors: Christoph Jansen (1), Georg Schollmeyer (2), Julian Rodemann (2), Hannah Blocher (2), Thomas Augustin (2) ((1) Lancaster University Leipzig, (2) Ludwig-Maximilians-Universität München)Comments: CJ, GS,JR and HB equally contributed to this workSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Methodology (stat.ME)
Given the vast number of classifiers that have been (and continue to be) proposed, reliable methods for comparing them are becoming increasingly important. The desire for reliability is broken down into three main aspects: (1) Comparisons should allow for different quality metrics simultaneously. (2) Comparisons should take into account the statistical uncertainty induced by the choice of benchmark suite. (3) The robustness of the comparisons under small deviations in the underlying assumptions should be verifiable. To address (1), we propose to compare classifiers using a generalized stochastic dominance ordering (GSD) and present the GSD-front as an information-efficient alternative to the classical Pareto-front. For (2), we propose a consistent statistical estimator for the GSD-front and construct a statistical test for whether a (potentially new) classifier lies in the GSD-front of a set of state-of-the-art classifiers. For (3), we relax our proposed test using techniques from robust statistics and imprecise probabilities. We illustrate our concepts on the benchmark suite PMLB and on the platform OpenML.
- [424] arXiv:2406.03938 (cross-list from q-bio.PE) [pdf, other]
-
Title: Diversity in Evolutionary DynamicsSubjects: Populations and Evolution (q-bio.PE); Computational Engineering, Finance, and Science (cs.CE)
We consider the dynamics imposed by natural selection on the populations of two competing, sexually reproducing, haploid species. In this setting, the fitness of any genome varies over time due to the changing population mix of the competing species; crucially, this fitness variation arises naturally from the model itself, without the need for imposing it exogenously as is typically the case. Previous work on this model [14] showed that, in the special case where each of the two species exhibits just two phenotypes, genetic diversity is maintained at all times. This finding supported the tenet that sexual reproduction is advantageous because it promotes diversity, which increases the survivability of a species.
In the present paper we consider the more realistic case where there are more than two phenotypes available to each species. The conclusions about diversity in general turn out to be very different from the two-phenotype case.
Our first result is negative: namely, we show that sexual reproduction does not guarantee the maintenance of diversity at all times, i.e., the result of [14] does not generalize. Our counterexample consists of two competing species with just three phenotypes each. We show that, for any time~$t_0$ and any $\varepsilon>0$, there is a time $t\ge t_0$ at which the combined diversity of both species is smaller than~$\varepsilon$. Our main result is a complementary positive statement, which says that in any non-degenerate example, diversity is maintained in a weaker, ``infinitely often'' sense.
Thus, our results refute the supposition that sexual reproduction ensures diversity at all times, but affirm a weaker assertion that extended periods of high diversity are necessarily a recurrent event. - [425] arXiv:2406.03961 (cross-list from eess.IV) [pdf, ps, other]
-
Title: LDM-RSIC: Exploring Distortion Prior with Latent Diffusion Models for Remote Sensing Image CompressionSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
Deep learning-based image compression algorithms typically focus on designing encoding and decoding networks and improving the accuracy of entropy model estimation to enhance the rate-distortion (RD) performance. However, few algorithms leverage the compression distortion prior from existing compression algorithms to improve RD performance. In this paper, we propose a latent diffusion model-based remote sensing image compression (LDM-RSIC) method, which aims to enhance the final decoding quality of RS images by utilizing the generated distortion prior from a LDM. Our approach consists of two stages. In the first stage, a self-encoder learns prior from the high-quality input image. In the second stage, the prior is generated through an LDM, conditioned on the decoded image of an existing learning-based image compression algorithm, to be used as auxiliary information for generating the texture-rich enhanced image. To better utilize the prior, a channel attention and gate-based dynamic feature attention module (DFAM) is embedded into a Transformer-based multi-scale enhancement network (MEN) for image enhancement. Extensive experiments demonstrate the proposed LDM-RSIC significantly outperforms existing state-of-the-art traditional and learning-based image compression algorithms in terms of both subjective perception and objective metrics. Additionally, we use the LDM-based scheme to improve the traditional image compression algorithm JPEG2000 and obtain 32.00% bit savings on the DOTA testing set. The code will be available at https://github.com/mlkk518/LDM-RSIC.
- [426] arXiv:2406.03972 (cross-list from quant-ph) [pdf, ps, other]
-
Title: Eigenpath traversal by Poisson-distributed phase randomisationComments: 19 pagesSubjects: Quantum Physics (quant-ph); Data Structures and Algorithms (cs.DS)
We present a framework for quantum computation, similar to Adiabatic Quantum Computation (AQC), that is based on the quantum Zeno effect. By performing randomised dephasing operations at intervals determined by a Poisson process, we are able to track the eigenspace associated to a particular eigenvalue.
We derive a simple differential equation for the fidelity, leading to general theorems bounding the time complexity of a whole class of algorithms. We also use eigenstate filtering to optimise the scaling of the complexity in the error tolerance $\epsilon$.
In many cases the bounds given by our general theorems are optimal, giving a time complexity of $O(1/\Delta_m)$ with $\Delta_m$ the minimum of the gap. This allows us to prove optimal results using very general features of problems, minimising the problem-specific insight necessary.
As two applications of our framework, we obtain optimal scaling for the Grover problem (i.e.\ $O(\sqrt{N})$ where $N$ is the database size) and the Quantum Linear System Problem (i.e.\ $O(\kappa\log(1/\epsilon))$ where $\kappa$ is the condition number and $\epsilon$ the error tolerance) by direct applications of our theorems. - [427] arXiv:2406.04000 (cross-list from physics.optics) [pdf, other]
-
Title: Stochastic logic in biased coupled photonic probabilistic bitsAuthors: Michael Horodynski, Charles Roques-Carmes, Yannick Salamin, Seou Choi, Jamison Sloan, Di Luo, Marin SoljačićSubjects: Optics (physics.optics); Emerging Technologies (cs.ET)
Optical computing often employs tailor-made hardware to implement specific algorithms, trading generality for improved performance in key aspects like speed and power efficiency. An important computing approach that is still missing its corresponding optical hardware is probabilistic computing, used e.g. for solving difficult combinatorial optimization problems. In this study, we propose an experimentally viable photonic approach to solve arbitrary probabilistic computing problems. Our method relies on the insight that coherent Ising machines composed of coupled and biased optical parametric oscillators can emulate stochastic logic. We demonstrate the feasibility of our approach by using numerical simulations equivalent to the full density matrix formulation of coupled optical parametric oscillators.
- [428] arXiv:2406.04001 (cross-list from math.OC) [pdf, other]
-
Title: Benign Nonconvex Landscapes in Optimal and Robust Control, Part II: Extended Convex LiftingSubjects: Optimization and Control (math.OC); Systems and Control (eess.SY); Dynamical Systems (math.DS)
Many optimal and robust control problems are nonconvex and potentially nonsmooth in their policy optimization forms. In Part II of this paper, we introduce a new and unified Extended Convex Lifting (ECL) framework to reveal hidden convexity in classical optimal and robust control problems from a modern optimization perspective. Our ECL offers a bridge between nonconvex policy optimization and convex reformulations, enabling convex analysis for nonconvex problems. Despite non-convexity and non-smoothness, the existence of an ECL not only reveals that minimizing the original function is equivalent to a convex problem but also certifies a class of first-order non-degenerate stationary points to be globally optimal. Therefore, no spurious stationarity exists in the set of non-degenerate policies. This ECL framework can cover many benchmark control problems, including state feedback linear quadratic regulator (LQR), dynamic output feedback linear quadratic Gaussian (LQG) control, and $\mathcal{H}_\infty$ robust control. ECL can also handle a class of distributed control problems when the notion of quadratic invariance (QI) holds. We further show that all static stabilizing policies are non-degenerate for state feedback LQR and $\mathcal{H}_\infty$ control under standard assumptions. We believe that the new ECL framework may be of independent interest for analyzing nonconvex problems beyond control.
- [429] arXiv:2406.04004 (cross-list from quant-ph) [pdf, other]
-
Title: T-Count Optimizing Genetic Algorithm for Quantum State PreparationComments: To appear in IEEE QSW 2024 proceedingsSubjects: Quantum Physics (quant-ph); Neural and Evolutionary Computing (cs.NE)
Quantum state preparation is a crucial process within numerous quantum algorithms, and the need for efficient initialization of quantum registers is ever increasing as demand for useful quantum computing grows. The problem arises as the number of qubits to be initialized grows, the circuits required to implement the desired state also exponentially increase in size leading to loss of fidelity to noise. This is mainly due to the susceptibility to environmental effects of the non-Clifford T gate, whose use should thus be reduced as much as possible. In this paper, we present and utilize a genetic algorithm for state preparation circuits consisting of gates from the Clifford + T gate set and optimize them in T-Count as to reduce the impact of noise. Whilst the method presented here does not always produce the most accurate circuits in terms of fidelity, it can generate high-fidelity, non-trivial quantum states such as quantum Fourier transform states. In addition, our algorithm does automatically generate fault tolerantly implementable solutions where the number of the most error prone components is reduced. We present an evaluation of the algorithm when trialed against preparing random, Poisson probability distribution, W, GHZ, and quantum Fourier transform states. We also experimentally demonstrate the scalability issues as qubit count increases, which highlights the need for further optimization of the search process.
- [430] arXiv:2406.04012 (cross-list from stat.ML) [pdf, other]
-
Title: Variational inference, Mixture of Gaussians, Bayesian Machine LearningSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
Variational inference (VI) is a popular approach in Bayesian inference, that looks for the best approximation of the posterior distribution within a parametric family, minimizing a loss that is typically the (reverse) Kullback-Leibler (KL) divergence. Despite its empirical success, the theoretical properties of VI have only received attention recently, and mostly when the parametric family is the one of Gaussians. This work aims to contribute to the theoretical study of VI in the non-Gaussian case by investigating the setting of Mixture of Gaussians with fixed covariance and constant weights. In this view, VI over this specific family can be casted as the minimization of a Mollified relative entropy, i.e. the KL between the convolution (with respect to a Gaussian kernel) of an atomic measure supported on Diracs, and the target distribution. The support of the atomic measure corresponds to the localization of the Gaussian components. Hence, solving variational inference becomes equivalent to optimizing the positions of the Diracs (the particles), which can be done through gradient descent and takes the form of an interacting particle system. We study two sources of error of variational inference in this context when optimizing the mollified relative entropy. The first one is an optimization result, that is a descent lemma establishing that the algorithm decreases the objective at each iteration. The second one is an approximation error, that upper bounds the objective between an optimal finite mixture and the target distribution.
- [431] arXiv:2406.04034 (cross-list from math.CO) [pdf, ps, other]
-
Title: The geometry of intersecting codes and applications to additive combinatorics and factorization theoryComments: 31 pagesSubjects: Combinatorics (math.CO); Information Theory (cs.IT); Number Theory (math.NT)
Intersecting codes are linear codes where every two nonzero codewords have non-trivially intersecting support. In this article we expand on the theory of this family of codes, by showing that nondegenerate intersecting codes correspond to sets of points (with multiplicites) in a projective space that are not contained in two hyperplanes. This correspondence allows the use of geometric arguments to demonstrate properties and provide constructions of intersecting codes. We improve on existing bounds on their length and provide explicit constructions of short intersecting codes. Finally, generalizing a link between coding theory and the theory of the Davenport constant (a combinatorial invariant of finite abelian groups), we provide new asymptotic bounds on the weighted $2$-wise Davenport constant. These bounds then yield results on factorizations in rings of algebraic integers and related structures.
- [432] arXiv:2406.04047 (cross-list from stat.ML) [pdf, other]
-
Title: Slicing Mutual Information Generalization Bounds for Neural NetworksComments: Accepted at ICML 2024Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
The ability of machine learning (ML) algorithms to generalize well to unseen data has been studied through the lens of information theory, by bounding the generalization error with the input-output mutual information (MI), i.e., the MI between the training data and the learned hypothesis. Yet, these bounds have limited practicality for modern ML applications (e.g., deep learning), due to the difficulty of evaluating MI in high dimensions. Motivated by recent findings on the compressibility of neural networks, we consider algorithms that operate by slicing the parameter space, i.e., trained on random lower-dimensional subspaces. We introduce new, tighter information-theoretic generalization bounds tailored for such algorithms, demonstrating that slicing improves generalization. Our bounds offer significant computational and statistical advantages over standard MI bounds, as they rely on scalable alternative measures of dependence, i.e., disintegrated mutual information and $k$-sliced mutual information. Then, we extend our analysis to algorithms whose parameters do not need to exactly lie on random subspaces, by leveraging rate-distortion theory. This strategy yields generalization bounds that incorporate a distortion term measuring model compressibility under slicing, thereby tightening existing bounds without compromising performance or requiring model compression. Building on this, we propose a regularization scheme enabling practitioners to control generalization through compressibility. Finally, we empirically validate our results and achieve the computation of non-vacuous information-theoretic generalization bounds for neural networks, a task that was previously out of reach.
- [433] arXiv:2406.04071 (cross-list from stat.ML) [pdf, other]
-
Title: Dynamic angular synchronization under smoothness constraintsComments: 40 pages, 9 figuresSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST)
Given an undirected measurement graph $\mathcal{H} = ([n], \mathcal{E})$, the classical angular synchronization problem consists of recovering unknown angles $\theta_1^*,\dots,\theta_n^*$ from a collection of noisy pairwise measurements of the form $(\theta_i^* - \theta_j^*) \mod 2\pi$, for all $\{i,j\} \in \mathcal{E}$. This problem arises in a variety of applications, including computer vision, time synchronization of distributed networks, and ranking from pairwise comparisons. In this paper, we consider a dynamic version of this problem where the angles, and also the measurement graphs evolve over $T$ time points. Assuming a smoothness condition on the evolution of the latent angles, we derive three algorithms for joint estimation of the angles over all time points. Moreover, for one of the algorithms, we establish non-asymptotic recovery guarantees for the mean-squared error (MSE) under different statistical models. In particular, we show that the MSE converges to zero as $T$ increases under milder conditions than in the static setting. This includes the setting where the measurement graphs are highly sparse and disconnected, and also when the measurement noise is large and can potentially increase with $T$. We complement our theoretical results with experiments on synthetic data.
- [434] arXiv:2406.04098 (cross-list from stat.ML) [pdf, other]
-
Title: A Large-Scale Neutral Comparison Study of Survival Models on Low-Dimensional DataComments: 42 pages, 28 figuresSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
This work presents the first large-scale neutral benchmark experiment focused on single-event, right-censored, low-dimensional survival data. Benchmark experiments are essential in methodological research to scientifically compare new and existing model classes through proper empirical evaluation. Existing benchmarks in the survival literature are often narrow in scope, focusing, for example, on high-dimensional data. Additionally, they may lack appropriate tuning or evaluation procedures, or are qualitative reviews, rather than quantitative comparisons. This comprehensive study aims to fill the gap by neutrally evaluating a broad range of methods and providing generalizable conclusions. We benchmark 18 models, ranging from classical statistical approaches to many common machine learning methods, on 32 publicly available datasets. The benchmark tunes for both a discrimination measure and a proper scoring rule to assess performance in different settings. Evaluating on 8 survival metrics, we assess discrimination, calibration, and overall predictive performance of the tested models. Using discrimination measures, we find that no method significantly outperforms the Cox model. However, (tuned) Accelerated Failure Time models were able to achieve significantly better results with respect to overall predictive performance as measured by the right-censored log-likelihood. Machine learning methods that performed comparably well include Oblique Random Survival Forests under discrimination, and Cox-based likelihood-boosting under overall predictive performance. We conclude that for predictive purposes in the standard survival analysis setting of low-dimensional, right-censored data, the Cox Proportional Hazards model remains a simple and robust method, sufficient for practitioners.
- [435] arXiv:2406.04132 (cross-list from math.DS) [pdf, ps, other]
-
Title: Realizability of Subgroups by Subshifts of Finite TypeAuthors: Nicolás BitarComments: 26 pages, 2 figures. Comments welcomeSubjects: Dynamical Systems (math.DS); Discrete Mathematics (cs.DM); Group Theory (math.GR)
We study the problem of realizing families of subgroups as the set of stabilizers of configurations from a subshift of finite type (SFT). This problem generalizes both the existence of strongly and weakly aperiodic SFTs. We show that a finitely generated normal subgroup is realizable if and only if the quotient by the subgroup admits a strongly aperiodic SFT. We also show that if a subgroup is realizable, its subgroup membership problem must be decidable. The article also contains the introduction of periodically rigid groups, which are groups for which every weakly aperiodic subshift of finite type is strongly aperiodic. We conjecture that the only finitely generated periodically rigid groups are virtually $\mathbb{Z}$ groups and torsion-free virtually $\mathbb{Z}^2$ groups. Finally, we show virtually nilpotent and polycyclic groups satisfy the conjecture.
- [436] arXiv:2406.04142 (cross-list from math.OC) [pdf, other]
-
Title: Stochastic Polyak Step-sizes and Momentum: Convergence Guarantees and Practical PerformanceComments: 39 pages, 20 FiguresSubjects: Optimization and Control (math.OC); Machine Learning (cs.LG); Machine Learning (stat.ML)
Stochastic gradient descent with momentum, also known as Stochastic Heavy Ball method (SHB), is one of the most popular algorithms for solving large-scale stochastic optimization problems in various machine learning tasks. In practical scenarios, tuning the step-size and momentum parameters of the method is a prohibitively expensive and time-consuming process. In this work, inspired by the recent advantages of stochastic Polyak step-size in the performance of stochastic gradient descent (SGD), we propose and explore new Polyak-type variants suitable for the update rule of the SHB method. In particular, using the Iterate Moving Average (IMA) viewpoint of SHB, we propose and analyze three novel step-size selections: MomSPS$_{\max}$, MomDecSPS, and MomAdaSPS. For MomSPS$_{\max}$, we provide convergence guarantees for SHB to a neighborhood of the solution for convex and smooth problems (without assuming interpolation). If interpolation is also satisfied, then using MomSPS$_{\max}$, SHB converges to the true solution at a fast rate matching the deterministic HB. The other two variants, MomDecSPS and MomAdaSPS, are the first adaptive step-sizes for SHB that guarantee convergence to the exact minimizer without prior knowledge of the problem parameters and without assuming interpolation. The convergence analysis of SHB is tight and obtains the convergence guarantees of SGD with stochastic Polyak step-sizes as a special case. We supplement our analysis with experiments that validate the theory and demonstrate the effectiveness and robustness of the new algorithms.
- [437] arXiv:2406.04149 (cross-list from eess.IV) [pdf, ps, other]
-
Title: Characterizing segregation in blast rock piles a deep-learning approach leveraging aerial image analysisSubjects: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI)
Blasted rock material serves a critical role in various engineering applications, yet the phenomenon of segregation-where particle sizes vary significantly along the gradient of a quarry pile-presents challenges for optimizing quarry material storage and handling. This study introduces an advanced image analysis methodology to characterize such segregation of rock fragments. The accurate delineation of detailed rock fragment size distributions was achieved through the analysis of drone-captured imagery, coupled with the application of an enhanced Unet semantic segmentation model integrated with an expansion-based post-processing technique. The quarry slope was stratified into four vertical sections, with the size distribution of each section quantified via ellipsoid shape approximations. Our results disclose pronounced vertical segregation patterns, with finer particles concentrated in the upper slope regions and coarser particles in the lower. Utilizing relative characteristic diameters, we offered insight into the degree of segregation, thereby illustrating the spatial heterogeneity in fragment size more clearly. The techniques outlined in this study deliver a scalable and accurate method for assessing fragment size distribution, with the potential to better inform resource management and operational decisions in quarry management.
- [438] arXiv:2406.04163 (cross-list from math.OC) [pdf, ps, other]
-
Title: Essentially Sharp Estimates on the Entropy Regularization Error in Discrete Discounted Markov Decision ProcessesComments: 25 pages, 1 figureSubjects: Optimization and Control (math.OC); Machine Learning (cs.LG); Systems and Control (eess.SY)
We study the error introduced by entropy regularization of infinite-horizon discrete discounted Markov decision processes. We show that this error decreases exponentially in the inverse regularization strength both in a weighted KL-divergence and in value with a problem-specific exponent. We provide a lower bound matching our upper bound up to a polynomial factor. Our proof relies on the correspondence of the solutions of entropy-regularized Markov decision processes with gradient flows of the unregularized reward with respect to a Riemannian metric common in natural policy gradient methods. Further, this correspondence allows us to identify the limit of the gradient flow as the generalized maximum entropy optimal policy, thereby characterizing the implicit bias of the Kakade gradient flow which corresponds to a time-continuous version of the natural policy gradient method. We use this to show that for entropy-regularized natural policy gradient methods the overall error decays exponentially in the square root of the number of iterations improving existing sublinear guarantees.
- [439] arXiv:2406.04179 (cross-list from math.PR) [pdf, ps, other]
-
Title: On the zeros of partition functions with multi-spin interactionsAuthors: Alexander BarvinokComments: 16 pagesSubjects: Probability (math.PR); Data Structures and Algorithms (cs.DS); Mathematical Physics (math-ph); Combinatorics (math.CO)
Let $X_1, \ldots, X_n$ be probability spaces, let $X$ be their direct product, let $\phi_1, \ldots, \phi_m: X \longrightarrow {\Bbb C}$ be random variables, each depending only on a few coordinates of a point $x=(x_1, \ldots, x_n)$, and let $f=\phi_1 + \ldots + \phi_m$. The expectation $E\thinspace e^{\lambda f}$, where $\lambda \in {\Bbb C}$, appears in statistical physics as the partition function of a system with multi-spin interactions, and also in combinatorics and computer science, where it is known as the partition function of edge-coloring models, tensor network contractions or a Holant polynomial. Assuming that each $\phi_i$ is 1-Lipschitz in the Hamming metric of $X$, that each $\phi_i(x)$ depends on at most $r \geq 2$ coordinates $x_1, \ldots, x_n$ of $x \in X$, and that for each $j$ there are at most $c \geq 1$ functions $\phi_i$ that depend on the coordinate $x_j$, we prove that $E\thinspace e^{\lambda f} \ne 0$ provided $| \lambda | \leq \ (3 c \sqrt{r-1})^{-1}$ and that the bound is sharp up to a logarithmic in $r$ factor. As a corollary, the value of the expectation can be efficiently approximated, provided $\lambda$ lies in a slightly smaller disc.
- [440] arXiv:2406.04188 (cross-list from eess.SP) [pdf, other]
-
Title: Digital Twin Aided RIS Communication: Robust Beamforming and Interference ManagementComments: Dataset and code files will be available soon on the DeepMIMIO website: this https URLSubjects: Signal Processing (eess.SP); Information Theory (cs.IT)
Reconfigurable intelligent surfaces (RISs) are envisioned to play a key role in future wireless communication networks. However, channel estimation in RIS-aided wireless networks is challenging due to their passive nature and the large number of reflective elements, leading to high channel estimation overhead. Additionally, conventional methods like beam sweeping, which do not rely on explicit channel state information, often struggle in managing interference in multi-user networks. In this paper, we propose a novel approach that leverages digital twins (DTs) of the physical environments to approximate channels using electromagnetic 3D models and ray tracing, thus relaxing the need for channel estimation and extensive over-the-air computations in RIS-aided wireless networks. To address the digital twins channel approximation errors, we further refine this approach with a DT-specific robust transmission design that reliably meets minimum desired rates. The results show that our method secures these rates over 90% of the time, significantly outperforming beam sweeping, which achieves these rates less than 8% of the time due to its poor management of transmitting power and interference.
- [441] arXiv:2406.04203 (cross-list from math.PR) [pdf, other]
-
Title: Explicit Steady-State Approximations for Parallel Server Systems with Heterogeneous ServersSubjects: Probability (math.PR); Systems and Control (eess.SY); Optimization and Control (math.OC)
The weighted-workload-task-allocation (WWTA) load-balancing policy is known to be throughput optimal for parallel server systems with heterogeneous servers. This work concerns the heavy traffic approximation of steady-state performance for parallel server systems operating under WWTA policy. Under a relaxed complete-resource-pooling condition, we prove that WWTA achieves a "strong form" of state-space collapse in heavy traffic and that the scaled workload for each server converges in distribution to an exponential random variable, whose parameter is explicitly given by system primitives. Various steady-state performance measures are shown to be approximated from this exponential random variable. Instead of proving a stochastic process limit followed by an interchange of limits - a method that dominates the literature, our method works directly with a pre-limit basic adjoint relationship (BAR) that characterizes the stationary distribution of each pre-limit system.
- [442] arXiv:2406.04212 (cross-list from eess.AS) [pdf, ps, other]
-
Title: Sound Event Bounding BoxesComments: Accepted for publication at Interspeech 2024Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
Sound event detection is the task of recognizing sounds and determining their extent (onset/offset times) within an audio clip. Existing systems commonly predict sound presence confidence in short time frames. Then, thresholding produces binary frame-level presence decisions, with the extent of individual events determined by merging consecutive positive frames. In this paper, we show that frame-level thresholding degrades the prediction of the event extent by coupling it with the system's sound presence confidence. We propose to decouple the prediction of event extent and confidence by introducing SEBBs, which format each sound event prediction as a tuple of a class type, extent, and overall confidence. We also propose a change-detection-based algorithm to convert legacy frame-level outputs into SEBBs. We find the algorithm significantly improves the performance of DCASE 2023 Challenge systems, boosting the state of the art from .644 to .686 PSDS1.
- [443] arXiv:2406.04243 (cross-list from math.OC) [pdf, other]
-
Title: Policy Optimization in Control: Geometry and Algorithmic ImplicationsSubjects: Optimization and Control (math.OC); Systems and Control (eess.SY); Differential Geometry (math.DG)
This survey explores the geometric perspective on policy optimization within the realm of feedback control systems, emphasizing the intrinsic relationship between control design and optimization. By adopting a geometric viewpoint, we aim to provide a nuanced understanding of how various ``complete parameterization'' -- referring to the policy parameters together with its Riemannian geometry -- of control design problems, influence stability and performance of local search algorithms. The paper is structured to address key themes such as policy parameterization, the topology and geometry of stabilizing policies, and their implications for various (non-convex) dynamic performance measures. We focus on a few iconic control design problems, including the Linear Quadratic Regulator (LQR), Linear Quadratic Gaussian (LQG) control, and $\mathcal{H}_\infty$ control. In particular, we first discuss the topology and Riemannian geometry of stabilizing policies, distinguishing between their static and dynamic realizations. Expanding on this geometric perspective, we then explore structural properties of the aforementioned performance measures and their interplay with the geometry of stabilizing policies in presence of policy constraints; along the way, we address issues such as spurious stationary points, symmetries of dynamic feedback policies, and (non-)smoothness of the corresponding performance measures. We conclude the survey with algorithmic implications of policy optimization in feedback design.
- [444] arXiv:2406.04245 (cross-list from quant-ph) [pdf, ps, other]
-
Title: Online learning of a panoply of quantum objectsComments: 34 pages. Comments welcomeSubjects: Quantum Physics (quant-ph); Machine Learning (cs.LG)
In many quantum tasks, there is an unknown quantum object that one wishes to learn. An online strategy for this task involves adaptively refining a hypothesis to reproduce such an object or its measurement statistics. A common evaluation metric for such a strategy is its regret, or roughly the accumulated errors in hypothesis statistics. We prove a sublinear regret bound for learning over general subsets of positive semidefinite matrices via the regularized-follow-the-leader algorithm and apply it to various settings where one wishes to learn quantum objects. For concrete applications, we present a sublinear regret bound for learning quantum states, effects, channels, interactive measurements, strategies, co-strategies, and the collection of inner products of pure states. Our bound applies to many other quantum objects with compact, convex representations. In proving our regret bound, we establish various matrix analysis results useful in quantum information theory. This includes a generalization of Pinsker's inequality for arbitrary positive semidefinite operators with possibly different traces, which may be of independent interest and applicable to more general classes of divergences.
- [445] arXiv:2406.04250 (cross-list from quant-ph) [pdf, other]
-
Title: Online learning of quantum processesComments: 14 + 72 pages, 6 figuresSubjects: Quantum Physics (quant-ph); Machine Learning (cs.LG); Machine Learning (stat.ML)
Among recent insights into learning quantum states, online learning and shadow tomography procedures are notable for their ability to accurately predict expectation values even of adaptively chosen observables. In contrast to the state case, quantum process learning tasks with a similarly adaptive nature have received little attention. In this work, we investigate online learning tasks for quantum processes. Whereas online learning is infeasible for general quantum channels, we show that channels of bounded gate complexity as well as Pauli channels can be online learned in the regret and mistake-bounded models of online learning. In fact, we can online learn probabilistic mixtures of any exponentially large set of known channels. We also provide a provably sample-efficient shadow tomography procedure for Pauli channels. Our results extend beyond quantum channels to non-Markovian multi-time processes, with favorable regret and mistake bounds, as well as a shadow tomography procedure. We complement our online learning upper bounds with mistake as well as computational lower bounds. On the technical side, we make use of the multiplicative weights update algorithm, classical adaptive data analysis, and Bell sampling, as well as tools from the theory of quantum combs for multi-time quantum processes. Our work initiates a study of online learning for classes of quantum channels and, more generally, non-Markovian quantum processes. Given the importance of online learning for state shadow tomography, this may serve as a step towards quantum channel variants of adaptive shadow tomography.
- [446] arXiv:2406.04259 (cross-list from math.AT) [pdf, other]
-
Title: Topological Stability and Latschev-type Reconstruction Theorems for $\boldsymbol{\mathrm{CAT}(κ)}$ SpacesSubjects: Algebraic Topology (math.AT); Computational Geometry (cs.CG); Metric Geometry (math.MG)
We consider the problem of homotopy-type reconstruction of compact shapes $X\subset\mathbb{R}^N$ that are $\mathrm{CAT}(\kappa)$ in the intrinsic length metric. The reconstructed spaces are in the form of Vietoris--Rips complexes computed from a compact sample $S$, Hausdorff--close to the unknown shape $X$. Instead of the Euclidean metric on the sample, our reconstruction technique leverages a path-based metric to compute these complexes. As naturally emerging in the framework of reconstruction, we also study the Gromov--Hausdorff topological stability and finiteness problem for general compact $\mathrm{CAT}(\kappa)$ spaces. Our techniques provide novel sampling conditions alternative to the existing and commonly used techniques using weak feature size and $\mu$--reach. In particular, we introduce a new parameter, called the {\em restricted distortion}, which is a generalization of the well-known global distortion of embedding. We show examples of Euclidean subspaces, for which the known parameters such as the reach, $\mu$--reach and weak features size vanish, whereas the restricted distortion is finite, making our reconstruction results applicable for such spaces.
- [447] arXiv:2406.04269 (cross-list from eess.AS) [pdf, other]
-
Title: Beyond Performance Plateaus: A Comprehensive Study on Scalability in Speech EnhancementComments: 5 pages, 3 figures, 4 tables, Accepted by Interspeech 2024Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
Deep learning-based speech enhancement (SE) models have achieved impressive performance in the past decade. Numerous advanced architectures have been designed to deliver state-of-the-art performance; however, their scalability potential remains unrevealed. Meanwhile, the majority of research focuses on small-sized datasets with restricted diversity, leading to a plateau in performance improvement. In this paper, we aim to provide new insights for addressing the above issues by exploring the scalability of SE models in terms of architectures, model sizes, compute budgets, and dataset sizes. Our investigation involves several popular SE architectures and speech data from different domains. Experiments reveal both similarities and distinctions between the scaling effects in SE and other tasks such as speech recognition. These findings further provide insights into the under-explored SE directions, e.g., larger-scale multi-domain corpora and efficiently scalable architectures.
- [448] arXiv:2406.04282 (cross-list from eess.SP) [pdf, other]
-
Title: A Statistical Characterization of Wireless Channels Conditioned on Side InformationSubjects: Signal Processing (eess.SP); Information Theory (cs.IT)
Statistical prior channel knowledge, such as the wide-sense-stationary-uncorrelated-scattering (WSSUS) property, and additional side information both can be used to enhance physical layer applications in wireless communication. Generally, the wireless channel's strongly fluctuating path phases and WSSUS property characterize the channel by a zero mean and Toeplitz-structured covariance matrices in different domains. In this work, we derive a framework to comprehensively categorize side information based on whether it preserves or abandons these statistical features conditioned on the given side information. To accomplish this, we combine insights from a generic channel model with the representation of wireless channels as probabilistic graphs. Additionally, we exemplify several applications, ranging from channel modeling to estimation and clustering, which demonstrate how the proposed framework can practically enhance physical layer methods utilizing machine learning (ML).
Replacements for Fri, 7 Jun 24
- [449] arXiv:1708.09157 (replaced) [pdf, other]
-
Title: Cross-lingual, Character-Level Neural Morphological TaggingComments: Published as a conference paper at EMNLP 2017; Fixed minor typos and cleaned up formattingSubjects: Computation and Language (cs.CL)
- [450] arXiv:1912.12095 (replaced) [pdf, other]
-
Title: One Point, One Object: Simultaneous 3D Object Segmentation and 6-DOF Pose EstimationAuthors: Hongsen LiuSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [451] arXiv:2008.05195 (replaced) [pdf, other]
-
Title: Competitive Demand Learning: A Non-cooperative Pricing Algorithm with Coordinated Price ExperimentationJournal-ref: Production and Operations Management 2024. Vol. 33(1)Subjects: Computer Science and Game Theory (cs.GT)
- [452] arXiv:2009.04553 (replaced) [pdf, other]
-
Title: Threshold rates for properties of random codesComments: November 2021 versionSubjects: Information Theory (cs.IT); Discrete Mathematics (cs.DM); Combinatorics (math.CO)
- [453] arXiv:2106.03354 (replaced) [pdf, other]
-
Title: AI without networksComments: 47 pages with 8 figures + 33 pages supplementary with 7 figures and one table (total 80 pages)Subjects: Machine Learning (cs.LG); Statistical Mechanics (cond-mat.stat-mech); Functional Analysis (math.FA); Machine Learning (stat.ML)
- [454] arXiv:2109.11725 (replaced) [pdf, other]
-
Title: Punctured Low-Bias Codes Behave Like Random Linear CodesSubjects: Computational Complexity (cs.CC); Information Theory (cs.IT); Combinatorics (math.CO)
- [455] arXiv:2112.14734 (replaced) [pdf, other]
-
Title: Sequential memory improves sample and memory efficiency in Episodic ControlComments: 21 pages, 8 figuresSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO); Neurons and Cognition (q-bio.NC)
- [456] arXiv:2203.00387 (replaced) [pdf, other]
-
Title: Motion-aware Dynamic Graph Neural Network for Video Compressive SensingSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [457] arXiv:2203.12082 (replaced) [pdf, other]
-
Title: PlaneMVS: 3D Plane Reconstruction from Multi-View StereoComments: CVPR 2022; source code: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [458] arXiv:2205.08628 (replaced) [pdf, ps, other]
-
Title: Mechanized Analysis of Anselm's Modal Ontological ArgumentAuthors: John RushbyComments: This version includes a new postscript that considers alternative premises due to Andrzej Bilat (April 2021)Journal-ref: International Journal for Philosophy of Religion, vol. 89, pp. 135-152, April 2021Subjects: Logic in Computer Science (cs.LO)
- [459] arXiv:2205.10192 (replaced) [pdf, other]
-
Title: On the Trade-off between Redundancy and Local Coherence in SummarizationComments: Accepted to JAIRJournal-ref: Journal of Artificial Intelligence Research, 80, 273-326 (2024)Subjects: Computation and Language (cs.CL)
- [460] arXiv:2206.06821 (replaced) [pdf, other]
-
Title: DoWhy-GCM: An extension of DoWhy for causal inference in graphical causal modelsJournal-ref: Journal of Machine Learning Research 25(147), 2024Subjects: Methodology (stat.ME); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
- [461] arXiv:2206.07438 (replaced) [pdf, other]
-
Title: Multi-Objective Hyperparameter Optimization in Machine Learning -- An OverviewAuthors: Florian Karl, Tobias Pielok, Julia Moosbauer, Florian Pfisterer, Stefan Coors, Martin Binder, Lennart Schneider, Janek Thomas, Jakob Richter, Michel Lang, Eduardo C. Garrido-Merchán, Juergen Branke, Bernd BischlComments: Published at ACM TELOJournal-ref: ACM Transactions on Evolutionary Learning and Optimization 3.4 (2023): 1-50Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
- [462] arXiv:2206.08465 (replaced) [pdf, other]
-
Title: Variational Estimators of the Degree-corrected Latent Block Model for Bipartite NetworksJournal-ref: Journal of Machine Learning Research 25 (2024) 1-42Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
- [463] arXiv:2207.12264 (replaced) [pdf, ps, other]
-
Title: Dynamics and triggers of misinformation on vaccinesSubjects: Physics and Society (physics.soc-ph); Computers and Society (cs.CY); Machine Learning (cs.LG); Social and Information Networks (cs.SI)
- [464] arXiv:2208.10790 (replaced) [pdf, other]
-
Title: Event-Triggered Time-Varying Bayesian OptimizationSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
- [465] arXiv:2209.00936 (replaced) [pdf, other]
-
Title: A Class-Aware Representation Refinement Framework for Graph ClassificationSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
- [466] arXiv:2210.04288 (replaced) [pdf, other]
-
Title: CoopHash: Cooperative Learning of Multipurpose Descriptor and Contrastive Pair Generator via Variational MCMC Teaching for Supervised Image HashingSubjects: Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR); Machine Learning (cs.LG)
- [467] arXiv:2210.17180 (replaced) [pdf, other]
-
Title: Automated Dominative Subspace Mining for Efficient Neural Architecture SearchAuthors: Yaofo Chen, Yong Guo, Daihai Liao, Fanbing Lv, Hengjie Song, James Tin-Yau Kwok, Mingkui TanComments: Published in IEEE TCSVTSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [468] arXiv:2212.01976 (replaced) [pdf, other]
-
Title: FedCC: Robust Federated Learning against Model Poisoning AttacksSubjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
- [469] arXiv:2212.02459 (replaced) [pdf, ps, other]
-
Title: Resilient Distributed Optimization for Multi-Agent Cyberphysical SystemsSubjects: Robotics (cs.RO); Signal Processing (eess.SP); Systems and Control (eess.SY)
- [470] arXiv:2212.10192 (replaced) [pdf, other]
-
Title: Adam: Dense Retrieval Distillation with Adaptive Dark ExamplesComments: 13 pages, 3 figuresSubjects: Computation and Language (cs.CL)
- [471] arXiv:2212.13462 (replaced) [pdf, other]
-
Title: MVTN: Learning Multi-View Transformations for 3D UnderstandingComments: under review journal extension for the ICCV 2021 paper arXiv:2011.13244Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
- [472] arXiv:2301.02428 (replaced) [pdf, other]
-
Title: Sensitivity analysis using Physics-informed neural networksAuthors: John M. Hanna, José V. Aguado, Sebastien Comas-Cardona, Ramzi Askri, Domenico BorzacchielloComments: 22 pages, 11 figuresSubjects: Numerical Analysis (math.NA)
- [473] arXiv:2301.06335 (replaced) [pdf, ps, other]
-
Title: Approximating the closest structured singular matrix polynomialComments: 28 pagesSubjects: Numerical Analysis (math.NA)
- [474] arXiv:2301.08146 (replaced) [pdf, other]
-
Title: What's happening in your neighborhood? A Weakly Supervised Approach to Detect Local NewsComments: 8 pages, 2 figures, 5 tablesSubjects: Information Retrieval (cs.IR); Computation and Language (cs.CL); Machine Learning (cs.LG)
- [475] arXiv:2302.01713 (replaced) [pdf, other]
-
Title: Towards Avoiding the Data Mess: Industry Insights from Data Mesh ImplementationsSubjects: Artificial Intelligence (cs.AI)
- [476] arXiv:2302.02785 (replaced) [pdf, other]
-
Title: An intelligent tutor for planning in large partially observable environmentsSubjects: Artificial Intelligence (cs.AI)
- [477] arXiv:2302.05372 (replaced) [pdf, ps, other]
-
Title: Towards Minimax Optimality of Model-based Robust Reinforcement LearningSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
- [478] arXiv:2302.08053 (replaced) [pdf, ps, other]
-
Title: Selective Noise Suppression Methods Using Random SVPWM to Shape the Noise Spectrum of PMSMsAuthors: Jian Wen (1 and 2), Xiaobin Cheng (1 and 2), Peifeng Ji (1), Jun Yang (1 and 2), Feng Zhao (3) ((1) Institute of Acoustics, Chinese Academy of Sciences, (2) University of Chinese Academy of Sciences, (3) Institute of Electrical Engineering, Chinese Academy of Sciences)Comments: 8 pages, 15 figuresSubjects: Systems and Control (eess.SY); Signal Processing (eess.SP)
- [479] arXiv:2302.12476 (replaced) [pdf, ps, other]
-
Title: Asymptotic behaviour of the semidiscrete FE approximations to weakly damped wave equations with minimal smoothness on initial dataComments: 28 pages, 18 figures, 5 tablesSubjects: Numerical Analysis (math.NA)
- [480] arXiv:2303.00368 (replaced) [pdf, ps, other]
-
Title: Sufficient conditions for the surjectivity of radical curve parametrizationsComments: 18 pages, no figuresJournal-ref: Journal of Algebra, Volume 640, 2024, Pages 129-146, ISSN 0021-8693Subjects: Algebraic Geometry (math.AG); Symbolic Computation (cs.SC)
- [481] arXiv:2303.07139 (replaced) [pdf, other]
-
Title: Comparing statistical and machine learning methods for time series forecasting in data-driven logistics -- A simulation studySubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
- [482] arXiv:2304.07889 (replaced) [pdf, other]
-
Title: Ontology for Healthcare Artificial Intelligence Privacy in BrazilSubjects: Artificial Intelligence (cs.AI)
- [483] arXiv:2304.08650 (replaced) [pdf, other]
-
Title: UAV-based Maritime Communications: Relaying to Enhance the Link QualityAuthors: Abdullah Taha Çağan, Görkem Berkay Koç, Handan Yakın, Berk Çiloğlu, Muhammad Zeeshan Ashgar, Özgün Ersoy, Jyri Hämäläinen, Metin ÖztürkSubjects: Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)
- [484] arXiv:2304.14545 (replaced) [pdf, other]
-
Title: Augmented balancing weights as linear regressionSubjects: Methodology (stat.ME); Machine Learning (cs.LG); Econometrics (econ.EM); Machine Learning (stat.ML)
- [485] arXiv:2305.11915 (replaced) [pdf, other]
-
Title: PINNs error estimates for nonlinear equations in $\mathbb{R}$-smooth Banach spacesComments: 30 pages, 9 figuresSubjects: Functional Analysis (math.FA); Machine Learning (cs.LG); Numerical Analysis (math.NA)
- [486] arXiv:2305.12659 (replaced) [pdf, other]
-
Title: UVOSAM: A Mask-free Paradigm for Unsupervised Video Object Segmentation via Segment Anything ModelSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [487] arXiv:2305.12798 (replaced) [pdf, other]
-
Title: Word Embeddings Are Steers for Language ModelsAuthors: Chi Han, Jialiang Xu, Manling Li, Yi Fung, Chenkai Sun, Nan Jiang, Tarek Abdelzaher, Heng JiComments: ACL 2024 Long Paper, 9 pages, 3 figuresSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
- [488] arXiv:2305.14109 (replaced) [pdf, other]
-
Title: Combining Multi-Objective Bayesian Optimization with Reinforcement Learning for TinyMLComments: 14 pages, 9 figuresSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
- [489] arXiv:2305.14592 (replaced) [pdf, other]
-
Title: Meta-Tuning LLMs to Leverage Lexical Knowledge for Generalizable Language Style UnderstandingComments: Accepted to ACL 2024 main conferenceSubjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
- [490] arXiv:2305.15577 (replaced) [pdf, other]
-
Title: Minimizing $f$-Divergences by Interpolating Velocity FieldsComments: This manuscript is an extended version of the ICML2024 version. The code for reproducing our results can be found at this https URLSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
- [491] arXiv:2305.16209 (replaced) [pdf, other]
-
Title: C-MCTS: Safe Planning with Monte Carlo Tree SearchSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
- [492] arXiv:2305.17139 (replaced) [pdf, other]
-
Title: A Measure-Theoretic Axiomatisation of CausalitySubjects: Artificial Intelligence (cs.AI); Statistics Theory (math.ST)
- [493] arXiv:2305.17834 (replaced) [pdf, other]
-
Title: Streaming Audio Transformers for Online Audio TaggingComments: Interspeech2024Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
- [494] arXiv:2306.01376 (replaced) [pdf, other]
-
Title: DSHGT: Dual-Supervisors Heterogeneous Graph Transformer -- A pioneer study of using heterogeneous graph learning for detecting software vulnerabilitiesSubjects: Software Engineering (cs.SE); Machine Learning (cs.LG)
- [495] arXiv:2306.03061 (replaced) [pdf, other]
-
Title: Structured Voronoi SamplingComments: Accepted at NeurIPS 2023Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
- [496] arXiv:2306.04815 (replaced) [pdf, other]
-
Title: Catapults in SGD: spikes in the training loss and their impact on generalization through feature learningComments: ICML 2024Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
- [497] arXiv:2306.05001 (replaced) [pdf, other]
-
Title: COURIER: Contrastive User Intention Reconstruction for Large-Scale Visual RecommendationAuthors: Jia-Qi Yang, Chenglei Dai, Dan OU, Dongshuai Li, Ju Huang, De-Chuan Zhan, Xiaoyi Zeng, Yang YangSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [498] arXiv:2306.06209 (replaced) [pdf, other]
-
Title: Backdoor Attack with Sparse and Invisible TriggerComments: This paper was accepted by IEEE Transactions on Information Forensics and Security (TIFS). The first two authors contributed equally to this work. 14 pagesSubjects: Computer Vision and Pattern Recognition (cs.CV); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
- [499] arXiv:2306.06844 (replaced) [pdf, other]
-
Title: Provably Efficient Bayesian Optimization with Unknown Gaussian Process Hyperparameter EstimationComments: 25 pages, 5 figuresSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
- [500] arXiv:2306.07550 (replaced) [pdf, ps, other]
-
Title: Nested Sequents for Intermediate Logics: The Case of Gödel-Dummett LogicsAuthors: Tim S. LyonSubjects: Logic in Computer Science (cs.LO); Logic (math.LO)
- [501] arXiv:2306.08141 (replaced) [pdf, other]
-
Title: ArtWhisperer: A Dataset for Characterizing Human-AI Interactions in Artistic CreationsComments: 31 pages, 27 figures, ICML 2024Subjects: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
- [502] arXiv:2306.09381 (replaced) [pdf, other]
-
Title: Spatiotemporal-Augmented Graph Neural Networks for Human Mobility SimulationSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
- [503] arXiv:2306.09782 (replaced) [pdf, other]
-
Title: Full Parameter Fine-tuning for Large Language Models with Limited ResourcesComments: ACL 2024Subjects: Computation and Language (cs.CL)
- [504] arXiv:2306.13493 (replaced) [pdf, other]
-
Title: Smoothed Circulant Embedding with Applications to Multilevel Monte Carlo Methods for PDEs with Random CoefficientsComments: 36 pages, 11 figures, submitted to IMA Journal of Numerical AnalysisSubjects: Numerical Analysis (math.NA)
- [505] arXiv:2306.14075 (replaced) [pdf, ps, other]
-
Title: Join Size Bounds using Lp-Norms on Degree SequencesSubjects: Databases (cs.DB); Information Theory (cs.IT)
- [506] arXiv:2306.17193 (replaced) [pdf, other]
-
Title: Uncovering the Limits of Machine Learning for Automatic Vulnerability DetectionSubjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)
- [507] arXiv:2307.02818 (replaced) [pdf, other]
-
Title: Degree Heterogeneity in Higher-Order Networks: Inference in the Hypergraph $\boldsymbolβ$-ModelSubjects: Statistics Theory (math.ST); Information Theory (cs.IT); Social and Information Networks (cs.SI); Machine Learning (stat.ML)
- [508] arXiv:2307.05141 (replaced) [pdf, other]
-
Title: Deep Probabilistic Movement Primitives with a Bayesian AggregatorSubjects: Robotics (cs.RO); Machine Learning (cs.LG)
- [509] arXiv:2307.15593 (replaced) [pdf, other]
-
Title: Robust Distortion-free Watermarks for Language ModelsComments: reformatting of camera-ready version accepted to TMLR, with minor edits to introductionSubjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Cryptography and Security (cs.CR)
- [510] arXiv:2307.16422 (replaced) [pdf, other]
-
Title: Statistically Optimal Generative Modeling with Maximum Deviation from the Empirical DistributionComments: ICML 2024Subjects: Statistics Theory (math.ST); Machine Learning (cs.LG); Machine Learning (stat.ML)
- [511] arXiv:2308.06020 (replaced) [pdf, other]
-
Title: A direct sampling method based on the Green's function for time-dependent inverse scattering problemsComments: 18 pages, 12 figures, 2 tablesSubjects: Numerical Analysis (math.NA); Mathematical Physics (math-ph)
- [512] arXiv:2308.07876 (replaced) [pdf, other]
-
Title: Leveraging Codebook Knowledge with NLI and ChatGPT for Zero-Shot Political Relation ClassificationAuthors: Yibo Hu, Erick Skorupa Parolin, Latifur Khan, Patrick T. Brandt, Javier Osorio, Vito J. D'OrazioComments: ACL 2024Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Machine Learning (cs.LG)
- [513] arXiv:2308.08841 (replaced) [pdf, other]
-
Title: Machine Learning-Assisted Discovery of Flow Reactor DesignsAuthors: Tom Savage, Nausheen Basha, Jonathan McDonough, James Krassowski, Omar K Matar, Ehecatl Antonio del Rio ChanonaComments: 11 pages, 9 figures, as accepted Nature Chemical EngineeringSubjects: Computational Engineering, Finance, and Science (cs.CE); Machine Learning (cs.LG)
- [514] arXiv:2308.08858 (replaced) [pdf, ps, other]
-
Title: Improving Sample Efficiency of Model-Free Algorithms for Zero-Sum Markov GamesSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Science and Game Theory (cs.GT); Machine Learning (stat.ML)
- [515] arXiv:2308.12568 (replaced) [pdf, other]
-
Title: A Small and Fast BERT for Chinese Medical Punctuation RestorationComments: 5 pages, 2 figures, Accepted by INTERSPEECH 2024Subjects: Computation and Language (cs.CL)
- [516] arXiv:2308.14915 (replaced) [pdf, other]
-
Title: Information-driven Affordance Discovery for Efficient Robotic ManipulationSubjects: Robotics (cs.RO)
- [517] arXiv:2309.00169 (replaced) [pdf, other]
-
Title: RepCodec: A Speech Representation Codec for Speech TokenizationSubjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
- [518] arXiv:2309.00610 (replaced) [pdf, other]
-
Title: CityDreamer: Compositional Generative Model of Unbounded 3D CitiesComments: CVPR 2024. Project page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [519] arXiv:2309.06054 (replaced) [pdf, other]
-
Title: Breaking through the learning plateaus of in-context learning in TransformerSubjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
- [520] arXiv:2309.07287 (replaced) [pdf, other]
-
Title: Enhancing Child Vocalization Classification with Phonetically-Tuned Embeddings for Assisting Autism DiagnosisComments: Accepted to Interspeech 2024Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
- [521] arXiv:2309.08047 (replaced) [pdf, other]
-
Title: Bias in News Summarization: Measures, Pitfalls and CorporaComments: Findings of ACL 24 Camera ReadySubjects: Computation and Language (cs.CL)
- [522] arXiv:2309.08511 (replaced) [pdf, other]
-
Title: Generalised Diffusion Probabilistic Scale-SpacesAuthors: Pascal PeterSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [523] arXiv:2309.09524 (replaced) [pdf, other]
-
Title: Improved Factorized Neural Transducer Model For text-only Domain AdaptationComments: Interspeech 2024 camerareadySubjects: Computation and Language (cs.CL)
- [524] arXiv:2309.09552 (replaced) [pdf, other]
-
Title: A Multitask Training Approach to Enhance Whisper with Contextual Biasing and Open-Vocabulary Keyword SpottingAuthors: Yuang Li, Min Zhang, Chang Su, Yinglu Li, Xiaosong Qiao, Mengxin Ren, Miaomiao Ma, Daimeng Wei, Shimin Tao, Hao YangComments: 5 pages, 2 figures, Accepted to InterSpeech 2024Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
- [525] arXiv:2309.09836 (replaced) [pdf, other]
-
Title: RECAP: Retrieval-Augmented Audio CaptioningComments: ICASSP 2024. Code and data: this https URLSubjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Sound (cs.SD)
- [526] arXiv:2309.10740 (replaced) [pdf, other]
-
Title: ConsistencyTTA: Accelerating Diffusion-Based Text-to-Audio Generation with Consistency DistillationSubjects: Sound (cs.SD); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
- [527] arXiv:2309.11361 (replaced) [pdf, other]
-
Title: Knowledge Graph Question Answering for Materials Science (KGQA4MAT): Developing Natural Language Interface for Metal-Organic Frameworks Knowledge Graph (MOF-KG) Using LLMAuthors: Yuan An, Jane Greenberg, Alex Kalinowski, Xintong Zhao, Xiaohua Hu, Fernando J. Uribe-Romo, Kyle Langlois, Jacob Furst, Diego A. Gómez-GualdrónComments: In 17th International Conference on Metadata and Semantics Research, October 2023Subjects: Artificial Intelligence (cs.AI)
- [528] arXiv:2309.15402 (replaced) [pdf, other]
-
Title: Navigate through Enigmatic Labyrinth A Survey of Chain of Thought Reasoning: Advances, Frontiers and FutureAuthors: Zheng Chu, Jingchang Chen, Qianglong Chen, Weijiang Yu, Tao He, Haotian Wang, Weihua Peng, Ming Liu, Bing Qin, Ting LiuComments: Accepted to ACL 2024Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
- [529] arXiv:2309.16002 (replaced) [pdf, other]
-
Title: Robust Blockwise Random Pivoting: Fast and Accurate Adaptive Interpolative DecompositionSubjects: Numerical Analysis (math.NA)
- [530] arXiv:2309.17419 (replaced) [pdf, other]
-
Title: Enumerating minimal solution sets for metric graph problemsComments: 26 pages, 4 figuresSubjects: Discrete Mathematics (cs.DM); Data Structures and Algorithms (cs.DS); Combinatorics (math.CO)
- [531] arXiv:2310.00160 (replaced) [pdf, other]
-
Title: Self-Specialization: Uncovering Latent Expertise within Large Language ModelsAuthors: Junmo Kang, Hongyin Luo, Yada Zhu, Jacob Hansen, James Glass, David Cox, Alan Ritter, Rogerio Feris, Leonid KarlinskyComments: ACL 2024 (Findings; Long Paper)Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
- [532] arXiv:2310.00165 (replaced) [pdf, other]
-
Title: SCoRe: Submodular Combinatorial Representation LearningComments: Accepted to ICML 2024Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
- [533] arXiv:2310.00530 (replaced) [pdf, ps, other]
-
Title: Multi-tiling Neural Radiance Field (NeRF) -- Geometric Assessment on Large-scale Aerial DatasetsComments: 9 FigureJournal-ref: The Photogrammetric Record, 2024Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [534] arXiv:2310.02442 (replaced) [pdf, other]
-
Title: GenCO: Generating Diverse Designs with Combinatorial ConstraintsComments: Accepted to ICML 2024Subjects: Machine Learning (cs.LG)
- [535] arXiv:2310.02721 (replaced) [pdf, other]
-
Title: Leveraging Temporal Graph Networks Using Module DecouplingSubjects: Machine Learning (cs.LG)
- [536] arXiv:2310.03309 (replaced) [pdf, other]
-
Title: Concise and Organized Perception Facilitates Reasoning in Large Language ModelsComments: 26 pagesSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
- [537] arXiv:2310.03938 (replaced) [pdf, other]
-
Title: EFFUSE: Efficient Self-Supervised Feature Fusion for E2E ASR in Low Resource and Multilingual ScenariosComments: 5 pages, 2 figures, 3 tablesSubjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
- [538] arXiv:2310.04022 (replaced) [pdf, other]
-
Title: Nonlinear Methods for Shape Optimization Problems in Liquid Crystal TactoidsSubjects: Numerical Analysis (math.NA)
- [539] arXiv:2310.04400 (replaced) [pdf, other]
-
Title: On the Embedding Collapse when Scaling up Recommendation ModelsComments: ICML 2024 AcceptedSubjects: Machine Learning (cs.LG); Information Retrieval (cs.IR)
- [540] arXiv:2310.04406 (replaced) [pdf, other]
-
Title: Language Agent Tree Search Unifies Reasoning Acting and Planning in Language ModelsComments: Code at this https URLSubjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [541] arXiv:2310.04764 (replaced) [pdf, other]
-
Title: Characterizations of Monadic Second Order Definable Context-Free Sets of GraphsSubjects: Formal Languages and Automata Theory (cs.FL); Logic in Computer Science (cs.LO)
- [542] arXiv:2310.05141 (replaced) [pdf, other]
-
Title: Transferable Availability Poisoning AttacksSubjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)
- [543] arXiv:2310.06430 (replaced) [pdf, other]
-
Title: Conformal Prediction for Deep Classifier via Label RankingComments: Accepted by ICML 2024Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Statistics Theory (math.ST)
- [544] arXiv:2310.07579 (replaced) [pdf, other]
-
Title: In-Context Unlearning: Language Models as Few Shot UnlearnersComments: Accepted at ICML 2024Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)
- [545] arXiv:2310.09639 (replaced) [pdf, other]
-
Title: DPZero: Private Fine-Tuning of Language Models without BackpropagationComments: ICML 2024Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Optimization and Control (math.OC); Machine Learning (stat.ML)
- [546] arXiv:2310.10195 (replaced) [pdf, other]
-
Title: AdaLomo: Low-memory Optimization with Adaptive Learning RateComments: ACL 2024 camera ready versionSubjects: Machine Learning (cs.LG); Computation and Language (cs.CL)
- [547] arXiv:2310.11897 (replaced) [pdf, other]
-
Title: Accelerated Policy Gradient: On the Convergence Rates of the Nesterov Momentum for Reinforcement LearningComments: 69 pages, 17 figuresSubjects: Machine Learning (cs.LG)
- [548] arXiv:2310.12419 (replaced) [pdf, other]
-
Title: Toward Unbiased Multiple-Target Fuzzing with Path DiversitySubjects: Cryptography and Security (cs.CR)
- [549] arXiv:2310.12956 (replaced) [pdf, other]
-
Title: Eureka-Moments in Transformers: Multi-Step Tasks Reveal Softmax Induced Optimization ProblemsAuthors: David T. Hoffmann, Simon Schrodi, Jelena Bratulić, Nadine Behrmann, Volker Fischer, Thomas BroxComments: Accepted at ICML 2024Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
- [550] arXiv:2310.13571 (replaced) [pdf, ps, other]
-
Title: Why Can Large Language Models Generate Correct Chain-of-Thoughts?Subjects: Computation and Language (cs.CL)
- [551] arXiv:2310.13585 (replaced) [pdf, other]
-
Title: POTLoc: Pseudo-Label Oriented Transformer for Point-Supervised Temporal Action LocalizationSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [552] arXiv:2310.18924 (replaced) [pdf, other]
-
Title: Remaining useful life prediction of Lithium-ion batteries using spatio-temporal multimodal attention networksSubjects: Machine Learning (cs.LG)
- [553] arXiv:2310.19220 (replaced) [pdf, other]
-
Title: From Stream to Pool: Dynamic Pricing Beyond i.i.d. ArrivalsComments: Authors are alphabetically orderedSubjects: Machine Learning (cs.LG); Computer Science and Game Theory (cs.GT)
- [554] arXiv:2311.02462 (replaced) [pdf, ps, other]
-
Title: Levels of AGI for Operationalizing Progress on the Path to AGIAuthors: Meredith Ringel Morris, Jascha Sohl-dickstein, Noah Fiedel, Tris Warkentin, Allan Dafoe, Aleksandra Faust, Clement Farabet, Shane LeggComments: version 4 - Position Paper accepted to ICML 2024. Note that due to ICML position paper titling format requirements, the title has changed slightly from that of the original arXiv pre-print. The original pre-print title was "Levels of AGI: Operationalizing Progress on the Path to AGI" but the official published title for ICML 2024 is "Levels of AGI for Operationalizing Progress on the Path to AGI"Journal-ref: Proceedings of ICML 2024Subjects: Artificial Intelligence (cs.AI)
- [555] arXiv:2311.02868 (replaced) [pdf, other]
-
Title: Sample Complexity Bounds for Estimating Probability Divergences under InvariancesComments: ICML 2024Subjects: Machine Learning (cs.LG)
- [556] arXiv:2311.03688 (replaced) [pdf, ps, other]
-
Title: Generalized Hamming weights and minimal shifts of Orlik-Terao algebrasAuthors: Stefan O. TohaneanuComments: 11 pagesSubjects: Information Theory (cs.IT); Commutative Algebra (math.AC)
- [557] arXiv:2311.05760 (replaced) [pdf, ps, other]
-
Title: Compressed and Sparse Models for Non-Convex Decentralized LearningSubjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS); Multiagent Systems (cs.MA); Optimization and Control (math.OC)
- [558] arXiv:2311.08967 (replaced) [pdf, other]
-
Title: Homomorphic Polynomial Public Key Cryptography for Quantum-secure Digital SignatureComments: 16 pages, 1 figureSubjects: Cryptography and Security (cs.CR)
- [559] arXiv:2311.09033 (replaced) [pdf, other]
-
Title: MELA: Multilingual Evaluation of Linguistic AcceptabilityComments: ACL 2024 camera-readySubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
- [560] arXiv:2311.09048 (replaced) [pdf, other]
-
Title: GRASP: A novel benchmark for evaluating language GRounding And Situated Physics understanding in multimodal language modelsSubjects: Computation and Language (cs.CL)
- [561] arXiv:2311.09109 (replaced) [pdf, other]
-
Title: Does Pre-trained Language Model Actually Infer Unseen Links in Knowledge Graph Completion?Comments: Accepted at NAACL 2024 main oral, 15 pages, 10 figuresSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
- [562] arXiv:2311.09213 (replaced) [pdf, other]
-
Title: GENEVA: GENErating and Visualizing branching narratives using LLMsComments: Accepted at IEEE Conference on Games 2024Subjects: Computation and Language (cs.CL)
- [563] arXiv:2311.09562 (replaced) [pdf, other]
-
Title: TextEE: Benchmark, Reevaluation, Reflections, and Future Challenges in Event ExtractionAuthors: Kuan-Hao Huang, I-Hung Hsu, Tanmay Parekh, Zhiyu Xie, Zixuan Zhang, Premkumar Natarajan, Kai-Wei Chang, Nanyun Peng, Heng JiComments: Paper accepted by ACL 2024 FindingsSubjects: Computation and Language (cs.CL)
- [564] arXiv:2311.09832 (replaced) [pdf, other]
-
Title: WatME: Towards Lossless Watermarking Through Lexical RedundancyComments: Accepted to ACL 2024 main conferenceSubjects: Computation and Language (cs.CL)
- [565] arXiv:2311.10680 (replaced) [pdf, other]
-
Title: Optimal Embedding Dimension for Sparse Subspace EmbeddingsComments: STOC 2024Subjects: Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG); Numerical Analysis (math.NA); Machine Learning (stat.ML)
- [566] arXiv:2311.14251 (replaced) [pdf, ps, other]
-
Title: Optimal 1-bit Error Exponent for 2-hop Relaying with Binary-Input ChannelsComments: IEEE Transactions on Information TheorySubjects: Information Theory (cs.IT)
- [567] arXiv:2311.17451 (replaced) [pdf, other]
-
Title: Wireless Network Digital Twin for 6G: Generative AI as A Key EnablerSubjects: Networking and Internet Architecture (cs.NI); Machine Learning (cs.LG)
- [568] arXiv:2311.18610 (replaced) [pdf, other]
-
Title: DiffCAD: Weakly-Supervised Probabilistic CAD Model Retrieval and Alignment from an RGB ImageComments: SIGGRAPH 2024, Project page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [569] arXiv:2311.18717 (replaced) [pdf, other]
-
Title: NFT Wash Trading: Direct vs. Indirect EstimationSubjects: General Economics (econ.GN); Cryptography and Security (cs.CR); Multiagent Systems (cs.MA); Trading and Market Microstructure (q-fin.TR); Applications (stat.AP)
- [570] arXiv:2312.01616 (replaced) [pdf, other]
-
Title: SchurVINS: Schur Complement-Based Lightweight Visual Inertial Navigation SystemComments: Accepted by CVPR2024Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
- [571] arXiv:2312.03668 (replaced) [pdf, other]
-
Title: Integrating Pre-Trained Speech and Language Models for End-to-End Speech RecognitionComments: 17 pages, 4 figures, 9 tables, accepted for Findings of ACL 2024. The model is available at this https URLSubjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
- [572] arXiv:2312.05601 (replaced) [pdf, other]
-
Title: A Meshless Solver for Blood Flow Simulations in Elastic Vessels Using Physics-Informed Neural NetworkSubjects: Numerical Analysis (math.NA); Fluid Dynamics (physics.flu-dyn)
- [573] arXiv:2312.07104 (replaced) [pdf, other]
-
Title: SGLang: Efficient Execution of Structured Language Model ProgramsAuthors: Lianmin Zheng, Liangsheng Yin, Zhiqiang Xie, Chuyue Sun, Jeff Huang, Cody Hao Yu, Shiyi Cao, Christos Kozyrakis, Ion Stoica, Joseph E. Gonzalez, Clark Barrett, Ying ShengSubjects: Artificial Intelligence (cs.AI); Programming Languages (cs.PL)
- [574] arXiv:2312.07364 (replaced) [pdf, other]
-
Title: Collapse-Aware Triplet Decoupling for Adversarially Robust Image RetrievalComments: Accepted by ICML2024Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [575] arXiv:2312.07671 (replaced) [pdf, ps, other]
-
Title: Reacting like Humans: Incorporating Intrinsic Human Behaviors into NAO through Sound-Based Reactions to Fearful and Shocking Events for Enhanced SociabilityAuthors: Ali Ghadami, Mohammadreza Taghimohammadi, Mohammad Mohammadzadeh, Mohammad Hosseinipour, Alireza TaheriComments: 16 pages, 11 figuresSubjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV)
- [576] arXiv:2312.08800 (replaced) [pdf, other]
-
Title: Evaluating Large Language Models for Health-related Queries with PresuppositionsComments: Findings of ACL 2024Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
- [577] arXiv:2312.10104 (replaced) [pdf, other]
-
Title: Lever LM: Configuring In-Context Sequence to Lever Large Vision Language ModelsComments: 17 pages, 6 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Machine Learning (cs.LG)
- [578] arXiv:2312.14591 (replaced) [pdf, other]
-
Title: Reasons to Reject? Aligning Language Models with JudgmentsComments: Accepted at ACL 2024 Findings. Our source codes and models are publicly available at this https URLSubjects: Computation and Language (cs.CL)
- [579] arXiv:2312.14667 (replaced) [pdf, other]
-
Title: Token-Level Contrastive Learning with Modality-Aware Prompting for Multimodal Intent RecognitionComments: Accepted by AAAI 2024 (Main Track, Long Paper)Subjects: Multimedia (cs.MM); Machine Learning (cs.LG)
- [580] arXiv:2312.14792 (replaced) [pdf, ps, other]
-
Title: The Rate-Distortion-Perception-Classification Tradeoff: Joint Source Coding and Modulation via Inverse-Domain GANsComments: Paper accepted in IEEE Transactions on Signal ProcessingSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Information Theory (cs.IT); Probability (math.PR)
- [581] arXiv:2312.14922 (replaced) [pdf, other]
-
Title: Learning from higher-order statistics, efficiently: hypothesis tests, random features, and neural networksSubjects: Machine Learning (stat.ML); Statistical Mechanics (cond-mat.stat-mech); Machine Learning (cs.LG)
- [582] arXiv:2312.16752 (replaced) [pdf, other]
-
Title: Relationships Between Necessary Conditions for Feedback StabilizabilityAuthors: Matthew D. KvalheimComments: 15 pages, 2 figures; v2 adds the 2 figures and 3 new examples, and fixes some errorsSubjects: Optimization and Control (math.OC); Systems and Control (eess.SY); Algebraic Topology (math.AT); Differential Geometry (math.DG)
- [583] arXiv:2312.17518 (replaced) [pdf, ps, other]
-
Title: An algebraic characterization of binary CSS-T codes and cyclic CSS-T codes for quantum fault toleranceAuthors: Eduardo Camps-Moreno, Hiram H. López, Gretchen L. Matthews, Diego Ruano, Rodrigo San-José, Ivan SoprunovJournal-ref: Quantum Inf Process 23, 230 (2024)Subjects: Information Theory (cs.IT)
- [584] arXiv:2401.00793 (replaced) [pdf, other]
-
Title: SecFormer: Towards Fast and Accurate Privacy-Preserving Inference for Large Language ModelsComments: Accepted by ACL 2024Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Cryptography and Security (cs.CR)
- [585] arXiv:2401.01017 (replaced) [pdf, other]
-
Title: A Survey of Computation Offloading with Task TypeComments: Accepted by IEEE Transactions on Intelligent Transportation SystemsSubjects: Distributed, Parallel, and Cluster Computing (cs.DC)
- [586] arXiv:2401.02058 (replaced) [pdf, other]
-
Title: Neural Collapse for Cross-entropy Class-Imbalanced Learning with Unconstrained ReLU Feature ModelComments: 2024 International Conference on Machine LearningSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
- [587] arXiv:2401.04621 (replaced) [pdf, other]
-
Title: DebugBench: Evaluating Debugging Capability of Large Language ModelsAuthors: Runchu Tian, Yining Ye, Yujia Qin, Xin Cong, Yankai Lin, Yinxu Pan, Yesai Wu, Haotian Hui, Weichuan Liu, Zhiyuan Liu, Maosong SunComments: Accepted as Findings of ACL 2024Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
- [588] arXiv:2401.05749 (replaced) [pdf, other]
-
Title: A Shocking Amount of the Web is Machine Translated: Insights from Multi-Way ParallelismComments: Accepted at ACL Findings 2024Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
- [589] arXiv:2401.06568 (replaced) [pdf, other]
-
Title: Lost in the Source Language: How Large Language Models Evaluate the Quality of Machine TranslationComments: Accepted by ACL2024 FindingsSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
- [590] arXiv:2401.06688 (replaced) [pdf, other]
-
Title: Don't Rank, Combine! Combining Machine Translation Hypotheses Using Quality EstimationComments: Accepted at ACL 2024Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
- [591] arXiv:2401.07888 (replaced) [pdf, other]
-
Title: Multifidelity domain decomposition-based physics-informed neural networks and operators for time-dependent problemsSubjects: Numerical Analysis (math.NA); Machine Learning (cs.LG)
- [592] arXiv:2401.08295 (replaced) [pdf, other]
-
Title: SAPT: A Shared Attention Framework for Parameter-Efficient Continual Learning of Large Language ModelsAuthors: Weixiang Zhao, Shilong Wang, Yulin Hu, Yanyan Zhao, Bing Qin, Xuanyu Zhang, Qing Yang, Dongliang Xu, Wanxiang CheComments: To appear at ACL 2024Subjects: Computation and Language (cs.CL)
- [593] arXiv:2401.09670 (replaced) [pdf, other]
-
Title: DistServe: Disaggregating Prefill and Decoding for Goodput-optimized Large Language Model ServingAuthors: Yinmin Zhong, Shengyu Liu, Junda Chen, Jianbo Hu, Yibo Zhu, Xuanzhe Liu, Xin Jin, Hao ZhangComments: OSDI 2024Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
- [594] arXiv:2401.10186 (replaced) [pdf, other]
-
Title: Beyond Traditional Benchmarks: Analyzing Behaviors of Open LLMs on Data-to-Text GenerationComments: Accepted to ACL 2024 Main ConferenceSubjects: Computation and Language (cs.CL)
- [595] arXiv:2401.10338 (replaced) [pdf, ps, other]
-
Title: MELODY: Robust Semi-Supervised Hybrid Model for Entity-Level Online Anomaly Detection with Multivariate Time SeriesSubjects: Machine Learning (cs.LG)
- [596] arXiv:2401.10774 (replaced) [pdf, other]
-
Title: Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding HeadsComments: The code for this implementation is available at this https URLSubjects: Machine Learning (cs.LG); Computation and Language (cs.CL)
- [597] arXiv:2401.11382 (replaced) [pdf, other]
-
Title: Using Large Language Model for End-to-End Chinese ASR and NERAuthors: Yuang Li, Jiawei Yu, Min Zhang, Mengxin Ren, Yanqing Zhao, Xiaofeng Zhao, Shimin Tao, Jinsong Su, Hao YangComments: 5 pages, 2 figures, Accepted to InterSpeech 2024Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
- [598] arXiv:2401.13388 (replaced) [pdf, other]
-
Title: UNIMO-G: Unified Image Generation through Multimodal Conditional DiffusionComments: Accepted by ACL 2024, Main Conference, Long PaperSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [599] arXiv:2401.13649 (replaced) [pdf, other]
-
Title: VisualWebArena: Evaluating Multimodal Agents on Realistic Visual Web TasksAuthors: Jing Yu Koh, Robert Lo, Lawrence Jang, Vikram Duvvur, Ming Chong Lim, Po-Yu Huang, Graham Neubig, Shuyan Zhou, Ruslan Salakhutdinov, Daniel FriedComments: Accepted to ACL 2024. 24 pages. Project page: this https URLSubjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
- [600] arXiv:2401.14556 (replaced) [pdf, other]
-
Title: Looking Right is Sometimes Right: Investigating the Capabilities of Decoder-only LLMs for Sequence LabelingComments: Accepted at ACL 2024 FindingsSubjects: Computation and Language (cs.CL)
- [601] arXiv:2401.16467 (replaced) [pdf, other]
-
Title: ReGAL: Refactoring Programs to Discover Generalizable AbstractionsComments: ICML 2024 Camera-Ready; First two authors contributed equally; Code: this https URLSubjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Programming Languages (cs.PL)
- [602] arXiv:2401.17263 (replaced) [pdf, other]
-
Title: Robust Prompt Optimization for Defending Language Models Against Jailbreaking AttacksComments: Code available at this https URLSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
- [603] arXiv:2401.17264 (replaced) [pdf, other]
-
Title: Proactive Detection of Voice Cloning with Localized WatermarkingAuthors: Robin San Roman, Pierre Fernandez, Alexandre Défossez, Teddy Furon, Tuan Tran, Hady ElsaharSubjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)
- [604] arXiv:2401.18046 (replaced) [pdf, other]
-
Title: Multipath parsing in the brainComments: Accepted at ACL2024, main conference. 15 pagesSubjects: Computation and Language (cs.CL)
- [605] arXiv:2402.00258 (replaced) [pdf, other]
-
Title: Multi-group Learning for Hierarchical GroupsComments: Accepted in International Conference on Machine Learning 2024 (ICML 2024)Subjects: Machine Learning (cs.LG)
- [606] arXiv:2402.00759 (replaced) [pdf, other]
-
Title: Building Expressive and Tractable Probabilistic Generative Models: A ReviewSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
- [607] arXiv:2402.01156 (replaced) [pdf, other]
-
Title: An Empirical Study on Low Code Programming using Traditional vs Large Language Model SupportAuthors: Yongkun Liu, Jiachi Chen, Tingting Bi, John Grundy, Yanlin Wang, Jianxing Yu, Ting Chen, Yutian Tang, Zibin ZhengSubjects: Software Engineering (cs.SE)
- [608] arXiv:2402.01287 (replaced) [pdf, other]
-
Title: Spiking CenterNet: A Distillation-boosted Spiking Neural Network for Object DetectionComments: 8 pages, 5 figures. Accepted at IJCNN 2024Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
- [609] arXiv:2402.01344 (replaced) [pdf, other]
-
Title: Monotone, Bi-Lipschitz, and Polyak-Lojasiewicz NetworksComments: International Conference on Machine Learning, Vienna, Austria, July 21 -- 17, 2024Subjects: Machine Learning (cs.LG)
- [610] arXiv:2402.01501 (replaced) [pdf, ps, other]
-
Title: Satisfiability Modulo Exponential Integer ArithmeticSubjects: Logic in Computer Science (cs.LO)
- [611] arXiv:2402.02500 (replaced) [pdf, other]
-
Title: Point Cloud Matters: Rethinking the Impact of Different Observation Spaces on Robot LearningSubjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [612] arXiv:2402.03141 (replaced) [pdf, other]
-
Title: Boosting Reinforcement Learning with Strongly Delayed Feedback Through Auxiliary Short DelaysAuthors: Qingyuan Wu, Simon Sinong Zhan, Yixuan Wang, Yuhui Wang, Chung-Wei Lin, Chen Lv, Qi Zhu, Jürgen Schmidhuber, Chao HuangComments: ICML 2024Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Systems and Control (eess.SY)
- [613] arXiv:2402.03169 (replaced) [pdf, ps, other]
-
Title: A Random Matrix Approach to Low-Multilinear-Rank Tensor ApproximationSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Probability (math.PR)
- [614] arXiv:2402.03412 (replaced) [pdf, other]
-
Title: See More Details: Efficient Image Super-Resolution by Experts MiningComments: Accepted at ICML 2024Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
- [615] arXiv:2402.03625 (replaced) [pdf, other]
-
Title: Convex Relaxations of ReLU Neural Networks Approximate Global Optima in Polynomial TimeSubjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
- [616] arXiv:2402.03903 (replaced) [pdf, other]
-
Title: Averaging $n$-step Returns Reduces Variance in Reinforcement LearningComments: ICML 2024. 27 pages, 7 figures, 3 tablesSubjects: Machine Learning (cs.LG)
- [617] arXiv:2402.04356 (replaced) [pdf, other]
-
Title: Bidirectional Autoregressive Diffusion Model for Dance GenerationSubjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Audio and Speech Processing (eess.AS)
- [618] arXiv:2402.04407 (replaced) [pdf, ps, other]
-
Title: Sharp Lower Bounds on the Manifold Widths of Sobolev and Besov SpacesAuthors: Jonathan W. SiegelSubjects: Numerical Analysis (math.NA)
- [619] arXiv:2402.04467 (replaced) [pdf, other]
-
Title: DySLIM: Dynamics Stable Learning by Invariant Measure for Chaotic SystemsAuthors: Yair Schiff, Zhong Yi Wan, Jeffrey B. Parker, Stephan Hoyer, Volodymyr Kuleshov, Fei Sha, Leonardo Zepeda-NúñezComments: ICML 2024; Code to reproduce our experiments is available at this https URLSubjects: Machine Learning (cs.LG); Dynamical Systems (math.DS)
- [620] arXiv:2402.04610 (replaced) [pdf, other]
-
Title: Early Stopping of Untrained Convolutional Neural NetworksSubjects: Numerical Analysis (math.NA)
- [621] arXiv:2402.04621 (replaced) [pdf, other]
-
Title: Feature Distribution on Graph Topology Mediates the Effect of Graph Convolution: Homophily PerspectiveComments: published in ICML 2024Subjects: Machine Learning (cs.LG)
- [622] arXiv:2402.04788 (replaced) [pdf, other]
-
Title: MLLM-as-a-Judge: Assessing Multimodal LLM-as-a-Judge with Vision-Language BenchmarkAuthors: Dongping Chen, Ruoxi Chen, Shilin Zhang, Yinuo Liu, Yaochen Wang, Huichi Zhou, Qihui Zhang, Pan Zhou, Yao Wan, Lichao SunComments: ICML 2024 (Oral)Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
- [623] arXiv:2402.04997 (replaced) [pdf, other]
-
Title: Generative Flows on Discrete State-Spaces: Enabling Multimodal Flows with Applications to Protein Co-DesignComments: 60 pages, 11 figures, 6 tables; ICML 2024Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Quantitative Methods (q-bio.QM)
- [624] arXiv:2402.06031 (replaced) [pdf, other]
-
Title: An operator learning perspective on parameter-to-observable mapsComments: 63 pages, 10 figures, 1 tableSubjects: Machine Learning (cs.LG); Statistics Theory (math.ST); Machine Learning (stat.ML)
- [625] arXiv:2402.06700 (replaced) [pdf, other]
-
Title: Entropy-Regularized Token-Level Policy Optimization for Language Agent ReinforcementSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
- [626] arXiv:2402.06733 (replaced) [pdf, other]
-
Title: NICE: To Optimize In-Context Examples or Not?Comments: Accepted as a full paper (9 pages) at ACL 2024 (Main)Journal-ref: Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics 2024 (Volume 1: Long Papers)Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
- [627] arXiv:2402.06888 (replaced) [pdf, other]
-
Title: Analysis of Self-Supervised Speech Models on Children's Speech and Infant VocalizationsComments: Accepted to 2024 ICASSP Workshop of Self-supervision in Audio, Speech and Beyond (SASB)Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
- [628] arXiv:2402.07214 (replaced) [pdf, other]
-
Title: Through the Lens of Split Vote: Exploring Disagreement, Difficulty and Calibration in Legal Case Outcome ClassificationSubjects: Computation and Language (cs.CL)
- [629] arXiv:2402.07483 (replaced) [pdf, other]
-
Title: T-RAG: Lessons from the LLM TrenchesComments: Added Needle in a Haystack analysis for T-RAGSubjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
- [630] arXiv:2402.07640 (replaced) [pdf, other]
-
Title: CMFeed: A Benchmark Dataset for Controllable Multimodal Feedback SynthesisSubjects: Multimedia (cs.MM); Artificial Intelligence (cs.AI)
- [631] arXiv:2402.07844 (replaced) [pdf, other]
-
Title: Mercury: A Code Efficiency Benchmark for LLM Code SynthesisSubjects: Software Engineering (cs.SE); Computation and Language (cs.CL)
- [632] arXiv:2402.07891 (replaced) [pdf, other]
-
Title: Label-Efficient Model Selection for Text GenerationAuthors: Shir Ashury-Tahan, Ariel Gera, Benjamin Sznajder, Leshem Choshen, Liat Ein-Dor, Eyal ShnarchComments: Accepted to ACL (main conference)Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
- [633] arXiv:2402.08595 (replaced) [pdf, other]
-
Title: Homomorphism Counts for Graph Neural Networks: All About That BasisComments: Proceedings of the Forty-First International Conference on Machine Learning (ICML 2024). Code available at: this https URLSubjects: Machine Learning (cs.LG)
- [634] arXiv:2402.08876 (replaced) [pdf, other]
-
Title: DUDF: Differentiable Unsigned Distance Fields with Hyperbolic ScalingSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR)
- [635] arXiv:2402.09470 (replaced) [pdf, other]
-
Title: Rolling Diffusion ModelsSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
- [636] arXiv:2402.10013 (replaced) [pdf, other]
-
Title: Bridging the Empirical-Theoretical Gap in Neural Network Formal Language Learning Using Minimum Description LengthComments: 9 pages, 5 figures, 3 appendix pagesSubjects: Computation and Language (cs.CL); Formal Languages and Automata Theory (cs.FL)
- [637] arXiv:2402.10073 (replaced) [pdf, other]
-
Title: Both Matter: Enhancing the Emotional Intelligence of Large Language Models without Compromising the General IntelligenceAuthors: Weixiang Zhao, Zhuojun Li, Shilong Wang, Yang Wang, Yulin Hu, Yanyan Zhao, Chen Wei, Bing QinComments: To appear at Findings of ACL 2024Subjects: Computation and Language (cs.CL)
- [638] arXiv:2402.10422 (replaced) [pdf, other]
-
Title: Pushing the Limits of Zero-shot End-to-End Speech TranslationComments: ACL 2024 (Findings)Subjects: Computation and Language (cs.CL)
- [639] arXiv:2402.10450 (replaced) [pdf, other]
-
Title: PRISE: LLM-Style Sequence Compression for Learning Temporal Action Abstractions in ControlComments: Accepted at the Forty-first International Conference on Machine Learning (ICML 2024)Subjects: Machine Learning (cs.LG)
- [640] arXiv:2402.10571 (replaced) [pdf, other]
-
Title: Direct Preference Optimization with an OffsetSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
- [641] arXiv:2402.10588 (replaced) [pdf, other]
-
Title: Do Llamas Work in English? On the Latent Language of Multilingual TransformersComments: 12 pages. 28 with appendixSubjects: Computation and Language (cs.CL); Computers and Society (cs.CY)
- [642] arXiv:2402.10639 (replaced) [pdf, other]
-
Title: Generalizability of Mixture of Domain-Specific Adapters from the Lens of Signed Weight Directions and its Application to Effective Model PruningComments: ACL Main 2024Subjects: Computation and Language (cs.CL)
- [643] arXiv:2402.10727 (replaced) [pdf, other]
-
Title: Predictive Uncertainty Quantification via Risk Decompositions for Strictly Proper Scoring RulesSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
- [644] arXiv:2402.10890 (replaced) [pdf, other]
-
Title: When is Tree Search Useful for LLM Planning? It Depends on the DiscriminatorComments: ACL 2024 mainSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
- [645] arXiv:2402.11138 (replaced) [pdf, other]
-
Title: Contrastive Instruction TuningAuthors: Tianyi Lorena Yan, Fei Wang, James Y. Huang, Wenxuan Zhou, Fan Yin, Aram Galstyan, Wenpeng Yin, Muhao ChenComments: ACL 2024 FindingsSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
- [646] arXiv:2402.11349 (replaced) [pdf, other]
-
Title: Language Models Don't Learn the Physical Manifestation of LanguageComments: ACL 2024 MainSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
- [647] arXiv:2402.11463 (replaced) [pdf, other]
-
Title: Attractor Memory for Long-Term Time Series Forecasting: A Chaos PerspectiveComments: arXiv admin note: text overlap with arXiv:nlin/0307015 by other authorsSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Chaotic Dynamics (nlin.CD)
- [648] arXiv:2402.11485 (replaced) [pdf, other]
-
Title: LEIA: Facilitating Cross-lingual Knowledge Transfer in Language Models with Entity-based Data AugmentationComments: ACL Findings 2024Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
- [649] arXiv:2402.11517 (replaced) [pdf, other]
-
Title: Knowledge-to-SQL: Enhancing SQL Generation with Data Expert LLMComments: Accepted to ACL2024 FindingsSubjects: Computation and Language (cs.CL)
- [650] arXiv:2402.11548 (replaced) [pdf, other]
-
Title: KMMLU: Measuring Massive Multitask Language Understanding in KoreanAuthors: Guijin Son, Hanwool Lee, Sungdong Kim, Seungone Kim, Niklas Muennighoff, Taekyoon Choi, Cheonbok Park, Kang Min Yoo, Stella BidermanComments: Under ReviewSubjects: Computation and Language (cs.CL)
- [651] arXiv:2402.11597 (replaced) [pdf, other]
-
Title: Multi-Task Inference: Can Large Language Models Follow Multiple Instructions at Once?Comments: acl 2024 (main)Subjects: Computation and Language (cs.CL)
- [652] arXiv:2402.11674 (replaced) [pdf, other]
-
Title: A Fast Algorithm to Simulate Nonlinear Resistive NetworksAuthors: Benjamin ScellierComments: ICML 2024Subjects: Emerging Technologies (cs.ET); Disordered Systems and Neural Networks (cond-mat.dis-nn); Machine Learning (cs.LG)
- [653] arXiv:2402.11740 (replaced) [pdf, ps, other]
-
Title: Extraction of nonlinearity in neural networks with Koopman operatorComments: 22 pages, 14 figuresSubjects: Machine Learning (cs.LG)
- [654] arXiv:2402.11894 (replaced) [pdf, other]
-
Title: Automating Dataset Updates Towards Reliable and Timely Evaluation of Large Language ModelsAuthors: Jiahao Ying, Yixin Cao, Yushi Bai, Qianru Sun, Bo Wang, Wei Tang, Zhaojun Ding, Yizhe Yang, Xuanjing Huang, Shuicheng YanSubjects: Computation and Language (cs.CL)
- [655] arXiv:2402.12343 (replaced) [pdf, other]
-
Title: Emulated Disalignment: Safety Alignment for Large Language Models May Backfire!Comments: ACL 2024Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
- [656] arXiv:2402.12424 (replaced) [pdf, other]
-
Title: Tables as Texts or Images: Evaluating the Table Reasoning Ability of LLMs and MLLMsAuthors: Naihao Deng, Zhenjie Sun, Ruiqi He, Aman Sikka, Yulong Chen, Lin Ma, Yue Zhang, Rada MihalceaComments: Accepted to ACL 2024 FindingsSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
- [657] arXiv:2402.12451 (replaced) [pdf, other]
-
Title: The Revolution of Multimodal Large Language Models: A SurveyAuthors: Davide Caffagni, Federico Cocchi, Luca Barsellotti, Nicholas Moratelli, Sara Sarto, Lorenzo Baraldi, Lorenzo Baraldi, Marcella Cornia, Rita CucchiaraComments: ACL 2024 (Findings)Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Multimedia (cs.MM)
- [658] arXiv:2402.12621 (replaced) [pdf, other]
-
Title: Reflect-RL: Two-Player Online RL Fine-Tuning for LMsComments: ACL 2024Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)
- [659] arXiv:2402.12691 (replaced) [pdf, other]
-
Title: Tree-Planted Transformers: Unidirectional Transformer Language Models with Implicit Syntactic SupervisionComments: Accepted by ACL 2024 (Findings)Subjects: Computation and Language (cs.CL)
- [660] arXiv:2402.12991 (replaced) [pdf, other]
-
Title: TRAP: Targeted Random Adversarial Prompt Honeypot for Black-Box IdentificationComments: Accepted at ACL 2024 (findings)Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Cryptography and Security (cs.CR)
- [661] arXiv:2402.13212 (replaced) [pdf, other]
-
Title: Soft Self-Consistency Improves Language Model AgentsComments: ACL 2024 Camera-Ready, the first three authors contributed equally; Code: this https URLSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
- [662] arXiv:2402.13874 (replaced) [pdf, other]
-
Title: $Se^2$: Sequential Example Selection for In-Context LearningAuthors: Haoyu Liu, Jianfeng Liu, Shaohan Huang, Yuefeng Zhan, Hao Sun, Weiwei Deng, Furu Wei, Qi ZhangComments: Accepted by ACL 2024 FindingsSubjects: Computation and Language (cs.CL)
- [663] arXiv:2402.14008 (replaced) [pdf, other]
-
Title: OlympiadBench: A Challenging Benchmark for Promoting AGI with Olympiad-Level Bilingual Multimodal Scientific ProblemsAuthors: Chaoqun He, Renjie Luo, Yuzhuo Bai, Shengding Hu, Zhen Leng Thai, Junhao Shen, Jinyi Hu, Xu Han, Yujie Huang, Yuxiang Zhang, Jie Liu, Lei Qi, Zhiyuan Liu, Maosong SunComments: Accepted by ACL 2024 (main), updateSubjects: Computation and Language (cs.CL)
- [664] arXiv:2402.14116 (replaced) [pdf, other]
-
Title: FanOutQA: A Multi-Hop, Multi-Document Question Answering Benchmark for Large Language ModelsComments: 18 pages, 2 figures. ACL 2024Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
- [665] arXiv:2402.14298 (replaced) [pdf, other]
-
Title: Multi-modal Stance Detection: New Datasets and ModelComments: ACL'24 FindingsSubjects: Computation and Language (cs.CL)
- [666] arXiv:2402.14328 (replaced) [pdf, other]
-
Title: Understanding and Patching Compositional Reasoning in LLMsComments: Accepted by ACL'2024 FindingsSubjects: Computation and Language (cs.CL)
- [667] arXiv:2402.14490 (replaced) [pdf, other]
-
Title: Imbalanced Data Clustering using Equilibrium K-MeansAuthors: Yudong HeSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
- [668] arXiv:2402.14569 (replaced) [pdf, other]
-
Title: Transformable Gaussian Reward Function for Socially-Aware Navigation with Deep Reinforcement LearningComments: 22 pages, 9 figuresSubjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)
- [669] arXiv:2402.14979 (replaced) [pdf, other]
-
Title: Optimizing Language Models for Human Preferences is a Causal Inference ProblemComments: UAI 2024Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Methodology (stat.ME)
- [670] arXiv:2402.15082 (replaced) [pdf, other]
-
Title: PEMT: Multi-Task Correlation Guided Mixture-of-Experts Enables Parameter-Efficient Transfer LearningComments: Accepted to Findings of the ACL 2024Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
- [671] arXiv:2402.15332 (replaced) [pdf, ps, other]
-
Title: Position: Categorical Deep Learning is an Algebraic Theory of All ArchitecturesAuthors: Bruno Gavranović, Paul Lessard, Andrew Dudzik, Tamara von Glehn, João G. M. Araújo, Petar VeličkovićComments: To appear in ICML 2024. Comments welcome. More info at categoricaldeeplearning.comSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Category Theory (math.CT); Rings and Algebras (math.RA); Machine Learning (stat.ML)
- [672] arXiv:2402.15392 (replaced) [pdf, ps, other]
-
Title: Offline Inverse RL: New Solution Concepts and Provably Efficient AlgorithmsComments: International Conference on Machine Learning 41 (ICML 2024)Subjects: Machine Learning (cs.LG)
- [673] arXiv:2402.15637 (replaced) [pdf, other]
-
Title: Addressing Order Sensitivity of In-Context Demonstration Examples in Causal Language ModelsSubjects: Computation and Language (cs.CL)
- [674] arXiv:2402.15838 (replaced) [pdf, other]
-
Title: ListT5: Listwise Reranking with Fusion-in-Decoder Improves Zero-shot RetrievalComments: Accepted to ACL 2024 main (long)Subjects: Information Retrieval (cs.IR)
- [675] arXiv:2402.16438 (replaced) [pdf, other]
-
Title: Language-Specific Neurons: The Key to Multilingual Capabilities in Large Language ModelsAuthors: Tianyi Tang, Wenyang Luo, Haoyang Huang, Dongdong Zhang, Xiaolei Wang, Xin Zhao, Furu Wei, Ji-Rong WenComments: Accepted by ACL 2024Subjects: Computation and Language (cs.CL)
- [676] arXiv:2402.16775 (replaced) [pdf, other]
-
Title: A Comprehensive Evaluation of Quantization Strategies for Large Language ModelsComments: ACL 2024 FindingsSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
- [677] arXiv:2402.17120 (replaced) [pdf, other]
-
Title: LCEN: A Novel Feature Selection Algorithm for Nonlinear, Interpretable Machine Learning ModelsSubjects: Machine Learning (cs.LG)
- [678] arXiv:2402.17316 (replaced) [pdf, other]
-
Title: Towards Robust and Efficient Cloud-Edge Elastic Model Adaptation via Selective Entropy DistillationComments: Published in ICLR 2024Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [679] arXiv:2402.17447 (replaced) [pdf, other]
-
Title: Deep Learning Based Named Entity Recognition Models for RecipesAuthors: Mansi Goel, Ayush Agarwal, Shubham Agrawal, Janak Kapuriya, Akhil Vamshi Konam, Rishabh Gupta, Shrey Rastogi, Niharika, Ganesh BaglerComments: 13 pages, 6 main figures and 2 in appendices, and 3 main tables; Accepted for publication in LREC-COLING 2024Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
- [680] arXiv:2402.17641 (replaced) [pdf, other]
-
Title: Variational Learning is Effective for Large Deep NetworksAuthors: Yuesong Shen, Nico Daheim, Bai Cong, Peter Nickl, Gian Maria Marconi, Clement Bazan, Rio Yokota, Iryna Gurevych, Daniel Cremers, Mohammad Emtiyaz Khan, Thomas MöllenhoffComments: Published at International Conference on Machine Learning (ICML), 2024. The first two authors contributed equally. Code is available here: this https URLSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Optimization and Control (math.OC); Machine Learning (stat.ML)
- [681] arXiv:2402.18059 (replaced) [pdf, other]
-
Title: Token-Specific Watermarking with Enhanced Detectability and Semantic Coherence for Large Language ModelsAuthors: Mingjia Huo, Sai Ashish Somayajula, Youwei Liang, Ruisi Zhang, Farinaz Koushanfar, Pengtao XieComments: 22 pages, 13 figures, 5 tablesSubjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Cryptography and Security (cs.CR)
- [682] arXiv:2402.18158 (replaced) [pdf, other]
-
Title: Evaluating Quantized Large Language ModelsAuthors: Shiyao Li, Xuefei Ning, Luning Wang, Tengxuan Liu, Xiangsheng Shi, Shengen Yan, Guohao Dai, Huazhong Yang, Yu WangSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
- [683] arXiv:2402.18334 (replaced) [pdf, other]
-
Title: Learning to Generate Instruction Tuning Datasets for Zero-Shot Task AdaptationComments: ACL Findings 2024Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
- [684] arXiv:2403.00720 (replaced) [pdf, other]
-
Title: Subhomogeneous Deep Equilibrium ModelsSubjects: Machine Learning (cs.LG); Numerical Analysis (math.NA); Optimization and Control (math.OC)
- [685] arXiv:2403.01165 (replaced) [pdf, other]
-
Title: STAR: Constraint LoRA with Dynamic Active Learning for Data-Efficient Fine-Tuning of Large Language ModelsComments: Accepted by ACL2024(Findings)Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
- [686] arXiv:2403.01166 (replaced) [pdf, other]
-
Title: DINER: Debiasing Aspect-based Sentiment Analysis with Multi-variable Causal InferenceComments: Accepted by ACL2024(Findings)Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
- [687] arXiv:2403.01931 (replaced) [pdf, other]
-
Title: VariErr NLI: Separating Annotation Error from Human Label VariationComments: 14 pages, accepted at ACL 2024 mainSubjects: Computation and Language (cs.CL)
- [688] arXiv:2403.02271 (replaced) [pdf, other]
-
Title: RIFF: Learning to Rephrase Inputs for Few-shot Fine-tuning of Language ModelsComments: Final Version (Findings of ACL2024)Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
- [689] arXiv:2403.02354 (replaced) [pdf, other]
-
Title: Spatio-Temporal Field Neural Networks for Air Quality InferenceComments: We want to recheck our model and experimental designSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
- [690] arXiv:2403.02437 (replaced) [pdf, other]
-
Title: SoK: Challenges and Opportunities in Federated UnlearningSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC)
- [691] arXiv:2403.02451 (replaced) [pdf, other]
-
Title: Views Are My Own, but Also Yours: Benchmarking Theory of Mind Using Common GroundAuthors: Adil Soubki, John Murzaku, Arash Yousefi Jordehi, Peter Zeng, Magdalena Markowska, Seyed Abolghasem Mirroshandel, Owen RambowSubjects: Computation and Language (cs.CL)
- [692] arXiv:2403.02660 (replaced) [pdf, other]
-
Title: A randomized lattice rule without component-by-component constructionAuthors: Takashi GodaComments: revision, 21 pages, 3 figuresSubjects: Numerical Analysis (math.NA)
- [693] arXiv:2403.02977 (replaced) [pdf, other]
-
Title: Fast Iterative Region Inflation for Computing Large 2-D/3-D Convex Regions of Obstacle-Free SpaceAuthors: Qianhao Wang, Zhepei Wang, Mingyang Wang, Jialin Ji, Zhichao Han, Tianyue Wu, Rui Jin, Yuman Gao, Chao Xu, Fei GaoSubjects: Robotics (cs.RO)
- [694] arXiv:2403.03129 (replaced) [pdf, other]
-
Title: CoGenesis: A Framework Collaborating Large and Small Language Models for Secure Context-Aware Instruction FollowingComments: Accepted to ACL 2024 (Main Conference)Subjects: Computation and Language (cs.CL)
- [695] arXiv:2403.03167 (replaced) [pdf, other]
-
Title: PARADISE: Evaluating Implicit Planning Skills of Language Models with Procedural Warnings and Tips DatasetComments: 9 pages, ACL 2024 FindingsSubjects: Computation and Language (cs.CL)
- [696] arXiv:2403.03234 (replaced) [pdf, other]
-
Title: Caduceus: Bi-Directional Equivariant Long-Range DNA Sequence ModelingComments: ICML 2024; Code to reproduce our experiments is available at this https URLSubjects: Genomics (q-bio.GN); Machine Learning (cs.LG)
- [697] arXiv:2403.04346 (replaced) [pdf, ps, other]
-
Title: BrainKnow -- Extracting, Linking, and Synthesizing Neuroscience KnowledgeComments: 22 pages, 7 figuresSubjects: Digital Libraries (cs.DL); Neurons and Cognition (q-bio.NC)
- [698] arXiv:2403.05535 (replaced) [pdf, other]
-
Title: Tell, Don't Show!: Language Guidance Eases Transfer Across Domains in Images and VideosComments: ICML 2024 Camera-Ready. Project Page and Code: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
- [699] arXiv:2403.06189 (replaced) [pdf, other]
-
Title: Harmonious Group Choreography with Trajectory-Controllable DiffusionSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [700] arXiv:2403.06840 (replaced) [pdf, other]
-
Title: RA-ISF: Learning to Answer and Understand from Retrieval Augmentation via Iterative Self-FeedbackComments: 20 pages, multiple figures. Providing second version RA-ISFSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
- [701] arXiv:2403.06932 (replaced) [pdf, other]
-
Title: ERA-CoT: Improving Chain-of-Thought through Entity Relationship AnalysisComments: 15 pages, second version of ERA-CoTSubjects: Computation and Language (cs.CL)
- [702] arXiv:2403.07245 (replaced) [pdf, other]
-
Title: Dataset Condensation for Time Series Classification via Dual Domain MatchingComments: Accepted by KDD 2024 research trackSubjects: Machine Learning (cs.LG)
- [703] arXiv:2403.07723 (replaced) [pdf, ps, other]
-
Title: On the Last-Iterate Convergence of Shuffling Gradient MethodsComments: ICML 2024Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
- [704] arXiv:2403.07746 (replaced) [pdf, other]
-
Title: Unleashing HyDRa: Hybrid Fusion, Depth Consistency and Radar for Unified 3D PerceptionAuthors: Philipp Wolters, Johannes Gilg, Torben Teepe, Fabian Herzog, Anouar Laouichi, Martin Hofmann, Gerhard RigollComments: 10 pages, 4 figures Added eval on VoDSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [705] arXiv:2403.07974 (replaced) [pdf, other]
-
Title: LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for CodeAuthors: Naman Jain, King Han, Alex Gu, Wen-Ding Li, Fanjia Yan, Tianjun Zhang, Sida Wang, Armando Solar-Lezama, Koushik Sen, Ion StoicaComments: Website - this https URLSubjects: Software Engineering (cs.SE); Computation and Language (cs.CL); Machine Learning (cs.LG)
- [706] arXiv:2403.09347 (replaced) [pdf, other]
-
Title: BurstAttention: An Efficient Distributed Attention Framework for Extremely Long SequencesComments: 13 pages, 7 figuresSubjects: Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)
- [707] arXiv:2403.09871 (replaced) [pdf, other]
-
Title: ThermoHands: A Benchmark for 3D Hand Pose Estimation from Egocentric Thermal ImagesComments: 15 pages, 6 figures, 4 tablesSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
- [708] arXiv:2403.10081 (replaced) [pdf, other]
-
Title: DRAGIN: Dynamic Retrieval Augmented Generation based on the Information Needs of Large Language ModelsSubjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)
- [709] arXiv:2403.13169 (replaced) [pdf, other]
-
Title: Wav2Gloss: Generating Interlinear Glossed Text from SpeechAuthors: Taiqi He, Kwanghee Choi, Lindia Tjuatja, Nathaniel R. Robinson, Jiatong Shi, Shinji Watanabe, Graham Neubig, David R. Mortensen, Lori LevinComments: ACL 2024 camera ready versionSubjects: Computation and Language (cs.CL)
- [710] arXiv:2403.13872 (replaced) [pdf, other]
-
Title: Spatial-Temporal Graph Representation Learning for Tactical Networks Future State PredictionSubjects: Machine Learning (cs.LG); Social and Information Networks (cs.SI)
- [711] arXiv:2403.15097 (replaced) [pdf, other]
-
Title: Argument-Aware Approach To Event LinkingAuthors: I-Hung Hsu, Zihan Xue, Nilay Pochh, Sahil Bansal, Premkumar Natarajan, Jayanth Srinivasa, Nanyun PengComments: Paper accepted by ACL-findings 2024Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
- [712] arXiv:2403.15191 (replaced) [pdf, other]
-
Title: VORTEX: Real-Time Off-Chain Payments and Cross-Chain Swaps for CryptocurrenciesSubjects: Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC)
- [713] arXiv:2403.17270 (replaced) [pdf, other]
-
Title: Human Stress Response and Perceived Safety during Encounters with Quadruped RobotsComments: 8 pages, 7 figs, 5 tablesSubjects: Robotics (cs.RO); Human-Computer Interaction (cs.HC)
- [714] arXiv:2403.17673 (replaced) [pdf, other]
-
Title: How Private are DP-SGD Implementations?Authors: Lynn Chua, Badih Ghazi, Pritish Kamath, Ravi Kumar, Pasin Manurangsi, Amer Sinha, Chiyuan ZhangComments: Proceedings of ICML 2024Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Data Structures and Algorithms (cs.DS)
- [715] arXiv:2403.18680 (replaced) [pdf, other]
-
Title: Non-Linear Inference Time Intervention: Improving LLM TruthfulnessAuthors: Jakub Hoscilowicz, Adam Wiacek, Jan Chojnacki, Adam Cieslak, Leszek Michon, Vitalii Urbanevych, Artur JanickiComments: Accepted on Interspeech 2024 Conference. Code is available at this https URLSubjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
- [716] arXiv:2403.18953 (replaced) [pdf, ps, other]
-
Title: Hybridizing Traditional and Next-Generation Reservoir Computing to Accurately and Efficiently Forecast Dynamical SystemsComments: 12 pages, 7 figuresJournal-ref: Chaos 1 June 2024; 34 (6): 063114Subjects: Machine Learning (cs.LG)
- [717] arXiv:2403.19223 (replaced) [pdf, ps, other]
-
Title: Computing large deviation rate functions of entropy production for diffusion processes by an interacting particle methodSubjects: Numerical Analysis (math.NA)
- [718] arXiv:2403.19260 (replaced) [pdf, other]
-
Title: NaijaHate: Evaluating Hate Speech Detection on Nigerian Twitter Using Representative DataAuthors: Manuel Tonneau, Pedro Vitor Quinta de Castro, Karim Lasri, Ibrahim Farouq, Lakshminarayanan Subramanian, Victor Orozco-Olvera, Samuel P. FraibergerComments: ACL 2024 main conference. Data and models available at this https URLSubjects: Computation and Language (cs.CL)
- [719] arXiv:2403.19589 (replaced) [pdf, other]
-
Title: TOD3Cap: Towards 3D Dense Captioning in Outdoor ScenesAuthors: Bu Jin, Yupeng Zheng, Pengfei Li, Weize Li, Yuhang Zheng, Sujie Hu, Xinyu Liu, Jinwei Zhu, Zhijie Yan, Haiyang Sun, Kun Zhan, Peng Jia, Xiaoxiao Long, Yilun Chen, Hao ZhaoComments: Code, data, and models are publicly available at this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [720] arXiv:2404.00929 (replaced) [pdf, other]
- [721] arXiv:2404.05835 (replaced) [pdf, other]
-
Title: Parameter-Adaptive Approximate MPC: Tuning Neural-Network Controllers without RetrainingComments: Accepted to L4DC 2024Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG); Optimization and Control (math.OC)
- [722] arXiv:2404.09889 (replaced) [pdf, other]
-
Title: Is Table Retrieval a Solved Problem? Exploring Join-Aware Multi-Table RetrievalComments: ACL 2024 camera readySubjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
- [723] arXiv:2404.10496 (replaced) [pdf, other]
-
Title: Spiral of Silences: How is Large Language Model Killing Information Retrieval? -- A Case Study on Open Domain Question AnsweringAuthors: Xiaoyang Chen, Ben He, Hongyu Lin, Xianpei Han, Tianshu Wang, Boxi Cao, Le Sun, Yingfei SunComments: Accepted to ACL2024Subjects: Information Retrieval (cs.IR)
- [724] arXiv:2404.12464 (replaced) [pdf, other]
-
Title: NormAd: A Benchmark for Measuring the Cultural Adaptability of Large Language ModelsComments: Preprint. In ReviewSubjects: Computation and Language (cs.CL)
- [725] arXiv:2404.13195 (replaced) [pdf, ps, other]
-
Title: Automatic BLAS Offloading on Unified Memory Architecture: A Study on NVIDIA Grace-HopperSubjects: Distributed, Parallel, and Cluster Computing (cs.DC)
- [726] arXiv:2404.13874 (replaced) [pdf, other]
-
Title: VALOR-EVAL: Holistic Coverage and Faithfulness Evaluation of Large Vision-Language ModelsComments: ACL 2024 FindingsSubjects: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
- [727] arXiv:2404.13936 (replaced) [pdf, ps, other]
-
Title: A bound preserving cut discontinuous Galerkin method for one dimensional hyperbolic conservation lawsComments: 32Subjects: Numerical Analysis (math.NA)
- [728] arXiv:2404.14461 (replaced) [pdf, other]
-
Title: Competition Report: Finding Universal Jailbreak Backdoors in Aligned LLMsAuthors: Javier Rando, Francesco Croce, Kryštof Mitka, Stepan Shabalin, Maksym Andriushchenko, Nicolas Flammarion, Florian TramèrComments: Competition ReportSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
- [729] arXiv:2404.14745 (replaced) [pdf, other]
-
Title: TAAT: Think and Act from Arbitrary Texts in Text2MotionComments: Updated errors in author informationSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [730] arXiv:2404.14964 (replaced) [pdf, other]
-
Title: Elucidating the theoretical underpinnings of surrogate gradient learning in spiking neural networksComments: 25 pages, 7 figures + 3 supplementary figuresSubjects: Neural and Evolutionary Computing (cs.NE); Neurons and Cognition (q-bio.NC)
- [731] arXiv:2404.15004 (replaced) [pdf, other]
-
Title: TAXI: Evaluating Categorical Knowledge Editing for Language ModelsComments: Accepted to ACL 2024 (Findings)Subjects: Computation and Language (cs.CL)
- [732] arXiv:2404.15522 (replaced) [pdf, other]
-
Title: LogicBench: Towards Systematic Evaluation of Logical Reasoning Ability of Large Language ModelsAuthors: Mihir Parmar, Nisarg Patel, Neeraj Varshney, Mutsumi Nakamura, Man Luo, Santosh Mashetty, Arindam Mitra, Chitta BaralComments: Accepted at ACL(Main) 2024 | First version available @ this https URLSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
- [733] arXiv:2404.15611 (replaced) [pdf, other]
-
Title: Model Poisoning Attacks to Federated Learning via Multi-Round ConsistencySubjects: Cryptography and Security (cs.CR)
- [734] arXiv:2404.16363 (replaced) [pdf, other]
-
Title: Byzantine Attacks Exploiting Penalties in Ethereum PoSSubjects: Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC)
- [735] arXiv:2404.16966 (replaced) [pdf, other]
-
Title: Examining the robustness of LLM evaluation to the distributional assumptions of benchmarksSubjects: Computation and Language (cs.CL)
- [736] arXiv:2404.17140 (replaced) [pdf, other]
-
Title: Small Language Models Need Strong Verifiers to Self-Correct ReasoningAuthors: Yunxiang Zhang, Muhammad Khalifa, Lajanugen Logeswaran, Jaekyeom Kim, Moontae Lee, Honglak Lee, Lu WangComments: ACL Findings 2024 - Camera ReadySubjects: Computation and Language (cs.CL)
- [737] arXiv:2405.00301 (replaced) [pdf, other]
-
Title: Enhanced Language Model Truthfulness with Learnable Intervention and Uncertainty ExpressionComments: 13 pages, 5 figuresSubjects: Computation and Language (cs.CL)
- [738] arXiv:2405.00892 (replaced) [pdf, other]
-
Title: Wake Vision: A Large-scale, Diverse Dataset and Benchmark Suite for TinyML Person DetectionAuthors: Colby Banbury, Emil Njor, Matthew Stewart, Pete Warden, Manjunath Kudlur, Nat Jeffries, Xenofon Fafoutis, Vijay Janapa ReddiSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [739] arXiv:2405.00899 (replaced) [pdf, other]
-
Title: Characterising the Creative Process in Humans and Large Language ModelsSubjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Neurons and Cognition (q-bio.NC)
- [740] arXiv:2405.02492 (replaced) [pdf, other]
-
Title: Investigating the Generalizability of Assistive Robots Models over Various TasksComments: Accepted to 2024 21st International Conference on Ubiquitous Robots (UR)Subjects: Robotics (cs.RO)
- [741] arXiv:2405.02664 (replaced) [pdf, other]
-
Title: MedPromptExtract (Medical Data Extraction Tool): Anonymization and Hi-fidelity Automated data extraction using NLP and prompt engineeringAuthors: Roomani Srivastava, Suraj Prasad, Lipika Bhat, Sarvesh Deshpande, Barnali Das, Kshitij JadhavComments: 4 pages, 3 figures, pre-print sumitted to CIKM 2024Subjects: Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
- [742] arXiv:2405.03035 (replaced) [pdf, other]
-
Title: Probabilistic Finite Automaton Emptiness is undecidableAuthors: Günter RoteComments: 63 pages, 14 figures, 2 tables, 53 footnotes, 11 sections plus 1 appendix. Added another proof and more history, which had been overlooked beforeSubjects: Formal Languages and Automata Theory (cs.FL)
- [743] arXiv:2405.03064 (replaced) [pdf, other]
-
Title: RICE: Breaking Through the Training Bottlenecks of Reinforcement Learning with ExplanationComments: Accepted by ICML 2024Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)
- [744] arXiv:2405.04061 (replaced) [pdf, other]
-
Title: Generalized Cauchy-Schwarz Divergence and Its Deep Learning ApplicationsSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
- [745] arXiv:2405.04776 (replaced) [pdf, other]
-
Title: Chain of Thoughtlessness? An Analysis of CoT in PlanningSubjects: Artificial Intelligence (cs.AI)
- [746] arXiv:2405.05847 (replaced) [pdf, other]
-
Title: Learned feature representations are biased by complexity, learning order, position, and moreSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
- [747] arXiv:2405.07460 (replaced) [pdf, other]
-
Title: HoneyBee: A Scalable Modular Framework for Creating Multimodal Oncology Datasets with Foundational Embedding ModelsSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Databases (cs.DB)
- [748] arXiv:2405.07536 (replaced) [pdf, other]
-
Title: Multi-AUV Kinematic Task Assignment based on Self-organizing Map Neural Network and Dubins Path GeneratorSubjects: Robotics (cs.RO); Systems and Control (eess.SY)
- [749] arXiv:2405.09005 (replaced) [pdf, other]
-
Title: Cons-training tensor networksComments: v2: mostly improved Fig 1 and 13 for clarity, improved exposition of ideas, and fixed a couple of transcription bugs in the pseudo algo. 3Subjects: Numerical Analysis (math.NA); Machine Learning (cs.LG); Quantum Physics (quant-ph)
- [750] arXiv:2405.09482 (replaced) [pdf, other]
-
Title: Beyond Flesch-Kincaid: Prompt-based Metrics Improve Difficulty Classification of Educational TextsSubjects: Computation and Language (cs.CL)
- [751] arXiv:2405.10150 (replaced) [pdf, other]
-
Title: Speaker Verification in Agent-Generated ConversationsSubjects: Computation and Language (cs.CL)
- [752] arXiv:2405.10467 (replaced) [pdf, other]
-
Title: Agent Design Pattern Catalogue: A Collection of Architectural Patterns for Foundation Model based AgentsAuthors: Yue Liu, Sin Kit Lo, Qinghua Lu, Liming Zhu, Dehai Zhao, Xiwei Xu, Stefan Harrer, Jon WhittleSubjects: Artificial Intelligence (cs.AI); Software Engineering (cs.SE)
- [753] arXiv:2405.10517 (replaced) [pdf, other]
-
Title: Towards Better Question Generation in QA-based Event ExtractionComments: Accepted to ACL2024 FindingsSubjects: Computation and Language (cs.CL)
- [754] arXiv:2405.11684 (replaced) [pdf, other]
-
Title: Learning Regularities from Data using Spiking Functions: A TheorySubjects: Machine Learning (cs.LG); Information Theory (cs.IT)
- [755] arXiv:2405.11876 (replaced) [pdf, other]
-
Title: Understanding crypter-as-a-service in a popular underground marketplaceComments: A short version of this paper was accepted at the 6th Workshop on Attackers and Cyber-Crime Operations (WACCO)Subjects: Cryptography and Security (cs.CR)
- [756] arXiv:2405.11968 (replaced) [pdf, other]
-
Title: Conditional Shift-Robust Conformal Prediction for Graph Neural NetworkAuthors: S. AkanshaComments: 14 pages, 2 figures, 3 tablesSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
- [757] arXiv:2405.12684 (replaced) [pdf, other]
-
Title: Model Free Prediction with Uncertainty AssessmentSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
- [758] arXiv:2405.13034 (replaced) [pdf, other]
-
Title: Autonomous Workflow for Multimodal Fine-Grained Training Assistants Towards Mixed RealityAuthors: Jiahuan Pei, Irene Viola, Haochen Huang, Junxiao Wang, Moonisa Ahsan, Fanghua Ye, Jiang Yiming, Yao Sai, Di Wang, Zhumin Chen, Pengjie Ren, Pablo CesarComments: Accepted by ACL 2024Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
- [759] arXiv:2405.13753 (replaced) [pdf, other]
-
Title: A Dynamic Model of Performative Human-ML Collaboration: Theory and Empirical EvidenceComments: 9 Pages and appendixSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); General Economics (econ.GN)
- [760] arXiv:2405.13902 (replaced) [pdf, other]
-
Title: LOGIN: A Large Language Model Consulted Graph Neural Network Training FrameworkSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
- [761] arXiv:2405.14108 (replaced) [pdf, other]
-
Title: Deep Learning for Protein-Ligand Docking: Are We There Yet?Comments: 30 pages, 1 table, 27 figures. Under review. Code, data, tutorials, and benchmark results are available at this https URLSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Biomolecules (q-bio.BM); Quantitative Methods (q-bio.QM)
- [762] arXiv:2405.14156 (replaced) [pdf, other]
-
Title: Unveiling the Tapestry of Consistency in Large Vision-Language ModelsAuthors: Yuan Zhang, Fei Xiao, Tao Huang, Chun-Kai Fan, Hongyuan Dong, Jiawen Li, Jiacong Wang, Kuan Cheng, Shanghang Zhang, Haoyuan GuoComments: This project is available at this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [763] arXiv:2405.15671 (replaced) [pdf, other]
-
Title: The Undecidability of Quantified AnnouncementsComments: This paper contains a correction to the 2016 article, The Undecidablity of Quantified Announcements, published in Studia LogicaJournal-ref: The undecidability of quantified announcements. Studia Logica, 104(4) pages 597-640, 2016Subjects: Logic in Computer Science (cs.LO)
- [764] arXiv:2405.15769 (replaced) [pdf, other]
-
Title: FastDrag: Manipulate Anything in One StepComments: 13 pages, 13 figures, Project page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [765] arXiv:2405.16225 (replaced) [pdf, ps, other]
- [766] arXiv:2405.16488 (replaced) [pdf, ps, other]
- [767] arXiv:2405.16526 (replaced) [pdf, other]
-
Title: Past, Present, and Future of Citation Practices in HCIAuthors: Jonas OppenlaenderSubjects: Human-Computer Interaction (cs.HC); Computers and Society (cs.CY); Digital Libraries (cs.DL)
- [768] arXiv:2405.16849 (replaced) [pdf, other]
-
Title: Sync4D: Video Guided Controllable Dynamics for Physics-Based 4D GenerationAuthors: Zhoujie Fu, Jiacheng Wei, Wenhao Shen, Chaoyue Song, Xiaofeng Yang, Fayao Liu, Xulei Yang, Guosheng LinComments: Our project page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [769] arXiv:2405.17234 (replaced) [pdf, other]
- [770] arXiv:2405.17272 (replaced) [pdf, other]
-
Title: DPN: Decoupling Partition and Navigation for Neural Solvers of Min-max Vehicle Routing ProblemsSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
- [771] arXiv:2405.17345 (replaced) [pdf, other]
-
Title: Exploring and steering the moral compass of Large Language ModelsAuthors: Alejandro TlaieSubjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
- [772] arXiv:2405.17398 (replaced) [pdf, other]
-
Title: Vista: A Generalizable Driving World Model with High Fidelity and Versatile ControllabilityAuthors: Shenyuan Gao, Jiazhi Yang, Li Chen, Kashyap Chitta, Yihang Qiu, Andreas Geiger, Jun Zhang, Hongyang LiSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [773] arXiv:2405.17814 (replaced) [pdf, other]
-
Title: FAIntbench: A Holistic and Precise Benchmark for Bias Evaluation in Text-to-Image ModelsSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [774] arXiv:2405.18353 (replaced) [pdf, other]
-
Title: Simulating infinite-dimensional nonlinear diffusion bridgesAuthors: Gefan Yang, Elizabeth Louise Baker, Michael L. Severinsen, Christy Anna Hipsley, Stefan SommerSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
- [775] arXiv:2405.18457 (replaced) [pdf, other]
-
Title: Improving Linear System Solvers for Hyperparameter Optimisation in Iterative Gaussian ProcessesAuthors: Jihao Andreas Lin, Shreyas Padhy, Bruno Mlodozeniec, Javier Antorán, José Miguel Hernández-LobatoComments: Preprint. arXiv admin note: text overlap with arXiv:2405.18328Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
- [776] arXiv:2405.18860 (replaced) [pdf, other]
-
Title: Empowering Embodied Manipulation: A Bimanual-Mobile Robot Manipulation Dataset for Household TasksAuthors: Tianle Zhang, Dongjiang Li, Yihang Li, Zecui Zeng, Lin Zhao, Lei Sun, Yue Chen, Xuelong Wei, Yibing Zhan, Lusong Li, Xiaodong HeSubjects: Robotics (cs.RO)
- [777] arXiv:2405.18942 (replaced) [pdf, other]
-
Title: Verifiably Robust Conformal PredictionSubjects: Logic in Computer Science (cs.LO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
- [778] arXiv:2405.19732 (replaced) [pdf, other]
-
Title: Two Optimizers Are Better Than One: LLM Catalyst Empowers Gradient-Based Optimization for Prompt TuningSubjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Machine Learning (cs.LG)
- [779] arXiv:2405.19944 (replaced) [pdf, ps, other]
-
Title: Discrete-Time I&I Adaptive Interconnection and Damping Passivity-Based Control for Nonlinearly Parameterized Port-Controlled Hamiltonian SystemsComments: 31 pages, 9 figuresSubjects: Systems and Control (eess.SY)
- [780] arXiv:2405.20172 (replaced) [pdf, other]
-
Title: Iterative Feature Boosting for Explainable Speech Emotion RecognitionComments: Published in: 2023 International Conference on Machine Learning and Applications (ICMLA)Journal-ref: 2023 International Conference on Machine Learning and Applications (ICMLA), Jacksonville, FL, USA, 2023, pp. 543-549Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
- [781] arXiv:2405.20250 (replaced) [pdf, ps, other]
-
Title: Entropy annealing for policy mirror descent in continuous time and spaceSubjects: Optimization and Control (math.OC); Machine Learning (cs.LG); Probability (math.PR)
- [782] arXiv:2405.20267 (replaced) [pdf, other]
-
Title: Auto Arena of LLMs: Automating LLM Evaluations with Agent Peer-battles and Committee DiscussionsSubjects: Computation and Language (cs.CL)
- [783] arXiv:2405.20607 (replaced) [pdf, other]
-
Title: Textual Inversion and Self-supervised Refinement for Radiology Report GenerationAuthors: Yuanjiang Luo, Hongxiang Li, Xuan Wu, Meng Cao, Xiaoshuang Huang, Zhihong Zhu, Peixi Liao, Hu Chen, Yi ZhangComments: This paper has been early accepted by MICCAI 2024!Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [784] arXiv:2405.20703 (replaced) [pdf, other]
-
Title: It is Simple Sometimes: A Study On Improving Aspect-Based Sentiment Analysis PerformanceComments: Accepted to ACL 2024 FindingsSubjects: Computation and Language (cs.CL)
- [785] arXiv:2405.20988 (replaced) [pdf, other]
-
Title: Communication-Efficient Distributed Deep Learning via Federated Dynamic AveragingAuthors: Michail Theologitis, Georgios Frangias, Georgios Anestis, Vasilis Samoladas, Antonios DeligiannakisSubjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)
- [786] arXiv:2406.00083 (replaced) [pdf, other]
-
Title: BadRAG: Identifying Vulnerabilities in Retrieval Augmented Generation of Large Language ModelsSubjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Information Retrieval (cs.IR); Machine Learning (cs.LG)
- [787] arXiv:2406.00199 (replaced) [pdf, ps, other]
-
Title: Exfiltration of personal information from ChatGPT via prompt injectionAuthors: Gregory SchwartzmanSubjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computers and Society (cs.CY); Emerging Technologies (cs.ET)
- [788] arXiv:2406.00252 (replaced) [pdf, other]
-
Title: Multi-Modal and Multi-Agent Systems Meet Rationality: A SurveySubjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Multiagent Systems (cs.MA)
- [789] arXiv:2406.00307 (replaced) [pdf, other]
-
Title: HENASY: Learning to Assemble Scene-Entities for Egocentric Video-Language ModelComments: under submissionSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [790] arXiv:2406.00329 (replaced) [pdf, other]
-
Title: Whole Heart 3D+T Representation Learning Through Sparse 2D Cardiac MR ImagesSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [791] arXiv:2406.00670 (replaced) [pdf, other]
-
Title: Cascade-CLIP: Cascaded Vision-Language Embeddings Alignment for Zero-Shot Semantic SegmentationComments: Accepted by ICML 2024Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [792] arXiv:2406.00702 (replaced) [pdf, ps, other]
-
Title: Enhanced Classification of Heart Sounds Using Mel Frequency Cepstral Coefficients: A Comparative Study of Single and Ensemble Classifier StrategiesAuthors: Amir Masoud Rahmani, Amir Haider, Parisa Khoshvaght, Mohammad Adeli, Entesar Gemeay, Yazeed Alkhrijah, Mokhtar Mohammadi, Mehdi HosseinzadehSubjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
- [793] arXiv:2406.00773 (replaced) [pdf, other]
-
Title: Diffusion Tuning: Transferring Diffusion Models via Chain of ForgettingSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
- [794] arXiv:2406.00907 (replaced) [pdf, other]
-
Title: DDA: Dimensionality Driven Augmentation Search for Contrastive Learning in Laparoscopic SurgeryComments: 29 pages, 16 figures; MIDL 2024 - Medical Imaging with Deep LearningSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [795] arXiv:2406.01026 (replaced) [pdf, other]
-
Title: Strengthened Symbol Binding Makes Large Language Models Reliable Multiple-Choice SelectorsAuthors: Mengge Xue, Zhenyu Hu, Liqun Liu, Kuo Liao, Shuang Li, Honglin Han, Meng Zhao, Chengguo YinComments: Accept at ACL2024 MainJournal-ref: ACL 2024Subjects: Computation and Language (cs.CL)
- [796] arXiv:2406.01057 (replaced) [pdf, other]
-
Title: Knapsack with Vertex Cover, Set Cover, and Hitting SetSubjects: Data Structures and Algorithms (cs.DS); Computational Complexity (cs.CC)
- [797] arXiv:2406.01133 (replaced) [pdf, ps, other]
-
Title: Impact of Generative AI (Large Language Models) on the PRA model construction and maintenance, observationsSubjects: Performance (cs.PF)
- [798] arXiv:2406.01349 (replaced) [pdf, other]
-
Title: Unleashing Generalization of End-to-End Autonomous Driving with Controllable Long Video GenerationAuthors: Enhui Ma, Lijun Zhou, Tao Tang, Zhan Zhang, Dong Han, Junpeng Jiang, Kun Zhan, Peng Jia, Xianpeng Lang, Haiyang Sun, Di Lin, Kaicheng YuComments: Project Page: this https URL, 8 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [799] arXiv:2406.01392 (replaced) [pdf, other]
-
Title: Sparsity-Accelerated Training for Large Language ModelsAuthors: Da Ma, Lu Chen, Pengyu Wang, Hongshen Xu, Hanqi Li, Liangtai Sun, Su Zhu, Shuai Fan, Kai YuComments: Accepted to ACL 2024 FindingsSubjects: Computation and Language (cs.CL)
- [800] arXiv:2406.01425 (replaced) [pdf, other]
-
Title: Sensitivity-Informed Augmentation for Robust SegmentationAuthors: Laura Zheng, Wenjie Wei, Tony Wu, Jacob Clements, Shreelekha Revankar, Andre Harrison, Yu Shen, Ming C. LinComments: 10 pagesSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [801] arXiv:2406.01514 (replaced) [pdf, other]
-
Title: Decoupled Alignment for Robust Plug-and-Play AdaptationSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)
- [802] arXiv:2406.01548 (replaced) [pdf, other]
-
Title: How to discretize continuous state-action spaces in Q-learning: A symbolic control approachComments: Q-learning, Symbolic control, AbstractionSubjects: Systems and Control (eess.SY); Artificial Intelligence (cs.AI); Dynamical Systems (math.DS)
- [803] arXiv:2406.01624 (replaced) [pdf, other]
-
Title: Unveiling Hidden Factors: Explainable AI for Feature Boosting in Speech Emotion RecognitionComments: Published in: Springer Nature International Journal of Applied Intelligence (2024)Journal-ref: Applied Intelligence (2024), 1-24Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
- [804] arXiv:2406.01799 (replaced) [pdf, other]
-
Title: Online Control in Population DynamicsSubjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
- [805] arXiv:2406.01852 (replaced) [pdf, other]
-
Title: Non-uniformity is All You Need: Efficient and Timely Encrypted Traffic Classification With ECHOSubjects: Networking and Internet Architecture (cs.NI); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
- [806] arXiv:2406.01900 (replaced) [pdf, other]
-
Title: Follow-Your-Emoji: Fine-Controllable and Expressive Freestyle Portrait AnimationAuthors: Yue Ma, Hongyu Liu, Hongfa Wang, Heng Pan, Yingqing He, Junkun Yuan, Ailing Zeng, Chengfei Cai, Heung-Yeung Shum, Wei Liu, Qifeng ChenComments: Project Page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [807] arXiv:2406.01908 (replaced) [pdf, other]
-
Title: PDHG-Unrolled Learning-to-Optimize Method for Large-Scale Linear ProgrammingAuthors: Bingheng Li, Linxin Yang, Yupeng Chen, Senmiao Wang, Qian Chen, Haitao Mao, Yao Ma, Akang Wang, Tian Ding, Jiliang Tang, Ruoyu SunComments: Accepted by ICML 2024Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
- [808] arXiv:2406.02004 (replaced) [pdf, ps, other]
-
Title: Efficiently Train ASR Models that Memorize Less and Perform Better with Per-core ClippingComments: Accepted to Interspeech'24Subjects: Cryptography and Security (cs.CR); Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
- [809] arXiv:2406.02061 (replaced) [pdf, other]
-
Title: Alice in Wonderland: Simple Tasks Showing Complete Reasoning Breakdown in State-Of-the-Art Large Language ModelsComments: v1.1Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
- [810] arXiv:2406.02126 (replaced) [pdf, other]
-
Title: CityLight: A Universal Model Towards Real-world City-scale Traffic Signal Control CoordinationSubjects: Systems and Control (eess.SY); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multiagent Systems (cs.MA)
- [811] arXiv:2406.02169 (replaced) [src]
-
Title: A multilingual dataset for offensive language and hate speech detection for hausa, yoruba and igbo languagesComments: The experimental result was erroneously reported and we also omitted other authorsSubjects: Computation and Language (cs.CL)
- [812] arXiv:2406.02265 (replaced) [pdf, other]
-
Title: Understanding Retrieval Robustness for Retrieval-Augmented Image CaptioningComments: 9 pages, long paper at ACL 2024Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
- [813] arXiv:2406.02290 (replaced) [pdf, other]
-
Title: A Study of Optimizations for Fine-tuning Large Language ModelsComments: 10 pages, 4 figures. Revised text for clarity, updated referencesSubjects: Machine Learning (cs.LG)
- [814] arXiv:2406.02343 (replaced) [pdf, other]
-
Title: Cluster-Aware Similarity Diffusion for Instance RetrievalComments: This paper has been accepted by ICML2024Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
- [815] arXiv:2406.02347 (replaced) [pdf, other]
-
Title: Flash Diffusion: Accelerating Any Conditional Diffusion Model for Few Steps Image GenerationComments: 16 pages + 16 pages appendicesSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
- [816] arXiv:2406.02381 (replaced) [pdf, other]
-
Title: Kirigami: large convolutional kernels improve deep learning-based RNA secondary structure predictionComments: -Updated authorship and acknowledgementsSubjects: Biomolecules (q-bio.BM); Artificial Intelligence (cs.AI)
- [817] arXiv:2406.02541 (replaced) [pdf, other]
-
Title: Enhancing Temporal Consistency in Video Editing by Reconstructing Videos with 3D Gaussian SplattingComments: Project page at this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [818] arXiv:2406.02614 (replaced) [pdf, other]
-
Title: Frequency Enhanced Pre-training for Cross-city Few-shot Traffic ForecastingComments: Accepted by ECMLPKDD 2024 (Research Track)Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
- [819] arXiv:2406.02616 (replaced) [pdf, other]
-
Title: Adaptive Layer Splitting for Wireless LLM Inference in Edge Computing: A Model-Based Reinforcement Learning ApproachSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
- [820] arXiv:2406.02624 (replaced) [pdf, other]
-
Title: Take a Step Further: Understanding Page Spray in Linux Kernel ExploitationAuthors: Ziyi Guo, Dang K Le, Zhenpeng Lin, Kyle Zeng, Ruoyu Wang, Tiffany Bao, Yan Shoshitaishvili, Adam Doupé, Xinyu XingSubjects: Cryptography and Security (cs.CR); Software Engineering (cs.SE)
- [821] arXiv:2406.02749 (replaced) [pdf, other]
-
Title: Efficient Leverage Score Sampling for Tensor Train DecompositionSubjects: Data Structures and Algorithms (cs.DS)
- [822] arXiv:2406.02778 (replaced) [pdf, other]
-
Title: MS-IMAP -- A Multi-Scale Graph Embedding Approach for Interpretable Manifold LearningSubjects: Machine Learning (cs.LG)
- [823] arXiv:2406.02847 (replaced) [pdf, other]
-
Title: Exact Conversion of In-Context Learning to Model Weights in Linearized-Attention TransformersComments: Accepted to ICML 2024Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
- [824] arXiv:2406.02875 (replaced) [pdf, other]
-
Title: Leveraging KANs For Enhanced Deep Koopman Operator DiscoveryComments: 6 pages, 4 figures, 2 tablesSubjects: Machine Learning (cs.LG); Dynamical Systems (math.DS); Applied Physics (physics.app-ph); Computational Physics (physics.comp-ph)
- [825] arXiv:2406.02876 (replaced) [pdf, other]
-
Title: LCS: A Language Converter Strategy for Zero-Shot Neural Machine TranslationComments: ACL2024 Findings, Codes are at this https URLSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
- [826] arXiv:2406.02881 (replaced) [pdf, other]
-
Title: Inv-Adapter: ID Customization Generation via Image Inversion and Lightweight AdapterComments: technical reportSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [827] arXiv:2406.02882 (replaced) [pdf, other]
-
Title: Outdated Issue Aware Decoding for Factual Knowledge EditingComments: ACL2024 Findings, Codes are at this https URLSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
- [828] arXiv:2406.02886 (replaced) [pdf, other]
-
Title: PLaD: Preference-based Large Language Model Distillation with Pseudo-Preference PairsAuthors: Rongzhi Zhang, Jiaming Shen, Tianqi Liu, Haorui Wang, Zhen Qin, Feng Han, Jialu Liu, Simon Baumgartner, Michael Bendersky, Chao ZhangComments: Findings of ACL 2024Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
- [829] arXiv:2406.02887 (replaced) [pdf, other]
-
Title: USM RNN-T model weights binarizationSubjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
- [830] arXiv:2406.02918 (replaced) [pdf, other]
-
Title: U-KAN Makes Strong Backbone for Medical Image Segmentation and GenerationSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
- [831] arXiv:2406.02966 (replaced) [pdf, ps, other]
-
Title: Generative AI and Digital Neocolonialism in Global Education: Towards an Equitable FrameworkSubjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI)
- [832] arXiv:2406.03051 (replaced) [pdf, other]
-
Title: Adapter-X: A Novel General Parameter-Efficient Fine-Tuning Framework for VisionSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [833] arXiv:2406.03095 (replaced) [pdf, other]
-
Title: EgoSurgery-Tool: A Dataset of Surgical Tool and Hand Detection from Egocentric Open Surgery VideosSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
- [834] arXiv:2406.03099 (replaced) [pdf, other]
-
Title: Graph Convolutional Branch and BoundComments: Submitted to European Journal of Operational ResearchSubjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
- [835] arXiv:2406.03145 (replaced) [pdf, other]
-
Title: E(n) Equivariant Message Passing Cellular NetworksSubjects: Machine Learning (cs.LG)
- [836] arXiv:2406.03151 (replaced) [pdf, other]
-
Title: Which Side Are You On? A Multi-task Dataset for End-to-End Argument Summarisation and EvaluationAuthors: Hao Li, Yuping Wu, Viktor Schlegel, Riza Batista-Navarro, Tharindu Madusanka, Iqra Zahid, Jiayan Zeng, Xiaochi Wang, Xinran He, Yizhi Li, Goran NenadicComments: Published on ACL 2024 FindingsSubjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
- [837] arXiv:2406.03154 (replaced) [pdf, other]
-
Title: Detecting Model Misspecification in Amortized Bayesian Inference with Neural Networks: An Extended InvestigationComments: Extended version of the conference paper this https URL arXiv admin note: text overlap with arXiv:2112.08866Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
- [838] arXiv:2406.03170 (replaced) [pdf, other]
-
Title: StatBot.Swiss: Bilingual Open Data Exploration in Natural LanguageAuthors: Farhad Nooralahzadeh, Yi Zhang, Ellery Smith, Sabine Maennel, Cyril Matthey-Doret, Raphaël de Fondville, Kurt StockingerComments: This work is accepted at ACL Findings 2024Subjects: Computation and Language (cs.CL)
- [839] arXiv:2406.03248 (replaced) [pdf, other]
-
Title: Large Language Models as Evaluators for Recommendation ExplanationsSubjects: Information Retrieval (cs.IR); Computation and Language (cs.CL)
- [840] arXiv:2406.03253 (replaced) [pdf, other]
-
Title: Generating Explanations for Cellular Neural NetworksSubjects: Machine Learning (cs.LG)
- [841] arXiv:2406.03262 (replaced) [pdf, other]
-
Title: ADer: A Comprehensive Benchmark for Multi-class Visual Anomaly DetectionAuthors: Jiangning Zhang, Haoyang He, Zhenye Gan, Qingdong He, Yuxuan Cai, Zhucun Xue, Yabiao Wang, Chengjie Wang, Lei Xie, Yong LiuSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [842] arXiv:2406.03337 (replaced) [pdf, other]
-
Title: Identifying latent state transition in non-linear dynamical systemsSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
- [843] arXiv:2406.03345 (replaced) [pdf, other]
-
Title: Feature Contamination: Neural Networks Learn Uncorrelated Features and Fail to GeneralizeComments: ICML 2024Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
- [844] arXiv:2406.03437 (replaced) [pdf, other]
-
Title: Transfer Learning for Latent Variable Network ModelsSubjects: Machine Learning (cs.LG)
- [845] arXiv:2406.03452 (replaced) [pdf, other]
-
Title: Using Synchronic Definitions and Semantic Relations to Classify Semantic Change TypesSubjects: Computation and Language (cs.CL)
- [846] arXiv:2406.03488 (replaced) [pdf, other]
-
Title: Seq1F1B: Efficient Sequence-Level Pipeline Parallelism for Large Language Model TrainingComments: 12 pages, 4 figures, 6 tablesSubjects: Distributed, Parallel, and Cluster Computing (cs.DC)
[ showing 500 entries per page: fewer | more | all ]
Disable MathJax (What is MathJax?)
Links to: arXiv, form interface, find, cs, recent, 2406, contact, help (Access key information)