Computer Science

New submissions, skipping first 1000

Submissions received from Wed 5 Jun 24 to Thu 6 Jun 24, announced Fri, 7 Jun 24

New submissions
Cross-lists
Replacements

[ total of 846 entries: 1-500 | 347-846 ]
[ showing 500 entries per page: fewer | more | all ]

New submissions for Fri, 7 Jun 24 (continued, showing last 48 of 394 entries)

[347] arXiv:2406.04280 [pdf, other]: Title: xMIL: Insightful Explanations for Multiple Instance Learning in Histopathology

Authors: Julius Hense, Mina Jamshidi Idaji, Oliver Eberle, Thomas Schnake, Jonas Dippel, Laure Ciernik, Oliver Buchstab, Andreas Mock, Frederick Klauschen, Klaus-Robert Müller

Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

Multiple instance learning (MIL) is an effective and widely used approach for weakly supervised machine learning. In histopathology, MIL models have achieved remarkable success in tasks like tumor detection, biomarker prediction, and outcome prognostication. However, MIL explanation methods are still lagging behind, as they are limited to small bag sizes or disregard instance interactions. We revisit MIL through the lens of explainable AI (XAI) and introduce xMIL, a refined framework with more general assumptions. We demonstrate how to obtain improved MIL explanations using layer-wise relevance propagation (LRP) and conduct extensive evaluation experiments on three toy settings and four real-world histopathology datasets. Our approach consistently outperforms previous explanation attempts with particularly improved faithfulness scores on challenging biomarker prediction tasks. Finally, we showcase how xMIL explanations enable pathologists to extract insights from MIL models, representing a significant advance for knowledge discovery and model debugging in digital histopathology.
[348] arXiv:2406.04284 [pdf, other]: Title: What is Dataset Distillation Learning?

Authors: William Yang, Ye Zhu, Zhiwei Deng, Olga Russakovsky

Comments: ICML 2024

Subjects: Machine Learning (cs.LG)

Dataset distillation has emerged as a strategy to overcome the hurdles associated with large datasets by learning a compact set of synthetic data that retains essential information from the original dataset. While distilled data can be used to train high performing models, little is understood about how the information is stored. In this study, we posit and answer three questions about the behavior, representativeness, and point-wise information content of distilled data. We reveal distilled data cannot serve as a substitute for real data during training outside the standard evaluation setting for dataset distillation. Additionally, the distillation process retains high task performance by compressing information related to the early training dynamics of real models. Finally, we provide an framework for interpreting distilled data and reveal that individual distilled data points contain meaningful semantic information. This investigation sheds light on the intricate nature of distilled data, providing a better understanding on how they can be effectively utilized.
[349] arXiv:2406.04286 [pdf, other]: Title: ABEX: Data Augmentation for Low-Resource NLU via Expanding Abstract Descriptions

Authors: Sreyan Ghosh, Utkarsh Tyagi, Sonal Kumar, C. K. Evuru, S Ramaneswaran, S Sakshi, Dinesh Manocha

Comments: ACL 2024 Main Conference. Code and data: this https URL

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

We present ABEX, a novel and effective generative data augmentation methodology for low-resource Natural Language Understanding (NLU) tasks. ABEX is based on ABstract-and-EXpand, a novel paradigm for generating diverse forms of an input document -- we first convert a document into its concise, abstract description and then generate new documents based on expanding the resultant abstraction. To learn the task of expanding abstract descriptions, we first train BART on a large-scale synthetic dataset with abstract-document pairs. Next, to generate abstract descriptions for a document, we propose a simple, controllable, and training-free method based on editing AMR graphs. ABEX brings the best of both worlds: by expanding from abstract representations, it preserves the original semantic properties of the documents, like style and meaning, thereby maintaining alignment with the original label and data distribution. At the same time, the fundamental process of elaborating on abstract descriptions facilitates diverse generations. We demonstrate the effectiveness of ABEX on 4 NLU tasks spanning 12 datasets and 4 low-resource settings. ABEX outperforms all our baselines qualitatively with improvements of 0.04% - 38.8%. Qualitatively, ABEX outperforms all prior methods from literature in terms of context and length diversity.
[350] arXiv:2406.04287 [pdf, other]: Title: SpectralZoom: Efficient Segmentation with an Adaptive Hyperspectral Camera

Authors: Jackson Arnold, Sophia Rossi, Chloe Petrosino, Ethan Mitchell, Sanjeev J. Koppal

Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)

Hyperspectral image segmentation is crucial for many fields such as agriculture, remote sensing, biomedical imaging, battlefield sensing and astronomy. However, the challenge of hyper and multi spectral imaging is its large data footprint. We propose both a novel camera design and a vision transformer-based (ViT) algorithm that alleviate both the captured data footprint and the computational load for hyperspectral segmentation. Our camera is able to adaptively sample image regions or patches at different resolutions, instead of capturing the entire hyperspectral cube at one high resolution. Our segmentation algorithm works in concert with the camera, applying ViT-based segmentation only to adaptively selected patches. We show results both in simulation and on a real hardware platform demonstrating both accurate segmentation results and reduced computational burden.
[351] arXiv:2406.04289 [pdf, other]: Title: What Languages are Easy to Language-Model? A Perspective from Learning Probabilistic Regular Languages

Authors: Nadav Borenstein, Anej Svete, Robin Chan, Josef Valvoda, Franz Nowak, Isabelle Augenstein, Eleanor Chodroff, Ryan Cotterell

Comments: Accepted to ACL 2024

Subjects: Computation and Language (cs.CL)

What can large language models learn? By definition, language models (LM) are distributions over strings. Therefore, an intuitive way of addressing the above question is to formalize it as a matter of learnability of classes of distributions over strings. While prior work in this direction focused on assessing the theoretical limits, in contrast, we seek to understand the empirical learnability. Unlike prior empirical work, we evaluate neural LMs on their home turf-learning probabilistic languages-rather than as classifiers of formal languages. In particular, we investigate the learnability of regular LMs (RLMs) by RNN and Transformer LMs. We empirically test the learnability of RLMs as a function of various complexity parameters of the RLM and the hidden state size of the neural LM. We find that the RLM rank, which corresponds to the size of linear space spanned by the logits of its conditional distributions, and the expected length of sampled strings are strong and significant predictors of learnability for both RNNs and Transformers. Several other predictors also reach significance, but with differing patterns between RNNs and Transformers.
[352] arXiv:2406.04290 [pdf, other]: Title: Providing High-Performance Execution with a Sequential Contract for Cryptographic Programs

Authors: Ali Hajiabadi, Trevor E. Carlson

Comments: 17 pages, 7 figures, 4 tables

Subjects: Cryptography and Security (cs.CR); Hardware Architecture (cs.AR)

Constant-time programming is a widely deployed approach to harden cryptographic programs against side channel attacks. However, modern processors violate the underlying assumptions of constant-time policies by speculatively executing unintended paths of the program.
In this work, we propose Cassandra, a novel hardware-software mechanism to protect constant-time cryptographic code against speculative control flow based attacks. Cassandra explores the radical design point of disabling the branch predictor and recording-and-replaying sequential control flow of the program. Two key insights that enable our design are that (1) the sequential control flow of a constant-time program is constant over different runs, and (2) cryptographic programs are highly looped and their control flow patterns repeat in a highly compressible way. These insights allow us to perform an offline branch analysis that significantly compresses control flow traces. We add a small component to a typical processor design, the Branch Trace Unit, to store compressed traces and determine fetch redirections according to the sequential model of the program. Moreover, we provide a formal security analysis and prove that our methodology adheres to a strong security contract by design. Despite providing a higher security guarantee, Cassandra counter-intuitively improves performance by 1.77% by eliminating branch misprediction penalties.
[353] arXiv:2406.04291 [pdf, other]: Title: Stratified Prediction-Powered Inference for Hybrid Language Model Evaluation

Authors: Adam Fisch, Joshua Maynez, R. Alex Hofer, Bhuwan Dhingra, Amir Globerson, William W. Cohen

Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Prediction-powered inference (PPI) is a method that improves statistical estimates based on limited human-labeled data. PPI achieves this by combining small amounts of human-labeled data with larger amounts of data labeled by a reasonably accurate -- but potentially biased -- automatic system, in a way that results in tighter confidence intervals for certain parameters of interest (e.g., the mean performance of a language model). In this paper, we propose a method called Stratified Prediction-Powered Inference (StratPPI), in which we show that the basic PPI estimates can be considerably improved by employing simple data stratification strategies. Without making any assumptions on the underlying automatic labeling system or data distribution, we derive an algorithm for computing provably valid confidence intervals for population parameters (such as averages) that is based on stratified sampling. In particular, we show both theoretically and empirically that, with appropriate choices of stratification and sample allocation, our approach can provide substantially tighter confidence intervals than unstratified approaches. Specifically, StratPPI is expected to improve in cases where the performance of the autorater varies across different conditional distributions of the target data.
[354] arXiv:2406.04292 [pdf, other]: Title: VISTA: Visualized Text Embedding For Universal Multi-Modal Retrieval

Authors: Junjie Zhou, Zheng Liu, Shitao Xiao, Bo Zhao, Yongping Xiong

Comments: Accepted to ACL 2024 main conference

Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)

Multi-modal retrieval becomes increasingly popular in practice. However, the existing retrievers are mostly text-oriented, which lack the capability to process visual information. Despite the presence of vision-language models like CLIP, the current methods are severely limited in representing the text-only and image-only data. In this work, we present a new embedding model VISTA for universal multi-modal retrieval. Our work brings forth threefold technical contributions. Firstly, we introduce a flexible architecture which extends a powerful text encoder with the image understanding capability by introducing visual token embeddings. Secondly, we develop two data generation strategies, which bring high-quality composed image-text to facilitate the training of the embedding model. Thirdly, we introduce a multi-stage training algorithm, which first aligns the visual token embedding with the text encoder using massive weakly labeled data, and then develops multi-modal representation capability using the generated composed image-text data. In our experiments, VISTA achieves superior performances across a variety of multi-modal retrieval tasks in both zero-shot and supervised settings. Our model, data, and source code are available at https://github.com/FlagOpen/FlagEmbedding.
[355] arXiv:2406.04295 [pdf, other]: Title: Everything to the Synthetic: Diffusion-driven Test-time Adaptation via Synthetic-Domain Alignment

Authors: Jiayi Guo, Junhao Zhao, Chunjiang Ge, Chaoqun Du, Zanlin Ni, Shiji Song, Humphrey Shi, Gao Huang

Comments: GitHub: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Test-time adaptation (TTA) aims to enhance the performance of source-domain pretrained models when tested on unknown shifted target domains. Traditional TTA methods primarily adapt model weights based on target data streams, making model performance sensitive to the amount and order of target data. Recently, diffusion-driven TTA methods have demonstrated strong performance by using an unconditional diffusion model, which is also trained on the source domain to transform target data into synthetic data as a source domain projection. This allows the source model to make predictions without weight adaptation. In this paper, we argue that the domains of the source model and the synthetic data in diffusion-driven TTA methods are not aligned. To adapt the source model to the synthetic domain of the unconditional diffusion model, we introduce a Synthetic-Domain Alignment (SDA) framework to fine-tune the source model with synthetic data. Specifically, we first employ a conditional diffusion model to generate labeled samples, creating a synthetic dataset. Subsequently, we use the aforementioned unconditional diffusion model to add noise to and denoise each sample before fine-tuning. This process mitigates the potential domain gap between the conditional and unconditional models. Extensive experiments across various models and benchmarks demonstrate that SDA achieves superior domain alignment and consistently outperforms existing diffusion-driven TTA methods. Our code is available at https://github.com/SHI-Labs/Diffusion-Driven-Test-Time-Adaptation-via-Synthetic-Domain-Alignment.
[356] arXiv:2406.04298 [pdf, other]: Title: Measuring and Addressing Indexical Bias in Information Retrieval

Authors: Caleb Ziems, William Held, Jane Dwivedi-Yu, Diyi Yang

Comments: ACL 2024

Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL)

Information Retrieval (IR) systems are designed to deliver relevant content, but traditional systems may not optimize rankings for fairness, neutrality, or the balance of ideas. Consequently, IR can often introduce indexical biases, or biases in the positional order of documents. Although indexical bias can demonstrably affect people's opinion, voting patterns, and other behaviors, these issues remain understudied as the field lacks reliable metrics and procedures for automatically measuring indexical bias. Towards this end, we introduce the PAIR framework, which supports automatic bias audits for ranked documents or entire IR systems. After introducing DUO, the first general-purpose automatic bias metric, we run an extensive evaluation of 8 IR systems on a new corpus of 32k synthetic and 4.7k natural documents, with 4k queries spanning 1.4k controversial issue topics. A human behavioral study validates our approach, showing that our bias metric can help predict when and how indexical bias will shift a reader's opinion.
[357] arXiv:2406.04299 [pdf, other]: Title: NoisyGL: A Comprehensive Benchmark for Graph Neural Networks under Label Noise

Authors: Zhonghao Wang, Danyu Sun, Sheng Zhou, Haobo Wang, Jiapei Fan, Longtao Huang, Jiajun Bu

Comments: Submitted to the 38th Conference on Neural Information Processing Systems (NeurIPS 2024) Track on Datasets and Benchmarks

Subjects: Machine Learning (cs.LG); Social and Information Networks (cs.SI)

Graph Neural Networks (GNNs) exhibit strong potential in node classification task through a message-passing mechanism. However, their performance often hinges on high-quality node labels, which are challenging to obtain in real-world scenarios due to unreliable sources or adversarial attacks. Consequently, label noise is common in real-world graph data, negatively impacting GNNs by propagating incorrect information during training. To address this issue, the study of Graph Neural Networks under Label Noise (GLN) has recently gained traction. However, due to variations in dataset selection, data splitting, and preprocessing techniques, the community currently lacks a comprehensive benchmark, which impedes deeper understanding and further development of GLN. To fill this gap, we introduce NoisyGL in this paper, the first comprehensive benchmark for graph neural networks under label noise. NoisyGL enables fair comparisons and detailed analyses of GLN methods on noisy labeled graph data across various datasets, with unified experimental settings and interface. Our benchmark has uncovered several important insights that were missed in previous research, and we believe these findings will be highly beneficial for future studies. We hope our open-source benchmark library will foster further advancements in this field. The code of the benchmark can be found in https://github.com/eaglelab-zju/NoisyGL.
[358] arXiv:2406.04300 [pdf, other]: Title: Text-to-Drive: Diverse Driving Behavior Synthesis via Large Language Models

Authors: Phat Nguyen, Tsun-Hsuan Wang, Zhang-Wei Hong, Sertac Karaman, Daniela Rus

Comments: 14 pages, 7 figures

Subjects: Robotics (cs.RO)

Generating varied scenarios through simulation is crucial for training and evaluating safety-critical systems, such as autonomous vehicles. Yet, the task of modeling the trajectories of other vehicles to simulate diverse and meaningful close interactions remains prohibitively costly. Adopting language descriptions to generate driving behaviors emerges as a promising strategy, offering a scalable and intuitive method for human operators to simulate a wide range of driving interactions. However, the scarcity of large-scale annotated language-trajectory data makes this approach challenging.
To address this gap, we propose Text-to-Drive (T2D) to synthesize diverse driving behaviors via Large Language Models (LLMs). We introduce a knowledge-driven approach that operates in two stages. In the first stage, we employ the embedded knowledge of LLMs to generate diverse language descriptions of driving behaviors for a scene. Then, we leverage LLM's reasoning capabilities to synthesize these behaviors in simulation. At its core, T2D employs an LLM to construct a state chart that maps low-level states to high-level abstractions. This strategy aids in downstream tasks such as summarizing low-level observations, assessing policy alignment with behavior description, and shaping the auxiliary reward, all without needing human supervision. With our knowledge-driven approach, we demonstrate that T2D generates more diverse trajectories compared to other baselines and offers a natural language interface that allows for interactive incorporation of human preference. Please check our website for more examples: https://text-to-drive.github.io/
[359] arXiv:2406.04301 [pdf, other]: Title: Neural Surface Reconstruction from Sparse Views Using Epipolar Geometry

Authors: Kaichen Zhou

Subjects: Computer Vision and Pattern Recognition (cs.CV)

This paper addresses the challenge of reconstructing surfaces from sparse view inputs, where ambiguity and occlusions due to missing information pose significant hurdles. We present a novel approach, named EpiS, that incorporates Epipolar information into the reconstruction process. Existing methods in sparse-view neural surface learning have mainly focused on mean and variance considerations using cost volumes for feature extraction. In contrast, our method aggregates coarse information from the cost volume into Epipolar features extracted from multiple source views, enabling the generation of fine-grained Signal Distance Function (SDF)-aware features. Additionally, we employ an attention mechanism along the line dimension to facilitate feature fusion based on the SDF feature. Furthermore, to address the information gaps in sparse conditions, we integrate depth information from monocular depth estimation using global and local regularization techniques. The global regularization utilizes a triplet loss function, while the local regularization employs a derivative loss function. Extensive experiments demonstrate that our approach outperforms state-of-the-art methods, especially in cases with sparse and generalizable conditions.
[360] arXiv:2406.04302 [pdf, other]: Title: Representational Alignment Supports Effective Machine Teaching

Authors: Ilia Sucholutsky, Katherine M. Collins, Maya Malaviya, Nori Jacoby, Weiyang Liu, Theodore R. Sumers, Michalis Korakakis, Umang Bhatt, Mark Ho, Joshua B. Tenenbaum, Brad Love, Zachary A. Pardos, Adrian Weller, Thomas L. Griffiths

Comments: Preprint

Subjects: Machine Learning (cs.LG)

A good teacher should not only be knowledgeable; but should be able to communicate in a way that the student understands -- to share the student's representation of the world. In this work, we integrate insights from machine teaching and pragmatic communication with the burgeoning literature on representational alignment to characterize a utility curve defining a relationship between representational alignment and teacher capability for promoting student learning. To explore the characteristics of this utility curve, we design a supervised learning environment that disentangles representational alignment from teacher accuracy. We conduct extensive computational experiments with machines teaching machines, complemented by a series of experiments in which machines teach humans. Drawing on our findings that improved representational alignment with a student improves student learning outcomes (i.e., task accuracy), we design a classroom matching procedure that assigns students to teachers based on the utility curve. If we are to design effective machine teachers, it is not enough to build teachers that are accurate -- we want teachers that can align, representationally, to their students too.
[361] arXiv:2406.04303 [pdf, other]: Title: Vision-LSTM: xLSTM as Generic Vision Backbone

Authors: Benedikt Alkin, Maximilian Beck, Korbinian Pöppel, Sepp Hochreiter, Johannes Brandstetter

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Transformers are widely used as generic backbones in computer vision, despite initially introduced for natural language processing. Recently, the Long Short-Term Memory (LSTM) has been extended to a scalable and performant architecture - the xLSTM - which overcomes long-standing LSTM limitations via exponential gating and parallelizable matrix memory structure. In this report, we introduce Vision-LSTM (ViL), an adaption of the xLSTM building blocks to computer vision. ViL comprises a stack of xLSTM blocks where odd blocks process the sequence of patch tokens from top to bottom while even blocks go from bottom to top. Experiments show that ViL holds promise to be further deployed as new generic backbone for computer vision architectures.
[362] arXiv:2406.04306 [pdf, other]: Title: Semantically Diverse Language Generation for Uncertainty Estimation in Language Models

Authors: Lukas Aichberger, Kajetan Schweighofer, Mykyta Ielanskyi, Sepp Hochreiter

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Large language models (LLMs) can suffer from hallucinations when generating text. These hallucinations impede various applications in society and industry by making LLMs untrustworthy. Current LLMs generate text in an autoregressive fashion by predicting and appending text tokens. When an LLM is uncertain about the semantic meaning of the next tokens to generate, it is likely to start hallucinating. Thus, it has been suggested that hallucinations stem from predictive uncertainty. We introduce Semantically Diverse Language Generation (SDLG) to quantify predictive uncertainty in LLMs. SDLG steers the LLM to generate semantically diverse yet likely alternatives for an initially generated text. This approach provides a precise measure of aleatoric semantic uncertainty, detecting whether the initial text is likely to be hallucinated. Experiments on question-answering tasks demonstrate that SDLG consistently outperforms existing methods while being the most computationally efficient, setting a new standard for uncertainty estimation in LLMs.
[363] arXiv:2406.04308 [pdf, other]: Title: Approximation-Aware Bayesian Optimization

Authors: Natalie Maus, Kyurae Kim, Geoff Pleiss, David Eriksson, John P. Cunningham, Jacob R. Gardner

Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

High-dimensional Bayesian optimization (BO) tasks such as molecular design often require 10,000 function evaluations before obtaining meaningful results. While methods like sparse variational Gaussian processes (SVGPs) reduce computational requirements in these settings, the underlying approximations result in suboptimal data acquisitions that slow the progress of optimization. In this paper we modify SVGPs to better align with the goals of BO: targeting informed data acquisition rather than global posterior fidelity. Using the framework of utility-calibrated variational inference, we unify GP approximation and data acquisition into a joint optimization problem, thereby ensuring optimal decisions under a limited computational budget. Our approach can be used with any decision-theoretic acquisition function and is compatible with trust region methods like TuRBO. We derive efficient joint objectives for the expected improvement and knowledge gradient acquisition functions in both the standard and batch BO settings. Our approach outperforms standard SVGPs on high-dimensional benchmark tasks in control and molecular design.
[364] arXiv:2406.04309 [pdf, other]: Title: ReFiNe: Recursive Field Networks for Cross-modal Multi-scene Representation

Authors: Sergey Zakharov, Katherine Liu, Adrien Gaidon, Rares Ambrus

Comments: SIGGRAPH 2024. Project Page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (cs.LG); Multimedia (cs.MM)

The common trade-offs of state-of-the-art methods for multi-shape representation (a single model "packing" multiple objects) involve trading modeling accuracy against memory and storage. We show how to encode multiple shapes represented as continuous neural fields with a higher degree of precision than previously possible and with low memory usage. Key to our approach is a recursive hierarchical formulation that exploits object self-similarity, leading to a highly compressed and efficient shape latent space. Thanks to the recursive formulation, our method supports spatial and global-to-local latent feature fusion without needing to initialize and maintain auxiliary data structures, while still allowing for continuous field queries to enable applications such as raytracing. In experiments on a set of diverse datasets, we provide compelling qualitative results and demonstrate state-of-the-art multi-scene reconstruction and compression results with a single network per dataset.
[365] arXiv:2406.04312 [pdf, other]: Title: ReNO: Enhancing One-step Text-to-Image Models through Reward-based Noise Optimization

Authors: Luca Eyring, Shyamgopal Karthik, Karsten Roth, Alexey Dosovitskiy, Zeynep Akata

Comments: Preprint

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Text-to-Image (T2I) models have made significant advancements in recent years, but they still struggle to accurately capture intricate details specified in complex compositional prompts. While fine-tuning T2I models with reward objectives has shown promise, it suffers from "reward hacking" and may not generalize well to unseen prompt distributions. In this work, we propose Reward-based Noise Optimization (ReNO), a novel approach that enhances T2I models at inference by optimizing the initial noise based on the signal from one or multiple human preference reward models. Remarkably, solving this optimization problem with gradient ascent for 50 iterations yields impressive results on four different one-step models across two competitive benchmarks, T2I-CompBench and GenEval. Within a computational budget of 20-50 seconds, ReNO-enhanced one-step models consistently surpass the performance of all current open-source Text-to-Image models. Extensive user studies demonstrate that our model is preferred nearly twice as often compared to the popular SDXL model and is on par with the proprietary Stable Diffusion 3 with 8B parameters. Moreover, given the same computational resources, a ReNO-optimized one-step model outperforms widely-used open-source models such as SDXL and PixArt-$\alpha$, highlighting the efficiency and effectiveness of ReNO in enhancing T2I model performance at inference time. Code is available at https://github.com/ExplainableML/ReNO.
[366] arXiv:2406.04313 [pdf, other]: Title: Improving Alignment and Robustness with Short Circuiting

Authors: Andy Zou, Long Phan, Justin Wang, Derek Duenas, Maxwell Lin, Maksym Andriushchenko, Rowan Wang, Zico Kolter, Matt Fredrikson, Dan Hendrycks

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Computers and Society (cs.CY)

AI systems can take harmful actions and are highly vulnerable to adversarial attacks. We present an approach, inspired by recent advances in representation engineering, that "short-circuits" models as they respond with harmful outputs. Existing techniques aimed at improving alignment, such as refusal training, are often bypassed. Techniques such as adversarial training try to plug these holes by countering specific attacks. As an alternative to refusal training and adversarial training, short-circuiting directly controls the representations that are responsible for harmful outputs in the first place. Our technique can be applied to both text-only and multimodal language models to prevent the generation of harmful outputs without sacrificing utility -- even in the presence of powerful unseen attacks. Notably, while adversarial robustness in standalone image recognition remains an open challenge, short-circuiting allows the larger multimodal system to reliably withstand image "hijacks" that aim to produce harmful content. Finally, we extend our approach to AI agents, demonstrating considerable reductions in the rate of harmful actions when they are under attack. Our approach represents a significant step forward in the development of reliable safeguards to harmful behavior and adversarial attacks.
[367] arXiv:2406.04314 [pdf, other]: Title: Step-aware Preference Optimization: Aligning Preference with Denoising Performance at Each Step

Authors: Zhanhao Liang, Yuhui Yuan, Shuyang Gu, Bohan Chen, Tiankai Hang, Ji Li, Liang Zheng

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Recently, Direct Preference Optimization (DPO) has extended its success from aligning large language models (LLMs) to aligning text-to-image diffusion models with human preferences. Unlike most existing DPO methods that assume all diffusion steps share a consistent preference order with the final generated images, we argue that this assumption neglects step-specific denoising performance and that preference labels should be tailored to each step's contribution. To address this limitation, we propose Step-aware Preference Optimization (SPO), a novel post-training approach that independently evaluates and adjusts the denoising performance at each step, using a step-aware preference model and a step-wise resampler to ensure accurate step-aware supervision. Specifically, at each denoising step, we sample a pool of images, find a suitable win-lose pair, and, most importantly, randomly select a single image from the pool to initialize the next denoising step. This step-wise resampler process ensures the next win-lose image pair comes from the same image, making the win-lose comparison independent of the previous step. To assess the preferences at each step, we train a separate step-aware preference model that can be applied to both noisy and clean images. Our experiments with Stable Diffusion v1.5 and SDXL demonstrate that SPO significantly outperforms the latest Diffusion-DPO in aligning generated images with complex, detailed prompts and enhancing aesthetics, while also achieving more than 20x times faster in training efficiency. Code and model: https://rockeycoss.github.io/spo.github.io/
[368] arXiv:2406.04316 [pdf, other]: Title: Omni6DPose: A Benchmark and Model for Universal 6D Object Pose Estimation and Tracking

Authors: Jiyao Zhang, Weiyao Huang, Bo Peng, Mingdong Wu, Fei Hu, Zijian Chen, Bo Zhao, Hao Dong

Subjects: Computer Vision and Pattern Recognition (cs.CV)

6D Object Pose Estimation is a crucial yet challenging task in computer vision, suffering from a significant lack of large-scale datasets. This scarcity impedes comprehensive evaluation of model performance, limiting research advancements. Furthermore, the restricted number of available instances or categories curtails its applications. To address these issues, this paper introduces Omni6DPose, a substantial dataset characterized by its diversity in object categories, large scale, and variety in object materials. Omni6DPose is divided into three main components: ROPE (Real 6D Object Pose Estimation Dataset), which includes 332K images annotated with over 1.5M annotations across 581 instances in 149 categories; SOPE(Simulated 6D Object Pose Estimation Dataset), consisting of 475K images created in a mixed reality setting with depth simulation, annotated with over 5M annotations across 4162 instances in the same 149 categories; and the manually aligned real scanned objects used in both ROPE and SOPE. Omni6DPose is inherently challenging due to the substantial variations and ambiguities. To address this challenge, we introduce GenPose++, an enhanced version of the SOTA category-level pose estimation framework, incorporating two pivotal improvements: Semantic-aware feature extraction and Clustering-based aggregation. Moreover, we provide a comprehensive benchmarking analysis to evaluate the performance of previous methods on this large-scale dataset in the realms of 6D object pose estimation and pose tracking.
[369] arXiv:2406.04317 [pdf, other]: Title: Regularized KL-Divergence for Well-Defined Function-Space Variational Inference in Bayesian neural networks

Authors: Tristan Cinquin, Robert Bamler

Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Bayesian neural networks (BNN) promise to combine the predictive performance of neural networks with principled uncertainty modeling important for safety-critical systems and decision making. However, posterior uncertainty estimates depend on the choice of prior, and finding informative priors in weight-space has proven difficult. This has motivated variational inference (VI) methods that pose priors directly on the function generated by the BNN rather than on weights. In this paper, we address a fundamental issue with such function-space VI approaches pointed out by Burt et al. (2020), who showed that the objective function (ELBO) is negative infinite for most priors of interest. Our solution builds on generalized VI (Knoblauch et al., 2019) with the regularized KL divergence (Quang, 2019) and is, to the best of our knowledge, the first well-defined variational objective for function-space inference in BNNs with Gaussian process (GP) priors. Experiments show that our method incorporates the properties specified by the GP prior on synthetic and small real-world data sets, and provides competitive uncertainty estimates for regression, classification and out-of-distribution detection compared to BNN baselines with both function and weight-space priors.
[370] arXiv:2406.04318 [pdf, other]: Title: Adaptive Sampling of k-Space in Magnetic Resonance for Rapid Pathology Prediction

Authors: Chen-Yu Yen, Raghav Singhal, Umang Sharma, Rajesh Ranganath, Sumit Chopra, Lerrel Pinto

Comments: ICML 2024. Project website at this https URL

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

Magnetic Resonance (MR) imaging, despite its proven diagnostic utility, remains an inaccessible imaging modality for disease surveillance at the population level. A major factor rendering MR inaccessible is lengthy scan times. An MR scanner collects measurements associated with the underlying anatomy in the Fourier space, also known as the k-space. Creating a high-fidelity image requires collecting large quantities of such measurements, increasing the scan time. Traditionally to accelerate an MR scan, image reconstruction from under-sampled k-space data is the method of choice. However, recent works show the feasibility of bypassing image reconstruction and directly learning to detect disease directly from a sparser learned subset of the k-space measurements. In this work, we propose Adaptive Sampling for MR (ASMR), a sampling method that learns an adaptive policy to sequentially select k-space samples to optimize for target disease detection. On 6 out of 8 pathology classification tasks spanning the Knee, Brain, and Prostate MR scans, ASMR reaches within 2% of the performance of a fully sampled classifier while using only 8% of the k-space, as well as outperforming prior state-of-the-art work in k-space sampling such as EMRT, LOUPE, and DPS.
[371] arXiv:2406.04320 [pdf, other]: Title: Chimera: Effectively Modeling Multivariate Time Series with 2-Dimensional State Space Models

Authors: Ali Behrouz, Michele Santacatterina, Ramin Zabih

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Modeling multivariate time series is a well-established problem with a wide range of applications from healthcare to financial markets. Traditional State Space Models (SSMs) are classical approaches for univariate time series modeling due to their simplicity and expressive power to represent linear dependencies. They, however, have fundamentally limited expressive power to capture non-linear dependencies, are slow in practice, and fail to model the inter-variate information flow. Despite recent attempts to improve the expressive power of SSMs by using deep structured SSMs, the existing methods are either limited to univariate time series, fail to model complex patterns (e.g., seasonal patterns), fail to dynamically model the dependencies of variate and time dimensions, and/or are input-independent. We present Chimera that uses two input-dependent 2-D SSM heads with different discretization processes to learn long-term progression and seasonal patterns. To improve the efficiency of complex 2D recurrence, we present a fast training using a new 2-dimensional parallel selective scan. We further present and discuss 2-dimensional Mamba and Mamba-2 as the spacial cases of our 2D SSM. Our experimental evaluation shows the superior performance of Chimera on extensive and diverse benchmarks, including ECG and speech time series classification, long-term and short-term time series forecasting, and time series anomaly detection.
[372] arXiv:2406.04321 [pdf, other]: Title: VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling

Authors: Zeyue Tian, Zhaoyang Liu, Ruibin Yuan, Jiahao Pan, Xiaoqiang Huang, Qifeng Liu, Xu Tan, Qifeng Chen, Wei Xue, Yike Guo

Comments: The code and datasets will be available at this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM); Sound (cs.SD)

In this work, we systematically study music generation conditioned solely on the video. First, we present a large-scale dataset comprising 190K video-music pairs, including various genres such as movie trailers, advertisements, and documentaries. Furthermore, we propose VidMuse, a simple framework for generating music aligned with video inputs. VidMuse stands out by producing high-fidelity music that is both acoustically and semantically aligned with the video. By incorporating local and global visual cues, VidMuse enables the creation of musically coherent audio tracks that consistently match the video content through Long-Short-Term modeling. Through extensive experiments, VidMuse outperforms existing models in terms of audio quality, diversity, and audio-visual alignment. The code and datasets will be available at https://github.com/ZeyueT/VidMuse/.
[373] arXiv:2406.04322 [pdf, other]: Title: DIRECT-3D: Learning Direct Text-to-3D Generation on Massive Noisy 3D Data

Authors: Qihao Liu, Yi Zhang, Song Bai, Adam Kortylewski, Alan Yuille

Comments: Accepted to CVPR 2024; code: this https URL; project page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)

We present DIRECT-3D, a diffusion-based 3D generative model for creating high-quality 3D assets (represented by Neural Radiance Fields) from text prompts. Unlike recent 3D generative models that rely on clean and well-aligned 3D data, limiting them to single or few-class generation, our model is directly trained on extensive noisy and unaligned `in-the-wild' 3D assets, mitigating the key challenge (i.e., data scarcity) in large-scale 3D generation. In particular, DIRECT-3D is a tri-plane diffusion model that integrates two innovations: 1) A novel learning framework where noisy data are filtered and aligned automatically during the training process. Specifically, after an initial warm-up phase using a small set of clean data, an iterative optimization is introduced in the diffusion process to explicitly estimate the 3D pose of objects and select beneficial data based on conditional density. 2) An efficient 3D representation that is achieved by disentangling object geometry and color features with two separate conditional diffusion models that are optimized hierarchically. Given a prompt input, our model generates high-quality, high-resolution, realistic, and complex 3D objects with accurate geometric details in seconds. We achieve state-of-the-art performance in both single-class generation and text-to-3D generation. We also demonstrate that DIRECT-3D can serve as a useful 3D geometric prior of objects, for example to alleviate the well-known Janus problem in 2D-lifting methods such as DreamFusion. The code and models are available for research purposes at: https://github.com/qihao067/direct3d.
[374] arXiv:2406.04323 [pdf, other]: Title: ATraDiff: Accelerating Online Reinforcement Learning with Imaginary Trajectories

Authors: Qianlan Yang, Yu-Xiong Wang

Comments: ICML 2024 Accepted

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

Training autonomous agents with sparse rewards is a long-standing problem in online reinforcement learning (RL), due to low data efficiency. Prior work overcomes this challenge by extracting useful knowledge from offline data, often accomplished through the learning of action distribution from offline data and utilizing the learned distribution to facilitate online RL. However, since the offline data are given and fixed, the extracted knowledge is inherently limited, making it difficult to generalize to new tasks. We propose a novel approach that leverages offline data to learn a generative diffusion model, coined as Adaptive Trajectory Diffuser (ATraDiff). This model generates synthetic trajectories, serving as a form of data augmentation and consequently enhancing the performance of online RL methods. The key strength of our diffuser lies in its adaptability, allowing it to effectively handle varying trajectory lengths and mitigate distribution shifts between online and offline data. Because of its simplicity, ATraDiff seamlessly integrates with a wide spectrum of RL methods. Empirical evaluation shows that ATraDiff consistently achieves state-of-the-art performance across a variety of environments, with particularly pronounced improvements in complicated settings. Our code and demo video are available at https://atradiff.github.io .
[375] arXiv:2406.04324 [pdf, other]: Title: SF-V: Single Forward Video Generation Model

Authors: Zhixing Zhang, Yanyu Li, Yushu Wu, Yanwu Xu, Anil Kag, Ivan Skorokhodov, Willi Menapace, Aliaksandr Siarohin, Junli Cao, Dimitris Metaxas, Sergey Tulyakov, Jian Ren

Comments: Project Page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)

Diffusion-based video generation models have demonstrated remarkable success in obtaining high-fidelity videos through the iterative denoising process. However, these models require multiple denoising steps during sampling, resulting in high computational costs. In this work, we propose a novel approach to obtain single-step video generation models by leveraging adversarial training to fine-tune pre-trained video diffusion models. We show that, through the adversarial training, the multi-steps video diffusion model, i.e., Stable Video Diffusion (SVD), can be trained to perform single forward pass to synthesize high-quality videos, capturing both temporal and spatial dependencies in the video data. Extensive experiments demonstrate that our method achieves competitive generation quality of synthesized videos with significantly reduced computational overhead for the denoising process (i.e., around $23\times$ speedup compared with SVD and $6\times$ speedup compared with existing works, with even better generation quality), paving the way for real-time video synthesis and editing. More visualization results are made publicly available at https://snap-research.github.io/SF-V.
[376] arXiv:2406.04325 [pdf, other]: Title: ShareGPT4Video: Improving Video Understanding and Generation with Better Captions

Authors: Lin Chen, Xilin Wei, Jinsong Li, Xiaoyi Dong, Pan Zhang, Yuhang Zang, Zehui Chen, Haodong Duan, Bin Lin, Zhenyu Tang, Li Yuan, Yu Qiao, Dahua Lin, Feng Zhao, Jiaqi Wang

Comments: Project Page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)

We present the ShareGPT4Video series, aiming to facilitate the video understanding of large video-language models (LVLMs) and the video generation of text-to-video models (T2VMs) via dense and precise captions. The series comprises: 1) ShareGPT4Video, 40K GPT4V annotated dense captions of videos with various lengths and sources, developed through carefully designed data filtering and annotating strategy. 2) ShareCaptioner-Video, an efficient and capable captioning model for arbitrary videos, with 4.8M high-quality aesthetic videos annotated by it. 3) ShareGPT4Video-8B, a simple yet superb LVLM that reached SOTA performance on three advancing video benchmarks. To achieve this, taking aside the non-scalable costly human annotators, we find using GPT4V to caption video with a naive multi-frame or frame-concatenation input strategy leads to less detailed and sometimes temporal-confused results. We argue the challenge of designing a high-quality video captioning strategy lies in three aspects: 1) Inter-frame precise temporal change understanding. 2) Intra-frame detailed content description. 3) Frame-number scalability for arbitrary-length videos. To this end, we meticulously designed a differential video captioning strategy, which is stable, scalable, and efficient for generating captions for videos with arbitrary resolution, aspect ratios, and length. Based on it, we construct ShareGPT4Video, which contains 40K high-quality videos spanning a wide range of categories, and the resulting captions encompass rich world knowledge, object attributes, camera movements, and crucially, detailed and precise temporal descriptions of events. Based on ShareGPT4Video, we further develop ShareCaptioner-Video, a superior captioner capable of efficiently generating high-quality captions for arbitrary videos...
[377] arXiv:2406.04327 [pdf, other]: Title: Causal Estimation of Memorisation Profiles

Authors: Pietro Lesci, Clara Meister, Thomas Hofmann, Andreas Vlachos, Tiago Pimentel

Comments: Published at the ACL 2024 Conference (main)

Subjects: Machine Learning (cs.LG)

Understanding memorisation in language models has practical and societal implications, e.g., studying models' training dynamics or preventing copyright infringements. Prior work defines memorisation as the causal effect of training with an instance on the model's ability to predict that instance. This definition relies on a counterfactual: the ability to observe what would have happened had the model not seen that instance. Existing methods struggle to provide computationally efficient and accurate estimates of this counterfactual. Further, they often estimate memorisation for a model architecture rather than for a specific model instance. This paper fills an important gap in the literature, proposing a new, principled, and efficient method to estimate memorisation based on the difference-in-differences design from econometrics. Using this method, we characterise a model's memorisation profile--its memorisation trends across training--by only observing its behaviour on a small set of instances throughout training. In experiments with the Pythia model suite, we find that memorisation (i) is stronger and more persistent in larger models, (ii) is determined by data order and learning rate, and (iii) has stable trends across model sizes, thus making memorisation in larger models predictable from smaller ones.
[378] arXiv:2406.04328 [pdf, other]: Title: The Brain's Bitter Lesson: Scaling Speech Decoding With Self-Supervised Learning

Authors: Dulhan Jayalath, Gilad Landau, Brendan Shillingford, Mark Woolrich, Oiwi Parker Jones

Comments: 10 pages, 4 figures, under review

Subjects: Machine Learning (cs.LG)

The past few years have produced a series of spectacular advances in the decoding of speech from brain activity. The engine of these advances has been the acquisition of labelled data, with increasingly large datasets acquired from single subjects. However, participants exhibit anatomical and other individual differences, and datasets use varied scanners and task designs. As a result, prior work has struggled to leverage data from multiple subjects, multiple datasets, multiple tasks, and unlabelled datasets. In turn, the field has not benefited from the rapidly growing number of open neural data repositories to exploit large-scale data and deep learning. To address this, we develop an initial set of neuroscience-inspired self-supervised objectives, together with a neural architecture, for representation learning from heterogeneous and unlabelled neural recordings. Experimental results show that representations learned with these objectives generalise across subjects, datasets, and tasks, and are also learned faster than using only labelled data. In addition, we set new benchmarks for two foundational speech decoding tasks. Taken together, these methods now unlock the potential for training speech decoding models with orders of magnitude more existing data.
[379] arXiv:2406.04329 [pdf, other]: Title: Simplified and Generalized Masked Diffusion for Discrete Data

Authors: Jiaxin Shi, Kehang Han, Zhe Wang, Arnaud Doucet, Michalis K. Titsias

Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Masked (or absorbing) diffusion is actively explored as an alternative to autoregressive models for generative modeling of discrete data. However, existing work in this area has been hindered by unnecessarily complex model formulations and unclear relationships between different perspectives, leading to suboptimal parameterization, training objectives, and ad hoc adjustments to counteract these issues. In this work, we aim to provide a simple and general framework that unlocks the full potential of masked diffusion models. We show that the continuous-time variational objective of masked diffusion models is a simple weighted integral of cross-entropy losses. Our framework also enables training generalized masked diffusion models with state-dependent masking schedules. When evaluated by perplexity, our models trained on OpenWebText surpass prior diffusion language models at GPT-2 scale and demonstrate superior performance on 4 out of 5 zero-shot language modeling tasks. Furthermore, our models vastly outperform previous discrete diffusion models on pixel-level image modeling, achieving 2.78~(CIFAR-10) and 3.42 (ImageNet 64$\times$64) bits per dimension that are comparable or better than autoregressive models of similar sizes.
[380] arXiv:2406.04330 [pdf, other]: Title: Parameter-Inverted Image Pyramid Networks

Authors: Xizhou Zhu, Xue Yang, Zhaokai Wang, Hao Li, Wenhan Dou, Junqi Ge, Lewei Lu, Yu Qiao, Jifeng Dai

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Image pyramids are commonly used in modern computer vision tasks to obtain multi-scale features for precise understanding of images. However, image pyramids process multiple resolutions of images using the same large-scale model, which requires significant computational cost. To overcome this issue, we propose a novel network architecture known as the Parameter-Inverted Image Pyramid Networks (PIIP). Our core idea is to use models with different parameter sizes to process different resolution levels of the image pyramid, thereby balancing computational efficiency and performance. Specifically, the input to PIIP is a set of multi-scale images, where higher resolution images are processed by smaller networks. We further propose a feature interaction mechanism to allow features of different resolutions to complement each other and effectively integrate information from different spatial scales. Extensive experiments demonstrate that the PIIP achieves superior performance in tasks such as object detection, segmentation, and image classification, compared to traditional image pyramid methods and single-branch networks, while reducing computational cost. Notably, when applying our method on a large-scale vision foundation model InternViT-6B, we improve its performance by 1%-2% on detection and segmentation with only 40%-60% of the original computation. These results validate the effectiveness of the PIIP approach and provide a new technical direction for future vision computing tasks. Our code and models are available at https://github.com/OpenGVLab/PIIP.
[381] arXiv:2406.04331 [pdf, other]: Title: PaCE: Parsimonious Concept Engineering for Large Language Models

Authors: Jinqi Luo, Tianjiao Ding, Kwan Ho Ryan Chan, Darshan Thaker, Aditya Chattopadhyay, Chris Callison-Burch, René Vidal

Comments: 26 pages, 17 figures, 5 tables, dataset and code at this https URL

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Machine Learning (cs.LG)

Large Language Models (LLMs) are being used for a wide variety of tasks. While they are capable of generating human-like responses, they can also produce undesirable output including potentially harmful information, racist or sexist language, and hallucinations. Alignment methods are designed to reduce such undesirable output, via techniques such as fine-tuning, prompt engineering, and representation engineering. However, existing methods face several challenges: some require costly fine-tuning for every alignment task; some do not adequately remove undesirable concepts, failing alignment; some remove benign concepts, lowering the linguistic capabilities of LLMs. To address these issues, we propose Parsimonious Concept Engineering (PaCE), a novel activation engineering framework for alignment. First, to sufficiently model the concepts, we construct a large-scale concept dictionary in the activation space, in which each atom corresponds to a semantic concept. Then, given any alignment task, we instruct a concept partitioner to efficiently annotate the concepts as benign or undesirable. Finally, at inference time, we decompose the LLM activations along the concept dictionary via sparse coding, to accurately represent the activation as a linear combination of the benign and undesirable components. By removing the latter ones from the activation, we reorient the behavior of LLMs towards alignment goals. We conduct experiments on tasks such as response detoxification, faithfulness enhancement, and sentiment revising, and show that PaCE achieves state-of-the-art alignment performance while maintaining linguistic capabilities.
[382] arXiv:2406.04332 [pdf, other]: Title: Coarse-To-Fine Tensor Trains for Compact Visual Representations

Authors: Sebastian Loeschcke, Dan Wang, Christian Leth-Espensen, Serge Belongie, Michael J. Kastoryano, Sagie Benaim

Comments: Project webpage: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

The ability to learn compact, high-quality, and easy-to-optimize representations for visual data is paramount to many applications such as novel view synthesis and 3D reconstruction. Recent work has shown substantial success in using tensor networks to design such compact and high-quality representations. However, the ability to optimize tensor-based representations, and in particular, the highly compact tensor train representation, is still lacking. This has prevented practitioners from deploying the full potential of tensor networks for visual data. To this end, we propose 'Prolongation Upsampling Tensor Train (PuTT)', a novel method for learning tensor train representations in a coarse-to-fine manner. Our method involves the prolonging or `upsampling' of a learned tensor train representation, creating a sequence of 'coarse-to-fine' tensor trains that are incrementally refined. We evaluate our representation along three axes: (1). compression, (2). denoising capability, and (3). image completion capability. To assess these axes, we consider the tasks of image fitting, 3D fitting, and novel view synthesis, where our method shows an improved performance compared to state-of-the-art tensor-based methods. For full results see our project webpage: https://sebulo.github.io/PuTT_website/
[383] arXiv:2406.04333 [pdf, other]: Title: BitsFusion: 1.99 bits Weight Quantization of Diffusion Model

Authors: Yang Sui, Yanyu Li, Anil Kag, Yerlan Idelbayev, Junli Cao, Ju Hu, Dhritiman Sagar, Bo Yuan, Sergey Tulyakov, Jian Ren

Comments: Project Page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Diffusion-based image generation models have achieved great success in recent years by showing the capability of synthesizing high-quality content. However, these models contain a huge number of parameters, resulting in a significantly large model size. Saving and transferring them is a major bottleneck for various applications, especially those running on resource-constrained devices. In this work, we develop a novel weight quantization method that quantizes the UNet from Stable Diffusion v1.5 to 1.99 bits, achieving a model with 7.9X smaller size while exhibiting even better generation quality than the original one. Our approach includes several novel techniques, such as assigning optimal bits to each layer, initializing the quantized model for better performance, and improving the training strategy to dramatically reduce quantization error. Furthermore, we extensively evaluate our quantized model across various benchmark datasets and through human evaluation to demonstrate its superior generation quality.
[384] arXiv:2406.04334 [pdf, other]: Title: DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effective for LMMs

Authors: Lingchen Meng, Jianwei Yang, Rui Tian, Xiyang Dai, Zuxuan Wu, Jianfeng Gao, Yu-Gang Jiang

Comments: Project Page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Most large multimodal models (LMMs) are implemented by feeding visual tokens as a sequence into the first layer of a large language model (LLM). The resulting architecture is simple but significantly increases computation and memory costs, as it has to handle a large number of additional tokens in its input layer. This paper presents a new architecture DeepStack for LMMs. Considering $N$ layers in the language and vision transformer of LMMs, we stack the visual tokens into $N$ groups and feed each group to its aligned transformer layer \textit{from bottom to top}. Surprisingly, this simple method greatly enhances the power of LMMs to model interactions among visual tokens across layers but with minimal additional cost. We apply DeepStack to both language and vision transformer in LMMs, and validate the effectiveness of DeepStack LMMs with extensive empirical results. Using the same context length, our DeepStack 7B and 13B parameters surpass their counterparts by \textbf{2.7} and \textbf{2.9} on average across \textbf{9} benchmarks, respectively. Using only one-fifth of the context length, DeepStack rivals closely to the counterparts that use the full context length. These gains are particularly pronounced on high-resolution tasks, e.g., \textbf{4.2}, \textbf{11.0}, and \textbf{4.0} improvements on TextVQA, DocVQA, and InfoVQA compared to LLaVA-1.5-7B, respectively. We further apply DeepStack to vision transformer layers, which brings us a similar amount of improvements, \textbf{3.8} on average compared with LLaVA-1.5-7B.
[385] arXiv:2406.04336 [pdf, other]: Title: On the Expressive Power of Spectral Invariant Graph Neural Networks

Authors: Bohang Zhang, Lingxiao Zhao, Haggai Maron

Comments: 31 pages; 3 figures; to appear in ICML 2024

Subjects: Machine Learning (cs.LG); Discrete Mathematics (cs.DM); Data Structures and Algorithms (cs.DS); Combinatorics (math.CO); Spectral Theory (math.SP)

Incorporating spectral information to enhance Graph Neural Networks (GNNs) has shown promising results but raises a fundamental challenge due to the inherent ambiguity of eigenvectors. Various architectures have been proposed to address this ambiguity, referred to as spectral invariant architectures. Notable examples include GNNs and Graph Transformers that use spectral distances, spectral projection matrices, or other invariant spectral features. However, the potential expressive power of these spectral invariant architectures remains largely unclear. The goal of this work is to gain a deep theoretical understanding of the expressive power obtainable when using spectral features. We first introduce a unified message-passing framework for designing spectral invariant GNNs, called Eigenspace Projection GNN (EPNN). A comprehensive analysis shows that EPNN essentially unifies all prior spectral invariant architectures, in that they are either strictly less expressive or equivalent to EPNN. A fine-grained expressiveness hierarchy among different architectures is also established. On the other hand, we prove that EPNN itself is bounded by a recently proposed class of Subgraph GNNs, implying that all these spectral invariant architectures are strictly less expressive than 3-WL. Finally, we discuss whether using spectral features can gain additional expressiveness when combined with more expressive GNNs.
[386] arXiv:2406.04337 [pdf, other]: Title: Coherent Zero-Shot Visual Instruction Generation

Authors: Quynh Phung, Songwei Ge, Jia-Bin Huang

Comments: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Despite the advances in text-to-image synthesis, particularly with diffusion models, generating visual instructions that require consistent representation and smooth state transitions of objects across sequential steps remains a formidable challenge. This paper introduces a simple, training-free framework to tackle the issues, capitalizing on the advancements in diffusion models and large language models (LLMs). Our approach systematically integrates text comprehension and image generation to ensure visual instructions are visually appealing and maintain consistency and accuracy throughout the instruction sequence. We validate the effectiveness by testing multi-step instructions and comparing the text alignment and consistency with several baselines. Our experiments show that our approach can visualize coherent and visually pleasing instructions
[387] arXiv:2406.04338 [pdf, other]: Title: Physics3D: Learning Physical Properties of 3D Gaussians via Video Diffusion

Authors: Fangfu Liu, Hanyang Wang, Shunyu Yao, Shengjun Zhang, Jie Zhou, Yueqi Duan

Comments: Project page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR)

In recent years, there has been rapid development in 3D generation models, opening up new possibilities for applications such as simulating the dynamic movements of 3D objects and customizing their behaviors. However, current 3D generative models tend to focus only on surface features such as color and shape, neglecting the inherent physical properties that govern the behavior of objects in the real world. To accurately simulate physics-aligned dynamics, it is essential to predict the physical properties of materials and incorporate them into the behavior prediction process. Nonetheless, predicting the diverse materials of real-world objects is still challenging due to the complex nature of their physical attributes. In this paper, we propose \textbf{Physics3D}, a novel method for learning various physical properties of 3D objects through a video diffusion model. Our approach involves designing a highly generalizable physical simulation system based on a viscoelastic material model, which enables us to simulate a wide range of materials with high-fidelity capabilities. Moreover, we distill the physical priors from a video diffusion model that contains more understanding of realistic object materials. Extensive experiments demonstrate the effectiveness of our method with both elastic and plastic materials. Physics3D shows great potential for bridging the gap between the physical world and virtual neural space, providing a better integration and application of realistic physical principles in virtual environments. Project page: https://liuff19.github.io/Physics3D.
[388] arXiv:2406.04339 [pdf, other]: Title: RoboMamba: Multimodal State Space Model for Efficient Robot Reasoning and Manipulation

Authors: Jiaming Liu, Mengzhen Liu, Zhenyu Wang, Lily Lee, Kaichen Zhou, Pengju An, Senqiao Yang, Renrui Zhang, Yandong Guo, Shanghang Zhang

Subjects: Computer Vision and Pattern Recognition (cs.CV)

A fundamental objective in robot manipulation is to enable models to comprehend visual scenes and execute actions. Although existing robot Multimodal Large Language Models (MLLMs) can handle a range of basic tasks, they still face challenges in two areas: 1) inadequate reasoning ability to tackle complex tasks, and 2) high computational costs for MLLM fine-tuning and inference. The recently proposed state space model (SSM) known as Mamba demonstrates promising capabilities in non-trivial sequence modeling with linear inference complexity. Inspired by this, we introduce RoboMamba, an end-to-end robotic MLLM that leverages the Mamba model to deliver both robotic reasoning and action capabilities, while maintaining efficient fine-tuning and inference. Specifically, we first integrate the vision encoder with Mamba, aligning visual data with language embedding through co-training, empowering our model with visual common sense and robot-related reasoning. To further equip RoboMamba with action pose prediction abilities, we explore an efficient fine-tuning strategy with a simple policy head. We find that once RoboMamba possesses sufficient reasoning capability, it can acquire manipulation skills with minimal fine-tuning parameters (0.1\% of the model) and time (20 minutes). In experiments, RoboMamba demonstrates outstanding reasoning capabilities on general and robotic evaluation benchmarks. Meanwhile, our model showcases impressive pose prediction results in both simulation and real-world experiments, achieving inference speeds 7 times faster than existing robot MLLMs. Our project web page: https://sites.google.com/view/robomamba-web
[389] arXiv:2406.04340 [pdf, other]: Title: GLACE: Global Local Accelerated Coordinate Encoding

Authors: Fangjinhua Wang, Xudong Jiang, Silvano Galliani, Christoph Vogel, Marc Pollefeys

Comments: Large-scale visual localization with a single optimizable MLP. CVPR 2024. Code: this https URL Project page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Scene coordinate regression (SCR) methods are a family of visual localization methods that directly regress 2D-3D matches for camera pose estimation. They are effective in small-scale scenes but face significant challenges in large-scale scenes that are further amplified in the absence of ground truth 3D point clouds for supervision. Here, the model can only rely on reprojection constraints and needs to implicitly triangulate the points. The challenges stem from a fundamental dilemma: The network has to be invariant to observations of the same landmark at different viewpoints and lighting conditions, etc., but at the same time discriminate unrelated but similar observations. The latter becomes more relevant and severe in larger scenes. In this work, we tackle this problem by introducing the concept of co-visibility to the network. We propose GLACE, which integrates pre-trained global and local encodings and enables SCR to scale to large scenes with only a single small-sized network. Specifically, we propose a novel feature diffusion technique that implicitly groups the reprojection constraints with co-visibility and avoids overfitting to trivial solutions. Additionally, our position decoder parameterizes the output positions for large-scale scenes more effectively. Without using 3D models or depth maps for supervision, our method achieves state-of-the-art results on large-scale scenes with a low-map-size model. On Cambridge landmarks, with a single model, we achieve 17% lower median position error than Poker, the ensemble variant of the state-of-the-art SCR method ACE. Code is available at: https://github.com/cvg/glace.
[390] arXiv:2406.04341 [pdf, other]: Title: Interpreting the Second-Order Effects of Neurons in CLIP

Authors: Yossi Gandelsman, Alexei A. Efros, Jacob Steinhardt

Comments: project page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)

We interpret the function of individual neurons in CLIP by automatically describing them using text. Analyzing the direct effects (i.e. the flow from a neuron through the residual stream to the output) or the indirect effects (overall contribution) fails to capture the neurons' function in CLIP. Therefore, we present the "second-order lens", analyzing the effect flowing from a neuron through the later attention heads, directly to the output. We find that these effects are highly selective: for each neuron, the effect is significant for <2% of the images. Moreover, each effect can be approximated by a single direction in the text-image space of CLIP. We describe neurons by decomposing these directions into sparse sets of text representations. The sets reveal polysemantic behavior - each neuron corresponds to multiple, often unrelated, concepts (e.g. ships and cars). Exploiting this neuron polysemy, we mass-produce "semantic" adversarial examples by generating images with concepts spuriously correlated to the incorrect class. Additionally, we use the second-order effects for zero-shot segmentation and attribute discovery in images. Our results indicate that a scalable understanding of neurons can be used for model deception and for introducing new model capabilities.
[391] arXiv:2406.04342 [pdf, other]: Title: Learning 1D Causal Visual Representation with De-focus Attention Networks

Authors: Chenxin Tao, Xizhou Zhu, Shiqian Su, Lewei Lu, Changyao Tian, Xuan Luo, Gao Huang, Hongsheng Li, Yu Qiao, Jie Zhou, Jifeng Dai

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Modality differences have led to the development of heterogeneous architectures for vision and language models. While images typically require 2D non-causal modeling, texts utilize 1D causal modeling. This distinction poses significant challenges in constructing unified multi-modal models. This paper explores the feasibility of representing images using 1D causal modeling. We identify an "over-focus" issue in existing 1D causal vision models, where attention overly concentrates on a small proportion of visual tokens. The issue of "over-focus" hinders the model's ability to extract diverse visual features and to receive effective gradients for optimization. To address this, we propose De-focus Attention Networks, which employ learnable bandpass filters to create varied attention patterns. During training, large and scheduled drop path rates, and an auxiliary loss on globally pooled features for global understanding tasks are introduced. These two strategies encourage the model to attend to a broader range of tokens and enhance network optimization. Extensive experiments validate the efficacy of our approach, demonstrating that 1D causal visual representation can perform comparably to 2D non-causal representation in tasks such as global perception, dense prediction, and multi-modal understanding. Code is released at https://github.com/OpenGVLab/De-focus-Attention-Networks.
[392] arXiv:2406.04343 [pdf, other]: Title: Flash3D: Feed-Forward Generalisable 3D Scene Reconstruction from a Single Image

Authors: Stanislaw Szymanowicz, Eldar Insafutdinov, Chuanxia Zheng, Dylan Campbell, João F. Henriques, Christian Rupprecht, Andrea Vedaldi

Comments: Project page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)

In this paper, we propose Flash3D, a method for scene reconstruction and novel view synthesis from a single image which is both very generalisable and efficient. For generalisability, we start from a "foundation" model for monocular depth estimation and extend it to a full 3D shape and appearance reconstructor. For efficiency, we base this extension on feed-forward Gaussian Splatting. Specifically, we predict a first layer of 3D Gaussians at the predicted depth, and then add additional layers of Gaussians that are offset in space, allowing the model to complete the reconstruction behind occlusions and truncations. Flash3D is very efficient, trainable on a single GPU in a day, and thus accessible to most researchers. It achieves state-of-the-art results when trained and tested on RealEstate10k. When transferred to unseen datasets like NYU it outperforms competitors by a large margin. More impressively, when transferred to KITTI, Flash3D achieves better PSNR than methods trained specifically on that dataset. In some instances, it even outperforms recent methods that use multiple views as input. Code, models, demo, and more results are available at https://www.robots.ox.ac.uk/~vgg/research/flash3d/.
[393] arXiv:2406.04344 [pdf, other]: Title: Verbalized Machine Learning: Revisiting Machine Learning with Language Models

Authors: Tim Z. Xiao, Robert Bamler, Bernhard Schölkopf, Weiyang Liu

Comments: Technical Report v1 (92 pages, 15 figures)

Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)

Motivated by the large progress made by large language models (LLMs), we introduce the framework of verbalized machine learning (VML). In contrast to conventional machine learning models that are typically optimized over a continuous parameter space, VML constrains the parameter space to be human-interpretable natural language. Such a constraint leads to a new perspective of function approximation, where an LLM with a text prompt can be viewed as a function parameterized by the text prompt. Guided by this perspective, we revisit classical machine learning problems, such as regression and classification, and find that these problems can be solved by an LLM-parameterized learner and optimizer. The major advantages of VML include (1) easy encoding of inductive bias: prior knowledge about the problem and hypothesis class can be encoded in natural language and fed into the LLM-parameterized learner; (2) automatic model class selection: the optimizer can automatically select a concrete model class based on data and verbalized prior knowledge, and it can update the model class during training; and (3) interpretable learner updates: the LLM-parameterized optimizer can provide explanations for why each learner update is performed. We conduct several studies to empirically evaluate the effectiveness of VML, and hope that VML can serve as a stepping stone to stronger interpretability and trustworthiness in ML.
[394] arXiv:2406.04345 [pdf, other]: Title: Stereo-Depth Fusion through Virtual Pattern Projection

Authors: Luca Bartolomei, Matteo Poggi, Fabio Tosi, Andrea Conti, Stefano Mattoccia

Comments: extended version of ICCV 2023: "Active Stereo Without Pattern Projector"

Subjects: Computer Vision and Pattern Recognition (cs.CV)

This paper presents a novel general-purpose stereo and depth data fusion paradigm that mimics the active stereo principle by replacing the unreliable physical pattern projector with a depth sensor. It works by projecting virtual patterns consistent with the scene geometry onto the left and right images acquired by a conventional stereo camera, using the sparse hints obtained from a depth sensor, to facilitate the visual correspondence. Purposely, any depth sensing device can be seamlessly plugged into our framework, enabling the deployment of a virtual active stereo setup in any possible environment and overcoming the severe limitations of physical pattern projection, such as the limited working range and environmental conditions. Exhaustive experiments on indoor and outdoor datasets featuring both long and close range, including those providing raw, unfiltered depth hints from off-the-shelf depth sensors, highlight the effectiveness of our approach in notably boosting the robustness and accuracy of algorithms and deep stereo without any code modification and even without re-training. Additionally, we assess the performance of our strategy on active stereo evaluation datasets with conventional pattern projection. Indeed, in all these scenarios, our virtual pattern projection paradigm achieves state-of-the-art performance. The source code is available at: https://github.com/bartn8/vppstereo.

Cross-lists for Fri, 7 Jun 24

[395] arXiv:2405.08005 (cross-list from math.OC) [pdf, other]: Title: Graphon Mean Field Games with a Representative Player: Analysis and Learning Algorithm

Authors: Fuzhong Zhou, Chenyu Zhang, Xu Chen, Xuan Di

Comments: Published as a conference paper at ICML 2024

Subjects: Optimization and Control (math.OC); Artificial Intelligence (cs.AI); Computer Science and Game Theory (cs.GT); Machine Learning (cs.LG); Machine Learning (stat.ML)

We propose a discrete time graphon game formulation on continuous state and action spaces using a representative player to study stochastic games with heterogeneous interaction among agents. This formulation admits both philosophical and mathematical advantages, compared to a widely adopted formulation using a continuum of players. We prove the existence and uniqueness of the graphon equilibrium with mild assumptions, and show that this equilibrium can be used to construct an approximate solution for finite player game on networks, which is challenging to analyze and solve due to curse of dimensionality. An online oracle-free learning algorithm is developed to solve the equilibrium numerically, and sample complexity analysis is provided for its convergence.
[396] arXiv:2406.03499 (cross-list from physics.plasm-ph) [pdf, ps, other]: Title: Estimated electric conductivities of thermal plasma for air-fuel combustion and oxy-fuel combustion with potassium or cesium seeding

Authors: Osama A. Marzouk

Comments: 28 pages, 16 figures, 14 tables

Journal-ref: Heliyon, volume 10, issues 11, article number e31697, 2024

Subjects: Plasma Physics (physics.plasm-ph); Numerical Analysis (math.NA); Fluid Dynamics (physics.flu-dyn)

A complete model for estimating the electric conductivity of combustion product gases, with added cesium (Cs) or potassium (K) vapor for ionization, is presented. Neutral carrier gases serve as the bulk fluid that carries the seed material, as well as the electrons generated by the partial thermal (equilibrium) ionization of the seed alkali metal. The model accounts for electron-neutral scattering, as well as electron-ion and electron-electron scattering. The model is tested through comparison with published data. The model is aimed at being utilized for the plasma within magnetohydrodynamic (MHD) channels, where direct power extraction from passing electrically conducting plasma gas enables electric power generation. The thermal ionization model is then used to estimate the electric conductivity of seeded combustion gases under complete combustion of three selected fuels, namely: hydrogen (H2), methane (CH4), and carbon (C). For each of these three fuels, two options for the oxidizer were applied, namely: air (21 % molecular oxygen, 79 % molecular nitrogen by mole), and pure oxygen (oxy-combustion). Two types of seeds (with 1 % mole fraction, based on the composition before ionization) were also applied for each of the six combinations of (fuel-oxidizer), leading to a total of 12 different MHD plasma cases. For each of these cases, the electric conductivity was computed for a range of temperatures from 2000 K to 3000 K. The smallest estimated electric conductivity was 0.35 S/m for oxy-hydrogen combustion at 2000 K, with potassium seeding. The largest estimated electric conductivity was 180.30 S/m for oxy-carbon combustion at 3000 K, with cesium seeding. At 2000 K, replacing potassium with cesium causes a gain in the electric conductivity by a multiplicative gain factor of about 3.6 regardless of the fuel and oxidizer. This gain factor declines to between 1.77 and 2.07 at 3000 K.
[397] arXiv:2406.03504 (cross-list from math.OC) [pdf, ps, other]: Title: A New Branch-and-Bound Pruning Framework for $\ell_0$-Regularized Problems

Authors: Theo Guyard, Cédric Herzet, Clément Elvira, Ayşe-Nur Arslan

Subjects: Optimization and Control (math.OC); Machine Learning (cs.LG)

We consider the resolution of learning problems involving $\ell_0$-regularization via Branch-and-Bound (BnB) algorithms. These methods explore regions of the feasible space of the problem and check whether they do not contain solutions through "pruning tests". In standard implementations, evaluating a pruning test requires to solve a convex optimization problem, which may result in computational bottlenecks. In this paper, we present an alternative to implement pruning tests for some generic family of $\ell_0$-regularized problems. Our proposed procedure allows the simultaneous assessment of several regions and can be embedded in standard BnB implementations with a negligible computational overhead. We show through numerical simulations that our pruning strategy can improve the solving time of BnB procedures by several orders of magnitude for typical problems encountered in machine-learning applications.
[398] arXiv:2406.03587 (cross-list from physics.soc-ph) [pdf, other]: Title: Subsuming Complex Networks by Node Walks

Authors: Alexandre Benatti, Luciano da F. Costa

Comments: 14 pages and 7 figures

Subjects: Physics and Society (physics.soc-ph); Social and Information Networks (cs.SI)

The concept of node walk in graphs and complex networks has been addressed, consisting of one or more nodes that move into adjacent nodes, henceforth incorporating the respective connections. This type of dynamics is then applied to subsume complex networks. Three types of networks (Erd\'os- R\'eny, Barab\'asi-Albert, as well as a geometric model) are considered, while three node walks heuristics (uniformly random, largest degree, and smallest degree) are taken into account. Several interesting results are obtained and described, including the identification that the subsuming dynamics depend strongly on both the specific topology of the networks as well as the criteria controlling the node walks. The use of node walks as a model for studying the relationship between network topology and dynamics is motivated by this result. In addition, relatively high correlations between the initial node degree and the accumulated strength of the walking node were observed for some combinations of network types and dynamic rules, allowing some of the properties of the subsumption to be roughly predicted from the initial topology around the waking node which has been found, however, not to be enough for full determination of the subsumption dynamics. Another interesting result regards the quite distinct signatures (along the iterations) of walking node strengths obtained for the several considered combinations of network type and subsumption rules.
[399] arXiv:2406.03616 (cross-list from stat.ML) [pdf, other]: Title: BEACON: A Bayesian Optimization Strategy for Novelty Search in Expensive Black-Box Systems

Authors: Wei-Ting Tang, Ankush Chakrabarty, Joel A. Paulson

Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

Novelty search (NS) refers to a class of exploration algorithms that automatically uncover diverse system behaviors through simulations or experiments. Systematically obtaining diverse outcomes is a key component in many real-world design problems such as material and drug discovery, neural architecture search, reinforcement learning, and robot navigation. Since the relationship between the inputs and outputs (i.e., behaviors) of these complex systems is typically not available in closed form, NS requires a black-box perspective. Consequently, popular NS algorithms rely on evolutionary optimization and other meta-heuristics that require intensive sampling of the input space, which is impractical when the system is expensive to evaluate. We propose a Bayesian optimization inspired algorithm for sample-efficient NS that is specifically designed for such expensive black-box systems. Our approach models the input-to-behavior mapping with multi-output Gaussian processes (MOGP) and selects the next point to evaluate by maximizing a novelty metric that depends on a posterior sample drawn from the MOGP that promotes both exploration and exploitation. By leveraging advances in efficient posterior sampling and high-dimensional Gaussian process modeling, we discuss how our approach can be made scalable with respect to both amount of data and number of inputs. We test our approach on ten synthetic benchmark problems and eight real-world problems (with up to 2133 inputs) including new applications such as discovery of diverse metal organic frameworks for use in clean energy technology. We show that our approach greatly outperforms existing NS algorithms by finding substantially larger sets of diverse behaviors under limited sample budgets.
[400] arXiv:2406.03628 (cross-list from stat.ML) [pdf, other]: Title: Synthetic Oversampling: Theory and A Practical Approach Using LLMs to Address Data Imbalance

Authors: Ryumei Nakada, Yichen Xu, Lexin Li, Linjun Zhang

Comments: 59 pages, 7 figures

Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

Imbalanced data and spurious correlations are common challenges in machine learning and data science. Oversampling, which artificially increases the number of instances in the underrepresented classes, has been widely adopted to tackle these challenges. In this article, we introduce OPAL (\textbf{O}versam\textbf{P}ling with \textbf{A}rtificial \textbf{L}LM-generated data), a systematic oversampling approach that leverages the capabilities of large language models (LLMs) to generate high-quality synthetic data for minority groups. Recent studies on synthetic data generation using deep generative models mostly target prediction tasks. Our proposal differs in that we focus on handling imbalanced data and spurious correlations. More importantly, we develop a novel theory that rigorously characterizes the benefits of using the synthetic data, and shows the capacity of transformers in generating high-quality synthetic data for both labels and covariates. We further conduct intensive numerical experiments to demonstrate the efficacy of our proposed approach compared to some representative alternative solutions.
[401] arXiv:2406.03637 (cross-list from eess.AS) [pdf, other]: Title: Style Mixture of Experts for Expressive Text-To-Speech Synthesis

Authors: Ahad Jawaid, Shreeram Suresh Chandra, Junchen Lu, Berrak Sisman

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)

Recent advances in style transfer text-to-speech (TTS) have improved the expressiveness of synthesized speech. Despite these advancements, encoding stylistic information from diverse and unseen reference speech remains challenging. This paper introduces StyleMoE, an approach that divides the embedding space, modeled by the style encoder, into tractable subsets handled by style experts. The proposed method replaces the style encoder in a TTS system with a Mixture of Experts (MoE) layer. By utilizing a gating network to route reference speeches to different style experts, each expert specializes in aspects of the style space during optimization. Our experiments objectively and subjectively demonstrate the effectiveness of our proposed method in increasing the coverage of the style space for diverse and unseen styles. This approach can enhance the performance of existing state-of-the-art style transfer TTS models, marking the first study of MoE in style transfer TTS to our knowledge.
[402] arXiv:2406.03652 (cross-list from q-fin.PM) [pdf, other]: Title: Ensembling Portfolio Strategies for Long-Term Investments: A Distribution-Free Preference Framework for Decision-Making and Algorithms

Authors: Duy Khanh Lam

Comments: 25 pages, 12 figures, 3 tables, working paper

Subjects: Portfolio Management (q-fin.PM); Information Theory (cs.IT); Machine Learning (cs.LG); Computational Finance (q-fin.CP)

This paper investigates the problem of ensembling multiple strategies for sequential portfolios to outperform individual strategies in terms of long-term wealth. Due to the uncertainty of strategies' performances in the future market, which are often based on specific models and statistical assumptions, investors often mitigate risk and enhance robustness by combining multiple strategies, akin to common approaches in collective learning prediction. However, the absence of a distribution-free and consistent preference framework complicates decisions of combination due to the ambiguous objective. To address this gap, we introduce a novel framework for decision-making in combining strategies, irrespective of market conditions, by establishing the investor's preference between decisions and then forming a clear objective. Through this framework, we propose a combinatorial strategy construction, free from statistical assumptions, for any scale of component strategies, even infinite, such that it meets the determined criterion. Finally, we test the proposed strategy along with its accelerated variant and some other multi-strategies. The numerical experiments show results in favor of the proposed strategies, albeit with small tradeoffs in their Sharpe ratios, in which their cumulative wealths eventually exceed those of the best component strategies while the accelerated strategy significantly improves performance.
[403] arXiv:2406.03653 (cross-list from stat.ML) [pdf, other]: Title: Equivalence Set Restricted Latent Class Models (ESRLCM)

Authors: Jesse Bowers, Steve Culpepper

Comments: 43 pages, 10 tables, 1 figure

Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST)

Latent Class Models (LCMs) are used to cluster multivariate categorical data, commonly used to interpret survey responses. We propose a novel Bayesian model called the Equivalence Set Restricted Latent Class Model (ESRLCM). This model identifies clusters who have common item response probabilities, and does so more generically than traditional restricted latent attribute models. We verify the identifiability of ESRLCMs, and demonstrate the effectiveness in both simulations and real-world applications.
[404] arXiv:2406.03657 (cross-list from eess.AS) [pdf, other]: Title: UrBAN: Urban Beehive Acoustics and PheNotyping Dataset

Authors: Mahsa Abdollahi, Yi Zhu, Heitor R. Guimarães, Nico Coallier, Ségolène Maucourt, Pierre Giovenazzo, Tiago H. Falk

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)

In this paper, we present a multimodal dataset obtained from a honey bee colony in Montr\'eal, Quebec, Canada, spanning the years of 2021 to 2022. This apiary comprised 10 beehives, with microphones recording more than 2000 hours of high quality raw audio, and also sensors capturing temperature, and humidity. Periodic hive inspections involved monitoring colony honey bee population changes, assessing queen-related conditions, and documenting overall hive health. Additionally, health metrics, such as Varroa mite infestation rates and winter mortality assessments were recorded, offering valuable insights into factors affecting hive health status and resilience. In this study, we first outline the data collection process, sensor data description, and dataset structure. Furthermore, we demonstrate a practical application of this dataset by extracting various features from the raw audio to predict colony population using the number of frames of bees as a proxy.
[405] arXiv:2406.03663 (cross-list from eess.IV) [pdf, ps, other]: Title: A Hybrid Deep Learning Classification of Perimetric Glaucoma Using Peripapillary Nerve Fiber Layer Reflectance and Other OCT Parameters from Three Anatomy Regions

Authors: Ou Tan, David S. Greenfield, Brian A. Francis, Rohit Varma, Joel S. Schuman, David Huang, Dongseok Choi

Comments: 12 pages

Subjects: Image and Video Processing (eess.IV); Machine Learning (cs.LG); Quantitative Methods (q-bio.QM)

Precis: A hybrid deep-learning model combines NFL reflectance and other OCT parameters to improve glaucoma diagnosis. Objective: To investigate if a deep learning model could be used to combine nerve fiber layer (NFL) reflectance and other OCT parameters for glaucoma diagnosis. Patients and Methods: This is a prospective observational study where of 106 normal subjects and 164 perimetric glaucoma (PG) patients. Peripapillary NFL reflectance map, NFL thickness map, optic head analysis of disc, and macular ganglion cell complex thickness were obtained using spectral domain OCT. A hybrid deep learning model combined a fully connected network (FCN) and a convolution neural network (CNN) to develop and combine those OCT maps and parameters to distinguish normal and PG eyes. Two deep learning models were compared based on whether the NFL reflectance map was used as part of the input or not. Results: The hybrid deep learning model with reflectance achieved 0.909 sensitivity at 99% specificity and 0.926 at 95%. The overall accuracy was 0.948 with 0.893 sensitivity and 1.000 specificity, and the AROC was 0.979, which is significantly better than the logistic regression models (p < 0.001). The second best model is the hybrid deep learning model w/o reflectance, which also had significantly higher AROC than logistic regression models (p < 0.001). Logistic regression with reflectance model had slightly higher AROC or sensitivity than the other logistic regression model without reflectance (p = 0.024). Conclusions: Hybrid deep learning model significantly improved the diagnostic accuracy, without or without NFL reflectance. Hybrid deep learning model, combining reflectance/NFL thickness/GCC thickness/ONH parameter, may be a practical model for glaucoma screen purposes.
[406] arXiv:2406.03688 (cross-list from eess.IV) [pdf, other]: Title: Shadow and Light: Digitally Reconstructed Radiographs for Disease Classification

Authors: Benjamin Hou, Qingqing Zhu, Tejas Sudarshan Mathai, Qiao Jin, Zhiyong Lu, Ronald M. Summers

Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

In this paper, we introduce DRR-RATE, a large-scale synthetic chest X-ray dataset derived from the recently released CT-RATE dataset. DRR-RATE comprises of 50,188 frontal Digitally Reconstructed Radiographs (DRRs) from 21,304 unique patients. Each image is paired with a corresponding radiology text report and binary labels for 18 pathology classes. Given the controllable nature of DRR generation, it facilitates the inclusion of lateral view images and images from any desired viewing position. This opens up avenues for research into new and novel multimodal applications involving paired CT, X-ray images from various views, text, and binary labels. We demonstrate the applicability of DRR-RATE alongside existing large-scale chest X-ray resources, notably the CheXpert dataset and CheXnet model. Experiments demonstrate that CheXnet, when trained and tested on the DRR-RATE dataset, achieves sufficient to high AUC scores for the six common pathologies cited in common literature: Atelectasis, Cardiomegaly, Consolidation, Lung Lesion, Lung Opacity, and Pleural Effusion. Additionally, CheXnet trained on the CheXpert dataset can accurately identify several pathologies, even when operating out of distribution. This confirms that the generated DRR images effectively capture the essential pathology features from CT images. The dataset and labels are publicly accessible at https://huggingface.co/datasets/farrell236/DRR-RATE.
[407] arXiv:2406.03690 (cross-list from math.OC) [pdf, other]: Title: AMPIC: Adaptive Model Predictive Ising Controller for large-scale urban traffic signals

Authors: Daisuke Inoue, Hiroshi Yamashita, Kazuyuki Aihara, Hiroaki Yoshida

Comments: 17 pages, 8 figures

Subjects: Optimization and Control (math.OC); Emerging Technologies (cs.ET); Systems and Control (eess.SY); Quantum Physics (quant-ph)

Realizing smooth traffic flow is important for achieving carbon neutrality. Adaptive traffic signal control, which considers traffic conditions, has thus attracted attention. However, it is difficult to ensure optimal vehicle flow throughout a large city using existing control methods because of their heavy computational load. Here, we propose a control method called AMPIC (Adaptive Model Predictive Ising Controller) that guarantees both scalability and optimality. The proposed method employs model predictive control to solve an optimal control problem at each control interval with explicit consideration of a predictive model of vehicle flow. This optimal control problem is transformed into a combinatorial optimization problem with binary variables that is equivalent to the so-called Ising problem. This transformation allows us to use an Ising solver, which has been widely studied and is expected to have fast and efficient optimization performance. We performed numerical experiments using a microscopic traffic simulator for a realistic city road network. The results show that AMPIC enables faster vehicle cruising speed with less waiting time than that achieved by classical control methods, resulting in lower CO2 emissions. The model predictive approach with a long prediction horizon thus effectively improves control performance. Systematic parametric studies on model cities indicate that the proposed method realizes smoother traffic flows for large city road networks. Among Ising solvers, D-Wave's quantum annealing is shown to find near-optimal solutions at a reasonable computational cost.
[408] arXiv:2406.03696 (cross-list from stat.ML) [pdf, other]: Title: Discrete error dynamics of mini-batch gradient descent for least squares regression

Authors: Jackie Lok, Rishi Sonthalia, Elizaveta Rebrova

Comments: 26 pages

Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Optimization and Control (math.OC)

We study the discrete dynamics of mini-batch gradient descent for least squares regression when sampling without replacement. We show that the dynamics and generalization error of mini-batch gradient descent depends on a sample cross-covariance matrix $Z$ between the original features $X$ and a set of new features $\widetilde{X}$, in which each feature is modified by the mini-batches that appear before it during the learning process in an averaged way. Using this representation, we rigorously establish that the dynamics of mini-batch and full-batch gradient descent agree up to leading order with respect to the step size using the linear scaling rule. We also study discretization effects that a continuous-time gradient flow analysis cannot detect, and show that mini-batch gradient descent converges to a step-size dependent solution, in contrast with full-batch gradient descent. Finally, we investigate the effects of batching, assuming a random matrix model, by using tools from free probability theory to numerically compute the spectrum of $Z$.
[409] arXiv:2406.03711 (cross-list from physics.flu-dyn) [pdf, other]: Title: Pi-fusion: Physics-informed diffusion model for learning fluid dynamics

Authors: Jing Qiu, Jiancheng Huang, Xiangdong Zhang, Zeng Lin, Minglei Pan, Zengding Liu, Fen Miao

Subjects: Fluid Dynamics (physics.flu-dyn); Artificial Intelligence (cs.AI)

Physics-informed deep learning has been developed as a novel paradigm for learning physical dynamics recently. While general physics-informed deep learning methods have shown early promise in learning fluid dynamics, they are difficult to generalize in arbitrary time instants in real-world scenario, where the fluid motion can be considered as a time-variant trajectory involved large-scale particles. Inspired by the advantage of diffusion model in learning the distribution of data, we first propose Pi-fusion, a physics-informed diffusion model for predicting the temporal evolution of velocity and pressure field in fluid dynamics. Physics-informed guidance sampling is proposed in the inference procedure of Pi-fusion to improve the accuracy and interpretability of learning fluid dynamics. Furthermore, we introduce a training strategy based on reciprocal learning to learn the quasiperiodical pattern of fluid motion and thus improve the generalizability of the model. The proposed approach are then evaluated on both synthetic and real-world dataset, by comparing it with state-of-the-art physics-informed deep learning methods. Experimental results show that the proposed approach significantly outperforms existing methods for predicting temporal evolution of velocity and pressure field, confirming its strong generalization by drawing probabilistic inference of forward process and physics-informed guidance sampling. The proposed Pi-fusion can also be generalized in learning other physical dynamics governed by partial differential equations.
[410] arXiv:2406.03715 (cross-list from math.PR) [pdf, other]: Title: Strong convergence rates for full-discrete approximations of the stochastic Allen-Cahn equations on 2D torus

Authors: Ting Ma, Lifei Wang, Huanyu Yang

Subjects: Probability (math.PR); Numerical Analysis (math.NA)

In this paper we construct space-time full discretizations of stochastic Allen-Cahn equations driven by space-time white noise on 2D torus. The approximations are implemented by tamed exponential Euler discretization in time and spectral Galerkin method in space. We finally obtain the convergence rates with the spatial order of $\alpha-\delta$ and the temporal order of ${\alpha}/{6}-\delta$ in $\mathcal C^{-\alpha}$ for $\alpha\in(0,1/3)$ and $\delta>0$ arbitrarily small.
[411] arXiv:2406.03734 (cross-list from math.OC) [pdf, other]: Title: Policy Gradient Methods for the Cost-Constrained LQR: Strong Duality and Global Convergence

Authors: Feiran Zhao, Keyou You

Subjects: Optimization and Control (math.OC); Systems and Control (eess.SY)

In safety-critical applications, reinforcement learning (RL) needs to consider safety constraints. However, theoretical understandings of constrained RL for continuous control are largely absent. As a case study, this paper presents a cost-constrained LQR formulation, where a number of LQR costs with user-defined penalty matrices are subject to constraints. To solve it, we propose a policy gradient primal-dual method to find an optimal state feedback gain. Despite the non-convexity of the cost-constrained LQR problem, we provide a constructive proof for strong duality and a geometric interpretation of an optimal multiplier set. By proving that the concave dual function is Lipschitz smooth, we further provide convergence guarantees for the PG primal-dual method. Finally, we perform simulations to validate our theoretical findings.
[412] arXiv:2406.03766 (cross-list from eess.SP) [pdf, other]: Title: Privacy Preserving Semi-Decentralized Mean Estimation over Intermittently-Connected Networks

Authors: Rajarshi Saha, Mohamed Seif, Michal Yemini, Andrea J. Goldsmith, H. Vincent Poor

Comments: 14 pages, 6 figures. arXiv admin note: text overlap with arXiv:2303.00035

Subjects: Signal Processing (eess.SP); Distributed, Parallel, and Cluster Computing (cs.DC); Information Theory (cs.IT); Machine Learning (cs.LG); Systems and Control (eess.SY)

We consider the problem of privately estimating the mean of vectors distributed across different nodes of an unreliable wireless network, where communications between nodes can fail intermittently. We adopt a semi-decentralized setup, wherein to mitigate the impact of intermittently connected links, nodes can collaborate with their neighbors to compute a local consensus, which they relay to a central server. In such a setting, the communications between any pair of nodes must ensure that the privacy of the nodes is rigorously maintained to prevent unauthorized information leakage. We study the tradeoff between collaborative relaying and privacy leakage due to the data sharing among nodes and, subsequently, propose PriCER: Private Collaborative Estimation via Relaying -- a differentially private collaborative algorithm for mean estimation to optimize this tradeoff. The privacy guarantees of PriCER arise (i) implicitly, by exploiting the inherent stochasticity of the flaky network connections, and (ii) explicitly, by adding Gaussian perturbations to the estimates exchanged by the nodes. Local and central privacy guarantees are provided against eavesdroppers who can observe different signals, such as the communications amongst nodes during local consensus and (possibly multiple) transmissions from the relays to the central server. We substantiate our theoretical findings with numerical simulations. Our implementation is available at https://github.com/rajarshisaha95/private-collaborative-relaying.
[413] arXiv:2406.03783 (cross-list from math.CO) [pdf, other]: Title: Flips in colorful triangulations

Authors: Rohan Acharya, Torsten Mütze, Francesco Verciani

Subjects: Combinatorics (math.CO); Discrete Mathematics (cs.DM)

The associahedron is the graph $\mathcal{G}_N$ that has as nodes all triangulations of a convex $N$-gon, and an edge between any two triangulations that differ in a flip operation, which consists of removing an edge shared by two triangles and replacing it by the other diagonal of the resulting 4-gon. In this paper, we consider a large collection of induced subgraphs of $\mathcal{G}_N$ obtained by Ramsey-type colorability properties. Specifically, coloring the points of the $N$-gon red and blue alternatingly, we consider only colorful triangulations, namely triangulations in which every triangle has points in both colors, i.e., monochromatic triangles are forbidden. The resulting induced subgraph of $\mathcal{G}_N$ on colorful triangulations is denoted by $\mathcal{F}_N$. We prove that $\mathcal{F}_N$ has a Hamilton cycle for all $N\geq 8$, resolving a problem raised by Sagan, i.e., all colorful triangulations on $N$ points can be listed so that any two cyclically consecutive triangulations differ in a flip. In fact, we prove that for an arbitrary fixed coloring pattern of the $N$ points with at least 10 changes of color, the resulting subgraph of $\mathcal{G}_N$ on colorful triangulations (for that coloring pattern) admits a Hamilton cycle. We also provide an efficient algorithm for computing a Hamilton path in $\mathcal{F}_N$ that runs in time $\mathcal{O}(1)$ on average per generated node. This algorithm is based on a new and algorithmic construction of a tree rotation Gray code for listing all $n$-vertex $k$-ary trees that runs in time $\mathcal{O}(k)$ on average per generated tree.
[414] arXiv:2406.03787 (cross-list from math.OC) [pdf, other]: Title: Projection-Free Variance Reduction Methods for Stochastic Constrained Multi-Level Compositional Optimization

Authors: Wei Jiang, Sifan Yang, Wenhao Yang, Yibo Wang, Yuanyu Wan, Lijun Zhang

Subjects: Optimization and Control (math.OC); Machine Learning (cs.LG)

This paper investigates projection-free algorithms for stochastic constrained multi-level optimization. In this context, the objective function is a nested composition of several smooth functions, and the decision set is closed and convex. Existing projection-free algorithms for solving this problem suffer from two limitations: 1) they solely focus on the gradient mapping criterion and fail to match the optimal sample complexities in unconstrained settings; 2) their analysis is exclusively applicable to non-convex functions, without considering convex and strongly convex objectives. To address these issues, we introduce novel projection-free variance reduction algorithms and analyze their complexities under different criteria. For gradient mapping, our complexities improve existing results and match the optimal rates for unconstrained problems. For the widely-used Frank-Wolfe gap criterion, we provide theoretical guarantees that align with those for single-level problems. Additionally, by using a stage-wise adaptation, we further obtain complexities for convex and strongly convex functions. Finally, numerical experiments on different tasks demonstrate the effectiveness of our methods.
[415] arXiv:2406.03810 (cross-list from astro-ph.IM) [pdf, ps, other]: Title: Spherinator and HiPSter: Representation Learning for Unbiased Knowledge Discovery from Simulations

Authors: Kai L. Polsterer, Bernd Doser, Andreas Fehlner, Sebastian Trujillo-Gomez

Comments: 4 pages, 1 figure

Subjects: Instrumentation and Methods for Astrophysics (astro-ph.IM); Machine Learning (cs.LG)

Simulations are the best approximation to experimental laboratories in astrophysics and cosmology. However, the complexity, richness, and large size of their outputs severely limit the interpretability of their predictions. We describe a new, unbiased, and machine learning based approach to obtaining useful scientific insights from a broad range of simulations. The method can be used on today's largest simulations and will be essential to solve the extreme data exploration and analysis challenges posed by the Exascale era. Furthermore, this concept is so flexible, that it will also enable explorative access to observed data. Our concept is based on applying nonlinear dimensionality reduction to learn compact representations of the data in a low-dimensional space. The simulation data is projected onto this space for interactive inspection, visual interpretation, sample selection, and local analysis. We present a prototype using a rotational invariant hyperspherical variational convolutional autoencoder, utilizing a power distribution in the latent space, and trained on galaxies from IllustrisTNG simulation. Thereby, we obtain a natural Hubble tuning fork like similarity space that can be visualized interactively on the surface of a sphere by exploiting the power of HiPS tilings in Aladin Lite.
[416] arXiv:2406.03832 (cross-list from astro-ph.IM) [pdf, ps, other]: Title: UltraPINK -- New possibilities to explore Self-Organizing Kohonen Maps

Authors: Fenja Kollasch, Kai Polsterer

Subjects: Instrumentation and Methods for Astrophysics (astro-ph.IM); Human-Computer Interaction (cs.HC)

Unsupervised learning algorithms like self-organizing Kohonen maps are a promising approach to gain an overview among massive datasets. With UltraPINK, researchers can train, inspect, and explore self-organizing maps, whereby the toolbox of interaction possibilities grows continually. Key feature of UltraPINK is the consideration of versality in astronomical data. By keeping the operations as abstract as possible and using design patterns meant for abstract usage, we ensure that data is compatible with UltraPINK, regardless of its type, formatting, or origin. Future work on the application will keep extending the catalogue of exploration tools and the interfaces towards other established applications to process astronomical data. Ultimatively, we aim towards a solid infrastructure for data analysis in astronomy.
[417] arXiv:2406.03867 (cross-list from quant-ph) [pdf, other]: Title: A Comprehensive Study of Quantum Arithmetic Circuits

Authors: Siyi Wang, Xiufan Li, Wei Jie Bryan Lee, Suman Deb, Eugene Lim, Anupam Chattopadhyay

Comments: Under review at the Royal Society's Philosophical Transactions A

Subjects: Quantum Physics (quant-ph); Emerging Technologies (cs.ET)

In recent decades, the field of quantum computing has experienced remarkable progress. This progress is marked by the superior performance of many quantum algorithms compared to their classical counterparts, with Shor's algorithm serving as a prominent illustration. Quantum arithmetic circuits, which are the fundamental building blocks in numerous quantum algorithms, have attracted much attention. Despite extensive exploration of various designs in the existing literature, researchers remain keen on developing novel designs and improving existing ones.
In this review article, we aim to provide a systematically organized and easily comprehensible overview of the current state-of-the-art in quantum arithmetic circuits. Specifically, this study covers fundamental operations such as addition, subtraction, multiplication, division and modular exponentiation. We delve into the detailed quantum implementations of these prominent designs and evaluate their efficiency considering various objectives. We also discuss potential applications of presented arithmetic circuits and suggest future research directions.
[418] arXiv:2406.03896 (cross-list from cond-mat.soft) [pdf, other]: Title: Data-driven discovery of self-similarity using neural networks

Authors: Ryota Watanabe, Takanori Ishii, Yuji Hirono, Hirokazu Maruoka

Comments: 21 pages, 15 figures, 5 tables

Subjects: Soft Condensed Matter (cond-mat.soft); Statistical Mechanics (cond-mat.stat-mech); Machine Learning (cs.LG)

Finding self-similarity is a key step for understanding the governing law behind complex physical phenomena. Traditional methods for identifying self-similarity often rely on specific models, which can introduce significant bias. In this paper, we present a novel neural network-based approach that discovers self-similarity directly from observed data, without presupposing any models. The presence of self-similar solutions in a physical problem signals that the governing law contains a function whose arguments are given by power-law monomials of physical parameters, which are characterized by power-law exponents. The basic idea is to enforce such particular forms structurally in a neural network in a parametrized way. We train the neural network model using the observed data, and when the training is successful, we can extract the power exponents that characterize scale-transformation symmetries of the physical problem. We demonstrate the effectiveness of our method with both synthetic and experimental data, validating its potential as a robust, model-independent tool for exploring self-similarity in complex systems.
[419] arXiv:2406.03901 (cross-list from eess.IV) [pdf, other]: Title: Polyp and Surgical Instrument Segmentation with Double Encoder-Decoder Networks

Authors: Adrian Galdran

Journal-ref: NMI, Vol. 1 No. 1 (2021): MedAI: Transparency in Medical Image Segmentation

Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

This paper describes a solution for the MedAI competition, in which participants were required to segment both polyps and surgical instruments from endoscopic images. Our approach relies on a double encoder-decoder neural network which we have previously applied for polyp segmentation, but with a series of enhancements: a more powerful encoder architecture, an improved optimization procedure, and the post-processing of segmentations based on tempered model ensembling. Experimental results show that our method produces segmentations that show a good agreement with manual delineations provided by medical experts.
[420] arXiv:2406.03902 (cross-list from eess.IV) [pdf, other]: Title: C^2RV: Cross-Regional and Cross-View Learning for Sparse-View CBCT Reconstruction

Authors: Yiqun Lin, Jiewen Yang, Hualiang Wang, Xinpeng Ding, Wei Zhao, Xiaomeng Li

Comments: Accepted to CVPR 2024

Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

Cone beam computed tomography (CBCT) is an important imaging technology widely used in medical scenarios, such as diagnosis and preoperative planning. Using fewer projection views to reconstruct CT, also known as sparse-view reconstruction, can reduce ionizing radiation and further benefit interventional radiology. Compared with sparse-view reconstruction for traditional parallel/fan-beam CT, CBCT reconstruction is more challenging due to the increased dimensionality caused by the measurement process based on cone-shaped X-ray beams. As a 2D-to-3D reconstruction problem, although implicit neural representations have been introduced to enable efficient training, only local features are considered and different views are processed equally in previous works, resulting in spatial inconsistency and poor performance on complicated anatomies. To this end, we propose C^2RV by leveraging explicit multi-scale volumetric representations to enable cross-regional learning in the 3D space. Additionally, the scale-view cross-attention module is introduced to adaptively aggregate multi-scale and multi-view features. Extensive experiments demonstrate that our C^2RV achieves consistent and significant improvement over previous state-of-the-art methods on datasets with diverse anatomy.
[421] arXiv:2406.03903 (cross-list from eess.IV) [pdf, other]: Title: Data-Centric Label Smoothing for Explainable Glaucoma Screening from Eye Fundus Images

Authors: Adrian Galdran, Miguel A. González Ballester

Comments: Accepted to ISBI 2024 (Challenges), 2nd position in the JustRAIGS challenge (this https URL)

Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

As current computing capabilities increase, modern machine learning and computer vision system tend to increase in complexity, mostly by means of larger models and advanced optimization strategies. Although often neglected, in many problems there is also much to be gained by considering potential improvements in understanding and better leveraging already-available training data, including annotations. This so-called data-centric approach can lead to substantial performance increases, sometimes beyond what can be achieved by larger models. In this paper we adopt such an approach for the task of justifiable glaucoma screening from retinal images. In particular, we focus on how to combine information from multiple annotators of different skills into a tailored label smoothing scheme that allows us to better employ a large collection of fundus images, instead of discarding samples suffering from inter-rater variability. Internal validation results indicate that our bespoke label smoothing approach surpasses the performance of a standard resnet50 model and also the same model trained with conventional label smoothing techniques, in particular for the multi-label scenario of predicting clinical reasons of glaucoma likelihood in a highly imbalanced screening context. Our code is made available at github.com/agaldran/justraigs .
[422] arXiv:2406.03913 (cross-list from math.OC) [pdf, other]: Title: Recognizing weighted means in geodesic spaces

Authors: Ariel Goodwin, Adrian S. Lewis, Genaro Lopez-Acedo, Adriana Nicolae

Subjects: Optimization and Control (math.OC); Numerical Analysis (math.NA)

Geodesic metric spaces support a variety of averaging constructions for given finite sets. Computing such averages has generated extensive interest in diverse disciplines. Here we consider the inverse problem of recognizing computationally whether or not a given point is such an average, exactly or approximately. In nonpositively curved spaces, several averaging notions, including the usual weighted barycenter, produce the same "mean set". In such spaces, at points where the tangent cone is a Euclidean space, the recognition problem reduces to Euclidean projection onto a polytope. Hadamard manifolds comprise one example. Another consists of CAT(0) cubical complexes, at relative-interior points: the recognition problem is harder for general points, but we present an efficient semidefinite-programming-based algorithm.
[423] arXiv:2406.03924 (cross-list from stat.ML) [pdf, other]: Title: Statistical Multicriteria Benchmarking via the GSD-Front

Authors: Christoph Jansen (1), Georg Schollmeyer (2), Julian Rodemann (2), Hannah Blocher (2), Thomas Augustin (2) ((1) Lancaster University Leipzig, (2) Ludwig-Maximilians-Universität München)

Comments: CJ, GS,JR and HB equally contributed to this work

Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Methodology (stat.ME)

Given the vast number of classifiers that have been (and continue to be) proposed, reliable methods for comparing them are becoming increasingly important. The desire for reliability is broken down into three main aspects: (1) Comparisons should allow for different quality metrics simultaneously. (2) Comparisons should take into account the statistical uncertainty induced by the choice of benchmark suite. (3) The robustness of the comparisons under small deviations in the underlying assumptions should be verifiable. To address (1), we propose to compare classifiers using a generalized stochastic dominance ordering (GSD) and present the GSD-front as an information-efficient alternative to the classical Pareto-front. For (2), we propose a consistent statistical estimator for the GSD-front and construct a statistical test for whether a (potentially new) classifier lies in the GSD-front of a set of state-of-the-art classifiers. For (3), we relax our proposed test using techniques from robust statistics and imprecise probabilities. We illustrate our concepts on the benchmark suite PMLB and on the platform OpenML.
[424] arXiv:2406.03938 (cross-list from q-bio.PE) [pdf, other]: Title: Diversity in Evolutionary Dynamics

Authors: Yuval Rabani, Leonard J. Schulman, Alistair Sinclair

Subjects: Populations and Evolution (q-bio.PE); Computational Engineering, Finance, and Science (cs.CE)

We consider the dynamics imposed by natural selection on the populations of two competing, sexually reproducing, haploid species. In this setting, the fitness of any genome varies over time due to the changing population mix of the competing species; crucially, this fitness variation arises naturally from the model itself, without the need for imposing it exogenously as is typically the case. Previous work on this model [14] showed that, in the special case where each of the two species exhibits just two phenotypes, genetic diversity is maintained at all times. This finding supported the tenet that sexual reproduction is advantageous because it promotes diversity, which increases the survivability of a species.
In the present paper we consider the more realistic case where there are more than two phenotypes available to each species. The conclusions about diversity in general turn out to be very different from the two-phenotype case.
Our first result is negative: namely, we show that sexual reproduction does not guarantee the maintenance of diversity at all times, i.e., the result of [14] does not generalize. Our counterexample consists of two competing species with just three phenotypes each. We show that, for any time~$t_0$ and any $\varepsilon>0$, there is a time $t\ge t_0$ at which the combined diversity of both species is smaller than~$\varepsilon$. Our main result is a complementary positive statement, which says that in any non-degenerate example, diversity is maintained in a weaker, ``infinitely often'' sense.
Thus, our results refute the supposition that sexual reproduction ensures diversity at all times, but affirm a weaker assertion that extended periods of high diversity are necessarily a recurrent event.
[425] arXiv:2406.03961 (cross-list from eess.IV) [pdf, ps, other]: Title: LDM-RSIC: Exploring Distortion Prior with Latent Diffusion Models for Remote Sensing Image Compression

Authors: Junhui Li, Jutao Li, Xingsong Hou, Huake Wang, Yutao Zhang, Yujie Dun, Wenke Sun

Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

Deep learning-based image compression algorithms typically focus on designing encoding and decoding networks and improving the accuracy of entropy model estimation to enhance the rate-distortion (RD) performance. However, few algorithms leverage the compression distortion prior from existing compression algorithms to improve RD performance. In this paper, we propose a latent diffusion model-based remote sensing image compression (LDM-RSIC) method, which aims to enhance the final decoding quality of RS images by utilizing the generated distortion prior from a LDM. Our approach consists of two stages. In the first stage, a self-encoder learns prior from the high-quality input image. In the second stage, the prior is generated through an LDM, conditioned on the decoded image of an existing learning-based image compression algorithm, to be used as auxiliary information for generating the texture-rich enhanced image. To better utilize the prior, a channel attention and gate-based dynamic feature attention module (DFAM) is embedded into a Transformer-based multi-scale enhancement network (MEN) for image enhancement. Extensive experiments demonstrate the proposed LDM-RSIC significantly outperforms existing state-of-the-art traditional and learning-based image compression algorithms in terms of both subjective perception and objective metrics. Additionally, we use the LDM-based scheme to improve the traditional image compression algorithm JPEG2000 and obtain 32.00% bit savings on the DOTA testing set. The code will be available at https://github.com/mlkk518/LDM-RSIC.
[426] arXiv:2406.03972 (cross-list from quant-ph) [pdf, ps, other]: Title: Eigenpath traversal by Poisson-distributed phase randomisation

Authors: Joseph Cunningham, Jérémie Roland

Comments: 19 pages

Subjects: Quantum Physics (quant-ph); Data Structures and Algorithms (cs.DS)

We present a framework for quantum computation, similar to Adiabatic Quantum Computation (AQC), that is based on the quantum Zeno effect. By performing randomised dephasing operations at intervals determined by a Poisson process, we are able to track the eigenspace associated to a particular eigenvalue.
We derive a simple differential equation for the fidelity, leading to general theorems bounding the time complexity of a whole class of algorithms. We also use eigenstate filtering to optimise the scaling of the complexity in the error tolerance $\epsilon$.
In many cases the bounds given by our general theorems are optimal, giving a time complexity of $O(1/\Delta_m)$ with $\Delta_m$ the minimum of the gap. This allows us to prove optimal results using very general features of problems, minimising the problem-specific insight necessary.
As two applications of our framework, we obtain optimal scaling for the Grover problem (i.e.\ $O(\sqrt{N})$ where $N$ is the database size) and the Quantum Linear System Problem (i.e.\ $O(\kappa\log(1/\epsilon))$ where $\kappa$ is the condition number and $\epsilon$ the error tolerance) by direct applications of our theorems.
[427] arXiv:2406.04000 (cross-list from physics.optics) [pdf, other]: Title: Stochastic logic in biased coupled photonic probabilistic bits

Authors: Michael Horodynski, Charles Roques-Carmes, Yannick Salamin, Seou Choi, Jamison Sloan, Di Luo, Marin Soljačić

Subjects: Optics (physics.optics); Emerging Technologies (cs.ET)

Optical computing often employs tailor-made hardware to implement specific algorithms, trading generality for improved performance in key aspects like speed and power efficiency. An important computing approach that is still missing its corresponding optical hardware is probabilistic computing, used e.g. for solving difficult combinatorial optimization problems. In this study, we propose an experimentally viable photonic approach to solve arbitrary probabilistic computing problems. Our method relies on the insight that coherent Ising machines composed of coupled and biased optical parametric oscillators can emulate stochastic logic. We demonstrate the feasibility of our approach by using numerical simulations equivalent to the full density matrix formulation of coupled optical parametric oscillators.
[428] arXiv:2406.04001 (cross-list from math.OC) [pdf, other]: Title: Benign Nonconvex Landscapes in Optimal and Robust Control, Part II: Extended Convex Lifting

Authors: Yang Zheng, Chih-Fan Pai, Yujie Tang

Subjects: Optimization and Control (math.OC); Systems and Control (eess.SY); Dynamical Systems (math.DS)

Many optimal and robust control problems are nonconvex and potentially nonsmooth in their policy optimization forms. In Part II of this paper, we introduce a new and unified Extended Convex Lifting (ECL) framework to reveal hidden convexity in classical optimal and robust control problems from a modern optimization perspective. Our ECL offers a bridge between nonconvex policy optimization and convex reformulations, enabling convex analysis for nonconvex problems. Despite non-convexity and non-smoothness, the existence of an ECL not only reveals that minimizing the original function is equivalent to a convex problem but also certifies a class of first-order non-degenerate stationary points to be globally optimal. Therefore, no spurious stationarity exists in the set of non-degenerate policies. This ECL framework can cover many benchmark control problems, including state feedback linear quadratic regulator (LQR), dynamic output feedback linear quadratic Gaussian (LQG) control, and $\mathcal{H}_\infty$ robust control. ECL can also handle a class of distributed control problems when the notion of quadratic invariance (QI) holds. We further show that all static stabilizing policies are non-degenerate for state feedback LQR and $\mathcal{H}_\infty$ control under standard assumptions. We believe that the new ECL framework may be of independent interest for analyzing nonconvex problems beyond control.
[429] arXiv:2406.04004 (cross-list from quant-ph) [pdf, other]: Title: T-Count Optimizing Genetic Algorithm for Quantum State Preparation

Authors: Andrew Wright, Marco Lewis, Paolo Zuliani, Sadegh Soudjani

Comments: To appear in IEEE QSW 2024 proceedings

Subjects: Quantum Physics (quant-ph); Neural and Evolutionary Computing (cs.NE)

Quantum state preparation is a crucial process within numerous quantum algorithms, and the need for efficient initialization of quantum registers is ever increasing as demand for useful quantum computing grows. The problem arises as the number of qubits to be initialized grows, the circuits required to implement the desired state also exponentially increase in size leading to loss of fidelity to noise. This is mainly due to the susceptibility to environmental effects of the non-Clifford T gate, whose use should thus be reduced as much as possible. In this paper, we present and utilize a genetic algorithm for state preparation circuits consisting of gates from the Clifford + T gate set and optimize them in T-Count as to reduce the impact of noise. Whilst the method presented here does not always produce the most accurate circuits in terms of fidelity, it can generate high-fidelity, non-trivial quantum states such as quantum Fourier transform states. In addition, our algorithm does automatically generate fault tolerantly implementable solutions where the number of the most error prone components is reduced. We present an evaluation of the algorithm when trialed against preparing random, Poisson probability distribution, W, GHZ, and quantum Fourier transform states. We also experimentally demonstrate the scalability issues as qubit count increases, which highlights the need for further optimization of the search process.
[430] arXiv:2406.04012 (cross-list from stat.ML) [pdf, other]: Title: Variational inference, Mixture of Gaussians, Bayesian Machine Learning

Authors: Tom Huix, Anna Korba, Alain Durmus, Eric Moulines

Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

Variational inference (VI) is a popular approach in Bayesian inference, that looks for the best approximation of the posterior distribution within a parametric family, minimizing a loss that is typically the (reverse) Kullback-Leibler (KL) divergence. Despite its empirical success, the theoretical properties of VI have only received attention recently, and mostly when the parametric family is the one of Gaussians. This work aims to contribute to the theoretical study of VI in the non-Gaussian case by investigating the setting of Mixture of Gaussians with fixed covariance and constant weights. In this view, VI over this specific family can be casted as the minimization of a Mollified relative entropy, i.e. the KL between the convolution (with respect to a Gaussian kernel) of an atomic measure supported on Diracs, and the target distribution. The support of the atomic measure corresponds to the localization of the Gaussian components. Hence, solving variational inference becomes equivalent to optimizing the positions of the Diracs (the particles), which can be done through gradient descent and takes the form of an interacting particle system. We study two sources of error of variational inference in this context when optimizing the mollified relative entropy. The first one is an optimization result, that is a descent lemma establishing that the algorithm decreases the objective at each iteration. The second one is an approximation error, that upper bounds the objective between an optimal finite mixture and the target distribution.
[431] arXiv:2406.04034 (cross-list from math.CO) [pdf, ps, other]: Title: The geometry of intersecting codes and applications to additive combinatorics and factorization theory

Authors: Martino Borello, Wolfgang Schmid, Martin Scotti

Comments: 31 pages

Subjects: Combinatorics (math.CO); Information Theory (cs.IT); Number Theory (math.NT)

Intersecting codes are linear codes where every two nonzero codewords have non-trivially intersecting support. In this article we expand on the theory of this family of codes, by showing that nondegenerate intersecting codes correspond to sets of points (with multiplicites) in a projective space that are not contained in two hyperplanes. This correspondence allows the use of geometric arguments to demonstrate properties and provide constructions of intersecting codes. We improve on existing bounds on their length and provide explicit constructions of short intersecting codes. Finally, generalizing a link between coding theory and the theory of the Davenport constant (a combinatorial invariant of finite abelian groups), we provide new asymptotic bounds on the weighted $2$-wise Davenport constant. These bounds then yield results on factorizations in rings of algebraic integers and related structures.
[432] arXiv:2406.04047 (cross-list from stat.ML) [pdf, other]: Title: Slicing Mutual Information Generalization Bounds for Neural Networks

Authors: Kimia Nadjahi, Kristjan Greenewald, Rickard Brüel Gabrielsson, Justin Solomon

Comments: Accepted at ICML 2024

Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

The ability of machine learning (ML) algorithms to generalize well to unseen data has been studied through the lens of information theory, by bounding the generalization error with the input-output mutual information (MI), i.e., the MI between the training data and the learned hypothesis. Yet, these bounds have limited practicality for modern ML applications (e.g., deep learning), due to the difficulty of evaluating MI in high dimensions. Motivated by recent findings on the compressibility of neural networks, we consider algorithms that operate by slicing the parameter space, i.e., trained on random lower-dimensional subspaces. We introduce new, tighter information-theoretic generalization bounds tailored for such algorithms, demonstrating that slicing improves generalization. Our bounds offer significant computational and statistical advantages over standard MI bounds, as they rely on scalable alternative measures of dependence, i.e., disintegrated mutual information and $k$-sliced mutual information. Then, we extend our analysis to algorithms whose parameters do not need to exactly lie on random subspaces, by leveraging rate-distortion theory. This strategy yields generalization bounds that incorporate a distortion term measuring model compressibility under slicing, thereby tightening existing bounds without compromising performance or requiring model compression. Building on this, we propose a regularization scheme enabling practitioners to control generalization through compressibility. Finally, we empirically validate our results and achieve the computation of non-vacuous information-theoretic generalization bounds for neural networks, a task that was previously out of reach.
[433] arXiv:2406.04071 (cross-list from stat.ML) [pdf, other]: Title: Dynamic angular synchronization under smoothness constraints

Authors: Ernesto Araya, Mihai Cucuringu, Hemant Tyagi

Comments: 40 pages, 9 figures

Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST)

Given an undirected measurement graph $\mathcal{H} = ([n], \mathcal{E})$, the classical angular synchronization problem consists of recovering unknown angles $\theta_1^*,\dots,\theta_n^*$ from a collection of noisy pairwise measurements of the form $(\theta_i^* - \theta_j^*) \mod 2\pi$, for all $\{i,j\} \in \mathcal{E}$. This problem arises in a variety of applications, including computer vision, time synchronization of distributed networks, and ranking from pairwise comparisons. In this paper, we consider a dynamic version of this problem where the angles, and also the measurement graphs evolve over $T$ time points. Assuming a smoothness condition on the evolution of the latent angles, we derive three algorithms for joint estimation of the angles over all time points. Moreover, for one of the algorithms, we establish non-asymptotic recovery guarantees for the mean-squared error (MSE) under different statistical models. In particular, we show that the MSE converges to zero as $T$ increases under milder conditions than in the static setting. This includes the setting where the measurement graphs are highly sparse and disconnected, and also when the measurement noise is large and can potentially increase with $T$. We complement our theoretical results with experiments on synthetic data.
[434] arXiv:2406.04098 (cross-list from stat.ML) [pdf, other]: Title: A Large-Scale Neutral Comparison Study of Survival Models on Low-Dimensional Data

Authors: Lukas Burk, John Zobolas, Bernd Bischl, Andreas Bender, Marvin N. Wright, Raphael Sonabend

Comments: 42 pages, 28 figures

Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

This work presents the first large-scale neutral benchmark experiment focused on single-event, right-censored, low-dimensional survival data. Benchmark experiments are essential in methodological research to scientifically compare new and existing model classes through proper empirical evaluation. Existing benchmarks in the survival literature are often narrow in scope, focusing, for example, on high-dimensional data. Additionally, they may lack appropriate tuning or evaluation procedures, or are qualitative reviews, rather than quantitative comparisons. This comprehensive study aims to fill the gap by neutrally evaluating a broad range of methods and providing generalizable conclusions. We benchmark 18 models, ranging from classical statistical approaches to many common machine learning methods, on 32 publicly available datasets. The benchmark tunes for both a discrimination measure and a proper scoring rule to assess performance in different settings. Evaluating on 8 survival metrics, we assess discrimination, calibration, and overall predictive performance of the tested models. Using discrimination measures, we find that no method significantly outperforms the Cox model. However, (tuned) Accelerated Failure Time models were able to achieve significantly better results with respect to overall predictive performance as measured by the right-censored log-likelihood. Machine learning methods that performed comparably well include Oblique Random Survival Forests under discrimination, and Cox-based likelihood-boosting under overall predictive performance. We conclude that for predictive purposes in the standard survival analysis setting of low-dimensional, right-censored data, the Cox Proportional Hazards model remains a simple and robust method, sufficient for practitioners.
[435] arXiv:2406.04132 (cross-list from math.DS) [pdf, ps, other]: Title: Realizability of Subgroups by Subshifts of Finite Type

Authors: Nicolás Bitar

Comments: 26 pages, 2 figures. Comments welcome

Subjects: Dynamical Systems (math.DS); Discrete Mathematics (cs.DM); Group Theory (math.GR)

We study the problem of realizing families of subgroups as the set of stabilizers of configurations from a subshift of finite type (SFT). This problem generalizes both the existence of strongly and weakly aperiodic SFTs. We show that a finitely generated normal subgroup is realizable if and only if the quotient by the subgroup admits a strongly aperiodic SFT. We also show that if a subgroup is realizable, its subgroup membership problem must be decidable. The article also contains the introduction of periodically rigid groups, which are groups for which every weakly aperiodic subshift of finite type is strongly aperiodic. We conjecture that the only finitely generated periodically rigid groups are virtually $\mathbb{Z}$ groups and torsion-free virtually $\mathbb{Z}^2$ groups. Finally, we show virtually nilpotent and polycyclic groups satisfy the conjecture.
[436] arXiv:2406.04142 (cross-list from math.OC) [pdf, other]: Title: Stochastic Polyak Step-sizes and Momentum: Convergence Guarantees and Practical Performance

Authors: Dimitris Oikonomou, Nicolas Loizou

Comments: 39 pages, 20 Figures

Subjects: Optimization and Control (math.OC); Machine Learning (cs.LG); Machine Learning (stat.ML)

Stochastic gradient descent with momentum, also known as Stochastic Heavy Ball method (SHB), is one of the most popular algorithms for solving large-scale stochastic optimization problems in various machine learning tasks. In practical scenarios, tuning the step-size and momentum parameters of the method is a prohibitively expensive and time-consuming process. In this work, inspired by the recent advantages of stochastic Polyak step-size in the performance of stochastic gradient descent (SGD), we propose and explore new Polyak-type variants suitable for the update rule of the SHB method. In particular, using the Iterate Moving Average (IMA) viewpoint of SHB, we propose and analyze three novel step-size selections: MomSPS$_{\max}$, MomDecSPS, and MomAdaSPS. For MomSPS$_{\max}$, we provide convergence guarantees for SHB to a neighborhood of the solution for convex and smooth problems (without assuming interpolation). If interpolation is also satisfied, then using MomSPS$_{\max}$, SHB converges to the true solution at a fast rate matching the deterministic HB. The other two variants, MomDecSPS and MomAdaSPS, are the first adaptive step-sizes for SHB that guarantee convergence to the exact minimizer without prior knowledge of the problem parameters and without assuming interpolation. The convergence analysis of SHB is tight and obtains the convergence guarantees of SGD with stochastic Polyak step-sizes as a special case. We supplement our analysis with experiments that validate the theory and demonstrate the effectiveness and robustness of the new algorithms.
[437] arXiv:2406.04149 (cross-list from eess.IV) [pdf, ps, other]: Title: Characterizing segregation in blast rock piles a deep-learning approach leveraging aerial image analysis

Authors: Chengeng Liu, Sihong Liu, Chaomin Shen, Yupeng Gao, Yuxuan Liu

Subjects: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI)

Blasted rock material serves a critical role in various engineering applications, yet the phenomenon of segregation-where particle sizes vary significantly along the gradient of a quarry pile-presents challenges for optimizing quarry material storage and handling. This study introduces an advanced image analysis methodology to characterize such segregation of rock fragments. The accurate delineation of detailed rock fragment size distributions was achieved through the analysis of drone-captured imagery, coupled with the application of an enhanced Unet semantic segmentation model integrated with an expansion-based post-processing technique. The quarry slope was stratified into four vertical sections, with the size distribution of each section quantified via ellipsoid shape approximations. Our results disclose pronounced vertical segregation patterns, with finer particles concentrated in the upper slope regions and coarser particles in the lower. Utilizing relative characteristic diameters, we offered insight into the degree of segregation, thereby illustrating the spatial heterogeneity in fragment size more clearly. The techniques outlined in this study deliver a scalable and accurate method for assessing fragment size distribution, with the potential to better inform resource management and operational decisions in quarry management.
[438] arXiv:2406.04163 (cross-list from math.OC) [pdf, ps, other]: Title: Essentially Sharp Estimates on the Entropy Regularization Error in Discrete Discounted Markov Decision Processes

Authors: Johannes Müller, Semih Cayci

Comments: 25 pages, 1 figure

Subjects: Optimization and Control (math.OC); Machine Learning (cs.LG); Systems and Control (eess.SY)

We study the error introduced by entropy regularization of infinite-horizon discrete discounted Markov decision processes. We show that this error decreases exponentially in the inverse regularization strength both in a weighted KL-divergence and in value with a problem-specific exponent. We provide a lower bound matching our upper bound up to a polynomial factor. Our proof relies on the correspondence of the solutions of entropy-regularized Markov decision processes with gradient flows of the unregularized reward with respect to a Riemannian metric common in natural policy gradient methods. Further, this correspondence allows us to identify the limit of the gradient flow as the generalized maximum entropy optimal policy, thereby characterizing the implicit bias of the Kakade gradient flow which corresponds to a time-continuous version of the natural policy gradient method. We use this to show that for entropy-regularized natural policy gradient methods the overall error decays exponentially in the square root of the number of iterations improving existing sublinear guarantees.
[439] arXiv:2406.04179 (cross-list from math.PR) [pdf, ps, other]: Title: On the zeros of partition functions with multi-spin interactions

Authors: Alexander Barvinok

Comments: 16 pages

Subjects: Probability (math.PR); Data Structures and Algorithms (cs.DS); Mathematical Physics (math-ph); Combinatorics (math.CO)

Let $X_1, \ldots, X_n$ be probability spaces, let $X$ be their direct product, let $\phi_1, \ldots, \phi_m: X \longrightarrow {\Bbb C}$ be random variables, each depending only on a few coordinates of a point $x=(x_1, \ldots, x_n)$, and let $f=\phi_1 + \ldots + \phi_m$. The expectation $E\thinspace e^{\lambda f}$, where $\lambda \in {\Bbb C}$, appears in statistical physics as the partition function of a system with multi-spin interactions, and also in combinatorics and computer science, where it is known as the partition function of edge-coloring models, tensor network contractions or a Holant polynomial. Assuming that each $\phi_i$ is 1-Lipschitz in the Hamming metric of $X$, that each $\phi_i(x)$ depends on at most $r \geq 2$ coordinates $x_1, \ldots, x_n$ of $x \in X$, and that for each $j$ there are at most $c \geq 1$ functions $\phi_i$ that depend on the coordinate $x_j$, we prove that $E\thinspace e^{\lambda f} \ne 0$ provided $| \lambda | \leq \ (3 c \sqrt{r-1})^{-1}$ and that the bound is sharp up to a logarithmic in $r$ factor. As a corollary, the value of the expectation can be efficiently approximated, provided $\lambda$ lies in a slightly smaller disc.
[440] arXiv:2406.04188 (cross-list from eess.SP) [pdf, other]: Title: Digital Twin Aided RIS Communication: Robust Beamforming and Interference Management

Authors: Sadjad Alikhani, Ahmed Alkhateeb

Comments: Dataset and code files will be available soon on the DeepMIMIO website: this https URL

Subjects: Signal Processing (eess.SP); Information Theory (cs.IT)

Reconfigurable intelligent surfaces (RISs) are envisioned to play a key role in future wireless communication networks. However, channel estimation in RIS-aided wireless networks is challenging due to their passive nature and the large number of reflective elements, leading to high channel estimation overhead. Additionally, conventional methods like beam sweeping, which do not rely on explicit channel state information, often struggle in managing interference in multi-user networks. In this paper, we propose a novel approach that leverages digital twins (DTs) of the physical environments to approximate channels using electromagnetic 3D models and ray tracing, thus relaxing the need for channel estimation and extensive over-the-air computations in RIS-aided wireless networks. To address the digital twins channel approximation errors, we further refine this approach with a DT-specific robust transmission design that reliably meets minimum desired rates. The results show that our method secures these rates over 90% of the time, significantly outperforming beam sweeping, which achieves these rates less than 8% of the time due to its poor management of transmitting power and interference.
[441] arXiv:2406.04203 (cross-list from math.PR) [pdf, other]: Title: Explicit Steady-State Approximations for Parallel Server Systems with Heterogeneous Servers

Authors: J. G. Dai, Yaosheng Xu

Subjects: Probability (math.PR); Systems and Control (eess.SY); Optimization and Control (math.OC)

The weighted-workload-task-allocation (WWTA) load-balancing policy is known to be throughput optimal for parallel server systems with heterogeneous servers. This work concerns the heavy traffic approximation of steady-state performance for parallel server systems operating under WWTA policy. Under a relaxed complete-resource-pooling condition, we prove that WWTA achieves a "strong form" of state-space collapse in heavy traffic and that the scaled workload for each server converges in distribution to an exponential random variable, whose parameter is explicitly given by system primitives. Various steady-state performance measures are shown to be approximated from this exponential random variable. Instead of proving a stochastic process limit followed by an interchange of limits - a method that dominates the literature, our method works directly with a pre-limit basic adjoint relationship (BAR) that characterizes the stationary distribution of each pre-limit system.
[442] arXiv:2406.04212 (cross-list from eess.AS) [pdf, ps, other]: Title: Sound Event Bounding Boxes

Authors: Janek Ebbers, Francois G. Germain, Gordon Wichern, Jonathan Le Roux

Comments: Accepted for publication at Interspeech 2024

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)

Sound event detection is the task of recognizing sounds and determining their extent (onset/offset times) within an audio clip. Existing systems commonly predict sound presence confidence in short time frames. Then, thresholding produces binary frame-level presence decisions, with the extent of individual events determined by merging consecutive positive frames. In this paper, we show that frame-level thresholding degrades the prediction of the event extent by coupling it with the system's sound presence confidence. We propose to decouple the prediction of event extent and confidence by introducing SEBBs, which format each sound event prediction as a tuple of a class type, extent, and overall confidence. We also propose a change-detection-based algorithm to convert legacy frame-level outputs into SEBBs. We find the algorithm significantly improves the performance of DCASE 2023 Challenge systems, boosting the state of the art from .644 to .686 PSDS1.
[443] arXiv:2406.04243 (cross-list from math.OC) [pdf, other]: Title: Policy Optimization in Control: Geometry and Algorithmic Implications

Authors: Shahriar Talebi, Yang Zheng, Spencer Kraisler, Na Li, Mehran Mesbahi

Subjects: Optimization and Control (math.OC); Systems and Control (eess.SY); Differential Geometry (math.DG)

This survey explores the geometric perspective on policy optimization within the realm of feedback control systems, emphasizing the intrinsic relationship between control design and optimization. By adopting a geometric viewpoint, we aim to provide a nuanced understanding of how various ``complete parameterization'' -- referring to the policy parameters together with its Riemannian geometry -- of control design problems, influence stability and performance of local search algorithms. The paper is structured to address key themes such as policy parameterization, the topology and geometry of stabilizing policies, and their implications for various (non-convex) dynamic performance measures. We focus on a few iconic control design problems, including the Linear Quadratic Regulator (LQR), Linear Quadratic Gaussian (LQG) control, and $\mathcal{H}_\infty$ control. In particular, we first discuss the topology and Riemannian geometry of stabilizing policies, distinguishing between their static and dynamic realizations. Expanding on this geometric perspective, we then explore structural properties of the aforementioned performance measures and their interplay with the geometry of stabilizing policies in presence of policy constraints; along the way, we address issues such as spurious stationary points, symmetries of dynamic feedback policies, and (non-)smoothness of the corresponding performance measures. We conclude the survey with algorithmic implications of policy optimization in feedback design.
[444] arXiv:2406.04245 (cross-list from quant-ph) [pdf, ps, other]: Title: Online learning of a panoply of quantum objects

Authors: Akshay Bansal, Ian George, Soumik Ghosh, Jamie Sikora, Alice Zheng

Comments: 34 pages. Comments welcome

Subjects: Quantum Physics (quant-ph); Machine Learning (cs.LG)

In many quantum tasks, there is an unknown quantum object that one wishes to learn. An online strategy for this task involves adaptively refining a hypothesis to reproduce such an object or its measurement statistics. A common evaluation metric for such a strategy is its regret, or roughly the accumulated errors in hypothesis statistics. We prove a sublinear regret bound for learning over general subsets of positive semidefinite matrices via the regularized-follow-the-leader algorithm and apply it to various settings where one wishes to learn quantum objects. For concrete applications, we present a sublinear regret bound for learning quantum states, effects, channels, interactive measurements, strategies, co-strategies, and the collection of inner products of pure states. Our bound applies to many other quantum objects with compact, convex representations. In proving our regret bound, we establish various matrix analysis results useful in quantum information theory. This includes a generalization of Pinsker's inequality for arbitrary positive semidefinite operators with possibly different traces, which may be of independent interest and applicable to more general classes of divergences.
[445] arXiv:2406.04250 (cross-list from quant-ph) [pdf, other]: Title: Online learning of quantum processes

Authors: Asad Raza, Matthias C. Caro, Jens Eisert, Sumeet Khatri

Comments: 14 + 72 pages, 6 figures

Subjects: Quantum Physics (quant-ph); Machine Learning (cs.LG); Machine Learning (stat.ML)

Among recent insights into learning quantum states, online learning and shadow tomography procedures are notable for their ability to accurately predict expectation values even of adaptively chosen observables. In contrast to the state case, quantum process learning tasks with a similarly adaptive nature have received little attention. In this work, we investigate online learning tasks for quantum processes. Whereas online learning is infeasible for general quantum channels, we show that channels of bounded gate complexity as well as Pauli channels can be online learned in the regret and mistake-bounded models of online learning. In fact, we can online learn probabilistic mixtures of any exponentially large set of known channels. We also provide a provably sample-efficient shadow tomography procedure for Pauli channels. Our results extend beyond quantum channels to non-Markovian multi-time processes, with favorable regret and mistake bounds, as well as a shadow tomography procedure. We complement our online learning upper bounds with mistake as well as computational lower bounds. On the technical side, we make use of the multiplicative weights update algorithm, classical adaptive data analysis, and Bell sampling, as well as tools from the theory of quantum combs for multi-time quantum processes. Our work initiates a study of online learning for classes of quantum channels and, more generally, non-Markovian quantum processes. Given the importance of online learning for state shadow tomography, this may serve as a step towards quantum channel variants of adaptive shadow tomography.
[446] arXiv:2406.04259 (cross-list from math.AT) [pdf, other]: Title: Topological Stability and Latschev-type Reconstruction Theorems for $\boldsymbol{\mathrm{CAT}(κ)}$ Spaces

Authors: Rafal Komendarczyk, Sushovan Majhi, Will Tran

Subjects: Algebraic Topology (math.AT); Computational Geometry (cs.CG); Metric Geometry (math.MG)

We consider the problem of homotopy-type reconstruction of compact shapes $X\subset\mathbb{R}^N$ that are $\mathrm{CAT}(\kappa)$ in the intrinsic length metric. The reconstructed spaces are in the form of Vietoris--Rips complexes computed from a compact sample $S$, Hausdorff--close to the unknown shape $X$. Instead of the Euclidean metric on the sample, our reconstruction technique leverages a path-based metric to compute these complexes. As naturally emerging in the framework of reconstruction, we also study the Gromov--Hausdorff topological stability and finiteness problem for general compact $\mathrm{CAT}(\kappa)$ spaces. Our techniques provide novel sampling conditions alternative to the existing and commonly used techniques using weak feature size and $\mu$--reach. In particular, we introduce a new parameter, called the {\em restricted distortion}, which is a generalization of the well-known global distortion of embedding. We show examples of Euclidean subspaces, for which the known parameters such as the reach, $\mu$--reach and weak features size vanish, whereas the restricted distortion is finite, making our reconstruction results applicable for such spaces.
[447] arXiv:2406.04269 (cross-list from eess.AS) [pdf, other]: Title: Beyond Performance Plateaus: A Comprehensive Study on Scalability in Speech Enhancement

Authors: Wangyou Zhang, Kohei Saijo, Jee-weon Jung, Chenda Li, Shinji Watanabe, Yanmin Qian

Comments: 5 pages, 3 figures, 4 tables, Accepted by Interspeech 2024

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)

Deep learning-based speech enhancement (SE) models have achieved impressive performance in the past decade. Numerous advanced architectures have been designed to deliver state-of-the-art performance; however, their scalability potential remains unrevealed. Meanwhile, the majority of research focuses on small-sized datasets with restricted diversity, leading to a plateau in performance improvement. In this paper, we aim to provide new insights for addressing the above issues by exploring the scalability of SE models in terms of architectures, model sizes, compute budgets, and dataset sizes. Our investigation involves several popular SE architectures and speech data from different domains. Experiments reveal both similarities and distinctions between the scaling effects in SE and other tasks such as speech recognition. These findings further provide insights into the under-explored SE directions, e.g., larger-scale multi-domain corpora and efficiently scalable architectures.
[448] arXiv:2406.04282 (cross-list from eess.SP) [pdf, other]: Title: A Statistical Characterization of Wireless Channels Conditioned on Side Information

Authors: Benedikt Böck, Michael Baur, Nurettin Turan, Dominik Semmler, Wolfgang Utschick

Subjects: Signal Processing (eess.SP); Information Theory (cs.IT)

Statistical prior channel knowledge, such as the wide-sense-stationary-uncorrelated-scattering (WSSUS) property, and additional side information both can be used to enhance physical layer applications in wireless communication. Generally, the wireless channel's strongly fluctuating path phases and WSSUS property characterize the channel by a zero mean and Toeplitz-structured covariance matrices in different domains. In this work, we derive a framework to comprehensively categorize side information based on whether it preserves or abandons these statistical features conditioned on the given side information. To accomplish this, we combine insights from a generic channel model with the representation of wireless channels as probabilistic graphs. Additionally, we exemplify several applications, ranging from channel modeling to estimation and clustering, which demonstrate how the proposed framework can practically enhance physical layer methods utilizing machine learning (ML).

Replacements for Fri, 7 Jun 24

[449] arXiv:1708.09157 (replaced) [pdf, other]: Title: Cross-lingual, Character-Level Neural Morphological Tagging

Authors: Ryan Cotterell, Georg Heigold

Comments: Published as a conference paper at EMNLP 2017; Fixed minor typos and cleaned up formatting

Subjects: Computation and Language (cs.CL)
[450] arXiv:1912.12095 (replaced) [pdf, other]: Title: One Point, One Object: Simultaneous 3D Object Segmentation and 6-DOF Pose Estimation

Authors: Hongsen Liu

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[451] arXiv:2008.05195 (replaced) [pdf, other]: Title: Competitive Demand Learning: A Non-cooperative Pricing Algorithm with Coordinated Price Experimentation

Authors: Yongge Yang, Yu-Ching Lee, Po-An Chen

Journal-ref: Production and Operations Management 2024. Vol. 33(1)

Subjects: Computer Science and Game Theory (cs.GT)
[452] arXiv:2009.04553 (replaced) [pdf, other]: Title: Threshold rates for properties of random codes

Authors: Venkatesan Guruswami, Jonathan Mosheiff, Nicolas Resch, Shashwat Silas, Mary Wootters

Comments: November 2021 version

Subjects: Information Theory (cs.IT); Discrete Mathematics (cs.DM); Combinatorics (math.CO)
[453] arXiv:2106.03354 (replaced) [pdf, other]: Title: AI without networks

Authors: Partha P Mitra, Clément Sire

Comments: 47 pages with 8 figures + 33 pages supplementary with 7 figures and one table (total 80 pages)

Subjects: Machine Learning (cs.LG); Statistical Mechanics (cond-mat.stat-mech); Functional Analysis (math.FA); Machine Learning (stat.ML)
[454] arXiv:2109.11725 (replaced) [pdf, other]: Title: Punctured Low-Bias Codes Behave Like Random Linear Codes

Authors: Venkatesan Guruswami, Jonathan Mosheiff

Subjects: Computational Complexity (cs.CC); Information Theory (cs.IT); Combinatorics (math.CO)
[455] arXiv:2112.14734 (replaced) [pdf, other]: Title: Sequential memory improves sample and memory efficiency in Episodic Control

Authors: Ismael T. Freire, Adrián F. Amil, Paul F.M.J. Verschure

Comments: 21 pages, 8 figures

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO); Neurons and Cognition (q-bio.NC)
[456] arXiv:2203.00387 (replaced) [pdf, other]: Title: Motion-aware Dynamic Graph Neural Network for Video Compressive Sensing

Authors: Ruiying Lu, Ziheng Cheng, Bo Chen, Xin Yuan

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[457] arXiv:2203.12082 (replaced) [pdf, other]: Title: PlaneMVS: 3D Plane Reconstruction from Multi-View Stereo

Authors: Jiachen Liu, Pan Ji, Nitin Bansal, Changjiang Cai, Qingan Yan, Xiaolei Huang, Yi Xu

Comments: CVPR 2022; source code: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[458] arXiv:2205.08628 (replaced) [pdf, ps, other]: Title: Mechanized Analysis of Anselm's Modal Ontological Argument

Authors: John Rushby

Comments: This version includes a new postscript that considers alternative premises due to Andrzej Bilat (April 2021)

Journal-ref: International Journal for Philosophy of Religion, vol. 89, pp. 135-152, April 2021

Subjects: Logic in Computer Science (cs.LO)
[459] arXiv:2205.10192 (replaced) [pdf, other]: Title: On the Trade-off between Redundancy and Local Coherence in Summarization

Authors: Ronald Cardenas, Matthias Galle, Shay B. Cohen

Comments: Accepted to JAIR

Journal-ref: Journal of Artificial Intelligence Research, 80, 273-326 (2024)

Subjects: Computation and Language (cs.CL)
[460] arXiv:2206.06821 (replaced) [pdf, other]: Title: DoWhy-GCM: An extension of DoWhy for causal inference in graphical causal models

Authors: Patrick Blöbaum, Peter Götz, Kailash Budhathoki, Atalanti A. Mastakouri, Dominik Janzing

Journal-ref: Journal of Machine Learning Research 25(147), 2024

Subjects: Methodology (stat.ME); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
[461] arXiv:2206.07438 (replaced) [pdf, other]: Title: Multi-Objective Hyperparameter Optimization in Machine Learning -- An Overview

Authors: Florian Karl, Tobias Pielok, Julia Moosbauer, Florian Pfisterer, Stefan Coors, Martin Binder, Lennart Schneider, Janek Thomas, Jakob Richter, Michel Lang, Eduardo C. Garrido-Merchán, Juergen Branke, Bernd Bischl

Comments: Published at ACM TELO

Journal-ref: ACM Transactions on Evolutionary Learning and Optimization 3.4 (2023): 1-50

Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[462] arXiv:2206.08465 (replaced) [pdf, other]: Title: Variational Estimators of the Degree-corrected Latent Block Model for Bipartite Networks

Authors: Yunpeng Zhao, Ning Hao, Ji Zhu

Journal-ref: Journal of Machine Learning Research 25 (2024) 1-42

Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[463] arXiv:2207.12264 (replaced) [pdf, ps, other]: Title: Dynamics and triggers of misinformation on vaccines

Authors: Emanuele Brugnoli, Marco Delmastro

Subjects: Physics and Society (physics.soc-ph); Computers and Society (cs.CY); Machine Learning (cs.LG); Social and Information Networks (cs.SI)
[464] arXiv:2208.10790 (replaced) [pdf, other]: Title: Event-Triggered Time-Varying Bayesian Optimization

Authors: Paul Brunzema, Alexander von Rohr, Friedrich Solowjow, Sebastian Trimpe

Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[465] arXiv:2209.00936 (replaced) [pdf, other]: Title: A Class-Aware Representation Refinement Framework for Graph Classification

Authors: Jiaxing Xu, Jinjie Ni, Yiping Ke

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[466] arXiv:2210.04288 (replaced) [pdf, other]: Title: CoopHash: Cooperative Learning of Multipurpose Descriptor and Contrastive Pair Generator via Variational MCMC Teaching for Supervised Image Hashing

Authors: Khoa D. Doan, Jianwen Xie, Yaxuan Zhu, Yang Zhao, Ping Li

Subjects: Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR); Machine Learning (cs.LG)
[467] arXiv:2210.17180 (replaced) [pdf, other]: Title: Automated Dominative Subspace Mining for Efficient Neural Architecture Search

Authors: Yaofo Chen, Yong Guo, Daihai Liao, Fanbing Lv, Hengjie Song, James Tin-Yau Kwok, Mingkui Tan

Comments: Published in IEEE TCSVT

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[468] arXiv:2212.01976 (replaced) [pdf, other]: Title: FedCC: Robust Federated Learning against Model Poisoning Attacks

Authors: Hyejun Jeong, Hamin Son, Seohu Lee, Jayun Hyun, Tai-Myoung Chung

Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
[469] arXiv:2212.02459 (replaced) [pdf, ps, other]: Title: Resilient Distributed Optimization for Multi-Agent Cyberphysical Systems

Authors: Michal Yemini, Angelia Nedić, Andrea J. Goldsmith, Stephanie Gil

Subjects: Robotics (cs.RO); Signal Processing (eess.SP); Systems and Control (eess.SY)
[470] arXiv:2212.10192 (replaced) [pdf, other]: Title: Adam: Dense Retrieval Distillation with Adaptive Dark Examples

Authors: Chongyang Tao, Chang Liu, Tao Shen, Can Xu, Xiubo Geng, Binxing Jiao, Daxin Jiang

Comments: 13 pages, 3 figures

Subjects: Computation and Language (cs.CL)
[471] arXiv:2212.13462 (replaced) [pdf, other]: Title: MVTN: Learning Multi-View Transformations for 3D Understanding

Authors: Abdullah Hamdi, Faisal AlZahrani, Silvio Giancola, Bernard Ghanem

Comments: under review journal extension for the ICCV 2021 paper arXiv:2011.13244

Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
[472] arXiv:2301.02428 (replaced) [pdf, other]: Title: Sensitivity analysis using Physics-informed neural networks

Authors: John M. Hanna, José V. Aguado, Sebastien Comas-Cardona, Ramzi Askri, Domenico Borzacchiello

Comments: 22 pages, 11 figures

Subjects: Numerical Analysis (math.NA)
[473] arXiv:2301.06335 (replaced) [pdf, ps, other]: Title: Approximating the closest structured singular matrix polynomial

Authors: Miryam Gnazzo, Nicola Guglielmi

Comments: 28 pages

Subjects: Numerical Analysis (math.NA)
[474] arXiv:2301.08146 (replaced) [pdf, other]: Title: What's happening in your neighborhood? A Weakly Supervised Approach to Detect Local News

Authors: Deven Santosh Shah, Shiying He, Gosuddin Kamaruddin Siddiqi, Radhika Bansal

Comments: 8 pages, 2 figures, 5 tables

Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL); Machine Learning (cs.LG)
[475] arXiv:2302.01713 (replaced) [pdf, other]: Title: Towards Avoiding the Data Mess: Industry Insights from Data Mesh Implementations

Authors: Jan Bode, Niklas Kühl, Dominik Kreuzberger, Sebastian Hirschl, Carsten Holtmann

Subjects: Artificial Intelligence (cs.AI)
[476] arXiv:2302.02785 (replaced) [pdf, other]: Title: An intelligent tutor for planning in large partially observable environments

Authors: Lovis Heindrich, Saksham Consul, Falk Lieder

Subjects: Artificial Intelligence (cs.AI)
[477] arXiv:2302.05372 (replaced) [pdf, ps, other]: Title: Towards Minimax Optimality of Model-based Robust Reinforcement Learning

Authors: Pierre Clavier, Erwan Le Pennec, Matthieu Geist

Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[478] arXiv:2302.08053 (replaced) [pdf, ps, other]: Title: Selective Noise Suppression Methods Using Random SVPWM to Shape the Noise Spectrum of PMSMs

Authors: Jian Wen (1 and 2), Xiaobin Cheng (1 and 2), Peifeng Ji (1), Jun Yang (1 and 2), Feng Zhao (3) ((1) Institute of Acoustics, Chinese Academy of Sciences, (2) University of Chinese Academy of Sciences, (3) Institute of Electrical Engineering, Chinese Academy of Sciences)

Comments: 8 pages, 15 figures

Subjects: Systems and Control (eess.SY); Signal Processing (eess.SP)
[479] arXiv:2302.12476 (replaced) [pdf, ps, other]: Title: Asymptotic behaviour of the semidiscrete FE approximations to weakly damped wave equations with minimal smoothness on initial data

Authors: P. Danumjaya, Anil Kumar, Amiya K. Pani

Comments: 28 pages, 18 figures, 5 tables

Subjects: Numerical Analysis (math.NA)
[480] arXiv:2303.00368 (replaced) [pdf, ps, other]: Title: Sufficient conditions for the surjectivity of radical curve parametrizations

Authors: Jorce Caravantes, J.Rafael Sendra, David Sevilla, Carlos Villarino

Comments: 18 pages, no figures

Journal-ref: Journal of Algebra, Volume 640, 2024, Pages 129-146, ISSN 0021-8693

Subjects: Algebraic Geometry (math.AG); Symbolic Computation (cs.SC)
[481] arXiv:2303.07139 (replaced) [pdf, other]: Title: Comparing statistical and machine learning methods for time series forecasting in data-driven logistics -- A simulation study

Authors: Lena Schmid, Moritz Roidl, Markus Pauly

Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[482] arXiv:2304.07889 (replaced) [pdf, other]: Title: Ontology for Healthcare Artificial Intelligence Privacy in Brazil

Authors: Tiago Andres Vaz, José Miguel Silva Dora, Luís da Cunha Lamb, Suzi Alves Camey

Subjects: Artificial Intelligence (cs.AI)
[483] arXiv:2304.08650 (replaced) [pdf, other]: Title: UAV-based Maritime Communications: Relaying to Enhance the Link Quality

Authors: Abdullah Taha Çağan, Görkem Berkay Koç, Handan Yakın, Berk Çiloğlu, Muhammad Zeeshan Ashgar, Özgün Ersoy, Jyri Hämäläinen, Metin Öztürk

Subjects: Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)
[484] arXiv:2304.14545 (replaced) [pdf, other]: Title: Augmented balancing weights as linear regression

Authors: David Bruns-Smith, Oliver Dukes, Avi Feller, Elizabeth L. Ogburn

Subjects: Methodology (stat.ME); Machine Learning (cs.LG); Econometrics (econ.EM); Machine Learning (stat.ML)
[485] arXiv:2305.11915 (replaced) [pdf, other]: Title: PINNs error estimates for nonlinear equations in $\mathbb{R}$-smooth Banach spaces

Authors: Jiexing Gao, Yurii Zakharian

Comments: 30 pages, 9 figures

Subjects: Functional Analysis (math.FA); Machine Learning (cs.LG); Numerical Analysis (math.NA)
[486] arXiv:2305.12659 (replaced) [pdf, other]: Title: UVOSAM: A Mask-free Paradigm for Unsupervised Video Object Segmentation via Segment Anything Model

Authors: Zhenghao Zhang, Shengfan Zhang, Zhichao Wei, Zuozhuo Dai, Siyu Zhu

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[487] arXiv:2305.12798 (replaced) [pdf, other]: Title: Word Embeddings Are Steers for Language Models

Authors: Chi Han, Jialiang Xu, Manling Li, Yi Fung, Chenkai Sun, Nan Jiang, Tarek Abdelzaher, Heng Ji

Comments: ACL 2024 Long Paper, 9 pages, 3 figures

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[488] arXiv:2305.14109 (replaced) [pdf, other]: Title: Combining Multi-Objective Bayesian Optimization with Reinforcement Learning for TinyML

Authors: Mark Deutel, Georgios Kontes, Christopher Mutschler, Jürgen Teich

Comments: 14 pages, 9 figures

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[489] arXiv:2305.14592 (replaced) [pdf, other]: Title: Meta-Tuning LLMs to Leverage Lexical Knowledge for Generalizable Language Style Understanding

Authors: Ruohao Guo, Wei Xu, Alan Ritter

Comments: Accepted to ACL 2024 main conference

Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
[490] arXiv:2305.15577 (replaced) [pdf, other]: Title: Minimizing $f$-Divergences by Interpolating Velocity Fields

Authors: Song Liu, Jiahao Yu, Jack Simons, Mingxuan Yi, Mark Beaumont

Comments: This manuscript is an extended version of the ICML2024 version. The code for reproducing our results can be found at this https URL

Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[491] arXiv:2305.16209 (replaced) [pdf, other]: Title: C-MCTS: Safe Planning with Monte Carlo Tree Search

Authors: Dinesh Parthasarathy, Georgios Kontes, Axel Plinge, Christopher Mutschler

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[492] arXiv:2305.17139 (replaced) [pdf, other]: Title: A Measure-Theoretic Axiomatisation of Causality

Authors: Junhyung Park, Simon Buchholz, Bernhard Schölkopf, Krikamol Muandet

Subjects: Artificial Intelligence (cs.AI); Statistics Theory (math.ST)
[493] arXiv:2305.17834 (replaced) [pdf, other]: Title: Streaming Audio Transformers for Online Audio Tagging

Authors: Heinrich Dinkel, Zhiyong Yan, Yongqing Wang, Junbo Zhang, Yujun Wang

Comments: Interspeech2024

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[494] arXiv:2306.01376 (replaced) [pdf, other]: Title: DSHGT: Dual-Supervisors Heterogeneous Graph Transformer -- A pioneer study of using heterogeneous graph learning for detecting software vulnerabilities

Authors: Tiehua Zhang, Rui Xu, Jianping Zhang, Yuze Liu, Xin Chen, Jun Yin, Xi Zheng

Subjects: Software Engineering (cs.SE); Machine Learning (cs.LG)
[495] arXiv:2306.03061 (replaced) [pdf, other]: Title: Structured Voronoi Sampling

Authors: Afra Amini, Li Du, Ryan Cotterell

Comments: Accepted at NeurIPS 2023

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[496] arXiv:2306.04815 (replaced) [pdf, other]: Title: Catapults in SGD: spikes in the training loss and their impact on generalization through feature learning

Authors: Libin Zhu, Chaoyue Liu, Adityanarayanan Radhakrishnan, Mikhail Belkin

Comments: ICML 2024

Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
[497] arXiv:2306.05001 (replaced) [pdf, other]: Title: COURIER: Contrastive User Intention Reconstruction for Large-Scale Visual Recommendation

Authors: Jia-Qi Yang, Chenglei Dai, Dan OU, Dongshuai Li, Ju Huang, De-Chuan Zhan, Xiaoyi Zeng, Yang Yang

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[498] arXiv:2306.06209 (replaced) [pdf, other]: Title: Backdoor Attack with Sparse and Invisible Trigger

Authors: Yinghua Gao, Yiming Li, Xueluan Gong, Zhifeng Li, Shu-Tao Xia, Qian Wang

Comments: This paper was accepted by IEEE Transactions on Information Forensics and Security (TIFS). The first two authors contributed equally to this work. 14 pages

Subjects: Computer Vision and Pattern Recognition (cs.CV); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
[499] arXiv:2306.06844 (replaced) [pdf, other]: Title: Provably Efficient Bayesian Optimization with Unknown Gaussian Process Hyperparameter Estimation

Authors: Huong Ha, Vu Nguyen, Hung Tran-The, Hongyu Zhang, Xiuzhen Zhang, Anton van den Hengel

Comments: 25 pages, 5 figures

Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[500] arXiv:2306.07550 (replaced) [pdf, ps, other]: Title: Nested Sequents for Intermediate Logics: The Case of Gödel-Dummett Logics

Authors: Tim S. Lyon

Subjects: Logic in Computer Science (cs.LO); Logic (math.LO)
[501] arXiv:2306.08141 (replaced) [pdf, other]: Title: ArtWhisperer: A Dataset for Characterizing Human-AI Interactions in Artistic Creations

Authors: Kailas Vodrahalli, James Zou

Comments: 31 pages, 27 figures, ICML 2024

Subjects: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
[502] arXiv:2306.09381 (replaced) [pdf, other]: Title: Spatiotemporal-Augmented Graph Neural Networks for Human Mobility Simulation

Authors: Yu Wang, Tongya Zheng, Shunyu Liu, Zunlei Feng, Kaixuan Chen, Yunzhi Hao, Mingli Song

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[503] arXiv:2306.09782 (replaced) [pdf, other]: Title: Full Parameter Fine-tuning for Large Language Models with Limited Resources

Authors: Kai Lv, Yuqing Yang, Tengxiao Liu, Qinghui Gao, Qipeng Guo, Xipeng Qiu

Comments: ACL 2024

Subjects: Computation and Language (cs.CL)
[504] arXiv:2306.13493 (replaced) [pdf, other]: Title: Smoothed Circulant Embedding with Applications to Multilevel Monte Carlo Methods for PDEs with Random Coefficients

Authors: Anastasia Istratuca, Aretha Teckentrup

Comments: 36 pages, 11 figures, submitted to IMA Journal of Numerical Analysis

Subjects: Numerical Analysis (math.NA)
[505] arXiv:2306.14075 (replaced) [pdf, ps, other]: Title: Join Size Bounds using Lp-Norms on Degree Sequences

Authors: Mahmoud Abo Khamis, Vasileios Nakos, Dan Olteanu, Dan Suciu

Subjects: Databases (cs.DB); Information Theory (cs.IT)
[506] arXiv:2306.17193 (replaced) [pdf, other]: Title: Uncovering the Limits of Machine Learning for Automatic Vulnerability Detection

Authors: Niklas Risse, Marcel Böhme

Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)
[507] arXiv:2307.02818 (replaced) [pdf, other]: Title: Degree Heterogeneity in Higher-Order Networks: Inference in the Hypergraph $\boldsymbolβ$-Model

Authors: Sagnik Nandy, Bhaswar B. Bhattacharya

Subjects: Statistics Theory (math.ST); Information Theory (cs.IT); Social and Information Networks (cs.SI); Machine Learning (stat.ML)
[508] arXiv:2307.05141 (replaced) [pdf, other]: Title: Deep Probabilistic Movement Primitives with a Bayesian Aggregator

Authors: Michael Przystupa, Faezeh Haghverd, Martin Jagersand, Samuele Tosatto

Subjects: Robotics (cs.RO); Machine Learning (cs.LG)
[509] arXiv:2307.15593 (replaced) [pdf, other]: Title: Robust Distortion-free Watermarks for Language Models

Authors: Rohith Kuditipudi, John Thickstun, Tatsunori Hashimoto, Percy Liang

Comments: reformatting of camera-ready version accepted to TMLR, with minor edits to introduction

Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Cryptography and Security (cs.CR)
[510] arXiv:2307.16422 (replaced) [pdf, other]: Title: Statistically Optimal Generative Modeling with Maximum Deviation from the Empirical Distribution

Authors: Elen Vardanyan, Sona Hunanyan, Tigran Galstyan, Arshak Minasyan, Arnak Dalalyan

Comments: ICML 2024

Subjects: Statistics Theory (math.ST); Machine Learning (cs.LG); Machine Learning (stat.ML)
[511] arXiv:2308.06020 (replaced) [pdf, other]: Title: A direct sampling method based on the Green's function for time-dependent inverse scattering problems

Authors: Qingqing Yu, Bo Chen, Jiaru Wang, Yao Sun

Comments: 18 pages, 12 figures, 2 tables

Subjects: Numerical Analysis (math.NA); Mathematical Physics (math-ph)
[512] arXiv:2308.07876 (replaced) [pdf, other]: Title: Leveraging Codebook Knowledge with NLI and ChatGPT for Zero-Shot Political Relation Classification

Authors: Yibo Hu, Erick Skorupa Parolin, Latifur Khan, Patrick T. Brandt, Javier Osorio, Vito J. D'Orazio

Comments: ACL 2024

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Machine Learning (cs.LG)
[513] arXiv:2308.08841 (replaced) [pdf, other]: Title: Machine Learning-Assisted Discovery of Flow Reactor Designs

Authors: Tom Savage, Nausheen Basha, Jonathan McDonough, James Krassowski, Omar K Matar, Ehecatl Antonio del Rio Chanona

Comments: 11 pages, 9 figures, as accepted Nature Chemical Engineering

Subjects: Computational Engineering, Finance, and Science (cs.CE); Machine Learning (cs.LG)
[514] arXiv:2308.08858 (replaced) [pdf, ps, other]: Title: Improving Sample Efficiency of Model-Free Algorithms for Zero-Sum Markov Games

Authors: Songtao Feng, Ming Yin, Yu-Xiang Wang, Jing Yang, Yingbin Liang

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Science and Game Theory (cs.GT); Machine Learning (stat.ML)
[515] arXiv:2308.12568 (replaced) [pdf, other]: Title: A Small and Fast BERT for Chinese Medical Punctuation Restoration

Authors: Tongtao Ling, Yutao Lai, Lei Chen, Shilei Huang, Yi Liu

Comments: 5 pages, 2 figures, Accepted by INTERSPEECH 2024

Subjects: Computation and Language (cs.CL)
[516] arXiv:2308.14915 (replaced) [pdf, other]: Title: Information-driven Affordance Discovery for Efficient Robotic Manipulation

Authors: Pietro Mazzaglia, Taco Cohen, Daniel Dijkman

Subjects: Robotics (cs.RO)
[517] arXiv:2309.00169 (replaced) [pdf, other]: Title: RepCodec: A Speech Representation Codec for Speech Tokenization

Authors: Zhichao Huang, Chutong Meng, Tom Ko

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[518] arXiv:2309.00610 (replaced) [pdf, other]: Title: CityDreamer: Compositional Generative Model of Unbounded 3D Cities

Authors: Haozhe Xie, Zhaoxi Chen, Fangzhou Hong, Ziwei Liu

Comments: CVPR 2024. Project page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[519] arXiv:2309.06054 (replaced) [pdf, other]: Title: Breaking through the learning plateaus of in-context learning in Transformer

Authors: Jingwen Fu, Tao Yang, Yuwang Wang, Yan Lu, Nanning Zheng

Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
[520] arXiv:2309.07287 (replaced) [pdf, other]: Title: Enhancing Child Vocalization Classification with Phonetically-Tuned Embeddings for Assisting Autism Diagnosis

Authors: Jialu Li, Mark Hasegawa-Johnson, Karrie Karahalios

Comments: Accepted to Interspeech 2024

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[521] arXiv:2309.08047 (replaced) [pdf, other]: Title: Bias in News Summarization: Measures, Pitfalls and Corpora

Authors: Julius Steen, Katja Markert

Comments: Findings of ACL 24 Camera Ready

Subjects: Computation and Language (cs.CL)
[522] arXiv:2309.08511 (replaced) [pdf, other]: Title: Generalised Diffusion Probabilistic Scale-Spaces

Authors: Pascal Peter

Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[523] arXiv:2309.09524 (replaced) [pdf, other]: Title: Improved Factorized Neural Transducer Model For text-only Domain Adaptation

Authors: Junzhe Liu, Jianwei Yu, Xie Chen

Comments: Interspeech 2024 cameraready

Subjects: Computation and Language (cs.CL)
[524] arXiv:2309.09552 (replaced) [pdf, other]: Title: A Multitask Training Approach to Enhance Whisper with Contextual Biasing and Open-Vocabulary Keyword Spotting

Authors: Yuang Li, Min Zhang, Chang Su, Yinglu Li, Xiaosong Qiao, Mengxin Ren, Miaomiao Ma, Daimeng Wei, Shimin Tao, Hao Yang

Comments: 5 pages, 2 figures, Accepted to InterSpeech 2024

Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[525] arXiv:2309.09836 (replaced) [pdf, other]: Title: RECAP: Retrieval-Augmented Audio Captioning

Authors: Sreyan Ghosh, Sonal Kumar, Chandra Kiran Reddy Evuru, Ramani Duraiswami, Dinesh Manocha

Comments: ICASSP 2024. Code and data: this https URL

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Sound (cs.SD)
[526] arXiv:2309.10740 (replaced) [pdf, other]: Title: ConsistencyTTA: Accelerating Diffusion-Based Text-to-Audio Generation with Consistency Distillation

Authors: Yatong Bai, Trung Dang, Dung Tran, Kazuhito Koishida, Somayeh Sojoudi

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[527] arXiv:2309.11361 (replaced) [pdf, other]: Title: Knowledge Graph Question Answering for Materials Science (KGQA4MAT): Developing Natural Language Interface for Metal-Organic Frameworks Knowledge Graph (MOF-KG) Using LLM

Authors: Yuan An, Jane Greenberg, Alex Kalinowski, Xintong Zhao, Xiaohua Hu, Fernando J. Uribe-Romo, Kyle Langlois, Jacob Furst, Diego A. Gómez-Gualdrón

Comments: In 17th International Conference on Metadata and Semantics Research, October 2023

Subjects: Artificial Intelligence (cs.AI)
[528] arXiv:2309.15402 (replaced) [pdf, other]: Title: Navigate through Enigmatic Labyrinth A Survey of Chain of Thought Reasoning: Advances, Frontiers and Future

Authors: Zheng Chu, Jingchang Chen, Qianglong Chen, Weijiang Yu, Tao He, Haotian Wang, Weihua Peng, Ming Liu, Bing Qin, Ting Liu

Comments: Accepted to ACL 2024

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[529] arXiv:2309.16002 (replaced) [pdf, other]: Title: Robust Blockwise Random Pivoting: Fast and Accurate Adaptive Interpolative Decomposition

Authors: Yijun Dong, Chao Chen, Per-Gunnar Martinsson, Katherine Pearce

Subjects: Numerical Analysis (math.NA)
[530] arXiv:2309.17419 (replaced) [pdf, other]: Title: Enumerating minimal solution sets for metric graph problems

Authors: Benjamin Bergougnoux, Oscar Defrain, Fionn Mc Inerney

Comments: 26 pages, 4 figures

Subjects: Discrete Mathematics (cs.DM); Data Structures and Algorithms (cs.DS); Combinatorics (math.CO)
[531] arXiv:2310.00160 (replaced) [pdf, other]: Title: Self-Specialization: Uncovering Latent Expertise within Large Language Models

Authors: Junmo Kang, Hongyin Luo, Yada Zhu, Jacob Hansen, James Glass, David Cox, Alan Ritter, Rogerio Feris, Leonid Karlinsky

Comments: ACL 2024 (Findings; Long Paper)

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[532] arXiv:2310.00165 (replaced) [pdf, other]: Title: SCoRe: Submodular Combinatorial Representation Learning

Authors: Anay Majee, Suraj Kothawade, Krishnateja Killamsetty, Rishabh Iyer

Comments: Accepted to ICML 2024

Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
[533] arXiv:2310.00530 (replaced) [pdf, ps, other]: Title: Multi-tiling Neural Radiance Field (NeRF) -- Geometric Assessment on Large-scale Aerial Datasets

Authors: Ningli Xu, Rongjun Qin, Debao Huang, Fabio Remondino

Comments: 9 Figure

Journal-ref: The Photogrammetric Record, 2024

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[534] arXiv:2310.02442 (replaced) [pdf, other]: Title: GenCO: Generating Diverse Designs with Combinatorial Constraints

Authors: Aaron Ferber, Arman Zharmagambetov, Taoan Huang, Bistra Dilkina, Yuandong Tian

Comments: Accepted to ICML 2024

Subjects: Machine Learning (cs.LG)
[535] arXiv:2310.02721 (replaced) [pdf, other]: Title: Leveraging Temporal Graph Networks Using Module Decoupling

Authors: Or Feldman, Chaim Baskin

Subjects: Machine Learning (cs.LG)
[536] arXiv:2310.03309 (replaced) [pdf, other]: Title: Concise and Organized Perception Facilitates Reasoning in Large Language Models

Authors: Junjie Liu, Shaotian Yan, Chen Shen, Liang Xie, Wenxiao Wang, Jieping Ye

Comments: 26 pages

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[537] arXiv:2310.03938 (replaced) [pdf, other]: Title: EFFUSE: Efficient Self-Supervised Feature Fusion for E2E ASR in Low Resource and Multilingual Scenarios

Authors: Tejes Srivastava, Jiatong Shi, William Chen, Shinji Watanabe

Comments: 5 pages, 2 figures, 3 tables

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[538] arXiv:2310.04022 (replaced) [pdf, other]: Title: Nonlinear Methods for Shape Optimization Problems in Liquid Crystal Tactoids

Authors: James H. Adler, Anca S. Andrei, Timothy J. Atherton

Subjects: Numerical Analysis (math.NA)
[539] arXiv:2310.04400 (replaced) [pdf, other]: Title: On the Embedding Collapse when Scaling up Recommendation Models

Authors: Xingzhuo Guo, Junwei Pan, Ximei Wang, Baixu Chen, Jie Jiang, Mingsheng Long

Comments: ICML 2024 Accepted

Subjects: Machine Learning (cs.LG); Information Retrieval (cs.IR)
[540] arXiv:2310.04406 (replaced) [pdf, other]: Title: Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models

Authors: Andy Zhou, Kai Yan, Michal Shlapentokh-Rothman, Haohan Wang, Yu-Xiong Wang

Comments: Code at this https URL

Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[541] arXiv:2310.04764 (replaced) [pdf, other]: Title: Characterizations of Monadic Second Order Definable Context-Free Sets of Graphs

Authors: Radu Iosif, Florian Zuleger

Subjects: Formal Languages and Automata Theory (cs.FL); Logic in Computer Science (cs.LO)
[542] arXiv:2310.05141 (replaced) [pdf, other]: Title: Transferable Availability Poisoning Attacks

Authors: Yiyong Liu, Michael Backes, Xiao Zhang

Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)
[543] arXiv:2310.06430 (replaced) [pdf, other]: Title: Conformal Prediction for Deep Classifier via Label Ranking

Authors: Jianguo Huang, Huajun Xi, Linjun Zhang, Huaxiu Yao, Yue Qiu, Hongxin Wei

Comments: Accepted by ICML 2024

Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Statistics Theory (math.ST)
[544] arXiv:2310.07579 (replaced) [pdf, other]: Title: In-Context Unlearning: Language Models as Few Shot Unlearners

Authors: Martin Pawelczyk, Seth Neel, Himabindu Lakkaraju

Comments: Accepted at ICML 2024

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)
[545] arXiv:2310.09639 (replaced) [pdf, other]: Title: DPZero: Private Fine-Tuning of Language Models without Backpropagation

Authors: Liang Zhang, Bingcong Li, Kiran Koshy Thekumparampil, Sewoong Oh, Niao He

Comments: ICML 2024

Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Optimization and Control (math.OC); Machine Learning (stat.ML)
[546] arXiv:2310.10195 (replaced) [pdf, other]: Title: AdaLomo: Low-memory Optimization with Adaptive Learning Rate

Authors: Kai Lv, Hang Yan, Qipeng Guo, Haijun Lv, Xipeng Qiu

Comments: ACL 2024 camera ready version

Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)
[547] arXiv:2310.11897 (replaced) [pdf, other]: Title: Accelerated Policy Gradient: On the Convergence Rates of the Nesterov Momentum for Reinforcement Learning

Authors: Yen-Ju Chen, Nai-Chieh Huang, Ching-Pei Lee, Ping-Chun Hsieh

Comments: 69 pages, 17 figures

Subjects: Machine Learning (cs.LG)
[548] arXiv:2310.12419 (replaced) [pdf, other]: Title: Toward Unbiased Multiple-Target Fuzzing with Path Diversity

Authors: Huanyao Rong, Wei You, Xiaofeng Wang, Tianhao Mao

Subjects: Cryptography and Security (cs.CR)
[549] arXiv:2310.12956 (replaced) [pdf, other]: Title: Eureka-Moments in Transformers: Multi-Step Tasks Reveal Softmax Induced Optimization Problems

Authors: David T. Hoffmann, Simon Schrodi, Jelena Bratulić, Nadine Behrmann, Volker Fischer, Thomas Brox

Comments: Accepted at ICML 2024

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[550] arXiv:2310.13571 (replaced) [pdf, ps, other]: Title: Why Can Large Language Models Generate Correct Chain-of-Thoughts?

Authors: Rasul Tutunov, Antoine Grosnit, Juliusz Ziomek, Jun Wang, Haitham Bou-Ammar

Subjects: Computation and Language (cs.CL)
[551] arXiv:2310.13585 (replaced) [pdf, other]: Title: POTLoc: Pseudo-Label Oriented Transformer for Point-Supervised Temporal Action Localization

Authors: Elahe Vahdani, Yingli Tian

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[552] arXiv:2310.18924 (replaced) [pdf, other]: Title: Remaining useful life prediction of Lithium-ion batteries using spatio-temporal multimodal attention networks

Authors: Sungho Suh, Dhruv Aditya Mittal, Hymalai Bello, Bo Zhou, Mayank Shekhar Jha, Paul Lukowicz

Subjects: Machine Learning (cs.LG)
[553] arXiv:2310.19220 (replaced) [pdf, other]: Title: From Stream to Pool: Dynamic Pricing Beyond i.i.d. Arrivals

Authors: Titing Cui, Su Jia, Thomas Lavastida

Comments: Authors are alphabetically ordered

Subjects: Machine Learning (cs.LG); Computer Science and Game Theory (cs.GT)
[554] arXiv:2311.02462 (replaced) [pdf, ps, other]: Title: Levels of AGI for Operationalizing Progress on the Path to AGI

Authors: Meredith Ringel Morris, Jascha Sohl-dickstein, Noah Fiedel, Tris Warkentin, Allan Dafoe, Aleksandra Faust, Clement Farabet, Shane Legg

Comments: version 4 - Position Paper accepted to ICML 2024. Note that due to ICML position paper titling format requirements, the title has changed slightly from that of the original arXiv pre-print. The original pre-print title was "Levels of AGI: Operationalizing Progress on the Path to AGI" but the official published title for ICML 2024 is "Levels of AGI for Operationalizing Progress on the Path to AGI"

Journal-ref: Proceedings of ICML 2024

Subjects: Artificial Intelligence (cs.AI)
[555] arXiv:2311.02868 (replaced) [pdf, other]: Title: Sample Complexity Bounds for Estimating Probability Divergences under Invariances

Authors: Behrooz Tahmasebi, Stefanie Jegelka

Comments: ICML 2024

Subjects: Machine Learning (cs.LG)
[556] arXiv:2311.03688 (replaced) [pdf, ps, other]: Title: Generalized Hamming weights and minimal shifts of Orlik-Terao algebras

Authors: Stefan O. Tohaneanu

Comments: 11 pages

Subjects: Information Theory (cs.IT); Commutative Algebra (math.AC)
[557] arXiv:2311.05760 (replaced) [pdf, ps, other]: Title: Compressed and Sparse Models for Non-Convex Decentralized Learning

Authors: Andrew Campbell, Hang Liu, Leah Woldemariam, Anna Scaglione

Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS); Multiagent Systems (cs.MA); Optimization and Control (math.OC)
[558] arXiv:2311.08967 (replaced) [pdf, other]: Title: Homomorphic Polynomial Public Key Cryptography for Quantum-secure Digital Signature

Authors: Randy Kuang, Maria Perepechaenko, Mahmoud Sayed, Dafu Lou

Comments: 16 pages, 1 figure

Subjects: Cryptography and Security (cs.CR)
[559] arXiv:2311.09033 (replaced) [pdf, other]: Title: MELA: Multilingual Evaluation of Linguistic Acceptability

Authors: Ziyin Zhang, Yikang Liu, Weifang Huang, Junyu Mao, Rui Wang, Hai Hu

Comments: ACL 2024 camera-ready

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[560] arXiv:2311.09048 (replaced) [pdf, other]: Title: GRASP: A novel benchmark for evaluating language GRounding And Situated Physics understanding in multimodal language models

Authors: Serwan Jassim, Mario Holubar, Annika Richter, Cornelius Wolff, Xenia Ohmer, Elia Bruni

Subjects: Computation and Language (cs.CL)
[561] arXiv:2311.09109 (replaced) [pdf, other]: Title: Does Pre-trained Language Model Actually Infer Unseen Links in Knowledge Graph Completion?

Authors: Yusuke Sakai, Hidetaka Kamigaito, Katsuhiko Hayashi, Taro Watanabe

Comments: Accepted at NAACL 2024 main oral, 15 pages, 10 figures

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[562] arXiv:2311.09213 (replaced) [pdf, other]: Title: GENEVA: GENErating and Visualizing branching narratives using LLMs

Authors: Jorge Leandro, Sudha Rao, Michael Xu, Weijia Xu, Nebosja Jojic, Chris Brockett, Bill Dolan

Comments: Accepted at IEEE Conference on Games 2024

Subjects: Computation and Language (cs.CL)
[563] arXiv:2311.09562 (replaced) [pdf, other]: Title: TextEE: Benchmark, Reevaluation, Reflections, and Future Challenges in Event Extraction

Authors: Kuan-Hao Huang, I-Hung Hsu, Tanmay Parekh, Zhiyu Xie, Zixuan Zhang, Premkumar Natarajan, Kai-Wei Chang, Nanyun Peng, Heng Ji

Comments: Paper accepted by ACL 2024 Findings

Subjects: Computation and Language (cs.CL)
[564] arXiv:2311.09832 (replaced) [pdf, other]: Title: WatME: Towards Lossless Watermarking Through Lexical Redundancy

Authors: Liang Chen, Yatao Bian, Yang Deng, Deng Cai, Shuaiyi Li, Peilin Zhao, Kam-fai Wong

Comments: Accepted to ACL 2024 main conference

Subjects: Computation and Language (cs.CL)
[565] arXiv:2311.10680 (replaced) [pdf, other]: Title: Optimal Embedding Dimension for Sparse Subspace Embeddings

Authors: Shabarish Chenakkod, Michał Dereziński, Xiaoyu Dong, Mark Rudelson

Comments: STOC 2024

Subjects: Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG); Numerical Analysis (math.NA); Machine Learning (stat.ML)
[566] arXiv:2311.14251 (replaced) [pdf, ps, other]: Title: Optimal 1-bit Error Exponent for 2-hop Relaying with Binary-Input Channels

Authors: Yan Hao Ling, Jonathan Scarlett

Comments: IEEE Transactions on Information Theory

Subjects: Information Theory (cs.IT)
[567] arXiv:2311.17451 (replaced) [pdf, other]: Title: Wireless Network Digital Twin for 6G: Generative AI as A Key Enabler

Authors: Zhenyu Tao, Wei Xu, Yongming Huang, Xiaoyun Wang, Xiaohu You

Subjects: Networking and Internet Architecture (cs.NI); Machine Learning (cs.LG)
[568] arXiv:2311.18610 (replaced) [pdf, other]: Title: DiffCAD: Weakly-Supervised Probabilistic CAD Model Retrieval and Alignment from an RGB Image

Authors: Daoyi Gao, Dávid Rozenberszki, Stefan Leutenegger, Angela Dai

Comments: SIGGRAPH 2024, Project page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[569] arXiv:2311.18717 (replaced) [pdf, other]: Title: NFT Wash Trading: Direct vs. Indirect Estimation

Authors: Brett Hemenway Falk, Gerry Tsoukalas, Niuniu Zhang

Subjects: General Economics (econ.GN); Cryptography and Security (cs.CR); Multiagent Systems (cs.MA); Trading and Market Microstructure (q-fin.TR); Applications (stat.AP)
[570] arXiv:2312.01616 (replaced) [pdf, other]: Title: SchurVINS: Schur Complement-Based Lightweight Visual Inertial Navigation System

Authors: Yunfei Fan, Tianyu Zhao, Guidong Wang

Comments: Accepted by CVPR2024

Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
[571] arXiv:2312.03668 (replaced) [pdf, other]: Title: Integrating Pre-Trained Speech and Language Models for End-to-End Speech Recognition

Authors: Yukiya Hono, Koh Mitsuda, Tianyu Zhao, Kentaro Mitsui, Toshiaki Wakatsuki, Kei Sawada

Comments: 17 pages, 4 figures, 9 tables, accepted for Findings of ACL 2024. The model is available at this https URL

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
[572] arXiv:2312.05601 (replaced) [pdf, other]: Title: A Meshless Solver for Blood Flow Simulations in Elastic Vessels Using Physics-Informed Neural Network

Authors: Han Zhang, Raymond Chan, Xue-Cheng Tai

Subjects: Numerical Analysis (math.NA); Fluid Dynamics (physics.flu-dyn)
[573] arXiv:2312.07104 (replaced) [pdf, other]: Title: SGLang: Efficient Execution of Structured Language Model Programs

Authors: Lianmin Zheng, Liangsheng Yin, Zhiqiang Xie, Chuyue Sun, Jeff Huang, Cody Hao Yu, Shiyi Cao, Christos Kozyrakis, Ion Stoica, Joseph E. Gonzalez, Clark Barrett, Ying Sheng

Subjects: Artificial Intelligence (cs.AI); Programming Languages (cs.PL)
[574] arXiv:2312.07364 (replaced) [pdf, other]: Title: Collapse-Aware Triplet Decoupling for Adversarially Robust Image Retrieval

Authors: Qiwei Tian, Chenhao Lin, Zhengyu Zhao, Qian Li, Chao Shen

Comments: Accepted by ICML2024

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[575] arXiv:2312.07671 (replaced) [pdf, ps, other]: Title: Reacting like Humans: Incorporating Intrinsic Human Behaviors into NAO through Sound-Based Reactions to Fearful and Shocking Events for Enhanced Sociability

Authors: Ali Ghadami, Mohammadreza Taghimohammadi, Mohammad Mohammadzadeh, Mohammad Hosseinipour, Alireza Taheri

Comments: 16 pages, 11 figures

Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV)
[576] arXiv:2312.08800 (replaced) [pdf, other]: Title: Evaluating Large Language Models for Health-related Queries with Presuppositions

Authors: Navreet Kaur, Monojit Choudhury, Danish Pruthi

Comments: Findings of ACL 2024

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
[577] arXiv:2312.10104 (replaced) [pdf, other]: Title: Lever LM: Configuring In-Context Sequence to Lever Large Vision Language Models

Authors: Xu Yang, Yingzhe Peng, Haoxuan Ma, Shuo Xu, Chi Zhang, Yucheng Han, Hanwang Zhang

Comments: 17 pages, 6 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Machine Learning (cs.LG)
[578] arXiv:2312.14591 (replaced) [pdf, other]: Title: Reasons to Reject? Aligning Language Models with Judgments

Authors: Weiwen Xu, Deng Cai, Zhisong Zhang, Wai Lam, Shuming Shi

Comments: Accepted at ACL 2024 Findings. Our source codes and models are publicly available at this https URL

Subjects: Computation and Language (cs.CL)
[579] arXiv:2312.14667 (replaced) [pdf, other]: Title: Token-Level Contrastive Learning with Modality-Aware Prompting for Multimodal Intent Recognition

Authors: Qianrui Zhou, Hua Xu, Hao Li, Hanlei Zhang, Xiaohan Zhang, Yifan Wang, Kai Gao

Comments: Accepted by AAAI 2024 (Main Track, Long Paper)

Subjects: Multimedia (cs.MM); Machine Learning (cs.LG)
[580] arXiv:2312.14792 (replaced) [pdf, ps, other]: Title: The Rate-Distortion-Perception-Classification Tradeoff: Joint Source Coding and Modulation via Inverse-Domain GANs

Authors: Junli Fang, João F. C. Mota, Baoshan Lu, Weicheng Zhang, Xuemin Hong

Comments: Paper accepted in IEEE Transactions on Signal Processing

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Information Theory (cs.IT); Probability (math.PR)
[581] arXiv:2312.14922 (replaced) [pdf, other]: Title: Learning from higher-order statistics, efficiently: hypothesis tests, random features, and neural networks

Authors: Eszter Székely, Lorenzo Bardone, Federica Gerace, Sebastian Goldt

Subjects: Machine Learning (stat.ML); Statistical Mechanics (cond-mat.stat-mech); Machine Learning (cs.LG)
[582] arXiv:2312.16752 (replaced) [pdf, other]: Title: Relationships Between Necessary Conditions for Feedback Stabilizability

Authors: Matthew D. Kvalheim

Comments: 15 pages, 2 figures; v2 adds the 2 figures and 3 new examples, and fixes some errors

Subjects: Optimization and Control (math.OC); Systems and Control (eess.SY); Algebraic Topology (math.AT); Differential Geometry (math.DG)
[583] arXiv:2312.17518 (replaced) [pdf, ps, other]: Title: An algebraic characterization of binary CSS-T codes and cyclic CSS-T codes for quantum fault tolerance

Authors: Eduardo Camps-Moreno, Hiram H. López, Gretchen L. Matthews, Diego Ruano, Rodrigo San-José, Ivan Soprunov

Journal-ref: Quantum Inf Process 23, 230 (2024)

Subjects: Information Theory (cs.IT)
[584] arXiv:2401.00793 (replaced) [pdf, other]: Title: SecFormer: Towards Fast and Accurate Privacy-Preserving Inference for Large Language Models

Authors: Jinglong Luo, Yehong Zhang, Zhuo Zhang, Jiaqi Zhang, Xin Mu, Hui Wang, Yue Yu, Zenglin Xu

Comments: Accepted by ACL 2024

Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Cryptography and Security (cs.CR)
[585] arXiv:2401.01017 (replaced) [pdf, other]: Title: A Survey of Computation Offloading with Task Type

Authors: Siqi Zhang, Na Yi, Yi Ma

Comments: Accepted by IEEE Transactions on Intelligent Transportation Systems

Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
[586] arXiv:2401.02058 (replaced) [pdf, other]: Title: Neural Collapse for Cross-entropy Class-Imbalanced Learning with Unconstrained ReLU Feature Model

Authors: Hien Dang, Tho Tran, Tan Nguyen, Nhat Ho

Comments: 2024 International Conference on Machine Learning

Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[587] arXiv:2401.04621 (replaced) [pdf, other]: Title: DebugBench: Evaluating Debugging Capability of Large Language Models

Authors: Runchu Tian, Yining Ye, Yujia Qin, Xin Cong, Yankai Lin, Yinxu Pan, Yesai Wu, Haotian Hui, Weichuan Liu, Zhiyuan Liu, Maosong Sun

Comments: Accepted as Findings of ACL 2024

Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[588] arXiv:2401.05749 (replaced) [pdf, other]: Title: A Shocking Amount of the Web is Machine Translated: Insights from Multi-Way Parallelism

Authors: Brian Thompson, Mehak Preet Dhaliwal, Peter Frisch, Tobias Domhan, Marcello Federico

Comments: Accepted at ACL Findings 2024

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[589] arXiv:2401.06568 (replaced) [pdf, other]: Title: Lost in the Source Language: How Large Language Models Evaluate the Quality of Machine Translation

Authors: Xu Huang, Zhirui Zhang, Xiang Geng, Yichao Du, Jiajun Chen, Shujian Huang

Comments: Accepted by ACL2024 Findings

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[590] arXiv:2401.06688 (replaced) [pdf, other]: Title: Don't Rank, Combine! Combining Machine Translation Hypotheses Using Quality Estimation

Authors: Giorgos Vernikos, Andrei Popescu-Belis

Comments: Accepted at ACL 2024

Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
[591] arXiv:2401.07888 (replaced) [pdf, other]: Title: Multifidelity domain decomposition-based physics-informed neural networks and operators for time-dependent problems

Authors: Alexander Heinlein, Amanda A. Howard, Damien Beecroft, Panos Stinis

Subjects: Numerical Analysis (math.NA); Machine Learning (cs.LG)
[592] arXiv:2401.08295 (replaced) [pdf, other]: Title: SAPT: A Shared Attention Framework for Parameter-Efficient Continual Learning of Large Language Models

Authors: Weixiang Zhao, Shilong Wang, Yulin Hu, Yanyan Zhao, Bing Qin, Xuanyu Zhang, Qing Yang, Dongliang Xu, Wanxiang Che

Comments: To appear at ACL 2024

Subjects: Computation and Language (cs.CL)
[593] arXiv:2401.09670 (replaced) [pdf, other]: Title: DistServe: Disaggregating Prefill and Decoding for Goodput-optimized Large Language Model Serving

Authors: Yinmin Zhong, Shengyu Liu, Junda Chen, Jianbo Hu, Yibo Zhu, Xuanzhe Liu, Xin Jin, Hao Zhang

Comments: OSDI 2024

Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
[594] arXiv:2401.10186 (replaced) [pdf, other]: Title: Beyond Traditional Benchmarks: Analyzing Behaviors of Open LLMs on Data-to-Text Generation

Authors: Zdeněk Kasner, Ondřej Dušek

Comments: Accepted to ACL 2024 Main Conference

Subjects: Computation and Language (cs.CL)
[595] arXiv:2401.10338 (replaced) [pdf, ps, other]: Title: MELODY: Robust Semi-Supervised Hybrid Model for Entity-Level Online Anomaly Detection with Multivariate Time Series

Authors: Jingchao Ni, Gauthier Guinet, Peihong Jiang, Laurent Callot, Andrey Kan

Subjects: Machine Learning (cs.LG)
[596] arXiv:2401.10774 (replaced) [pdf, other]: Title: Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads

Authors: Tianle Cai, Yuhong Li, Zhengyang Geng, Hongwu Peng, Jason D. Lee, Deming Chen, Tri Dao

Comments: The code for this implementation is available at this https URL

Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)
[597] arXiv:2401.11382 (replaced) [pdf, other]: Title: Using Large Language Model for End-to-End Chinese ASR and NER

Authors: Yuang Li, Jiawei Yu, Min Zhang, Mengxin Ren, Yanqing Zhao, Xiaofeng Zhao, Shimin Tao, Jinsong Su, Hao Yang

Comments: 5 pages, 2 figures, Accepted to InterSpeech 2024

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[598] arXiv:2401.13388 (replaced) [pdf, other]: Title: UNIMO-G: Unified Image Generation through Multimodal Conditional Diffusion

Authors: Wei Li, Xue Xu, Jiachen Liu, Xinyan Xiao

Comments: Accepted by ACL 2024, Main Conference, Long Paper

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[599] arXiv:2401.13649 (replaced) [pdf, other]: Title: VisualWebArena: Evaluating Multimodal Agents on Realistic Visual Web Tasks

Authors: Jing Yu Koh, Robert Lo, Lawrence Jang, Vikram Duvvur, Ming Chong Lim, Po-Yu Huang, Graham Neubig, Shuyan Zhou, Ruslan Salakhutdinov, Daniel Fried

Comments: Accepted to ACL 2024. 24 pages. Project page: this https URL

Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
[600] arXiv:2401.14556 (replaced) [pdf, other]: Title: Looking Right is Sometimes Right: Investigating the Capabilities of Decoder-only LLMs for Sequence Labeling

Authors: David Dukić, Jan Šnajder

Comments: Accepted at ACL 2024 Findings

Subjects: Computation and Language (cs.CL)
[601] arXiv:2401.16467 (replaced) [pdf, other]: Title: ReGAL: Refactoring Programs to Discover Generalizable Abstractions

Authors: Elias Stengel-Eskin, Archiki Prasad, Mohit Bansal

Comments: ICML 2024 Camera-Ready; First two authors contributed equally; Code: this https URL

Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Programming Languages (cs.PL)
[602] arXiv:2401.17263 (replaced) [pdf, other]: Title: Robust Prompt Optimization for Defending Language Models Against Jailbreaking Attacks

Authors: Andy Zhou, Bo Li, Haohan Wang

Comments: Code available at this https URL

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
[603] arXiv:2401.17264 (replaced) [pdf, other]: Title: Proactive Detection of Voice Cloning with Localized Watermarking

Authors: Robin San Roman, Pierre Fernandez, Alexandre Défossez, Teddy Furon, Tuan Tran, Hady Elsahar

Comments: Published at ICML 2024. Code at this https URL - webpage at this https URL

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)
[604] arXiv:2401.18046 (replaced) [pdf, other]: Title: Multipath parsing in the brain

Authors: Berta Franzluebbers, Donald Dunagan, Miloš Stanojević, Jan Buys, John T. Hale

Comments: Accepted at ACL2024, main conference. 15 pages

Subjects: Computation and Language (cs.CL)
[605] arXiv:2402.00258 (replaced) [pdf, other]: Title: Multi-group Learning for Hierarchical Groups

Authors: Samuel Deng, Daniel Hsu

Comments: Accepted in International Conference on Machine Learning 2024 (ICML 2024)

Subjects: Machine Learning (cs.LG)
[606] arXiv:2402.00759 (replaced) [pdf, other]: Title: Building Expressive and Tractable Probabilistic Generative Models: A Review

Authors: Sahil Sidheekh, Sriraam Natarajan

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[607] arXiv:2402.01156 (replaced) [pdf, other]: Title: An Empirical Study on Low Code Programming using Traditional vs Large Language Model Support

Authors: Yongkun Liu, Jiachi Chen, Tingting Bi, John Grundy, Yanlin Wang, Jianxing Yu, Ting Chen, Yutian Tang, Zibin Zheng

Subjects: Software Engineering (cs.SE)
[608] arXiv:2402.01287 (replaced) [pdf, other]: Title: Spiking CenterNet: A Distillation-boosted Spiking Neural Network for Object Detection

Authors: Lennard Bodden, Franziska Schwaiger, Duc Bach Ha, Lars Kreuzberg, Sven Behnke

Comments: 8 pages, 5 figures. Accepted at IJCNN 2024

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
[609] arXiv:2402.01344 (replaced) [pdf, other]: Title: Monotone, Bi-Lipschitz, and Polyak-Lojasiewicz Networks

Authors: Ruigang Wang, Krishnamurthy Dvijotham, Ian R. Manchester

Comments: International Conference on Machine Learning, Vienna, Austria, July 21 -- 17, 2024

Subjects: Machine Learning (cs.LG)
[610] arXiv:2402.01501 (replaced) [pdf, ps, other]: Title: Satisfiability Modulo Exponential Integer Arithmetic

Authors: Florian Frohn, Jürgen Giesl

Subjects: Logic in Computer Science (cs.LO)
[611] arXiv:2402.02500 (replaced) [pdf, other]: Title: Point Cloud Matters: Rethinking the Impact of Different Observation Spaces on Robot Learning

Authors: Haoyi Zhu, Yating Wang, Di Huang, Weicai Ye, Wanli Ouyang, Tong He

Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[612] arXiv:2402.03141 (replaced) [pdf, other]: Title: Boosting Reinforcement Learning with Strongly Delayed Feedback Through Auxiliary Short Delays

Authors: Qingyuan Wu, Simon Sinong Zhan, Yixuan Wang, Yuhui Wang, Chung-Wei Lin, Chen Lv, Qi Zhu, Jürgen Schmidhuber, Chao Huang

Comments: ICML 2024

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Systems and Control (eess.SY)
[613] arXiv:2402.03169 (replaced) [pdf, ps, other]: Title: A Random Matrix Approach to Low-Multilinear-Rank Tensor Approximation

Authors: Hugo Lebeau, Florent Chatelain, Romain Couillet

Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Probability (math.PR)
[614] arXiv:2402.03412 (replaced) [pdf, other]: Title: See More Details: Efficient Image Super-Resolution by Experts Mining

Authors: Eduard Zamfir, Zongwei Wu, Nancy Mehta, Yulun Zhang, Radu Timofte

Comments: Accepted at ICML 2024

Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
[615] arXiv:2402.03625 (replaced) [pdf, other]: Title: Convex Relaxations of ReLU Neural Networks Approximate Global Optima in Polynomial Time

Authors: Sungyoon Kim, Mert Pilanci

Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
[616] arXiv:2402.03903 (replaced) [pdf, other]: Title: Averaging $n$-step Returns Reduces Variance in Reinforcement Learning

Authors: Brett Daley, Martha White, Marlos C. Machado

Comments: ICML 2024. 27 pages, 7 figures, 3 tables

Subjects: Machine Learning (cs.LG)
[617] arXiv:2402.04356 (replaced) [pdf, other]: Title: Bidirectional Autoregressive Diffusion Model for Dance Generation

Authors: Canyu Zhang, Youbao Tang, Ning Zhang, Ruei-Sung Lin, Mei Han, Jing Xiao, Song Wang

Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Audio and Speech Processing (eess.AS)
[618] arXiv:2402.04407 (replaced) [pdf, ps, other]: Title: Sharp Lower Bounds on the Manifold Widths of Sobolev and Besov Spaces

Authors: Jonathan W. Siegel

Subjects: Numerical Analysis (math.NA)
[619] arXiv:2402.04467 (replaced) [pdf, other]: Title: DySLIM: Dynamics Stable Learning by Invariant Measure for Chaotic Systems

Authors: Yair Schiff, Zhong Yi Wan, Jeffrey B. Parker, Stephan Hoyer, Volodymyr Kuleshov, Fei Sha, Leonardo Zepeda-Núñez

Comments: ICML 2024; Code to reproduce our experiments is available at this https URL

Subjects: Machine Learning (cs.LG); Dynamical Systems (math.DS)
[620] arXiv:2402.04610 (replaced) [pdf, other]: Title: Early Stopping of Untrained Convolutional Neural Networks

Authors: Tim Jahn, Bangti Jin

Subjects: Numerical Analysis (math.NA)
[621] arXiv:2402.04621 (replaced) [pdf, other]: Title: Feature Distribution on Graph Topology Mediates the Effect of Graph Convolution: Homophily Perspective

Authors: Soo Yong Lee, Sunwoo Kim, Fanchen Bu, Jaemin Yoo, Jiliang Tang, Kijung Shin

Comments: published in ICML 2024

Subjects: Machine Learning (cs.LG)
[622] arXiv:2402.04788 (replaced) [pdf, other]: Title: MLLM-as-a-Judge: Assessing Multimodal LLM-as-a-Judge with Vision-Language Benchmark

Authors: Dongping Chen, Ruoxi Chen, Shilin Zhang, Yinuo Liu, Yaochen Wang, Huichi Zhou, Qihui Zhang, Pan Zhou, Yao Wan, Lichao Sun

Comments: ICML 2024 (Oral)

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[623] arXiv:2402.04997 (replaced) [pdf, other]: Title: Generative Flows on Discrete State-Spaces: Enabling Multimodal Flows with Applications to Protein Co-Design

Authors: Andrew Campbell, Jason Yim, Regina Barzilay, Tom Rainforth, Tommi Jaakkola

Comments: 60 pages, 11 figures, 6 tables; ICML 2024

Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Quantitative Methods (q-bio.QM)
[624] arXiv:2402.06031 (replaced) [pdf, other]: Title: An operator learning perspective on parameter-to-observable maps

Authors: Daniel Zhengyu Huang, Nicholas H. Nelsen, Margaret Trautner

Comments: 63 pages, 10 figures, 1 table

Subjects: Machine Learning (cs.LG); Statistics Theory (math.ST); Machine Learning (stat.ML)
[625] arXiv:2402.06700 (replaced) [pdf, other]: Title: Entropy-Regularized Token-Level Policy Optimization for Language Agent Reinforcement

Authors: Muning Wen, Junwei Liao, Cheng Deng, Jun Wang, Weinan Zhang, Ying Wen

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[626] arXiv:2402.06733 (replaced) [pdf, other]: Title: NICE: To Optimize In-Context Examples or Not?

Authors: Pragya Srivastava, Satvik Golechha, Amit Deshpande, Amit Sharma

Comments: Accepted as a full paper (9 pages) at ACL 2024 (Main)

Journal-ref: Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics 2024 (Volume 1: Long Papers)

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[627] arXiv:2402.06888 (replaced) [pdf, other]: Title: Analysis of Self-Supervised Speech Models on Children's Speech and Infant Vocalizations

Authors: Jialu Li, Mark Hasegawa-Johnson, Nancy L. McElwain

Comments: Accepted to 2024 ICASSP Workshop of Self-supervision in Audio, Speech and Beyond (SASB)

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[628] arXiv:2402.07214 (replaced) [pdf, other]: Title: Through the Lens of Split Vote: Exploring Disagreement, Difficulty and Calibration in Legal Case Outcome Classification

Authors: Shanshan Xu, T.Y.S.S Santosh, Oana Ichim, Barbara Plank, Matthias Grabmair

Subjects: Computation and Language (cs.CL)
[629] arXiv:2402.07483 (replaced) [pdf, other]: Title: T-RAG: Lessons from the LLM Trenches

Authors: Masoomali Fatehkia, Ji Kim Lucas, Sanjay Chawla

Comments: Added Needle in a Haystack analysis for T-RAG

Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[630] arXiv:2402.07640 (replaced) [pdf, other]: Title: CMFeed: A Benchmark Dataset for Controllable Multimodal Feedback Synthesis

Authors: Puneet Kumar, Sarthak Malik, Balasubramanian Raman, Xiaobai Li

Subjects: Multimedia (cs.MM); Artificial Intelligence (cs.AI)
[631] arXiv:2402.07844 (replaced) [pdf, other]: Title: Mercury: A Code Efficiency Benchmark for LLM Code Synthesis

Authors: Mingzhe Du, Anh Tuan Luu, Bin Ji, Qian Liu, See-Kiong Ng

Subjects: Software Engineering (cs.SE); Computation and Language (cs.CL)
[632] arXiv:2402.07891 (replaced) [pdf, other]: Title: Label-Efficient Model Selection for Text Generation

Authors: Shir Ashury-Tahan, Ariel Gera, Benjamin Sznajder, Leshem Choshen, Liat Ein-Dor, Eyal Shnarch

Comments: Accepted to ACL (main conference)

Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
[633] arXiv:2402.08595 (replaced) [pdf, other]: Title: Homomorphism Counts for Graph Neural Networks: All About That Basis

Authors: Emily Jin, Michael Bronstein, Ismail Ilkan Ceylan, Matthias Lanzinger

Comments: Proceedings of the Forty-First International Conference on Machine Learning (ICML 2024). Code available at: this https URL

Subjects: Machine Learning (cs.LG)
[634] arXiv:2402.08876 (replaced) [pdf, other]: Title: DUDF: Differentiable Unsigned Distance Fields with Hyperbolic Scaling

Authors: Miguel Fainstein, Viviana Siless, Emmanuel Iarussi

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR)
[635] arXiv:2402.09470 (replaced) [pdf, other]: Title: Rolling Diffusion Models

Authors: David Ruhe, Jonathan Heek, Tim Salimans, Emiel Hoogeboom

Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[636] arXiv:2402.10013 (replaced) [pdf, other]: Title: Bridging the Empirical-Theoretical Gap in Neural Network Formal Language Learning Using Minimum Description Length

Authors: Nur Lan, Emmanuel Chemla, Roni Katzir

Comments: 9 pages, 5 figures, 3 appendix pages

Subjects: Computation and Language (cs.CL); Formal Languages and Automata Theory (cs.FL)
[637] arXiv:2402.10073 (replaced) [pdf, other]: Title: Both Matter: Enhancing the Emotional Intelligence of Large Language Models without Compromising the General Intelligence

Authors: Weixiang Zhao, Zhuojun Li, Shilong Wang, Yang Wang, Yulin Hu, Yanyan Zhao, Chen Wei, Bing Qin

Comments: To appear at Findings of ACL 2024

Subjects: Computation and Language (cs.CL)
[638] arXiv:2402.10422 (replaced) [pdf, other]: Title: Pushing the Limits of Zero-shot End-to-End Speech Translation

Authors: Ioannis Tsiamas, Gerard I. Gállego, José A. R. Fonollosa, Marta R. Costa-jussà

Comments: ACL 2024 (Findings)

Subjects: Computation and Language (cs.CL)
[639] arXiv:2402.10450 (replaced) [pdf, other]: Title: PRISE: LLM-Style Sequence Compression for Learning Temporal Action Abstractions in Control

Authors: Ruijie Zheng, Ching-An Cheng, Hal Daumé III, Furong Huang, Andrey Kolobov

Comments: Accepted at the Forty-first International Conference on Machine Learning (ICML 2024)

Subjects: Machine Learning (cs.LG)
[640] arXiv:2402.10571 (replaced) [pdf, other]: Title: Direct Preference Optimization with an Offset

Authors: Afra Amini, Tim Vieira, Ryan Cotterell

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[641] arXiv:2402.10588 (replaced) [pdf, other]: Title: Do Llamas Work in English? On the Latent Language of Multilingual Transformers

Authors: Chris Wendler, Veniamin Veselovsky, Giovanni Monea, Robert West

Comments: 12 pages. 28 with appendix

Subjects: Computation and Language (cs.CL); Computers and Society (cs.CY)
[642] arXiv:2402.10639 (replaced) [pdf, other]: Title: Generalizability of Mixture of Domain-Specific Adapters from the Lens of Signed Weight Directions and its Application to Effective Model Pruning

Authors: Tuc Nguyen, Thai Le

Comments: ACL Main 2024

Subjects: Computation and Language (cs.CL)
[643] arXiv:2402.10727 (replaced) [pdf, other]: Title: Predictive Uncertainty Quantification via Risk Decompositions for Strictly Proper Scoring Rules

Authors: Nikita Kotelevskii, Maxim Panov

Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[644] arXiv:2402.10890 (replaced) [pdf, other]: Title: When is Tree Search Useful for LLM Planning? It Depends on the Discriminator

Authors: Ziru Chen, Michael White, Raymond Mooney, Ali Payani, Yu Su, Huan Sun

Comments: ACL 2024 main

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[645] arXiv:2402.11138 (replaced) [pdf, other]: Title: Contrastive Instruction Tuning

Authors: Tianyi Lorena Yan, Fei Wang, James Y. Huang, Wenxuan Zhou, Fan Yin, Aram Galstyan, Wenpeng Yin, Muhao Chen

Comments: ACL 2024 Findings

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[646] arXiv:2402.11349 (replaced) [pdf, other]: Title: Language Models Don't Learn the Physical Manifestation of Language

Authors: Bruce W. Lee, JaeHyuk Lim

Comments: ACL 2024 Main

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[647] arXiv:2402.11463 (replaced) [pdf, other]: Title: Attractor Memory for Long-Term Time Series Forecasting: A Chaos Perspective

Authors: Jiaxi Hu, Yuehong Hu, Wei Chen, Ming Jin, Shirui Pan, Qingsong Wen, Yuxuan Liang

Comments: arXiv admin note: text overlap with arXiv:nlin/0307015 by other authors

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Chaotic Dynamics (nlin.CD)
[648] arXiv:2402.11485 (replaced) [pdf, other]: Title: LEIA: Facilitating Cross-lingual Knowledge Transfer in Language Models with Entity-based Data Augmentation

Authors: Ikuya Yamada, Ryokan Ri

Comments: ACL Findings 2024

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[649] arXiv:2402.11517 (replaced) [pdf, other]: Title: Knowledge-to-SQL: Enhancing SQL Generation with Data Expert LLM

Authors: Zijin Hong, Zheng Yuan, Hao Chen, Qinggang Zhang, Feiran Huang, Xiao Huang

Comments: Accepted to ACL2024 Findings

Subjects: Computation and Language (cs.CL)
[650] arXiv:2402.11548 (replaced) [pdf, other]: Title: KMMLU: Measuring Massive Multitask Language Understanding in Korean

Authors: Guijin Son, Hanwool Lee, Sungdong Kim, Seungone Kim, Niklas Muennighoff, Taekyoon Choi, Cheonbok Park, Kang Min Yoo, Stella Biderman

Comments: Under Review

Subjects: Computation and Language (cs.CL)
[651] arXiv:2402.11597 (replaced) [pdf, other]: Title: Multi-Task Inference: Can Large Language Models Follow Multiple Instructions at Once?

Authors: Guijin Son, Sangwon Baek, Sangdae Nam, Ilgyun Jeong, Seungone Kim

Comments: acl 2024 (main)

Subjects: Computation and Language (cs.CL)
[652] arXiv:2402.11674 (replaced) [pdf, other]: Title: A Fast Algorithm to Simulate Nonlinear Resistive Networks

Authors: Benjamin Scellier

Comments: ICML 2024

Subjects: Emerging Technologies (cs.ET); Disordered Systems and Neural Networks (cond-mat.dis-nn); Machine Learning (cs.LG)
[653] arXiv:2402.11740 (replaced) [pdf, ps, other]: Title: Extraction of nonlinearity in neural networks with Koopman operator

Authors: Naoki Sugishita, Kayo Kinjo, Jun Ohkubo

Comments: 22 pages, 14 figures

Subjects: Machine Learning (cs.LG)
[654] arXiv:2402.11894 (replaced) [pdf, other]: Title: Automating Dataset Updates Towards Reliable and Timely Evaluation of Large Language Models

Authors: Jiahao Ying, Yixin Cao, Yushi Bai, Qianru Sun, Bo Wang, Wei Tang, Zhaojun Ding, Yizhe Yang, Xuanjing Huang, Shuicheng Yan

Subjects: Computation and Language (cs.CL)
[655] arXiv:2402.12343 (replaced) [pdf, other]: Title: Emulated Disalignment: Safety Alignment for Large Language Models May Backfire!

Authors: Zhanhui Zhou, Jie Liu, Zhichen Dong, Jiaheng Liu, Chao Yang, Wanli Ouyang, Yu Qiao

Comments: ACL 2024

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[656] arXiv:2402.12424 (replaced) [pdf, other]: Title: Tables as Texts or Images: Evaluating the Table Reasoning Ability of LLMs and MLLMs

Authors: Naihao Deng, Zhenjie Sun, Ruiqi He, Aman Sikka, Yulong Chen, Lin Ma, Yue Zhang, Rada Mihalcea

Comments: Accepted to ACL 2024 Findings

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
[657] arXiv:2402.12451 (replaced) [pdf, other]: Title: The Revolution of Multimodal Large Language Models: A Survey

Authors: Davide Caffagni, Federico Cocchi, Luca Barsellotti, Nicholas Moratelli, Sara Sarto, Lorenzo Baraldi, Lorenzo Baraldi, Marcella Cornia, Rita Cucchiara

Comments: ACL 2024 (Findings)

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Multimedia (cs.MM)
[658] arXiv:2402.12621 (replaced) [pdf, other]: Title: Reflect-RL: Two-Player Online RL Fine-Tuning for LMs

Authors: Runlong Zhou, Simon S. Du, Beibin Li

Comments: ACL 2024

Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)
[659] arXiv:2402.12691 (replaced) [pdf, other]: Title: Tree-Planted Transformers: Unidirectional Transformer Language Models with Implicit Syntactic Supervision

Authors: Ryo Yoshida, Taiga Someya, Yohei Oseki

Comments: Accepted by ACL 2024 (Findings)

Subjects: Computation and Language (cs.CL)
[660] arXiv:2402.12991 (replaced) [pdf, other]: Title: TRAP: Targeted Random Adversarial Prompt Honeypot for Black-Box Identification

Authors: Martin Gubri, Dennis Ulmer, Hwaran Lee, Sangdoo Yun, Seong Joon Oh

Comments: Accepted at ACL 2024 (findings)

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Cryptography and Security (cs.CR)
[661] arXiv:2402.13212 (replaced) [pdf, other]: Title: Soft Self-Consistency Improves Language Model Agents

Authors: Han Wang, Archiki Prasad, Elias Stengel-Eskin, Mohit Bansal

Comments: ACL 2024 Camera-Ready, the first three authors contributed equally; Code: this https URL

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[662] arXiv:2402.13874 (replaced) [pdf, other]: Title: $Se^2$: Sequential Example Selection for In-Context Learning

Authors: Haoyu Liu, Jianfeng Liu, Shaohan Huang, Yuefeng Zhan, Hao Sun, Weiwei Deng, Furu Wei, Qi Zhang

Comments: Accepted by ACL 2024 Findings

Subjects: Computation and Language (cs.CL)
[663] arXiv:2402.14008 (replaced) [pdf, other]: Title: OlympiadBench: A Challenging Benchmark for Promoting AGI with Olympiad-Level Bilingual Multimodal Scientific Problems

Authors: Chaoqun He, Renjie Luo, Yuzhuo Bai, Shengding Hu, Zhen Leng Thai, Junhao Shen, Jinyi Hu, Xu Han, Yujie Huang, Yuxiang Zhang, Jie Liu, Lei Qi, Zhiyuan Liu, Maosong Sun

Comments: Accepted by ACL 2024 (main), update

Subjects: Computation and Language (cs.CL)
[664] arXiv:2402.14116 (replaced) [pdf, other]: Title: FanOutQA: A Multi-Hop, Multi-Document Question Answering Benchmark for Large Language Models

Authors: Andrew Zhu, Alyssa Hwang, Liam Dugan, Chris Callison-Burch

Comments: 18 pages, 2 figures. ACL 2024

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[665] arXiv:2402.14298 (replaced) [pdf, other]: Title: Multi-modal Stance Detection: New Datasets and Model

Authors: Bin Liang, Ang Li, Jingqian Zhao, Lin Gui, Min Yang, Yue Yu, Kam-Fai Wong, Ruifeng Xu

Comments: ACL'24 Findings

Subjects: Computation and Language (cs.CL)
[666] arXiv:2402.14328 (replaced) [pdf, other]: Title: Understanding and Patching Compositional Reasoning in LLMs

Authors: Zhaoyi Li, Gangwei Jiang, Hong Xie, Linqi Song, Defu Lian, Ying Wei

Comments: Accepted by ACL'2024 Findings

Subjects: Computation and Language (cs.CL)
[667] arXiv:2402.14490 (replaced) [pdf, other]: Title: Imbalanced Data Clustering using Equilibrium K-Means

Authors: Yudong He

Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[668] arXiv:2402.14569 (replaced) [pdf, other]: Title: Transformable Gaussian Reward Function for Socially-Aware Navigation with Deep Reinforcement Learning

Authors: Jinyeob Kim, Sumin Kang, Sungwoo Yang, Beomjoon Kim, Jargalbaatar Yura, Donghan Kim

Comments: 22 pages, 9 figures

Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)
[669] arXiv:2402.14979 (replaced) [pdf, other]: Title: Optimizing Language Models for Human Preferences is a Causal Inference Problem

Authors: Victoria Lin, Eli Ben-Michael, Louis-Philippe Morency

Comments: UAI 2024

Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Methodology (stat.ME)
[670] arXiv:2402.15082 (replaced) [pdf, other]: Title: PEMT: Multi-Task Correlation Guided Mixture-of-Experts Enables Parameter-Efficient Transfer Learning

Authors: Zhisheng Lin, Han Fu, Chenghao Liu, Zhuo Li, Jianling Sun

Comments: Accepted to Findings of the ACL 2024

Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
[671] arXiv:2402.15332 (replaced) [pdf, ps, other]: Title: Position: Categorical Deep Learning is an Algebraic Theory of All Architectures

Authors: Bruno Gavranović, Paul Lessard, Andrew Dudzik, Tamara von Glehn, João G. M. Araújo, Petar Veličković

Comments: To appear in ICML 2024. Comments welcome. More info at categoricaldeeplearning.com

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Category Theory (math.CT); Rings and Algebras (math.RA); Machine Learning (stat.ML)
[672] arXiv:2402.15392 (replaced) [pdf, ps, other]: Title: Offline Inverse RL: New Solution Concepts and Provably Efficient Algorithms

Authors: Filippo Lazzati, Mirco Mutti, Alberto Maria Metelli

Comments: International Conference on Machine Learning 41 (ICML 2024)

Subjects: Machine Learning (cs.LG)
[673] arXiv:2402.15637 (replaced) [pdf, other]: Title: Addressing Order Sensitivity of In-Context Demonstration Examples in Causal Language Models

Authors: Yanzheng Xiang, Hanqi Yan, Lin Gui, Yulan He

Subjects: Computation and Language (cs.CL)
[674] arXiv:2402.15838 (replaced) [pdf, other]: Title: ListT5: Listwise Reranking with Fusion-in-Decoder Improves Zero-shot Retrieval

Authors: Soyoung Yoon, Eunbi Choi, Jiyeon Kim, Hyeongu Yun, Yireun Kim, Seung-won Hwang

Comments: Accepted to ACL 2024 main (long)

Subjects: Information Retrieval (cs.IR)
[675] arXiv:2402.16438 (replaced) [pdf, other]: Title: Language-Specific Neurons: The Key to Multilingual Capabilities in Large Language Models

Authors: Tianyi Tang, Wenyang Luo, Haoyang Huang, Dongdong Zhang, Xiaolei Wang, Xin Zhao, Furu Wei, Ji-Rong Wen

Comments: Accepted by ACL 2024

Subjects: Computation and Language (cs.CL)
[676] arXiv:2402.16775 (replaced) [pdf, other]: Title: A Comprehensive Evaluation of Quantization Strategies for Large Language Models

Authors: Renren Jin, Jiangcun Du, Wuwei Huang, Wei Liu, Jian Luan, Bin Wang, Deyi Xiong

Comments: ACL 2024 Findings

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[677] arXiv:2402.17120 (replaced) [pdf, other]: Title: LCEN: A Novel Feature Selection Algorithm for Nonlinear, Interpretable Machine Learning Models

Authors: Pedro Seber, Richard D. Braatz

Subjects: Machine Learning (cs.LG)
[678] arXiv:2402.17316 (replaced) [pdf, other]: Title: Towards Robust and Efficient Cloud-Edge Elastic Model Adaptation via Selective Entropy Distillation

Authors: Yaofo Chen, Shuaicheng Niu, Yaowei Wang, Shoukai Xu, Hengjie Song, Mingkui Tan

Comments: Published in ICLR 2024

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[679] arXiv:2402.17447 (replaced) [pdf, other]: Title: Deep Learning Based Named Entity Recognition Models for Recipes

Authors: Mansi Goel, Ayush Agarwal, Shubham Agrawal, Janak Kapuriya, Akhil Vamshi Konam, Rishabh Gupta, Shrey Rastogi, Niharika, Ganesh Bagler

Comments: 13 pages, 6 main figures and 2 in appendices, and 3 main tables; Accepted for publication in LREC-COLING 2024

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
[680] arXiv:2402.17641 (replaced) [pdf, other]: Title: Variational Learning is Effective for Large Deep Networks

Authors: Yuesong Shen, Nico Daheim, Bai Cong, Peter Nickl, Gian Maria Marconi, Clement Bazan, Rio Yokota, Iryna Gurevych, Daniel Cremers, Mohammad Emtiyaz Khan, Thomas Möllenhoff

Comments: Published at International Conference on Machine Learning (ICML), 2024. The first two authors contributed equally. Code is available here: this https URL

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Optimization and Control (math.OC); Machine Learning (stat.ML)
[681] arXiv:2402.18059 (replaced) [pdf, other]: Title: Token-Specific Watermarking with Enhanced Detectability and Semantic Coherence for Large Language Models

Authors: Mingjia Huo, Sai Ashish Somayajula, Youwei Liang, Ruisi Zhang, Farinaz Koushanfar, Pengtao Xie

Comments: 22 pages, 13 figures, 5 tables

Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Cryptography and Security (cs.CR)
[682] arXiv:2402.18158 (replaced) [pdf, other]: Title: Evaluating Quantized Large Language Models

Authors: Shiyao Li, Xuefei Ning, Luning Wang, Tengxuan Liu, Xiangsheng Shi, Shengen Yan, Guohao Dai, Huazhong Yang, Yu Wang

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[683] arXiv:2402.18334 (replaced) [pdf, other]: Title: Learning to Generate Instruction Tuning Datasets for Zero-Shot Task Adaptation

Authors: Nihal V. Nayak, Yiyang Nan, Avi Trost, Stephen H. Bach

Comments: ACL Findings 2024

Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
[684] arXiv:2403.00720 (replaced) [pdf, other]: Title: Subhomogeneous Deep Equilibrium Models

Authors: Pietro Sittoni, Francesco Tudisco

Subjects: Machine Learning (cs.LG); Numerical Analysis (math.NA); Optimization and Control (math.OC)
[685] arXiv:2403.01165 (replaced) [pdf, other]: Title: STAR: Constraint LoRA with Dynamic Active Learning for Data-Efficient Fine-Tuning of Large Language Models

Authors: Linhai Zhang, Jialong Wu, Deyu Zhou, Guoqiang Xu

Comments: Accepted by ACL2024(Findings)

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[686] arXiv:2403.01166 (replaced) [pdf, other]: Title: DINER: Debiasing Aspect-based Sentiment Analysis with Multi-variable Causal Inference

Authors: Jialong Wu, Linhai Zhang, Deyu Zhou, Guoqiang Xu

Comments: Accepted by ACL2024(Findings)

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[687] arXiv:2403.01931 (replaced) [pdf, other]: Title: VariErr NLI: Separating Annotation Error from Human Label Variation

Authors: Leon Weber-Genzel, Siyao Peng, Marie-Catherine de Marneffe, Barbara Plank

Comments: 14 pages, accepted at ACL 2024 main

Subjects: Computation and Language (cs.CL)
[688] arXiv:2403.02271 (replaced) [pdf, other]: Title: RIFF: Learning to Rephrase Inputs for Few-shot Fine-tuning of Language Models

Authors: Saeed Najafi, Alona Fyshe

Comments: Final Version (Findings of ACL2024)

Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
[689] arXiv:2403.02354 (replaced) [pdf, other]: Title: Spatio-Temporal Field Neural Networks for Air Quality Inference

Authors: Yutong Feng, Qiongyan Wang, Yutong Xia, Junlin Huang, Siru Zhong, Yuxuan Liang

Comments: We want to recheck our model and experimental design

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[690] arXiv:2403.02437 (replaced) [pdf, other]: Title: SoK: Challenges and Opportunities in Federated Unlearning

Authors: Hyejun Jeong, Shiqing Ma, Amir Houmansadr

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC)
[691] arXiv:2403.02451 (replaced) [pdf, other]: Title: Views Are My Own, but Also Yours: Benchmarking Theory of Mind Using Common Ground

Authors: Adil Soubki, John Murzaku, Arash Yousefi Jordehi, Peter Zeng, Magdalena Markowska, Seyed Abolghasem Mirroshandel, Owen Rambow

Subjects: Computation and Language (cs.CL)
[692] arXiv:2403.02660 (replaced) [pdf, other]: Title: A randomized lattice rule without component-by-component construction

Authors: Takashi Goda

Comments: revision, 21 pages, 3 figures

Subjects: Numerical Analysis (math.NA)
[693] arXiv:2403.02977 (replaced) [pdf, other]: Title: Fast Iterative Region Inflation for Computing Large 2-D/3-D Convex Regions of Obstacle-Free Space

Authors: Qianhao Wang, Zhepei Wang, Mingyang Wang, Jialin Ji, Zhichao Han, Tianyue Wu, Rui Jin, Yuman Gao, Chao Xu, Fei Gao

Subjects: Robotics (cs.RO)
[694] arXiv:2403.03129 (replaced) [pdf, other]: Title: CoGenesis: A Framework Collaborating Large and Small Language Models for Secure Context-Aware Instruction Following

Authors: Kaiyan Zhang, Jianyu Wang, Ermo Hua, Biqing Qi, Ning Ding, Bowen Zhou

Comments: Accepted to ACL 2024 (Main Conference)

Subjects: Computation and Language (cs.CL)
[695] arXiv:2403.03167 (replaced) [pdf, other]: Title: PARADISE: Evaluating Implicit Planning Skills of Language Models with Procedural Warnings and Tips Dataset

Authors: Arda Uzunoglu, Abdalfatah Rashid Safa, Gözde Gül Şahin

Comments: 9 pages, ACL 2024 Findings

Subjects: Computation and Language (cs.CL)
[696] arXiv:2403.03234 (replaced) [pdf, other]: Title: Caduceus: Bi-Directional Equivariant Long-Range DNA Sequence Modeling

Authors: Yair Schiff, Chia-Hsiang Kao, Aaron Gokaslan, Tri Dao, Albert Gu, Volodymyr Kuleshov

Comments: ICML 2024; Code to reproduce our experiments is available at this https URL

Subjects: Genomics (q-bio.GN); Machine Learning (cs.LG)
[697] arXiv:2403.04346 (replaced) [pdf, ps, other]: Title: BrainKnow -- Extracting, Linking, and Synthesizing Neuroscience Knowledge

Authors: Cunqing Huangfu, Kang Sun, Yi Zeng, Yuwei Wang, Dongsheng Wang, Zizhe Ruan

Comments: 22 pages, 7 figures

Subjects: Digital Libraries (cs.DL); Neurons and Cognition (q-bio.NC)
[698] arXiv:2403.05535 (replaced) [pdf, other]: Title: Tell, Don't Show!: Language Guidance Eases Transfer Across Domains in Images and Videos

Authors: Tarun Kalluri, Bodhisattwa Prasad Majumder, Manmohan Chandraker

Comments: ICML 2024 Camera-Ready. Project Page and Code: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[699] arXiv:2403.06189 (replaced) [pdf, other]: Title: Harmonious Group Choreography with Trajectory-Controllable Diffusion

Authors: Yuqin Dai, Wanlu Zhu, Ronghui Li, Zeping Ren, Xiangzheng Zhou, Xiu Li, Jun Li, Jian Yang

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[700] arXiv:2403.06840 (replaced) [pdf, other]: Title: RA-ISF: Learning to Answer and Understand from Retrieval Augmentation via Iterative Self-Feedback

Authors: Yanming Liu, Xinyue Peng, Xuhong Zhang, Weihao Liu, Jianwei Yin, Jiannan Cao, Tianyu Du

Comments: 20 pages, multiple figures. Providing second version RA-ISF

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[701] arXiv:2403.06932 (replaced) [pdf, other]: Title: ERA-CoT: Improving Chain-of-Thought through Entity Relationship Analysis

Authors: Yanming Liu, Xinyue Peng, Tianyu Du, Jianwei Yin, Weihao Liu, Xuhong Zhang

Comments: 15 pages, second version of ERA-CoT

Subjects: Computation and Language (cs.CL)
[702] arXiv:2403.07245 (replaced) [pdf, other]: Title: Dataset Condensation for Time Series Classification via Dual Domain Matching

Authors: Zhanyu Liu, Ke Hao, Guanjie Zheng, Yanwei Yu

Comments: Accepted by KDD 2024 research track

Subjects: Machine Learning (cs.LG)
[703] arXiv:2403.07723 (replaced) [pdf, ps, other]: Title: On the Last-Iterate Convergence of Shuffling Gradient Methods

Authors: Zijian Liu, Zhengyuan Zhou

Comments: ICML 2024

Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
[704] arXiv:2403.07746 (replaced) [pdf, other]: Title: Unleashing HyDRa: Hybrid Fusion, Depth Consistency and Radar for Unified 3D Perception

Authors: Philipp Wolters, Johannes Gilg, Torben Teepe, Fabian Herzog, Anouar Laouichi, Martin Hofmann, Gerhard Rigoll

Comments: 10 pages, 4 figures Added eval on VoD

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[705] arXiv:2403.07974 (replaced) [pdf, other]: Title: LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code

Authors: Naman Jain, King Han, Alex Gu, Wen-Ding Li, Fanjia Yan, Tianjun Zhang, Sida Wang, Armando Solar-Lezama, Koushik Sen, Ion Stoica

Comments: Website - this https URL

Subjects: Software Engineering (cs.SE); Computation and Language (cs.CL); Machine Learning (cs.LG)
[706] arXiv:2403.09347 (replaced) [pdf, other]: Title: BurstAttention: An Efficient Distributed Attention Framework for Extremely Long Sequences

Authors: Ao Sun, Weilin Zhao, Xu Han, Cheng Yang, Zhiyuan Liu, Chuan Shi, Maosong Sun

Comments: 13 pages, 7 figures

Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)
[707] arXiv:2403.09871 (replaced) [pdf, other]: Title: ThermoHands: A Benchmark for 3D Hand Pose Estimation from Egocentric Thermal Images

Authors: Fangqiang Ding, Yunzhou Zhu, Xiangyu Wen, Gaowen Liu, Chris Xiaoxuan Lu

Comments: 15 pages, 6 figures, 4 tables

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
[708] arXiv:2403.10081 (replaced) [pdf, other]: Title: DRAGIN: Dynamic Retrieval Augmented Generation based on the Information Needs of Large Language Models

Authors: Weihang Su, Yichen Tang, Qingyao Ai, Zhijing Wu, Yiqun Liu

Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)
[709] arXiv:2403.13169 (replaced) [pdf, other]: Title: Wav2Gloss: Generating Interlinear Glossed Text from Speech

Authors: Taiqi He, Kwanghee Choi, Lindia Tjuatja, Nathaniel R. Robinson, Jiatong Shi, Shinji Watanabe, Graham Neubig, David R. Mortensen, Lori Levin

Comments: ACL 2024 camera ready version

Subjects: Computation and Language (cs.CL)
[710] arXiv:2403.13872 (replaced) [pdf, other]: Title: Spatial-Temporal Graph Representation Learning for Tactical Networks Future State Prediction

Authors: Liu Junhua, Albrethsen Justin, Goh Lincoln, Yau David, Lim Kwan Hui

Subjects: Machine Learning (cs.LG); Social and Information Networks (cs.SI)
[711] arXiv:2403.15097 (replaced) [pdf, other]: Title: Argument-Aware Approach To Event Linking

Authors: I-Hung Hsu, Zihan Xue, Nilay Pochh, Sahil Bansal, Premkumar Natarajan, Jayanth Srinivasa, Nanyun Peng

Comments: Paper accepted by ACL-findings 2024

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[712] arXiv:2403.15191 (replaced) [pdf, other]: Title: VORTEX: Real-Time Off-Chain Payments and Cross-Chain Swaps for Cryptocurrencies

Authors: Di Wu, Jian Liu, Zhengwei Hou, Wu Wen, Kui Ren

Subjects: Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC)
[713] arXiv:2403.17270 (replaced) [pdf, other]: Title: Human Stress Response and Perceived Safety during Encounters with Quadruped Robots

Authors: Ryan Gupta, Hyonyoung Shin, Emily Norman, Keri K. Stephens, Nanshu Lu, Luis Sentis

Comments: 8 pages, 7 figs, 5 tables

Subjects: Robotics (cs.RO); Human-Computer Interaction (cs.HC)
[714] arXiv:2403.17673 (replaced) [pdf, other]: Title: How Private are DP-SGD Implementations?

Authors: Lynn Chua, Badih Ghazi, Pritish Kamath, Ravi Kumar, Pasin Manurangsi, Amer Sinha, Chiyuan Zhang

Comments: Proceedings of ICML 2024

Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Data Structures and Algorithms (cs.DS)
[715] arXiv:2403.18680 (replaced) [pdf, other]: Title: Non-Linear Inference Time Intervention: Improving LLM Truthfulness

Authors: Jakub Hoscilowicz, Adam Wiacek, Jan Chojnacki, Adam Cieslak, Leszek Michon, Vitalii Urbanevych, Artur Janicki

Comments: Accepted on Interspeech 2024 Conference. Code is available at this https URL

Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
[716] arXiv:2403.18953 (replaced) [pdf, ps, other]: Title: Hybridizing Traditional and Next-Generation Reservoir Computing to Accurately and Efficiently Forecast Dynamical Systems

Authors: Ravi Chepuri, Dael Amzalag, Thomas Antonsen Jr., Michelle Girvan

Comments: 12 pages, 7 figures

Journal-ref: Chaos 1 June 2024; 34 (6): 063114

Subjects: Machine Learning (cs.LG)
[717] arXiv:2403.19223 (replaced) [pdf, ps, other]: Title: Computing large deviation rate functions of entropy production for diffusion processes by an interacting particle method

Authors: Zhizhang Wu, Renaud Raquépas, Jack Xin, Zhiwen Zhang

Subjects: Numerical Analysis (math.NA)
[718] arXiv:2403.19260 (replaced) [pdf, other]: Title: NaijaHate: Evaluating Hate Speech Detection on Nigerian Twitter Using Representative Data

Authors: Manuel Tonneau, Pedro Vitor Quinta de Castro, Karim Lasri, Ibrahim Farouq, Lakshminarayanan Subramanian, Victor Orozco-Olvera, Samuel P. Fraiberger

Comments: ACL 2024 main conference. Data and models available at this https URL

Subjects: Computation and Language (cs.CL)
[719] arXiv:2403.19589 (replaced) [pdf, other]: Title: TOD3Cap: Towards 3D Dense Captioning in Outdoor Scenes

Authors: Bu Jin, Yupeng Zheng, Pengfei Li, Weize Li, Yuhang Zheng, Sujie Hu, Xinyu Liu, Jinwei Zhu, Zhijie Yan, Haiyang Sun, Kun Zhan, Peng Jia, Xiaoxiao Long, Yilun Chen, Hao Zhao

Comments: Code, data, and models are publicly available at this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[720] arXiv:2404.00929 (replaced) [pdf, other]: Title: A Survey on Multilingual Large Language Models: Corpora, Alignment, and Bias

Authors: Yuemei Xu, Ling Hu, Jiayi Zhao, Zihan Qiu, Yuqi Ye, Hanwen Gu

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[721] arXiv:2404.05835 (replaced) [pdf, other]: Title: Parameter-Adaptive Approximate MPC: Tuning Neural-Network Controllers without Retraining

Authors: Henrik Hose, Alexander Gräfe, Sebastian Trimpe

Comments: Accepted to L4DC 2024

Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG); Optimization and Control (math.OC)
[722] arXiv:2404.09889 (replaced) [pdf, other]: Title: Is Table Retrieval a Solved Problem? Exploring Join-Aware Multi-Table Retrieval

Authors: Peter Baile Chen, Yi Zhang, Dan Roth

Comments: ACL 2024 camera ready

Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[723] arXiv:2404.10496 (replaced) [pdf, other]: Title: Spiral of Silences: How is Large Language Model Killing Information Retrieval? -- A Case Study on Open Domain Question Answering

Authors: Xiaoyang Chen, Ben He, Hongyu Lin, Xianpei Han, Tianshu Wang, Boxi Cao, Le Sun, Yingfei Sun

Comments: Accepted to ACL2024

Subjects: Information Retrieval (cs.IR)
[724] arXiv:2404.12464 (replaced) [pdf, other]: Title: NormAd: A Benchmark for Measuring the Cultural Adaptability of Large Language Models

Authors: Abhinav Rao, Akhila Yerukola, Vishwa Shah, Katharina Reinecke, Maarten Sap

Comments: Preprint. In Review

Subjects: Computation and Language (cs.CL)
[725] arXiv:2404.13195 (replaced) [pdf, ps, other]: Title: Automatic BLAS Offloading on Unified Memory Architecture: A Study on NVIDIA Grace-Hopper

Authors: Junjie Li, Yinzhi Wang, Xiao Liang, Hang Liu

Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
[726] arXiv:2404.13874 (replaced) [pdf, other]: Title: VALOR-EVAL: Holistic Coverage and Faithfulness Evaluation of Large Vision-Language Models

Authors: Haoyi Qiu, Wenbo Hu, Zi-Yi Dou, Nanyun Peng

Comments: ACL 2024 Findings

Subjects: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
[727] arXiv:2404.13936 (replaced) [pdf, ps, other]: Title: A bound preserving cut discontinuous Galerkin method for one dimensional hyperbolic conservation laws

Authors: Pei Fu, Gunilla Kreiss, Sara Zahedi

Comments: 32

Subjects: Numerical Analysis (math.NA)
[728] arXiv:2404.14461 (replaced) [pdf, other]: Title: Competition Report: Finding Universal Jailbreak Backdoors in Aligned LLMs

Authors: Javier Rando, Francesco Croce, Kryštof Mitka, Stepan Shabalin, Maksym Andriushchenko, Nicolas Flammarion, Florian Tramèr

Comments: Competition Report

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
[729] arXiv:2404.14745 (replaced) [pdf, other]: Title: TAAT: Think and Act from Arbitrary Texts in Text2Motion

Authors: Runqi Wang, Caoyuan Ma, Guopeng Li, Zheng Wang

Comments: Updated errors in author information

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[730] arXiv:2404.14964 (replaced) [pdf, other]: Title: Elucidating the theoretical underpinnings of surrogate gradient learning in spiking neural networks

Authors: Julia Gygax, Friedemann Zenke

Comments: 25 pages, 7 figures + 3 supplementary figures

Subjects: Neural and Evolutionary Computing (cs.NE); Neurons and Cognition (q-bio.NC)
[731] arXiv:2404.15004 (replaced) [pdf, other]: Title: TAXI: Evaluating Categorical Knowledge Editing for Language Models

Authors: Derek Powell, Walter Gerych, Thomas Hartvigsen

Comments: Accepted to ACL 2024 (Findings)

Subjects: Computation and Language (cs.CL)
[732] arXiv:2404.15522 (replaced) [pdf, other]: Title: LogicBench: Towards Systematic Evaluation of Logical Reasoning Ability of Large Language Models

Authors: Mihir Parmar, Nisarg Patel, Neeraj Varshney, Mutsumi Nakamura, Man Luo, Santosh Mashetty, Arindam Mitra, Chitta Baral

Comments: Accepted at ACL(Main) 2024 | First version available @ this https URL

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[733] arXiv:2404.15611 (replaced) [pdf, other]: Title: Model Poisoning Attacks to Federated Learning via Multi-Round Consistency

Authors: Yueqi Xie, Minghong Fang, Neil Zhenqiang Gong

Subjects: Cryptography and Security (cs.CR)
[734] arXiv:2404.16363 (replaced) [pdf, other]: Title: Byzantine Attacks Exploiting Penalties in Ethereum PoS

Authors: Ulysse Pavloff, Yackolley Amoussou-Genou, Sara Tucci-Piergiovanni

Subjects: Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC)
[735] arXiv:2404.16966 (replaced) [pdf, other]: Title: Examining the robustness of LLM evaluation to the distributional assumptions of benchmarks

Authors: Melissa Ailem, Katerina Marazopoulou, Charlotte Siska, James Bono

Subjects: Computation and Language (cs.CL)
[736] arXiv:2404.17140 (replaced) [pdf, other]: Title: Small Language Models Need Strong Verifiers to Self-Correct Reasoning

Authors: Yunxiang Zhang, Muhammad Khalifa, Lajanugen Logeswaran, Jaekyeom Kim, Moontae Lee, Honglak Lee, Lu Wang

Comments: ACL Findings 2024 - Camera Ready

Subjects: Computation and Language (cs.CL)
[737] arXiv:2405.00301 (replaced) [pdf, other]: Title: Enhanced Language Model Truthfulness with Learnable Intervention and Uncertainty Expression

Authors: Farima Fatahi Bayat, Xin Liu, H. V. Jagadish, Lu Wang

Comments: 13 pages, 5 figures

Subjects: Computation and Language (cs.CL)
[738] arXiv:2405.00892 (replaced) [pdf, other]: Title: Wake Vision: A Large-scale, Diverse Dataset and Benchmark Suite for TinyML Person Detection

Authors: Colby Banbury, Emil Njor, Matthew Stewart, Pete Warden, Manjunath Kudlur, Nat Jeffries, Xenofon Fafoutis, Vijay Janapa Reddi

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[739] arXiv:2405.00899 (replaced) [pdf, other]: Title: Characterising the Creative Process in Humans and Large Language Models

Authors: Surabhi S. Nath, Peter Dayan, Claire Stevenson

Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Neurons and Cognition (q-bio.NC)
[740] arXiv:2405.02492 (replaced) [pdf, other]: Title: Investigating the Generalizability of Assistive Robots Models over Various Tasks

Authors: Hamid Osooli, Christopher Coco, Johnathan Spanos, Amin Majdi, Reza Azadeh

Comments: Accepted to 2024 21st International Conference on Ubiquitous Robots (UR)

Subjects: Robotics (cs.RO)
[741] arXiv:2405.02664 (replaced) [pdf, other]: Title: MedPromptExtract (Medical Data Extraction Tool): Anonymization and Hi-fidelity Automated data extraction using NLP and prompt engineering

Authors: Roomani Srivastava, Suraj Prasad, Lipika Bhat, Sarvesh Deshpande, Barnali Das, Kshitij Jadhav

Comments: 4 pages, 3 figures, pre-print sumitted to CIKM 2024

Subjects: Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
[742] arXiv:2405.03035 (replaced) [pdf, other]: Title: Probabilistic Finite Automaton Emptiness is undecidable

Authors: Günter Rote

Comments: 63 pages, 14 figures, 2 tables, 53 footnotes, 11 sections plus 1 appendix. Added another proof and more history, which had been overlooked before

Subjects: Formal Languages and Automata Theory (cs.FL)
[743] arXiv:2405.03064 (replaced) [pdf, other]: Title: RICE: Breaking Through the Training Bottlenecks of Reinforcement Learning with Explanation

Authors: Zelei Cheng, Xian Wu, Jiahao Yu, Sabrina Yang, Gang Wang, Xinyu Xing

Comments: Accepted by ICML 2024

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)
[744] arXiv:2405.04061 (replaced) [pdf, other]: Title: Generalized Cauchy-Schwarz Divergence and Its Deep Learning Applications

Authors: Mingfei Lu, Chenxu Li, Shujian Yu, Robert Jenssen, Badong Chen

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[745] arXiv:2405.04776 (replaced) [pdf, other]: Title: Chain of Thoughtlessness? An Analysis of CoT in Planning

Authors: Kaya Stechly, Karthik Valmeekam, Subbarao Kambhampati

Subjects: Artificial Intelligence (cs.AI)
[746] arXiv:2405.05847 (replaced) [pdf, other]: Title: Learned feature representations are biased by complexity, learning order, position, and more

Authors: Andrew Kyle Lampinen, Stephanie C. Y. Chan, Katherine Hermann

Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
[747] arXiv:2405.07460 (replaced) [pdf, other]: Title: HoneyBee: A Scalable Modular Framework for Creating Multimodal Oncology Datasets with Foundational Embedding Models

Authors: Aakash Tripathi, Asim Waqas, Yasin Yilmaz, Ghulam Rasool

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Databases (cs.DB)
[748] arXiv:2405.07536 (replaced) [pdf, other]: Title: Multi-AUV Kinematic Task Assignment based on Self-organizing Map Neural Network and Dubins Path Generator

Authors: Xin Li, Wenyang Gan, Pang Wen, Daqi Zhu

Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
[749] arXiv:2405.09005 (replaced) [pdf, other]: Title: Cons-training tensor networks

Authors: Javier Lopez-Piqueres, Jing Chen

Comments: v2: mostly improved Fig 1 and 13 for clarity, improved exposition of ideas, and fixed a couple of transcription bugs in the pseudo algo. 3

Subjects: Numerical Analysis (math.NA); Machine Learning (cs.LG); Quantum Physics (quant-ph)
[750] arXiv:2405.09482 (replaced) [pdf, other]: Title: Beyond Flesch-Kincaid: Prompt-based Metrics Improve Difficulty Classification of Educational Texts

Authors: Donya Rooein, Paul Rottger, Anastassia Shaitarova, Dirk Hovy

Subjects: Computation and Language (cs.CL)
[751] arXiv:2405.10150 (replaced) [pdf, other]: Title: Speaker Verification in Agent-Generated Conversations

Authors: Yizhe Yang, Palakorn Achananuparp, Heyan Huang, Jing Jiang, Ee-Peng Lim

Subjects: Computation and Language (cs.CL)
[752] arXiv:2405.10467 (replaced) [pdf, other]: Title: Agent Design Pattern Catalogue: A Collection of Architectural Patterns for Foundation Model based Agents

Authors: Yue Liu, Sin Kit Lo, Qinghua Lu, Liming Zhu, Dehai Zhao, Xiwei Xu, Stefan Harrer, Jon Whittle

Subjects: Artificial Intelligence (cs.AI); Software Engineering (cs.SE)
[753] arXiv:2405.10517 (replaced) [pdf, other]: Title: Towards Better Question Generation in QA-based Event Extraction

Authors: Zijin Hong, Jian Liu

Comments: Accepted to ACL2024 Findings

Subjects: Computation and Language (cs.CL)
[754] arXiv:2405.11684 (replaced) [pdf, other]: Title: Learning Regularities from Data using Spiking Functions: A Theory

Authors: Canlin Zhang, Xiuwen Liu

Subjects: Machine Learning (cs.LG); Information Theory (cs.IT)
[755] arXiv:2405.11876 (replaced) [pdf, other]: Title: Understanding crypter-as-a-service in a popular underground marketplace

Authors: Alejandro de la Cruz, Sergio Pastrana

Comments: A short version of this paper was accepted at the 6th Workshop on Attackers and Cyber-Crime Operations (WACCO)

Subjects: Cryptography and Security (cs.CR)
[756] arXiv:2405.11968 (replaced) [pdf, other]: Title: Conditional Shift-Robust Conformal Prediction for Graph Neural Network

Authors: S. Akansha

Comments: 14 pages, 2 figures, 3 tables

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[757] arXiv:2405.12684 (replaced) [pdf, other]: Title: Model Free Prediction with Uncertainty Assessment

Authors: Yuling Jiao, Lican Kang, Jin Liu, Heng Peng, Heng Zuo

Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[758] arXiv:2405.13034 (replaced) [pdf, other]: Title: Autonomous Workflow for Multimodal Fine-Grained Training Assistants Towards Mixed Reality

Authors: Jiahuan Pei, Irene Viola, Haochen Huang, Junxiao Wang, Moonisa Ahsan, Fanghua Ye, Jiang Yiming, Yao Sai, Di Wang, Zhumin Chen, Pengjie Ren, Pablo Cesar

Comments: Accepted by ACL 2024

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
[759] arXiv:2405.13753 (replaced) [pdf, other]: Title: A Dynamic Model of Performative Human-ML Collaboration: Theory and Empirical Evidence

Authors: Tom Sühr, Samira Samadi, Chiara Farronato

Comments: 9 Pages and appendix

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); General Economics (econ.GN)
[760] arXiv:2405.13902 (replaced) [pdf, other]: Title: LOGIN: A Large Language Model Consulted Graph Neural Network Training Framework

Authors: Yiran Qiao, Xiang Ao, Yang Liu, Jiarong Xu, Xiaoqian Sun, Qing He

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[761] arXiv:2405.14108 (replaced) [pdf, other]: Title: Deep Learning for Protein-Ligand Docking: Are We There Yet?

Authors: Alex Morehead, Nabin Giri, Jian Liu, Jianlin Cheng

Comments: 30 pages, 1 table, 27 figures. Under review. Code, data, tutorials, and benchmark results are available at this https URL

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Biomolecules (q-bio.BM); Quantitative Methods (q-bio.QM)
[762] arXiv:2405.14156 (replaced) [pdf, other]: Title: Unveiling the Tapestry of Consistency in Large Vision-Language Models

Authors: Yuan Zhang, Fei Xiao, Tao Huang, Chun-Kai Fan, Hongyuan Dong, Jiawen Li, Jiacong Wang, Kuan Cheng, Shanghang Zhang, Haoyuan Guo

Comments: This project is available at this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[763] arXiv:2405.15671 (replaced) [pdf, other]: Title: The Undecidability of Quantified Announcements

Authors: Thomas Ågotnes, Hans van Ditmarsch, Tim French

Comments: This paper contains a correction to the 2016 article, The Undecidablity of Quantified Announcements, published in Studia Logica

Journal-ref: The undecidability of quantified announcements. Studia Logica, 104(4) pages 597-640, 2016

Subjects: Logic in Computer Science (cs.LO)
[764] arXiv:2405.15769 (replaced) [pdf, other]: Title: FastDrag: Manipulate Anything in One Step

Authors: Xuanjia Zhao, Jian Guan, Congyi Fan, Dongli Xu, Youtian Lin, Haiwei Pan, Pengming Feng

Comments: 13 pages, 13 figures, Project page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[765] arXiv:2405.16225 (replaced) [pdf, ps, other]: Title: Local Causal Structure Learning in the Presence of Latent Variables

Authors: Feng Xie, Zheng Li, Peng Wu, Yan Zeng, Chunchen Liu, Zhi Geng

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[766] arXiv:2405.16488 (replaced) [pdf, ps, other]: Title: Partial train and isolate, mitigate backdoor attack

Authors: Yong Li, Han Gao

Comments: 9 pages, 2 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[767] arXiv:2405.16526 (replaced) [pdf, other]: Title: Past, Present, and Future of Citation Practices in HCI

Authors: Jonas Oppenlaender

Subjects: Human-Computer Interaction (cs.HC); Computers and Society (cs.CY); Digital Libraries (cs.DL)
[768] arXiv:2405.16849 (replaced) [pdf, other]: Title: Sync4D: Video Guided Controllable Dynamics for Physics-Based 4D Generation

Authors: Zhoujie Fu, Jiacheng Wei, Wenhao Shen, Chaoyue Song, Xiaofeng Yang, Fayao Liu, Xulei Yang, Guosheng Lin

Comments: Our project page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[769] arXiv:2405.17234 (replaced) [pdf, other]: Title: Benchmarking General Purpose In-Context Learning

Authors: Fan Wang, Chuan Lin, Yang Cao, Yu Kang

Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[770] arXiv:2405.17272 (replaced) [pdf, other]: Title: DPN: Decoupling Partition and Navigation for Neural Solvers of Min-max Vehicle Routing Problems

Authors: Zhi Zheng, Shunyu Yao, Zhenkun Wang, Xialiang Tong, Mingxuan Yuan, Ke Tang

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[771] arXiv:2405.17345 (replaced) [pdf, other]: Title: Exploring and steering the moral compass of Large Language Models

Authors: Alejandro Tlaie

Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[772] arXiv:2405.17398 (replaced) [pdf, other]: Title: Vista: A Generalizable Driving World Model with High Fidelity and Versatile Controllability

Authors: Shenyuan Gao, Jiazhi Yang, Li Chen, Kashyap Chitta, Yihang Qiu, Andreas Geiger, Jun Zhang, Hongyang Li

Comments: Code and model: this https URL, video demos: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[773] arXiv:2405.17814 (replaced) [pdf, other]: Title: FAIntbench: A Holistic and Precise Benchmark for Bias Evaluation in Text-to-Image Models

Authors: Hanjun Luo, Ziye Deng, Ruizhe Chen, Zuozhu Liu

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[774] arXiv:2405.18353 (replaced) [pdf, other]: Title: Simulating infinite-dimensional nonlinear diffusion bridges

Authors: Gefan Yang, Elizabeth Louise Baker, Michael L. Severinsen, Christy Anna Hipsley, Stefan Sommer

Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[775] arXiv:2405.18457 (replaced) [pdf, other]: Title: Improving Linear System Solvers for Hyperparameter Optimisation in Iterative Gaussian Processes

Authors: Jihao Andreas Lin, Shreyas Padhy, Bruno Mlodozeniec, Javier Antorán, José Miguel Hernández-Lobato

Comments: Preprint. arXiv admin note: text overlap with arXiv:2405.18328

Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[776] arXiv:2405.18860 (replaced) [pdf, other]: Title: Empowering Embodied Manipulation: A Bimanual-Mobile Robot Manipulation Dataset for Household Tasks

Authors: Tianle Zhang, Dongjiang Li, Yihang Li, Zecui Zeng, Lin Zhao, Lei Sun, Yue Chen, Xuelong Wei, Yibing Zhan, Lusong Li, Xiaodong He

Subjects: Robotics (cs.RO)
[777] arXiv:2405.18942 (replaced) [pdf, other]: Title: Verifiably Robust Conformal Prediction

Authors: Linus Jeary, Tom Kuipers, Mehran Hosseini, Nicola Paoletti

Subjects: Logic in Computer Science (cs.LO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[778] arXiv:2405.19732 (replaced) [pdf, other]: Title: Two Optimizers Are Better Than One: LLM Catalyst Empowers Gradient-Based Optimization for Prompt Tuning

Authors: Zixian Guo, Ming Liu, Zhilong Ji, Jinfeng Bai, Yiwen Guo, Wangmeng Zuo

Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Machine Learning (cs.LG)
[779] arXiv:2405.19944 (replaced) [pdf, ps, other]: Title: Discrete-Time I&I Adaptive Interconnection and Damping Passivity-Based Control for Nonlinearly Parameterized Port-Controlled Hamiltonian Systems

Authors: Mohammed Alkrunz, Yaprak Yalcin

Comments: 31 pages, 9 figures

Subjects: Systems and Control (eess.SY)
[780] arXiv:2405.20172 (replaced) [pdf, other]: Title: Iterative Feature Boosting for Explainable Speech Emotion Recognition

Authors: Alaa Nfissi, Wassim Bouachir, Nizar Bouguila, Brian Mishara

Comments: Published in: 2023 International Conference on Machine Learning and Applications (ICMLA)

Journal-ref: 2023 International Conference on Machine Learning and Applications (ICMLA), Jacksonville, FL, USA, 2023, pp. 543-549

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[781] arXiv:2405.20250 (replaced) [pdf, ps, other]: Title: Entropy annealing for policy mirror descent in continuous time and space

Authors: Deven Sethi, David Šiška, Yufei Zhang

Subjects: Optimization and Control (math.OC); Machine Learning (cs.LG); Probability (math.PR)
[782] arXiv:2405.20267 (replaced) [pdf, other]: Title: Auto Arena of LLMs: Automating LLM Evaluations with Agent Peer-battles and Committee Discussions

Authors: Ruochen Zhao, Wenxuan Zhang, Yew Ken Chia, Deli Zhao, Lidong Bing

Subjects: Computation and Language (cs.CL)
[783] arXiv:2405.20607 (replaced) [pdf, other]: Title: Textual Inversion and Self-supervised Refinement for Radiology Report Generation

Authors: Yuanjiang Luo, Hongxiang Li, Xuan Wu, Meng Cao, Xiaoshuang Huang, Zhihong Zhu, Peixi Liao, Hu Chen, Yi Zhang

Comments: This paper has been early accepted by MICCAI 2024!

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[784] arXiv:2405.20703 (replaced) [pdf, other]: Title: It is Simple Sometimes: A Study On Improving Aspect-Based Sentiment Analysis Performance

Authors: Laura Cabello, Uchenna Akujuobi

Comments: Accepted to ACL 2024 Findings

Subjects: Computation and Language (cs.CL)
[785] arXiv:2405.20988 (replaced) [pdf, other]: Title: Communication-Efficient Distributed Deep Learning via Federated Dynamic Averaging

Authors: Michail Theologitis, Georgios Frangias, Georgios Anestis, Vasilis Samoladas, Antonios Deligiannakis

Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)
[786] arXiv:2406.00083 (replaced) [pdf, other]: Title: BadRAG: Identifying Vulnerabilities in Retrieval Augmented Generation of Large Language Models

Authors: Jiaqi Xue, Mengxin Zheng, Yebowen Hu, Fei Liu, Xun Chen, Qian Lou

Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Information Retrieval (cs.IR); Machine Learning (cs.LG)
[787] arXiv:2406.00199 (replaced) [pdf, ps, other]: Title: Exfiltration of personal information from ChatGPT via prompt injection

Authors: Gregory Schwartzman

Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computers and Society (cs.CY); Emerging Technologies (cs.ET)
[788] arXiv:2406.00252 (replaced) [pdf, other]: Title: Multi-Modal and Multi-Agent Systems Meet Rationality: A Survey

Authors: Bowen Jiang, Yangxinyu Xie, Xiaomeng Wang, Weijie J. Su, Camillo J. Taylor, Tanwi Mallick

Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Multiagent Systems (cs.MA)
[789] arXiv:2406.00307 (replaced) [pdf, other]: Title: HENASY: Learning to Assemble Scene-Entities for Egocentric Video-Language Model

Authors: Khoa Vo, Thinh Phan, Kashu Yamazaki, Minh Tran, Ngan Le

Comments: under submission

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[790] arXiv:2406.00329 (replaced) [pdf, other]: Title: Whole Heart 3D+T Representation Learning Through Sparse 2D Cardiac MR Images

Authors: Yundi Zhang, Chen Chen, Suprosanna Shit, Sophie Starck, Daniel Rueckert, Jiazhen Pan

Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[791] arXiv:2406.00670 (replaced) [pdf, other]: Title: Cascade-CLIP: Cascaded Vision-Language Embeddings Alignment for Zero-Shot Semantic Segmentation

Authors: Yunheng Li, ZhongYu Li, Quansheng Zeng, Qibin Hou, Ming-Ming Cheng

Comments: Accepted by ICML 2024

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[792] arXiv:2406.00702 (replaced) [pdf, ps, other]: Title: Enhanced Classification of Heart Sounds Using Mel Frequency Cepstral Coefficients: A Comparative Study of Single and Ensemble Classifier Strategies

Authors: Amir Masoud Rahmani, Amir Haider, Parisa Khoshvaght, Mohammad Adeli, Entesar Gemeay, Yazeed Alkhrijah, Mokhtar Mohammadi, Mehdi Hosseinzadeh

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[793] arXiv:2406.00773 (replaced) [pdf, other]: Title: Diffusion Tuning: Transferring Diffusion Models via Chain of Forgetting

Authors: Jincheng Zhong, Xingzhuo Guo, Jiaxiang Dong, Mingsheng Long

Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
[794] arXiv:2406.00907 (replaced) [pdf, other]: Title: DDA: Dimensionality Driven Augmentation Search for Contrastive Learning in Laparoscopic Surgery

Authors: Yuning Zhou, Henry Badgery, Matthew Read, James Bailey, Catherine E. Davey

Comments: 29 pages, 16 figures; MIDL 2024 - Medical Imaging with Deep Learning

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[795] arXiv:2406.01026 (replaced) [pdf, other]: Title: Strengthened Symbol Binding Makes Large Language Models Reliable Multiple-Choice Selectors

Authors: Mengge Xue, Zhenyu Hu, Liqun Liu, Kuo Liao, Shuang Li, Honglin Han, Meng Zhao, Chengguo Yin

Comments: Accept at ACL2024 Main

Journal-ref: ACL 2024

Subjects: Computation and Language (cs.CL)
[796] arXiv:2406.01057 (replaced) [pdf, other]: Title: Knapsack with Vertex Cover, Set Cover, and Hitting Set

Authors: Palash Dey, Ashlesha Hota, Sudeshna Kolay, Sipra Singh

Subjects: Data Structures and Algorithms (cs.DS); Computational Complexity (cs.CC)
[797] arXiv:2406.01133 (replaced) [pdf, ps, other]: Title: Impact of Generative AI (Large Language Models) on the PRA model construction and maintenance, observations

Authors: Valentin Rychkov (EDF R\&D), Claudia Picoco (EDF R\&D), Emilie Caleca (EDF R\&D)

Subjects: Performance (cs.PF)
[798] arXiv:2406.01349 (replaced) [pdf, other]: Title: Unleashing Generalization of End-to-End Autonomous Driving with Controllable Long Video Generation

Authors: Enhui Ma, Lijun Zhou, Tao Tang, Zhan Zhang, Dong Han, Junpeng Jiang, Kun Zhan, Peng Jia, Xianpeng Lang, Haiyang Sun, Di Lin, Kaicheng Yu

Comments: Project Page: this https URL, 8 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[799] arXiv:2406.01392 (replaced) [pdf, other]: Title: Sparsity-Accelerated Training for Large Language Models

Authors: Da Ma, Lu Chen, Pengyu Wang, Hongshen Xu, Hanqi Li, Liangtai Sun, Su Zhu, Shuai Fan, Kai Yu

Comments: Accepted to ACL 2024 Findings

Subjects: Computation and Language (cs.CL)
[800] arXiv:2406.01425 (replaced) [pdf, other]: Title: Sensitivity-Informed Augmentation for Robust Segmentation

Authors: Laura Zheng, Wenjie Wei, Tony Wu, Jacob Clements, Shreelekha Revankar, Andre Harrison, Yu Shen, Ming C. Lin

Comments: 10 pages

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[801] arXiv:2406.01514 (replaced) [pdf, other]: Title: Decoupled Alignment for Robust Plug-and-Play Adaptation

Authors: Haozheng Luo, Jiahao Yu, Wenxin Zhang, Jialong Li, Jerry Yao-Chieh Hu, Xinyu Xing, Han Liu

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)
[802] arXiv:2406.01548 (replaced) [pdf, other]: Title: How to discretize continuous state-action spaces in Q-learning: A symbolic control approach

Authors: Sadek Belamfedel Alaoui, Adnane Saoud

Comments: Q-learning, Symbolic control, Abstraction

Subjects: Systems and Control (eess.SY); Artificial Intelligence (cs.AI); Dynamical Systems (math.DS)
[803] arXiv:2406.01624 (replaced) [pdf, other]: Title: Unveiling Hidden Factors: Explainable AI for Feature Boosting in Speech Emotion Recognition

Authors: Alaa Nfissi, Wassim Bouachir, Nizar Bouguila, Brian Mishara

Comments: Published in: Springer Nature International Journal of Applied Intelligence (2024)

Journal-ref: Applied Intelligence (2024), 1-24

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[804] arXiv:2406.01799 (replaced) [pdf, other]: Title: Online Control in Population Dynamics

Authors: Noah Golowich, Elad Hazan, Zhou Lu, Dhruv Rohatgi, Y. Jennifer Sun

Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
[805] arXiv:2406.01852 (replaced) [pdf, other]: Title: Non-uniformity is All You Need: Efficient and Timely Encrypted Traffic Classification With ECHO

Authors: Shilo Daum, Tal Shapira, Anat Bremler-Barr, David Hay

Subjects: Networking and Internet Architecture (cs.NI); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
[806] arXiv:2406.01900 (replaced) [pdf, other]: Title: Follow-Your-Emoji: Fine-Controllable and Expressive Freestyle Portrait Animation

Authors: Yue Ma, Hongyu Liu, Hongfa Wang, Heng Pan, Yingqing He, Junkun Yuan, Ailing Zeng, Chengfei Cai, Heung-Yeung Shum, Wei Liu, Qifeng Chen

Comments: Project Page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[807] arXiv:2406.01908 (replaced) [pdf, other]: Title: PDHG-Unrolled Learning-to-Optimize Method for Large-Scale Linear Programming

Authors: Bingheng Li, Linxin Yang, Yupeng Chen, Senmiao Wang, Qian Chen, Haitao Mao, Yao Ma, Akang Wang, Tian Ding, Jiliang Tang, Ruoyu Sun

Comments: Accepted by ICML 2024

Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
[808] arXiv:2406.02004 (replaced) [pdf, ps, other]: Title: Efficiently Train ASR Models that Memorize Less and Perform Better with Per-core Clipping

Authors: Lun Wang, Om Thakkar, Zhong Meng, Nicole Rafidi, Rohit Prabhavalkar, Arun Narayanan

Comments: Accepted to Interspeech'24

Subjects: Cryptography and Security (cs.CR); Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[809] arXiv:2406.02061 (replaced) [pdf, other]: Title: Alice in Wonderland: Simple Tasks Showing Complete Reasoning Breakdown in State-Of-the-Art Large Language Models

Authors: Marianna Nezhurina, Lucia Cipolina-Kun, Mehdi Cherti, Jenia Jitsev

Comments: v1.1

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[810] arXiv:2406.02126 (replaced) [pdf, other]: Title: CityLight: A Universal Model Towards Real-world City-scale Traffic Signal Control Coordination

Authors: Jinwei Zeng, Chao Yu, Xinyi Yang, Wenxuan Ao, Jian Yuan, Yong Li, Yu Wang, Huazhong Yang

Subjects: Systems and Control (eess.SY); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multiagent Systems (cs.MA)
[811] arXiv:2406.02169 (replaced) [src]: Title: A multilingual dataset for offensive language and hate speech detection for hausa, yoruba and igbo languages

Authors: Saminu Mohammad Aliyu, Gregory Maksha Wajiga, Muhammad Murtala

Comments: The experimental result was erroneously reported and we also omitted other authors

Subjects: Computation and Language (cs.CL)
[812] arXiv:2406.02265 (replaced) [pdf, other]: Title: Understanding Retrieval Robustness for Retrieval-Augmented Image Captioning

Authors: Wenyan Li, Jiaang Li, Rita Ramos, Raphael Tang, Desmond Elliott

Comments: 9 pages, long paper at ACL 2024

Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
[813] arXiv:2406.02290 (replaced) [pdf, other]: Title: A Study of Optimizations for Fine-tuning Large Language Models

Authors: Arjun Singh, Nikhil Pandey, Anup Shirgaonkar, Pavan Manoj, Vijay Aski

Comments: 10 pages, 4 figures. Revised text for clarity, updated references

Subjects: Machine Learning (cs.LG)
[814] arXiv:2406.02343 (replaced) [pdf, other]: Title: Cluster-Aware Similarity Diffusion for Instance Retrieval

Authors: Jifei Luo, Hantao Yao, Changsheng Xu

Comments: This paper has been accepted by ICML2024

Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
[815] arXiv:2406.02347 (replaced) [pdf, other]: Title: Flash Diffusion: Accelerating Any Conditional Diffusion Model for Few Steps Image Generation

Authors: Clement Chadebec, Onur Tasar, Eyal Benaroche, Benjamin Aubin

Comments: 16 pages + 16 pages appendices

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[816] arXiv:2406.02381 (replaced) [pdf, other]: Title: Kirigami: large convolutional kernels improve deep learning-based RNA secondary structure prediction

Authors: Marc Harary, Chengxin Zhang

Comments: -Updated authorship and acknowledgements

Subjects: Biomolecules (q-bio.BM); Artificial Intelligence (cs.AI)
[817] arXiv:2406.02541 (replaced) [pdf, other]: Title: Enhancing Temporal Consistency in Video Editing by Reconstructing Videos with 3D Gaussian Splatting

Authors: Inkyu Shin, Qihang Yu, Xiaohui Shen, In So Kweon, Kuk-Jin Yoon, Liang-Chieh Chen

Comments: Project page at this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[818] arXiv:2406.02614 (replaced) [pdf, other]: Title: Frequency Enhanced Pre-training for Cross-city Few-shot Traffic Forecasting

Authors: Zhanyu Liu, Jianrong Ding, Guanjie Zheng

Comments: Accepted by ECMLPKDD 2024 (Research Track)

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[819] arXiv:2406.02616 (replaced) [pdf, other]: Title: Adaptive Layer Splitting for Wireless LLM Inference in Edge Computing: A Model-Based Reinforcement Learning Approach

Authors: Yuxuan Chen, Rongpeng Li, Xiaoxue Yu, Zhifeng Zhao, Honggang Zhang

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[820] arXiv:2406.02624 (replaced) [pdf, other]: Title: Take a Step Further: Understanding Page Spray in Linux Kernel Exploitation

Authors: Ziyi Guo, Dang K Le, Zhenpeng Lin, Kyle Zeng, Ruoyu Wang, Tiffany Bao, Yan Shoshitaishvili, Adam Doupé, Xinyu Xing

Subjects: Cryptography and Security (cs.CR); Software Engineering (cs.SE)
[821] arXiv:2406.02749 (replaced) [pdf, other]: Title: Efficient Leverage Score Sampling for Tensor Train Decomposition

Authors: Vivek Bharadwaj, Beheshteh T. Rakhshan, Osman Asif Malik, Guillaume Rabusseau

Subjects: Data Structures and Algorithms (cs.DS)
[822] arXiv:2406.02778 (replaced) [pdf, other]: Title: MS-IMAP -- A Multi-Scale Graph Embedding Approach for Interpretable Manifold Learning

Authors: Shay Deutsch, Lionel Yelibi, Alex Tong Lin, Arjun Ravi Kannan

Subjects: Machine Learning (cs.LG)
[823] arXiv:2406.02847 (replaced) [pdf, other]: Title: Exact Conversion of In-Context Learning to Model Weights in Linearized-Attention Transformers

Authors: Brian K Chen, Tianyang Hu, Hui Jin, Hwee Kuan Lee, Kenji Kawaguchi

Comments: Accepted to ICML 2024

Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[824] arXiv:2406.02875 (replaced) [pdf, other]: Title: Leveraging KANs For Enhanced Deep Koopman Operator Discovery

Authors: George Nehma, Madhur Tiwari

Comments: 6 pages, 4 figures, 2 tables

Subjects: Machine Learning (cs.LG); Dynamical Systems (math.DS); Applied Physics (physics.app-ph); Computational Physics (physics.comp-ph)
[825] arXiv:2406.02876 (replaced) [pdf, other]: Title: LCS: A Language Converter Strategy for Zero-Shot Neural Machine Translation

Authors: Zengkui Sun, Yijin Liu, Fandong Meng, Jinan Xu, Yufeng Chen, Jie Zhou

Comments: ACL2024 Findings, Codes are at this https URL

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[826] arXiv:2406.02881 (replaced) [pdf, other]: Title: Inv-Adapter: ID Customization Generation via Image Inversion and Lightweight Adapter

Authors: Peng Xing, Ning Wang, Jianbo Ouyang, Zechao Li

Comments: technical report

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[827] arXiv:2406.02882 (replaced) [pdf, other]: Title: Outdated Issue Aware Decoding for Factual Knowledge Editing

Authors: Zengkui Sun, Yijin Liu, Jiaan Wang, Fandong Meng, Jinan Xu, Yufeng Chen, Jie Zhou

Comments: ACL2024 Findings, Codes are at this https URL

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[828] arXiv:2406.02886 (replaced) [pdf, other]: Title: PLaD: Preference-based Large Language Model Distillation with Pseudo-Preference Pairs

Authors: Rongzhi Zhang, Jiaming Shen, Tianqi Liu, Haorui Wang, Zhen Qin, Feng Han, Jialu Liu, Simon Baumgartner, Michael Bendersky, Chao Zhang

Comments: Findings of ACL 2024

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[829] arXiv:2406.02887 (replaced) [pdf, other]: Title: USM RNN-T model weights binarization

Authors: Oleg Rybakov, Dmitriy Serdyuk, Chengjian Zheng

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[830] arXiv:2406.02918 (replaced) [pdf, other]: Title: U-KAN Makes Strong Backbone for Medical Image Segmentation and Generation

Authors: Chenxin Li, Xinyu Liu, Wuyang Li, Cheng Wang, Hengyu Liu, Yixuan Yuan

Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
[831] arXiv:2406.02966 (replaced) [pdf, ps, other]: Title: Generative AI and Digital Neocolonialism in Global Education: Towards an Equitable Framework

Authors: Matthew Nyaaba, Alyson Wright, Gyu Lim Choi

Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI)
[832] arXiv:2406.03051 (replaced) [pdf, other]: Title: Adapter-X: A Novel General Parameter-Efficient Fine-Tuning Framework for Vision

Authors: Minglei Li, Peng Ye, Yongqi Huang, Lin Zhang, Tao Chen, Tong He, Jiayuan Fan, Wanli Ouyang

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[833] arXiv:2406.03095 (replaced) [pdf, other]: Title: EgoSurgery-Tool: A Dataset of Surgical Tool and Hand Detection from Egocentric Open Surgery Videos

Authors: Ryo Fujii, Hideo Saito, Hiroki Kajita

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[834] arXiv:2406.03099 (replaced) [pdf, other]: Title: Graph Convolutional Branch and Bound

Authors: Lorenzo Sciandra, Roberto Esposito, Andrea Cesare Grosso, Laura Sacerdote, Cristina Zucca

Comments: Submitted to European Journal of Operational Research

Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
[835] arXiv:2406.03145 (replaced) [pdf, other]: Title: E(n) Equivariant Message Passing Cellular Networks

Authors: Veljko Kovač, Erik J. Bekkers, Pietro Liò, Floor Eijkelboom

Subjects: Machine Learning (cs.LG)
[836] arXiv:2406.03151 (replaced) [pdf, other]: Title: Which Side Are You On? A Multi-task Dataset for End-to-End Argument Summarisation and Evaluation

Authors: Hao Li, Yuping Wu, Viktor Schlegel, Riza Batista-Navarro, Tharindu Madusanka, Iqra Zahid, Jiayan Zeng, Xiaochi Wang, Xinran He, Yizhi Li, Goran Nenadic

Comments: Published on ACL 2024 Findings

Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
[837] arXiv:2406.03154 (replaced) [pdf, other]: Title: Detecting Model Misspecification in Amortized Bayesian Inference with Neural Networks: An Extended Investigation

Authors: Marvin Schmitt, Paul-Christian Bürkner, Ullrich Köthe, Stefan T. Radev

Comments: Extended version of the conference paper this https URL arXiv admin note: text overlap with arXiv:2112.08866

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[838] arXiv:2406.03170 (replaced) [pdf, other]: Title: StatBot.Swiss: Bilingual Open Data Exploration in Natural Language

Authors: Farhad Nooralahzadeh, Yi Zhang, Ellery Smith, Sabine Maennel, Cyril Matthey-Doret, Raphaël de Fondville, Kurt Stockinger

Comments: This work is accepted at ACL Findings 2024

Subjects: Computation and Language (cs.CL)
[839] arXiv:2406.03248 (replaced) [pdf, other]: Title: Large Language Models as Evaluators for Recommendation Explanations

Authors: Xiaoyu Zhang, Yishan Li, Jiayin Wang, Bowen Sun, Weizhi Ma, Peijie Sun, Min Zhang

Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL)
[840] arXiv:2406.03253 (replaced) [pdf, other]: Title: Generating Explanations for Cellular Neural Networks

Authors: Akshit Sinha, Sreeram Vennam, Charu Sharma, Ponnurangam Kumaraguru

Subjects: Machine Learning (cs.LG)
[841] arXiv:2406.03262 (replaced) [pdf, other]: Title: ADer: A Comprehensive Benchmark for Multi-class Visual Anomaly Detection

Authors: Jiangning Zhang, Haoyang He, Zhenye Gan, Qingdong He, Yuxuan Cai, Zhucun Xue, Yabiao Wang, Chengjie Wang, Lei Xie, Yong Liu

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[842] arXiv:2406.03337 (replaced) [pdf, other]: Title: Identifying latent state transition in non-linear dynamical systems

Authors: Çağlar Hızlı, Çağatay Yıldız, Matthias Bethge, ST John, Pekka Marttinen

Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[843] arXiv:2406.03345 (replaced) [pdf, other]: Title: Feature Contamination: Neural Networks Learn Uncorrelated Features and Fail to Generalize

Authors: Tianren Zhang, Chujie Zhao, Guanyu Chen, Yizhou Jiang, Feng Chen

Comments: ICML 2024

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[844] arXiv:2406.03437 (replaced) [pdf, other]: Title: Transfer Learning for Latent Variable Network Models

Authors: Akhil Jalan, Arya Mazumdar, Soumendu Sundar Mukherjee, Purnamrita Sarkar

Subjects: Machine Learning (cs.LG)
[845] arXiv:2406.03452 (replaced) [pdf, other]: Title: Using Synchronic Definitions and Semantic Relations to Classify Semantic Change Types

Authors: Pierluigi Cassotti, Stefano De Pascale, Nina Tahmasebi

Subjects: Computation and Language (cs.CL)
[846] arXiv:2406.03488 (replaced) [pdf, other]: Title: Seq1F1B: Efficient Sequence-Level Pipeline Parallelism for Large Language Model Training

Authors: Ao Sun, Weilin Zhao, Xu Han, Cheng Yang, Zhiyuan Liu, Chuan Shi, Maosong Sun

Comments: 12 pages, 4 figures, 6 tables

Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

New submissions
Cross-lists
Replacements

[ total of 846 entries: 1-500 | 347-846 ]
[ showing 500 entries per page: fewer | more | all ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, recent, 2406, contact, help (Access key information)

> cs

Computer Science

New submissions, skipping first 1000

New submissions for Fri, 7 Jun 24 (continued, showing last 48 of 394 entries)

Cross-lists for Fri, 7 Jun 24

Replacements for Fri, 7 Jun 24