Computer Science
New submissions
[ showing up to 2000 entries per page: fewer | more ]
New submissions for Fri, 29 Mar 24
- [1] arXiv:2403.18825 [pdf, ps, other]
-
Title: Numerical evaluation of code live-load models for estimating the forces caused by actual vehicles that act on bridge substructuresSubjects: Computational Engineering, Finance, and Science (cs.CE)
The present paper assesses the efficacy of code live-load models in accurately estimating the vehicular loads transferred to bridge substructures, such as abutments, piers, and foundations. Realistic traffic vehicle data are represented using four Weigh-in-Motion databases, which provide an authentic representation of vehicle information, thus providing a realistic basis for the examination of the bridges studied. The evaluation includes various bridge models, such as single-span girder bridges and two-, three-, and four-span continuous pinned-support girder bridges. By analyzing exceedance rates, the study compares the extreme force values obtained for vehicles in the databases with those predicted by selected code live-load models. These exceedance rates are presented in spectra format, as a function of the span length. The significant variations observed in the exceedance rates highlight the need for improving existing code live-load models to achieve more accurate estimations of the forces transferred to bridge substructures. Such improvements would lead to more uniform reliability levels for any limit state, such as resistance, fatigue, serviceability, and cracking.
- [2] arXiv:2403.18827 [pdf, other]
-
Title: Bridging Generative Networks with the Common Model of CognitionAuthors: Robert L. West, Spencer Eckler, Brendan Conway-Smith, Nico Turcas, Eilene Tomkins-Flanagan, Mary Alexandria KellySubjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Neurons and Cognition (q-bio.NC)
This article presents a theoretical framework for adapting the Common Model of Cognition to large generative network models within the field of artificial intelligence. This can be accomplished by restructuring modules within the Common Model into shadow production systems that are peripheral to a central production system, which handles higher-level reasoning based on the shadow productions' output. Implementing this novel structure within the Common Model allows for a seamless connection between cognitive architectures and generative neural networks.
- [3] arXiv:2403.18830 [pdf, other]
-
Title: Cloudy with a Chance of Green: Measuring the Predictability of 18,009 Traffic Lights in HamburgComments: Preprint submitted to IEEE IV 2024Subjects: Networking and Internet Architecture (cs.NI)
Informing drivers about the predicted state of upcoming traffic lights is considered a key solution to reduce unneeded energy expenditure and dilemma zones at intersections. However, newer traffic lights can react to traffic demand, resulting in spontaneous switching behavior and poor predictability. To assess whether future traffic light assistance services are viable, it is crucial to understand how strongly predictability is affected by such spontaneous switching behavior. Previous studies have so far only reported percentages of adaptivity-capable traffic lights, but the actual switching behavior has not been measured. Addressing this research gap, we conduct a large-scale predictability evaluation based on 424 million recorded switching cycles over four weeks for 18,009 individual traffic lights in Hamburg. Two characteristics of predictability are studied: cycle discrepancy and wait time diversity. Results indicate that fewer traffic lights exhibit hard-to-predict switching behavior than suggested by previous work, considering a reported number of 90.7% adaptive traffic lights in Hamburg. Contrasting previous work, we find that not all traffic lights capable of adaptiveness may necessarily exhibit low predictability. We critically review these results and derive avenues for future research.
- [4] arXiv:2403.18832 [pdf, other]
-
Title: Rationale Dataset and Analysis for the Commit Messages of the Linux Kernel Out-of-Memory KillerSubjects: Software Engineering (cs.SE)
Code commit messages can contain useful information on why a developer has made a change. However, the presence and structure of rationale in real-world code commit messages is not well studied. Here, we detail the creation of a labelled dataset to analyze the code commit messages of the Linux Kernel Out-Of-Memory Killer component. We study aspects of rationale information, such as presence, temporal evolution, and structure. We find that 98.9% of commits in our dataset contain sentences with rationale information, and that experienced developers report rationale in about 60% of the sentences in their commits. We report on the challenges we faced and provide examples for our labelling.
- [5] arXiv:2403.18835 [pdf, ps, other]
-
Title: Frog-Snake prey-predation Relationship Optimization (FSRO) : A novel nature-inspired metaheuristic algorithm for feature selectionSubjects: Neural and Evolutionary Computing (cs.NE)
Swarm intelligence algorithms have traditionally been designed for continuous optimization problems, and these algorithms have been modified and extended for application to discrete optimization problems. Notably, their application in feature selection for machine learning has demonstrated improvements in model accuracy, reduction of unnecessary data, and decreased computational time. This study proposes the Frog-Snake prey-predation Relationship Optimization (FSRO) algorithm, inspired by the prey-predation relationship between frogs and snakes for application to discrete optimization problems. The algorithm models three stages of a snake's foraging behavior "search", "approach", and "capture" as well as the frog's characteristic behavior of staying still to attract and then escaping. Furthermore, the introduction of the concept of evolutionary game theory enables dynamic control of the search process. The proposed algorithm conducts computational experiments on feature selection using 26 types of machine learning datasets to analyze its performance and identify improvements. In computer experiments, the proposed algorithm showed better performance than the comparison algorithms in terms of the best and standard deviation of fitness value and Accuracy. It was also proved that dynamic search control by evolutionary game theory is an effective method, and the proposed algorithm has the ability of a well-balanced search, achieving the two objectives of improving accuracy and reducing data.
- [6] arXiv:2403.18838 [pdf, ps, other]
-
Title: Unleashing the Power of AI. A Systematic Review of Cutting-Edge Techniques in AI-Enhanced Scientometrics, Webometrics, and BibliometricsComments: to be published in Library High Tech; 30 pages; 80 references; 4 figures; 3 tablesSubjects: Digital Libraries (cs.DL); Artificial Intelligence (cs.AI); Physics and Society (physics.soc-ph)
Purpose: The study aims to analyze the synergy of Artificial Intelligence (AI), with scientometrics, webometrics, and bibliometrics to unlock and to emphasize the potential of the applications and benefits of AI algorithms in these fields.
Design/methodology/approach: By conducting a systematic literature review, our aim is to explore the potential of AI in revolutionizing the methods used to measure and analyze scholarly communication, identify emerging research trends, and evaluate the impact of scientific publications. To achieve this, we implemented a comprehensive search strategy across reputable databases such as ProQuest, IEEE Explore, EBSCO, Web of Science, and Scopus. Our search encompassed articles published from January 1, 2000, to September 2022, resulting in a thorough review of 61 relevant articles.
Findings: (i) Regarding scientometrics, the application of AI yields various distinct advantages, such as conducting analyses of publications, citations, research impact prediction, collaboration, research trend analysis, and knowledge mapping, in a more objective and reliable framework. (ii) In terms of webometrics, AI algorithms are able to enhance web crawling and data collection, web link analysis, web content analysis, social media analysis, web impact analysis, and recommender systems. (iii) Moreover, automation of data collection, analysis of citations, disambiguation of authors, analysis of co-authorship networks, assessment of research impact, text mining, and recommender systems are considered as the potential of AI integration in the field of bibliometrics.
Originality/value: This study covers the particularly new benefits and potential of AI-enhanced scientometrics, webometrics, and bibliometrics to highlight the significant prospects of the synergy of this integration through AI. - [7] arXiv:2403.18841 [pdf, other]
-
Title: Minimal activation with maximal reach: Reachability clouds of bio-inspired slender manipulatorsComments: 13 pages, 4 figuresSubjects: Robotics (cs.RO)
In the field of soft robotics, flexibility, adaptability, and functionality define a new era of robotic systems that can safely deform, reach, and grasp. To optimize the design of soft robotic systems, it is critical to understand their configuration space and full range of motion across a wide variety of design parameters. Here we integrate extreme mechanics and soft robotics to provide quantitative insights into the design of bio-inspired soft slender manipulators using the concept of reachability clouds. For a minimal three-actuator design inspired by the elephant trunk, we establish an efficient and robust reduced-order method to generate reachability clouds of almost half a million points each to visualize the accessible workspace of a wide variety of manipulator designs. We generate an atlas of 256 reachability clouds by systematically varying the key design parameters including the fiber count, revolution, tapering angle, and activation magnitude. Our results demonstrate that reachability clouds not only offer an immediately clear perspective into the inverse problem of control, but also introduce powerful metrics to characterize reachable volumes, unreachable regions, and actuator redundancy to quantify the performance of soft slender robots. Our study provides new insights into the design of soft robotic systems with minimal activation and maximal reach with potential applications in medical robotics, flexible manufacturing, and the autonomous exploration of space.
- [8] arXiv:2403.18843 [pdf, other]
-
Title: JEP-KD: Joint-Embedding Predictive Architecture Based Knowledge Distillation for Visual Speech RecognitionSubjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Visual Speech Recognition (VSR) tasks are generally recognized to have a lower theoretical performance ceiling than Automatic Speech Recognition (ASR), owing to the inherent limitations of conveying semantic information visually. To mitigate this challenge, this paper introduces an advanced knowledge distillation approach using a Joint-Embedding Predictive Architecture (JEPA), named JEP-KD, designed to more effectively utilize audio features during model training. Central to JEP-KD is the inclusion of a generative network within the embedding layer, which enhances the video encoder's capacity for semantic feature extraction and brings it into closer alignment with the audio features from a pre-trained ASR model's encoder. This approach aims to progressively reduce the performance gap between VSR and ASR. Moreover, a comprehensive multimodal, multistage training regimen for the JEP-KD framework is established, bolstering the robustness and efficacy of the training process. Experiment results demonstrate that JEP-KD significantly improves the performance of VSR models and demonstrates versatility across different VSR platforms, indicating its potential for broader application within other multimodal tasks.
- [9] arXiv:2403.18845 [pdf, ps, other]
-
Title: On The Peer Review Reports: Does Size Matter?Comments: arXiv admin note: substantial text overlap with arXiv:2309.02000Journal-ref: Scientometrics (2024)Subjects: Digital Libraries (cs.DL)
Amidst the ever-expanding realm of scientific production and the proliferation of predatory journals, the focus on peer review remains paramount for scientometricians and sociologists of science. Despite this attention, there is a notable scarcity of empirical investigations into the tangible impact of peer review on publication quality. This study aims to address this gap by conducting a comprehensive analysis of how peer review contributes to the quality of scholarly publications, as measured by the citations they receive.Utilizing an adjusted dataset comprising 57,482 publications from Publons to Web of Science and employing the Raking Ratio method, our study reveals intriguing insights. Specifically, our findings shed light on a nuanced relationship between the length of reviewer reports and the subsequent citations received by publications. Through a robust regression analysis, we establish that, beginning from 947 words, the length of reviewer reports is significantly associated with an increase in citations.These results not only confirm the initial hypothesis that longer reports indicate requested improvements, thereby enhancing the quality and visibility of articles, but also underscore the importance of timely and comprehensive reviewer reports. Furthermore, insights from Publons' data suggest that open access to reports can influence reviewer behavior, encouraging more detailed reports.Beyond the scholarly landscape, our findings prompt a reevaluation of the role of reviewers, emphasizing the need to recognize and value this resource-intensive yet underappreciated activity in institutional evaluations. Additionally, the study sounds a cautionary note regarding the challenges faced by peer review in the context of an increasing volume of submissions, potentially compromising the vigilance of peers in swiftly assessing numerous articles.
- [10] arXiv:2403.18846 [pdf, other]
-
Title: The Blind Normalized Stein Variational Gradient Descent-Based Detection for Intelligent Massive Random AccessSubjects: Information Theory (cs.IT); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Signal Processing (eess.SP)
The lack of an efficient preamble detection algorithm remains a challenge for solving preamble collision problems in intelligent massive random access (RA) in practical communication scenarios. To solve this problem, we present a novel early preamble detection scheme based on a maximum likelihood estimation (MLE) model at the first step of the grant-based RA procedure. A novel blind normalized Stein variational gradient descent (SVGD)-based detector is proposed to obtain an approximate solution to the MLE model. First, by exploring the relationship between the Hadamard transform and wavelet transform, a new modified Hadamard transform (MHT) is developed to separate high-frequencies from important components using the second-order derivative filter. Next, to eliminate noise and mitigate the vanishing gradients problem in the SVGD-based detectors, the block MHT layer is designed based on the MHT, scaling layer, soft-thresholding layer, inverse MHT and sparsity penalty. Then, the blind normalized SVGD algorithm is derived to perform preamble detection without prior knowledge of noise power and the number of active devices. The experimental results show the proposed block MHT layer outperforms other transform-based methods in terms of computation costs and denoising performance. Furthermore, with the assistance of the block MHT layer, the proposed blind normalized SVGD algorithm achieves a higher preamble detection accuracy and throughput than other state-of-the-art detection methods.
- [11] arXiv:2403.18855 [pdf, other]
-
Title: Directed Criteria Citation Recommendation and Ranking Through Link PredictionComments: Extended Abstract at the International Conference of AI in Finance (ICAIF '20)Subjects: Social and Information Networks (cs.SI); Information Retrieval (cs.IR); Machine Learning (cs.LG)
We explore link prediction as a proxy for automatically surfacing documents from existing literature that might be topically or contextually relevant to a new document. Our model uses transformer-based graph embeddings to encode the meaning of each document, presented as a node within a citation network. We show that the semantic representations that our model generates can outperform other content-based methods in recommendation and ranking tasks. This provides a holistic approach to exploring citation graphs in domains where it is critical that these documents properly cite each other, so as to minimize the possibility of any inconsistencies
- [12] arXiv:2403.18866 [pdf, other]
-
Title: Graph Bayesian Optimization for Multiplex Influence MaximizationComments: Proceedings of the AAAI Conference on Artificial Intelligence, 2024Subjects: Social and Information Networks (cs.SI); Machine Learning (cs.LG)
Influence maximization (IM) is the problem of identifying a limited number of initial influential users within a social network to maximize the number of influenced users. However, previous research has mostly focused on individual information propagation, neglecting the simultaneous and interactive dissemination of multiple information items. In reality, when users encounter a piece of information, such as a smartphone product, they often associate it with related products in their minds, such as earphones or computers from the same brand. Additionally, information platforms frequently recommend related content to users, amplifying this cascading effect and leading to multiplex influence diffusion.
This paper first formulates the Multiplex Influence Maximization (Multi-IM) problem using multiplex diffusion models with an information association mechanism. In this problem, the seed set is a combination of influential users and information. To effectively manage the combinatorial complexity, we propose Graph Bayesian Optimization for Multi-IM (GBIM). The multiplex diffusion process is thoroughly investigated using a highly effective global kernelized attention message-passing module. This module, in conjunction with Bayesian linear regression (BLR), produces a scalable surrogate model. A data acquisition module incorporating the exploration-exploitation trade-off is developed to optimize the seed set further. Extensive experiments on synthetic and real-world datasets have proven our proposed framework effective. The code is available at https://github.com/zirui-yuan/GBIM. - [13] arXiv:2403.18868 [pdf, other]
-
Title: A recommender network perspective on the informational value of critics and crowdsSubjects: Social and Information Networks (cs.SI)
How do the ratings of critics and amateurs compare and how should they be combined? Previous research has produced mixed results about the first question, while the second remains unanswered. We have created a new, unique dataset, with wine ratings from critics and amateurs, and simulated a recommender system using the k-nearest-neighbor algorithm. We then formalized the advice seeking network spanned by that algorithm and studied people's relative influence. We find that critics are more consistent than amateurs, and thus their advice is more predictive than advice from amateurs. Getting advice from both groups can further boost performance. Our network theoretic approach allows us to identify influential critics, talented amateurs, and the information flow between groups. Our results provide evidence about the informational function of critics, while our framework is broadly applicable and can be leveraged to devise good decision strategies and more transparent recommender systems.
- [14] arXiv:2403.18869 [pdf, other]
-
Title: Efficient Unsupervised Community Search with Pre-trained Graph TransformerSubjects: Social and Information Networks (cs.SI); Databases (cs.DB)
Community search has aroused widespread interest in the past decades. Among existing solutions, the learning-based models exhibit outstanding performance in terms of accuracy by leveraging labels to 1) train the model for community score learning, and 2) select the optimal threshold for community identification. However, labeled data are not always available in real-world scenarios. To address this notable limitation of learning-based models, we propose a pre-trained graph Transformer based community search framework that uses Zero label (i.e., unsupervised), termed TransZero. TransZero has two key phases, i.e., the offline pre-training phase and the online search phase. Specifically, in the offline pretraining phase, we design an efficient and effective community search graph transformer (CSGphormer) to learn node representation. To pre-train CSGphormer without the usage of labels, we introduce two self-supervised losses, i.e., personalization loss and link loss, motivated by the inherent uniqueness of node and graph topology, respectively. In the online search phase, with the representation learned by the pre-trained CSGphormer, we compute the community score without using labels by measuring the similarity of representations between the query nodes and the nodes in the graph. To free the framework from the usage of a label-based threshold, we define a new function named expected score gain to guide the community identification process. Furthermore, we propose two efficient and effective algorithms for the community identification process that run without the usage of labels. Extensive experiments over 10 public datasets illustrate the superior performance of TransZero regarding both accuracy and efficiency.
- [15] arXiv:2403.18870 [pdf, ps, other]
-
Title: SugarcaneNet2024: An Optimized Weighted Average Ensemble Approach of LASSO Regularized Pre-trained Models for Sugarcane Disease ClassificationComments: 32 pages, 11 Figures, 13 TablesSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Sugarcane, a key crop for the world's sugar industry, is prone to several diseases that have a substantial negative influence on both its yield and quality. To effectively manage and implement preventative initiatives, diseases must be detected promptly and accurately. In this study, we present a unique model called sugarcaneNet2024 that outperforms previous methods for automatically and quickly detecting sugarcane disease through leaf image processing. Our proposed model consolidates an optimized weighted average ensemble of seven customized and LASSO-regularized pre-trained models, particularly InceptionV3, InceptionResNetV2, DenseNet201, DenseNet169, Xception, and ResNet152V2. Initially, we added three more dense layers with 0.0001 LASSO regularization, three 30% dropout layers, and three batch normalizations with renorm enabled at the bottom of these pre-trained models to improve the performance. The accuracy of sugarcane leaf disease classification was greatly increased by this addition. Following this, several comparative studies between the average ensemble and individual models were carried out, indicating that the ensemble technique performed better. The average ensemble of all modified pre-trained models produced outstanding outcomes: 100%, 99%, 99%, and 99.45% for f1 score, precision, recall, and accuracy, respectively. Performance was further enhanced by the implementation of an optimized weighted average ensemble technique incorporated with grid search. This optimized sugarcaneNet2024 model performed the best for detecting sugarcane diseases, having achieved accuracy, precision, recall, and F1 score of 99.67%, 100%, 100%, and 100% , respectively.
- [16] arXiv:2403.18871 [pdf, ps, other]
-
Title: Clinical Domain Knowledge-Derived Template Improves Post Hoc AI Explanations in Pneumothorax ClassificationAuthors: Han Yuan, Chuan Hong, Pengtao Jiang, Gangming Zhao, Nguyen Tuan Anh Tran, Xinxing Xu, Yet Yen Yan, Nan LiuSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Background: Pneumothorax is an acute thoracic disease caused by abnormal air collection between the lungs and chest wall. To address the opaqueness often associated with deep learning (DL) models, explainable artificial intelligence (XAI) methods have been introduced to outline regions related to pneumothorax diagnoses made by DL models. However, these explanations sometimes diverge from actual lesion areas, highlighting the need for further improvement. Method: We propose a template-guided approach to incorporate the clinical knowledge of pneumothorax into model explanations generated by XAI methods, thereby enhancing the quality of these explanations. Utilizing one lesion delineation created by radiologists, our approach first generates a template that represents potential areas of pneumothorax occurrence. This template is then superimposed on model explanations to filter out extraneous explanations that fall outside the template's boundaries. To validate its efficacy, we carried out a comparative analysis of three XAI methods with and without our template guidance when explaining two DL models in two real-world datasets. Results: The proposed approach consistently improved baseline XAI methods across twelve benchmark scenarios built on three XAI methods, two DL models, and two datasets. The average incremental percentages, calculated by the performance improvements over the baseline performance, were 97.8% in Intersection over Union (IoU) and 94.1% in Dice Similarity Coefficient (DSC) when comparing model explanations and ground-truth lesion areas. Conclusions: In the context of pneumothorax diagnoses, we proposed a template-guided approach for improving AI explanations. We anticipate that our template guidance will forge a fresh approach to elucidating AI models by integrating clinical domain expertise.
- [17] arXiv:2403.18872 [pdf, other]
-
Title: Targeted Visualization of the Backbone of Encoder LLMsSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Attention based Large Language Models (LLMs) are the state-of-the-art in natural language processing (NLP). The two most common architectures are encoders such as BERT, and decoders like the GPT models. Despite the success of encoder models, on which we focus in this work, they also bear several risks, including issues with bias or their susceptibility for adversarial attacks, signifying the necessity for explainable AI to detect such issues. While there does exist various local explainability methods focusing on the prediction of single inputs, global methods based on dimensionality reduction for classification inspection, which have emerged in other domains and that go further than just using t-SNE in the embedding space, are not widely spread in NLP.
To reduce this gap, we investigate the application of DeepView, a method for visualizing a part of the decision function together with a data set in two dimensions, to the NLP domain. While in previous work, DeepView has been used to inspect deep image classification models, we demonstrate how to apply it to BERT-based NLP classifiers and investigate its usability in this domain, including settings with adversarially perturbed input samples and pre-trained, fine-tuned, and multi-task models. - [18] arXiv:2403.18874 [pdf, other]
-
Title: Neural Attributed Community Search at Billion ScaleSubjects: Social and Information Networks (cs.SI)
Community search has been extensively studied in the past decades. In recent years, there is a growing interest in attributed community search that aims to identify a community based on both the query nodes and query attributes. A set of techniques have been investigated. Though the recent methods based on advanced learning models such as graph neural networks (GNNs) can achieve state-of-the-art performance in terms of accuracy, we notice that 1) they suffer from severe efficiency issues; 2) they directly model community search as a node classification problem and thus cannot make good use of interdependence among different entities in the graph. Motivated by these, in this paper, we propose a new neurAL attrIbuted Community sEarch model for large-scale graphs, termed ALICE. ALICE first extracts a candidate subgraph to reduce the search scope and subsequently predicts the community by the Consistency-aware Net , termed ConNet. Specifically, in the extraction phase, we introduce the density sketch modularity that uses a unified form to combine the strengths of two existing powerful modularities, i.e., classical modularity and density modularity. Based on the new modularity metric, we first adaptively obtain the candidate subgraph, formed by the k-hop neighbors of the query nodes, with the maximum modularity. Then, we construct a node-attribute bipartite graph to take attributes into consideration. After that, ConNet adopts a cross-attention encoder to encode the interaction between the query and the graph. The training of the model is guided by the structure-attribute consistency and the local consistency to achieve better performance. Extensive experiments over 11 real-world datasets including one billion-scale graph demonstrate the superiority of ALICE in terms of accuracy, efficiency, and scalability.
- [19] arXiv:2403.18878 [pdf, other]
-
Title: AIC-UNet: Anatomy-informed Cascaded UNet for Robust Multi-Organ SegmentationSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
Imposing key anatomical features, such as the number of organs, their shapes, sizes, and relative positions, is crucial for building a robust multi-organ segmentation model. Current attempts to incorporate anatomical features include broadening effective receptive fields (ERF) size with resource- and data-intensive modules such as self-attention or introducing organ-specific topology regularizers, which may not scale to multi-organ segmentation problems where inter-organ relation also plays a huge role. We introduce a new approach to impose anatomical constraints on any existing encoder-decoder segmentation model by conditioning model prediction with learnable anatomy prior. More specifically, given an abdominal scan, a part of the encoder spatially warps a learnable prior to align with the given input scan using thin plate spline (TPS) grid interpolation. The warped prior is then integrated during the decoding phase to guide the model for more anatomy-informed predictions. Code is available at \hyperlink{https://anonymous.4open.science/r/AIC-UNet-7048}{https://anonymous.4open.science/r/AIC-UNet-7048}.
- [20] arXiv:2403.18882 [pdf, ps, other]
-
Title: Towards a Cost-Benefit Analysis of Additive Manufacturing as a ServiceComments: In Proceedings of the 14th International Conference on Cloud Computing and Services Science (CLOSER 2024). Angers, FranceSubjects: Other Computer Science (cs.OH)
The landscape of traditional industrial manufacturing is undergoing a pivotal shift from resource-intensive production and long supply chains to more sustainable and regionally focused economies. In this evolving scenario, the move towards local, on-demand manufacturing is emerging as a remedy to the environmentally damaging practice of mass-producing products in distant countries and then transporting them over long distances to customers. This paradigm shift significantly empowers customers, giving them greater control over the manufacturing process by enabling on-demand production and favouring local production sites over traditional mass production and extensive shipping practices. In this position paper we propose a cloud-native Manufacturing as a Service (MaaS) platform that integrates advances in three-dimensional (3D) printing technology into a responsive and eco-conscious manufacturing ecosystem. In this context, we propose a high-level architectural design for a cloud-based MaaS platform that connects web shops of local stores with small and medium-sized enterprises (SMEs) operating 3D printers. Furthermore, we outline an experimental design, including a cost-benefit analysis, to empirically evaluate the operational effectiveness and economic feasibility in a cloud-based additive manufacturing ecosystem. The proposed cloud-based MaaS platform enables on-demand additive manufacturing and opens up a profit sharing opportunity between different stakeholders.
- [21] arXiv:2403.18883 [pdf, other]
-
Title: Towards a Cloud-based Smart Office Solution for Shared Workplace IndividualizationComments: In Proceedings of the 14th International Conference on Cloud Computing and Services Science (CLOSER 2024). Angers, FranceSubjects: Other Computer Science (cs.OH)
In the evolving landscape of workplace dynamics, the shift towards hybrid working models has highlighted inefficiencies in the use of traditional office space and the need for an improved employee experience. In this position paper we propose a Smart Office solution that addresses these challenges by integrating a microservice architecture with Internet of Things (IoT) technologies to provide a flexible, personalized workspace environment. The position paper focuses on the technical implementation of this solution, including the design of a Workplace Environment Index (WEI) to monitor and improve office conditions. By using cloud technology, IoT devices with sensors, and following a user-centred design, the proposed solution shows how Shared Open Workspaces can be transformed into adaptive, efficient environments that support the diverse needs of the modern workforce. This position paper paves the way for future experimentation in real-world office environments to validate the effectiveness of the Smart Office solution and provide insights into its potential to redefine the workplace for improved productivity and employee satisfaction.
- [22] arXiv:2403.18886 [pdf, other]
-
Title: Self-Expansion of Pre-trained Models with Mixture of Adapters for Continual LearningSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Continual learning aims to learn from a stream of continuously arriving data with minimum forgetting of previously learned knowledge. While previous works have explored the effectiveness of leveraging the generalizable knowledge from pre-trained models in continual learning, existing parameter-efficient fine-tuning approaches focus on the use of a predetermined or task-wise set of adapters or prompts. However, these approaches still suffer from forgetting due to task interference on jointly used parameters or restricted flexibility. The reliance on a static model architecture may lead to the allocation of excessive parameters that are not essential or, conversely, inadequate adaptation for downstream tasks, given that the scale and distribution of incoming data are unpredictable in continual learning. We propose Self-Expansion of pre-trained models with Modularized Adaptation (SEMA), a novel fine-tuning approach which automatically decides to reuse or add adapter modules on demand in continual learning, depending on whether drastic distribution shift that could not be handled by existing modules is detected at different representation levels. We design each adapter module to consist of an adapter and a representation descriptor, specifically, implemented as an autoencoder. The representation descriptor functions as a distributional shift indicator during training and triggers adapter expansion. For better usage of the adapters, an expandable weighting router is learned jointly for mixture of adapter outputs. By comparing with vision-transformer-based continual learning adaptation methods, we demonstrate that the proposed framework outperforms the state-of-the-art without memory rehearsal.
- [23] arXiv:2403.18908 [pdf, other]
-
Title: Enhancing Multiple Object Tracking Accuracy via Quantum AnnealingAuthors: Yasuyuki IharaComments: 19pages, 15 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV); Quantum Physics (quant-ph)
Multiple object tracking (MOT), a key task in image recognition, presents a persistent challenge in balancing processing speed and tracking accuracy. This study introduces a novel approach that leverages quantum annealing (QA) to expedite computation speed, while enhancing tracking accuracy through the ensembling of object tracking processes. A method to improve the matching integration process is also proposed. By utilizing the sequential nature of MOT, this study further augments the tracking method via reverse annealing (RA). Experimental validation confirms the maintenance of high accuracy with an annealing time of a mere 3 $\mu$s per tracking process. The proposed method holds significant potential for real-time MOT applications, including traffic flow measurement for urban traffic light control, collision prediction for autonomous robots and vehicles, and management of products mass-produced in factories.
- [24] arXiv:2403.18910 [pdf, other]
-
Title: A Geometric Explanation of the Likelihood OOD Detection ParadoxAuthors: Hamidreza Kamkari, Brendan Leigh Ross, Jesse C. Cresswell, Anthony L. Caterini, Rahul G. Krishnan, Gabriel Loaiza-GanemSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Likelihood-based deep generative models (DGMs) commonly exhibit a puzzling behaviour: when trained on a relatively complex dataset, they assign higher likelihood values to out-of-distribution (OOD) data from simpler sources. Adding to the mystery, OOD samples are never generated by these DGMs despite having higher likelihoods. This two-pronged paradox has yet to be conclusively explained, making likelihood-based OOD detection unreliable. Our primary observation is that high-likelihood regions will not be generated if they contain minimal probability mass. We demonstrate how this seeming contradiction of large densities yet low probability mass can occur around data confined to low-dimensional manifolds. We also show that this scenario can be identified through local intrinsic dimension (LID) estimation, and propose a method for OOD detection which pairs the likelihoods and LID estimates obtained from a pre-trained DGM. Our method can be applied to normalizing flows and score-based diffusion models, and obtains results which match or surpass state-of-the-art OOD detection benchmarks using the same DGM backbones. Our code is available at https://github.com/layer6ai-labs/dgm_ood_detection.
- [25] arXiv:2403.18913 [pdf, other]
-
Title: UniDepth: Universal Monocular Metric Depth EstimationAuthors: Luigi Piccinelli, Yung-Hsu Yang, Christos Sakaridis, Mattia Segu, Siyuan Li, Luc Van Gool, Fisher YuSubjects: Computer Vision and Pattern Recognition (cs.CV)
Accurate monocular metric depth estimation (MMDE) is crucial to solving downstream tasks in 3D perception and modeling. However, the remarkable accuracy of recent MMDE methods is confined to their training domains. These methods fail to generalize to unseen domains even in the presence of moderate domain gaps, which hinders their practical applicability. We propose a new model, UniDepth, capable of reconstructing metric 3D scenes from solely single images across domains. Departing from the existing MMDE methods, UniDepth directly predicts metric 3D points from the input image at inference time without any additional information, striving for a universal and flexible MMDE solution. In particular, UniDepth implements a self-promptable camera module predicting dense camera representation to condition depth features. Our model exploits a pseudo-spherical output representation, which disentangles camera and depth representations. In addition, we propose a geometric invariance loss that promotes the invariance of camera-prompted depth features. Thorough evaluations on ten datasets in a zero-shot regime consistently demonstrate the superior performance of UniDepth, even when compared with methods directly trained on the testing domains. Code and models are available at: https://github.com/lpiccinelli-eth/unidepth
- [26] arXiv:2403.18915 [pdf, other]
-
Title: PLOT-TAL -- Prompt Learning with Optimal Transport for Few-Shot Temporal Action LocalizationComments: Under ReviewSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
This paper introduces a novel approach to temporal action localization (TAL) in few-shot learning. Our work addresses the inherent limitations of conventional single-prompt learning methods that often lead to overfitting due to the inability to generalize across varying contexts in real-world videos. Recognizing the diversity of camera views, backgrounds, and objects in videos, we propose a multi-prompt learning framework enhanced with optimal transport. This design allows the model to learn a set of diverse prompts for each action, capturing general characteristics more effectively and distributing the representation to mitigate the risk of overfitting. Furthermore, by employing optimal transport theory, we efficiently align these prompts with action features, optimizing for a comprehensive representation that adapts to the multifaceted nature of video data. Our experiments demonstrate significant improvements in action localization accuracy and robustness in few-shot settings on the standard challenging datasets of THUMOS-14 and EpicKitchens100, highlighting the efficacy of our multi-prompt optimal transport approach in overcoming the challenges of conventional few-shot TAL methods.
- [27] arXiv:2403.18916 [pdf, ps, other]
-
Title: Modelling the Raft Distributed Consensus Protocol in mCRL2Authors: Parth Bora (Eindhoven University of Technology), Pham Duc Minh (Eindhoven University of Technology), Tim A.C. Willemse (Eindhoven University of Technology)Comments: In Proceedings MARS 2024, arXiv:2403.17862Journal-ref: EPTCS 399, 2024, pp. 7-20Subjects: Logic in Computer Science (cs.LO); Distributed, Parallel, and Cluster Computing (cs.DC); Software Engineering (cs.SE)
The consensus problem is a fundamental problem in distributed systems. It involves a set of actors, or entities, that need to agree on some values or decisions. The Raft algorithm is a solution to the consensus problem that has gained widespread popularity as an easy-to-understand and implement alternative to Lamport's Paxos algorithm. In this paper we discuss a formalisation of the Raft algorithm and its associated correctness properties in the mCRL2 specification language.
- [28] arXiv:2403.18917 [pdf, other]
-
Title: Formal Verification of Consistency for Systems with Redundant ControllersAuthors: Bjarne Johansson (ABB AB, Västerås, Sweden), Bahman Pourvatan (Mälardalen University, Västerås, Sweden), Zahra Moezkarimi (Mälardalen University, Västerås, Sweden), Alessandro Papadopoulos (Mälardalen University, Västerås, Sweden), Marjan Sirjani (Mälardalen University, Västerås, Sweden)Comments: In Proceedings MARS 2024, arXiv:2403.17862Journal-ref: EPTCS 399, 2024, pp. 169-191Subjects: Software Engineering (cs.SE)
A potential problem that may arise in the domain of distributed control systems is the existence of more than one primary controller in redundancy plans that may lead to inconsistency. An algorithm called NRP FD is proposed to solve this issue by prioritizing consistency over availability. In this paper, we demonstrate how by using modeling and formal verification, we discovered an issue in NRP FD where we may have two primary controllers at the same time. We then provide a solution to mitigate the identified issue, thereby enhancing the robustness and reliability of such systems.
- [29] arXiv:2403.18918 [pdf, other]
-
Title: Sliced Online Model Checking for Optimizing the Beam Scheduling Problem in Robotic Radiation TherapyAuthors: Lars Beckers (TUHH), Stefan Gerlach (TUHH), Ole Lübke (TUHH), Alexander Schlaefer (TUHH), Sibylle Schupp (TUHH)Comments: In Proceedings MARS 2024, arXiv:2403.17862Journal-ref: EPTCS 399, 2024, pp. 193-209Subjects: Computational Engineering, Finance, and Science (cs.CE)
In robotic radiation therapy, high-energy photon beams from different directions are directed at a target within the patient. Target motion can be tracked by robotic ultrasound and then compensated by synchronous beam motion. However, moving the beams may result in beams passing through the ultrasound transducer or the robot carrying it. While this can be avoided by pausing the beam delivery, the treatment time would increase. Typically, the beams are delivered in an order which minimizes the robot motion and thereby the overall treatment time. However, this order can be changed, i.e., instead of pausing beams, other feasible beam could be delivered.
We address this problem of dynamically ordering the beams by applying a model checking paradigm to select feasible beams. Since breathing patterns are complex and change rapidly, any offline model would be too imprecise. Thus, model checking must be conducted online, predicting the patient's current breathing pattern for a short amount of time and checking which beams can be delivered safely. Monitoring the treatment delivery online provides the option to reschedule beams dynamically in order to avoid pausing and hence to reduce treatment time.
While human breathing patterns are complex and may change rapidly, we need a model which can be verified quickly and use approximation by a superposition of sine curves. Further, we simplify the 3D breathing motion into separate 1D models. We compensate the simplification by adding noise inside the model itself. In turn, we synchronize between the multiple models representing the different spatial directions, the treatment simulation, and corresponding verification queries.
Our preliminary results show a 16.02 % to 37.21 % mean improvement on the idle time compared to a static beam schedule, depending on an additional safety margin. Note that an additional safety margin around the ultrasound robot can decrease idle times but also compromises plan quality by limiting the range of available beam directions. In contrast, the approach using online model checking maintains the plan quality. Further, we compare to a naive machine learning approach that does not achieve its goals while being harder to reason about. - [30] arXiv:2403.18920 [pdf, other]
-
Title: CPR: Retrieval Augmented Generation for Copyright ProtectionAuthors: Aditya Golatkar, Alessandro Achille, Luca Zancato, Yu-Xiang Wang, Ashwin Swaminathan, Stefano SoattoComments: CVPR 2024Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Retrieval Augmented Generation (RAG) is emerging as a flexible and robust technique to adapt models to private users data without training, to handle credit attribution, and to allow efficient machine unlearning at scale. However, RAG techniques for image generation may lead to parts of the retrieved samples being copied in the model's output. To reduce risks of leaking private information contained in the retrieved set, we introduce Copy-Protected generation with Retrieval (CPR), a new method for RAG with strong copyright protection guarantees in a mixed-private setting for diffusion models.CPR allows to condition the output of diffusion models on a set of retrieved images, while also guaranteeing that unique identifiable information about those example is not exposed in the generated outputs. In particular, it does so by sampling from a mixture of public (safe) distribution and private (user) distribution by merging their diffusion scores at inference. We prove that CPR satisfies Near Access Freeness (NAF) which bounds the amount of information an attacker may be able to extract from the generated images. We provide two algorithms for copyright protection, CPR-KL and CPR-Choose. Unlike previously proposed rejection-sampling-based NAF methods, our methods enable efficient copyright-protected sampling with a single run of backward diffusion. We show that our method can be applied to any pre-trained conditional diffusion model, such as Stable Diffusion or unCLIP. In particular, we empirically show that applying CPR on top of unCLIP improves quality and text-to-image alignment of the generated results (81.4 to 83.17 on TIFA benchmark), while enabling credit attribution, copy-right protection, and deterministic, constant time, unlearning.
- [31] arXiv:2403.18921 [pdf, other]
-
Title: SMOF: Streaming Modern CNNs on FPGAs with Smart Off-Chip EvictionComments: 12 pages, 8 figures, 5 tablesSubjects: Hardware Architecture (cs.AR); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Convolutional Neural Networks (CNNs) have demonstrated their effectiveness in numerous vision tasks. However, their high processing requirements necessitate efficient hardware acceleration to meet the application's performance targets. In the space of FPGAs, streaming-based dataflow architectures are often adopted by users, as significant performance gains can be achieved through layer-wise pipelining and reduced off-chip memory access by retaining data on-chip. However, modern topologies, such as the UNet, YOLO, and X3D models, utilise long skip connections, requiring significant on-chip storage and thus limiting the performance achieved by such system architectures. The paper addresses the above limitation by introducing weight and activation eviction mechanisms to off-chip memory along the computational pipeline, taking into account the available compute and memory resources. The proposed mechanism is incorporated into an existing toolflow, expanding the design space by utilising off-chip memory as a buffer. This enables the mapping of such modern CNNs to devices with limited on-chip memory, under the streaming architecture design approach. SMOF has demonstrated the capacity to deliver competitive and, in some cases, state-of-the-art performance across a spectrum of computer vision tasks, achieving up to 10.65 X throughput improvement compared to previous works.
- [32] arXiv:2403.18922 [pdf, other]
-
Title: Lift3D: Zero-Shot Lifting of Any 2D Vision Model to 3DComments: Computer Vision and Pattern Recognition Conference (CVPR), 2024Subjects: Computer Vision and Pattern Recognition (cs.CV)
In recent years, there has been an explosion of 2D vision models for numerous tasks such as semantic segmentation, style transfer or scene editing, enabled by large-scale 2D image datasets. At the same time, there has been renewed interest in 3D scene representations such as neural radiance fields from multi-view images. However, the availability of 3D or multiview data is still substantially limited compared to 2D image datasets, making extending 2D vision models to 3D data highly desirable but also very challenging. Indeed, extending a single 2D vision operator like scene editing to 3D typically requires a highly creative method specialized to that task and often requires per-scene optimization. In this paper, we ask the question of whether any 2D vision model can be lifted to make 3D consistent predictions. We answer this question in the affirmative; our new Lift3D method trains to predict unseen views on feature spaces generated by a few visual models (i.e. DINO and CLIP), but then generalizes to novel vision operators and tasks, such as style transfer, super-resolution, open vocabulary segmentation and image colorization; for some of these tasks, there is no comparable previous 3D method. In many cases, we even outperform state-of-the-art methods specialized for the task in question. Moreover, Lift3D is a zero-shot method, in the sense that it requires no task-specific training, nor scene-specific optimization.
- [33] arXiv:2403.18923 [pdf, other]
-
Title: Nature-Guided Cognitive Evolution for Predicting Dissolved Oxygen Concentrations in North Temperate LakesSubjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Predicting dissolved oxygen (DO) concentrations in north temperate lakes requires a comprehensive study of phenological patterns across various ecosystems, which highlights the significance of selecting phenological features and feature interactions. Process-based models are limited by partial process knowledge or oversimplified feature representations, while machine learning models face challenges in efficiently selecting relevant feature interactions for different lake types and tasks, especially under the infrequent nature of DO data collection. In this paper, we propose a Nature-Guided Cognitive Evolution (NGCE) strategy, which represents a multi-level fusion of adaptive learning with natural processes. Specifically, we utilize metabolic process-based models to generate simulated DO labels. Using these simulated labels, we implement a multi-population cognitive evolutionary search, where models, mirroring natural organisms, adaptively evolve to select relevant feature interactions within populations for different lake types and tasks. These models are not only capable of undergoing crossover and mutation mechanisms within intra-populations but also, albeit infrequently, engage in inter-population crossover. The second stage involves refining these models by retraining them with real observed labels. We have tested the performance of our NGCE strategy in predicting daily DO concentrations across a wide range of lakes in the Midwest, USA. These lakes, varying in size, depth, and trophic status, represent a broad spectrum of north temperate lakes. Our findings demonstrate that NGCE not only produces accurate predictions with few observed labels but also, through gene maps of models, reveals sophisticated phenological patterns of different lakes.
- [34] arXiv:2403.18926 [pdf, other]
-
Title: Enhancing Efficiency in Sparse Models with Sparser SelectionSubjects: Machine Learning (cs.LG); Computation and Language (cs.CL)
Sparse models, including sparse Mixture-of-Experts (MoE) models, have emerged as an effective approach for scaling Transformer models. However, they often suffer from computational inefficiency since a significant number of parameters are unnecessarily involved in computations via multiplying values by zero or low activation values. To address this issue, we present \tool, a novel MoE designed to enhance both the efficacy and efficiency of sparse MoE models. \tool leverages small experts and a threshold-based router to enable tokens to selectively engage only essential parameters. Our extensive experiments on language modeling and machine translation tasks demonstrate that \tool can enhance model performance while decreasing the computation load at MoE layers by over 50\% without sacrificing performance. Furthermore, we present the versatility of \tool by applying it to dense models, enabling sparse computation during inference. We provide a comprehensive analysis and make our code available at https://anonymous.4open.science/r/XMoE.
- [35] arXiv:2403.18929 [pdf, other]
-
Title: A Review of Neuroscience-Inspired Machine LearningComments: 13 Pages, 1 figureSubjects: Neural and Evolutionary Computing (cs.NE); Machine Learning (cs.LG)
One major criticism of deep learning centers around the biological implausibility of the credit assignment schema used for learning -- backpropagation of errors. This implausibility translates into practical limitations, spanning scientific fields, including incompatibility with hardware and non-differentiable implementations, thus leading to expensive energy requirements. In contrast, biologically plausible credit assignment is compatible with practically any learning condition and is energy-efficient. As a result, it accommodates hardware and scientific modeling, e.g. learning with physical systems and non-differentiable behavior. Furthermore, it can lead to the development of real-time, adaptive neuromorphic processing systems. In addressing this problem, an interdisciplinary branch of artificial intelligence research that lies at the intersection of neuroscience, cognitive science, and machine learning has emerged. In this paper, we survey several vital algorithms that model bio-plausible rules of credit assignment in artificial neural networks, discussing the solutions they provide for different scientific fields as well as their advantages on CPUs, GPUs, and novel implementations of neuromorphic hardware. We conclude by discussing the future challenges that will need to be addressed in order to make such algorithms more useful in practical applications.
- [36] arXiv:2403.18930 [pdf, other]
-
Title: Optimizing Wireless Networks with Deep Unfolding: Comparative Study on Two Deep Unfolding MechanismsComments: 11 pages, 13 figuresSubjects: Networking and Internet Architecture (cs.NI); Machine Learning (cs.LG)
In this work, we conduct a comparative study on two deep unfolding mechanisms to efficiently perform power control in the next generation wireless networks. The power control problem is formulated as energy efficiency over multiple interference links. The problem is nonconvex. We employ fractional programming transformation to design two solutions for the problem. The first solution is a numerical solution while the second solution is a closed-form solution. Based on the first solution, we design a semi-unfolding deep learning model where we combine the domain knowledge of the wireless communications and the recent advances in the data-driven deep learning. Moreover, on the highlights of the closed-form solution, fully deep unfolded deep learning model is designed in which we fully leveraged the expressive closed-form power control solution and deep learning advances. In the simulation results, we compare the performance of the proposed deep learning models and the iterative solutions in terms of accuracy and inference speed to show their suitability for the real-time application in next generation networks.
- [37] arXiv:2403.18932 [pdf, other]
-
Title: Measuring Political Bias in Large Language Models: What Is Said and How It Is SaidComments: 16 pagesSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
We propose to measure political bias in LLMs by analyzing both the content and style of their generated content regarding political issues. Existing benchmarks and measures focus on gender and racial biases. However, political bias exists in LLMs and can lead to polarization and other harms in downstream applications. In order to provide transparency to users, we advocate that there should be fine-grained and explainable measures of political biases generated by LLMs. Our proposed measure looks at different political issues such as reproductive rights and climate change, at both the content (the substance of the generation) and the style (the lexical polarity) of such bias. We measured the political bias in eleven open-sourced LLMs and showed that our proposed framework is easily scalable to other topics and is explainable.
- [38] arXiv:2403.18933 [pdf, other]
-
Title: SemEval Task 1: Semantic Textual Relatedness for African and Asian LanguagesAuthors: Nedjma Ousidhoum, Shamsuddeen Hassan Muhammad, Mohamed Abdalla, Idris Abdulmumin, Ibrahim Said Ahmad, Sanchit Ahuja, Alham Fikri Aji, Vladimir Araujo, Meriem Beloucif, Christine De Kock, Oumaima Hourrane, Manish Shrivastava, Thamar Solorio, Nirmal Surange, Krishnapriya Vishnubhotla, Seid Muhie Yimam, Saif M. MohammadComments: SemEval 2024 Task Description Paper. arXiv admin note: text overlap with arXiv:2402.08638Subjects: Computation and Language (cs.CL)
We present the first shared task on Semantic Textual Relatedness (STR). While earlier shared tasks primarily focused on semantic similarity, we instead investigate the broader phenomenon of semantic relatedness across 14 languages: Afrikaans, Algerian Arabic, Amharic, English, Hausa, Hindi, Indonesian, Kinyarwanda, Marathi, Moroccan Arabic, Modern Standard Arabic, Punjabi, Spanish, and Telugu. These languages originate from five distinct language families and are predominantly spoken in Africa and Asia -- regions characterised by the relatively limited availability of NLP resources. Each instance in the datasets is a sentence pair associated with a score that represents the degree of semantic textual relatedness between the two sentences. Participating systems were asked to rank sentence pairs by their closeness in meaning (i.e., their degree of semantic relatedness) in the 14 languages in three main tracks: (a) supervised, (b) unsupervised, and (c) crosslingual. The task attracted 163 participants. We received 70 submissions in total (across all tasks) from 51 different teams, and 38 system description papers. We report on the best-performing systems as well as the most common and the most effective approaches for the three different tracks.
- [39] arXiv:2403.18935 [pdf, ps, other]
-
Title: On the Semantic Security in the General Bounded Storage Model: A New ProofComments: 24 pagesSubjects: Information Theory (cs.IT)
In the bounded storage model introduced by Maurer, the adversary is computationally unbounded and has a bounded storage capacity. In this model, information-theoretic secrecy is guaranteed by using a publicly available random string whose length is larger than the adversary storage capacity. The protocol proposed by Maurer is simple, from the perspective of implementation, and efficient, from the perspective of the initial secret key size and random string length. However, he provided the proof of the security for the case where the adversary can access a constant fraction of the random string and store only original bits of the random string. In this paper, we provide a new proof of the security of the protocol proposed by Maurer for the general bounded storage model, i.e., the adversary can access all bits of the random string, and store the output of any Boolean function on the string. We reaffirm that the protocol is absolutely semantically secure in the general bounded storage model.
- [40] arXiv:2403.18938 [pdf, ps, other]
-
Title: Reshaping Free-Text Radiology Notes Into Structured Reports With Generative TransformersAuthors: Laura Bergomi, Tommaso M. Buonocore, Paolo Antonazzo, Lorenzo Alberghi, Riccardo Bellazzi, Lorenzo Preda, Chandra Bortolotto, Enea ParimbelliSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
BACKGROUND: Radiology reports are typically written in a free-text format, making clinical information difficult to extract and use. Recently the adoption of structured reporting (SR) has been recommended by various medical societies thanks to the advantages it offers, e.g. standardization, completeness and information retrieval. We propose a pipeline to extract information from free-text radiology reports, that fits with the items of the reference SR registry proposed by a national society of interventional and medical radiology, focusing on CT staging of patients with lymphoma. METHODS: Our work aims to leverage the potential of Natural Language Processing (NLP) and Transformer-based models to deal with automatic SR registry filling. With the availability of 174 radiology reports, we investigate a rule-free generative Question Answering approach based on a domain-specific version of T5 (IT5). Two strategies (batch-truncation and ex-post combination) are implemented to comply with the model's context length limitations. Performance is evaluated in terms of strict accuracy, F1, and format accuracy, and compared with the widely used GPT-3.5 Large Language Model. A 5-point Likert scale questionnaire is used to collect human-expert feedback on the similarity between medical annotations and generated answers. RESULTS: The combination of fine-tuning and batch splitting allows IT5 to achieve notable results; it performs on par with GPT-3.5 albeit its size being a thousand times smaller in terms of parameters. Human-based assessment scores show a high correlation (Spearman's correlation coefficients>0.88, p-values<0.001) with AI performance metrics (F1) and confirm the superior ability of LLMs (i.e., GPT-3.5, 175B of parameters) in generating plausible human-like statements.
- [41] arXiv:2403.18946 [pdf, other]
-
Title: Random Aggregate Beamforming for Over-the-Air Federated Learning in Large-Scale NetworksComments: 30 pages, 11 figuresSubjects: Information Theory (cs.IT); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
At present, there is a trend to deploy ubiquitous artificial intelligence (AI) applications at the edge of the network. As a promising framework that enables secure edge intelligence, federated learning (FL) has received widespread attention, and over-the-air computing (AirComp) has been integrated to further improve the communication efficiency. In this paper, we consider a joint device selection and aggregate beamforming design with the objectives of minimizing the aggregate error and maximizing the number of selected devices. This yields a combinatorial problem, which is difficult to solve especially in large-scale networks. To tackle the problems in a cost-effective manner, we propose a random aggregate beamforming-based scheme, which generates the aggregator beamforming vector via random sampling rather than optimization. The implementation of the proposed scheme does not require the channel estimation. We additionally use asymptotic analysis to study the obtained aggregate error and the number of the selected devices when the number of devices becomes large. Furthermore, a refined method that runs with multiple randomizations is also proposed for performance improvement. Extensive simulation results are presented to demonstrate the effectiveness of the proposed random aggregate beamforming-based scheme as well as the refined method.
- [42] arXiv:2403.18947 [pdf, other]
-
Title: Self-Supervised Interpretable Sensorimotor Learning via Latent Functional ModularityComments: 10 pages, 6 figures. Accepted for an oral presentation at the AAAI 2024 Workshop on Explainable AI Approaches for Deep Reinforcement LearningSubjects: Machine Learning (cs.LG); Robotics (cs.RO)
We introduce MoNet, a novel method that combines end-to-end learning with modular network architectures for self-supervised and interpretable sensorimotor learning. MoNet is composed of three functionally distinct neural modules: Perception, Planning, and Control. Leveraging its inherent modularity through a cognition-guided contrastive loss function, MoNet efficiently learns task-specific decision-making processes in latent space, without requiring task-level supervision. Moreover, our method incorporates an online post-hoc explainability approach, which enhances the interpretability of the end-to-end inferences without a trade-off in sensorimotor performance. In real-world indoor environments, MoNet demonstrates effective visual autonomous navigation, surpassing baseline models by 11% to 47% in task specificity analysis. We further delve into the interpretability of our network through the post-hoc analysis of perceptual saliency maps and latent decision vectors. This offers insights into the incorporation of explainable artificial intelligence within the realm of robotic learning, encompassing both perceptual and behavioral perspectives.
- [43] arXiv:2403.18949 [pdf, ps, other]
-
Title: An IoT Based Water-Logging Detection System: A Case Study of DhakaAuthors: Md Manirul Islam, Md. Sadad Mahamud, Umme Salsabil, A.A.M. Mazharul Amin, Samiul Haque SumanComments: Global Conference on Technology and Information ManagementSubjects: Other Computer Science (cs.OH)
With a large number of populations, many problems are rising rapidly in Dhaka, the capital city of Bangladesh. Water-logging is one of the major issues among them. Heavy rainfall, lack of awareness and poor maintenance causes bad sewerage system in the city. As a result, water is overflowed on the roads and sometimes it gets mixed with the drinking water. To overcome this problem, this paper realizes the potential of using Internet of Things to combat water-logging in drainage pipes which are used to move wastes as well as rainwater away from the city. The proposed system will continuously monitor real time water level, water flow and gas level inside the drainage pipe. Moreover, all the monitoring data will be stored in the central database for graphical representation and further analysis. In addition to that if any emergency arises in the drainage system, an alert will be sent directly to the nearest maintenance office.
- [44] arXiv:2403.18953 [pdf, ps, other]
-
Title: Hybridizing Traditional and Next-Generation Reservoir Computing to Accurately and Efficiently Forecast Dynamical SystemsComments: 10 pages, 7 figuresSubjects: Machine Learning (cs.LG)
Reservoir computers (RCs) are powerful machine learning architectures for time series prediction. Recently, next generation reservoir computers (NGRCs) have been introduced, offering distinct advantages over RCs, such as reduced computational expense and lower data requirements. However, NGRCs have their own practical difficulties distinct from those of RCs, including sensitivity to sampling time and type of nonlinearities in the data. Here, we introduce a hybrid RC-NGRC approach for time series forecasting of complex and chaotic dynamical systems. We show that our hybrid approach can produce accurate short term predictions and capture the long term statistics of dynamical systems in situations where the RC and NGRC components alone are insufficient. The advantage of the hybrid RC-NGRC approach is most pronounced when both components are limited in their prediction capabilities, e.g. for a small RC and a large sampling time in the training data. Under these conditions, we show for several chaotic systems that the hybrid RC-NGRC method with a small reservoir ($N \approx 100$) can achieve prediction performance rivaling that of a pure RC with a much larger reservoir ($N \approx 1000$), illustrating that the hybrid approach offers significant gains in computational efficiency over traditional RCs while simultaneously addressing some of the limitations of NGRCs.
- [45] arXiv:2403.18955 [pdf, other]
-
Title: Structurally Prune Anything: Any Architecture, Any Framework, Any TimeSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Neural network pruning serves as a critical technique for enhancing the efficiency of deep learning models. Unlike unstructured pruning, which only sets specific parameters to zero, structured pruning eliminates entire channels, thus yielding direct computational and storage benefits. However, the diverse patterns for coupling parameters, such as residual connections and group convolutions, the diverse deep learning frameworks, and the various time stages at which pruning can be performed make existing pruning methods less adaptable to different architectures, frameworks, and pruning criteria. To address this, we introduce Structurally Prune Anything (SPA), a versatile structured pruning framework that can prune neural networks with any architecture, from any framework, and at any stage of training. SPA leverages a standardized computational graph and ONNX representation to prune diverse neural network architectures without the need for manual intervention. SPA employs a group-level importance estimation method, which groups dependent computational operators, estimates their importance, and prunes unimportant coupled channels. This enables the transfer of various existing pruning criteria into a structured group style. As a result, SPA supports pruning at any time, either before training, after training with fine-tuning, or after training without fine-tuning. In the context of the latter, we introduce Optimal Brain SPA (OBSPA), an algorithm that achieves state-of-the-art pruning results needing neither fine-tuning nor calibration data. In extensive experiments, SPA shows competitive to state-of-the-art pruning performance across various architectures, from popular frameworks, at different pruning times.
- [46] arXiv:2403.18957 [pdf, other]
-
Title: Moderating Illicit Online Image Promotion for Unsafe User-Generated Content Games Using Large Vision-Language ModelsAuthors: Keyan Guo, Ayush Utkarsh, Wenbo Ding, Isabelle Ondracek, Ziming Zhao, Guo Freeman, Nishant Vishwamitra, Hongxin HuComments: To Appear in the 33rd USENIX Security Symposium, August 14-16, 2024Subjects: Computers and Society (cs.CY); Computation and Language (cs.CL); Machine Learning (cs.LG); Social and Information Networks (cs.SI)
Online user-generated content games (UGCGs) are increasingly popular among children and adolescents for social interaction and more creative online entertainment. However, they pose a heightened risk of exposure to explicit content, raising growing concerns for the online safety of children and adolescents. Despite these concerns, few studies have addressed the issue of illicit image-based promotions of unsafe UGCGs on social media, which can inadvertently attract young users. This challenge arises from the difficulty of obtaining comprehensive training data for UGCG images and the unique nature of these images, which differ from traditional unsafe content. In this work, we take the first step towards studying the threat of illicit promotions of unsafe UGCGs. We collect a real-world dataset comprising 2,924 images that display diverse sexually explicit and violent content used to promote UGCGs by their game creators. Our in-depth studies reveal a new understanding of this problem and the urgent need for automatically flagging illicit UGCG promotions. We additionally create a cutting-edge system, UGCG-Guard, designed to aid social media platforms in effectively identifying images used for illicit UGCG promotions. This system leverages recently introduced large vision-language models (VLMs) and employs a novel conditional prompting strategy for zero-shot domain adaptation, along with chain-of-thought (CoT) reasoning for contextual identification. UGCG-Guard achieves outstanding results, with an accuracy rate of 94% in detecting these images used for the illicit promotion of such games in real-world scenarios.
- [47] arXiv:2403.18958 [pdf, other]
-
Title: A State-of-the-practice Release-readiness Checklist for Generative AI-based Software ProductsSubjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI)
This paper investigates the complexities of integrating Large Language Models (LLMs) into software products, with a focus on the challenges encountered for determining their readiness for release. Our systematic review of grey literature identifies common challenges in deploying LLMs, ranging from pre-training and fine-tuning to user experience considerations. The study introduces a comprehensive checklist designed to guide practitioners in evaluating key release readiness aspects such as performance, monitoring, and deployment strategies, aiming to enhance the reliability and effectiveness of LLM-based applications in real-world settings.
- [48] arXiv:2403.18960 [pdf, other]
-
Title: Robust In-Hand Manipulation with Extrinsic ContactsComments: Accepted at ICRA 24Subjects: Robotics (cs.RO)
We present in-hand manipulation tasks where a robot moves an object in grasp, maintains its external contact mode with the environment, and adjusts its in-hand pose simultaneously. The proposed manipulation task leads to complex contact interactions which can be very susceptible to uncertainties in kinematic and physical parameters. Therefore, we propose a robust in-hand manipulation method, which consists of two parts. First, an in-gripper mechanics model that computes a na\"ive motion cone assuming all parameters are precise. Then, a robust planning method refines the motion cone to maintain desired contact mode regardless of parametric errors. Real-world experiments were conducted to illustrate the accuracy of the mechanics model and the effectiveness of the robust planning framework in the presence of kinematics parameter errors.
- [49] arXiv:2403.18962 [pdf, other]
-
Title: High Recall, Small Data: The Challenges of Within-System Evaluation in a Live Legal Search SystemSubjects: Information Retrieval (cs.IR)
This paper illustrates some challenges of common ranking evaluation methods for legal information retrieval (IR). We show these challenges with log data from a live legal search system and two user studies. We provide an overview of aspects of legal IR, and the implications of these aspects for the expected challenges of common evaluation methods: test collections based on explicit and implicit feedback, user surveys, and A/B testing. Next, we illustrate the challenges of common evaluation methods using data from a live, commercial, legal search engine. We specifically focus on methods for monitoring the effectiveness of (continuous) changes to document ranking by a single IR system over time. We show how the combination of characteristics in legal IR systems and limited user data can lead to challenges that cause the common evaluation methods discussed to be sub-optimal. In our future work we will therefore focus on less common evaluation methods, such as cost-based evaluation models.
- [50] arXiv:2403.18965 [pdf, other]
-
Title: LORD: Large Models based Opposite Reward Design for Autonomous DrivingSubjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Reinforcement learning (RL) based autonomous driving has emerged as a promising alternative to data-driven imitation learning approaches. However, crafting effective reward functions for RL poses challenges due to the complexity of defining and quantifying good driving behaviors across diverse scenarios. Recently, large pretrained models have gained significant attention as zero-shot reward models for tasks specified with desired linguistic goals. However, the desired linguistic goals for autonomous driving such as "drive safely" are ambiguous and incomprehensible by pretrained models. On the other hand, undesired linguistic goals like "collision" are more concrete and tractable. In this work, we introduce LORD, a novel large models based opposite reward design through undesired linguistic goals to enable the efficient use of large pretrained models as zero-shot reward models. Through extensive experiments, our proposed framework shows its efficiency in leveraging the power of large pretrained models for achieving safe and enhanced autonomous driving. Moreover, the proposed approach shows improved generalization capabilities as it outperforms counterpart methods across diverse and challenging driving scenarios.
- [51] arXiv:2403.18969 [pdf, other]
-
Title: A Survey on Large Language Models from Concept to ImplementationComments: Preprint in ArXiv template, total 24 pages, 5 figures, 5 tablesSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Theory (cs.IT); Machine Learning (cs.LG)
Recent advancements in Large Language Models (LLMs), particularly those built on Transformer architectures, have significantly broadened the scope of natural language processing (NLP) applications, transcending their initial use in chatbot technology. This paper investigates the multifaceted applications of these models, with an emphasis on the GPT series. This exploration focuses on the transformative impact of artificial intelligence (AI) driven tools in revolutionizing traditional tasks like coding and problem-solving, while also paving new paths in research and development across diverse industries. From code interpretation and image captioning to facilitating the construction of interactive systems and advancing computational domains, Transformer models exemplify a synergy of deep learning, data analysis, and neural network design. This survey provides an in-depth look at the latest research in Transformer models, highlighting their versatility and the potential they hold for transforming diverse application sectors, thereby offering readers a comprehensive understanding of the current and future landscape of Transformer-based LLMs in practical applications.
- [52] arXiv:2403.18970 [pdf, other]
-
Title: Two-level overlapping Schwarz preconditioners with universal coarse spaces for $2m$th-order elliptic problemsAuthors: Jongho ParkComments: 20 pages, 7 figuresSubjects: Numerical Analysis (math.NA)
We propose a novel universal construction of two-level overlapping Schwarz preconditioners for $2m$th-order elliptic boundary value problems, where $m$ is a positive integer. The word "universal" here signifies that the coarse space construction can be applied to any finite element discretization for any $m$ that satisfies some common assumptions. We present numerical results for conforming, nonconforming, and discontinuous Galerkin-type finite element discretizations for high-order problems to demonstrate the scalability of the proposed two-level overlapping Schwarz preconditioners.
- [53] arXiv:2403.18971 [pdf, other]
-
Title: Concurrent level set and fiber orientation optimization of composite structuresSubjects: Numerical Analysis (math.NA)
By adjusting both the structural shape and fiber orientation, this research aims to optimize the design of Fiber Reinforced Composite structures. The structural geometry is represented by a level set function, which is approximated by quadratic B-spline functions. The fiber orientation field is parameterized with quadratic/cubic B-splines on hierarchically refined meshes. Different levels for B-spline mesh refinement for the level set and fiber orientation fields are studied to obtain a smooth fiber layout. To facilitate FRC manufacturing, the parallel alignment, and smoothness of fiber paths are enforced by introducing penalty terms referred to as "misalignment penalty and curvature penalty", which are incorporated into the optimization process. A geometric interpretation of the penalties is provided. The material behavior of the FRCs is modeled by the Mori-Tanaka homogenization scheme and the macroscopic structure response is modeled by linear elasticity under static mutiloading conditions. The Governing equations are discretized by a Heaviside-enriched eXtended IsoGeometric Analysis to avoid the need to generate conformal meshes. Instabilities in XIGA are mitigated by the facet-oriented ghost stabilization technique. This work considers mass and strain energy in the formulation of the optimization objective, along with misalignment and curvature penalties and additional regularization terms. Constraints are imposed on the volume of the structure. The resulting optimization problems are solved by a gradient-based algorithm. The design sensitivities are computed by the adjoint method. Numerical examples demonstrate with two-dimensional and three-dimensional configurations that the proposed method is efficient in simultaneously optimizing the macroscopic shape and the fiber layout while improving manufacturability by promoting parallel and smooth fiber paths.
- [54] arXiv:2403.18972 [pdf, other]
-
Title: Risk-Aware Robotics: Tail Risk Measures in Planning, Control, and VerificationAuthors: Prithvi Akella, Anushri Dixit, Mohamadreza Ahmadi, Lars Lindemann, Margaret P. Chapman, George J. Pappas, Aaron D. Ames, Joel W. BurdickSubjects: Robotics (cs.RO); Systems and Control (eess.SY)
The need for a systematic approach to risk assessment has increased in recent years due to the ubiquity of autonomous systems that alter our day-to-day experiences and their need for safety, e.g., for self-driving vehicles, mobile service robots, and bipedal robots. These systems are expected to function safely in unpredictable environments and interact seamlessly with humans, whose behavior is notably challenging to forecast. We present a survey of risk-aware methodologies for autonomous systems. We adopt a contemporary risk-aware approach to mitigate rare and detrimental outcomes by advocating the use of tail risk measures, a concept borrowed from financial literature. This survey will introduce these measures and explain their relevance in the context of robotic systems for planning, control, and verification applications.
- [55] arXiv:2403.18973 [pdf, other]
-
Title: Conformal Intent Classification and Clarification for Fast and Accurate Intent RecognitionComments: 9 pages,2 figures,3 tables,6 appendices,to be published in ACL's NAACL Findings 2024Subjects: Computation and Language (cs.CL)
We present Conformal Intent Classification and Clarification (CICC), a framework for fast and accurate intent classification for task-oriented dialogue systems. The framework turns heuristic uncertainty scores of any intent classifier into a clarification question that is guaranteed to contain the true intent at a pre-defined confidence level. By disambiguating between a small number of likely intents, the user query can be resolved quickly and accurately. Additionally, we propose to augment the framework for out-of-scope detection. In a comparative evaluation using seven intent recognition datasets we find that CICC generates small clarification questions and is capable of out-of-scope detection. CICC can help practitioners and researchers substantially in improving the user experience of dialogue agents with specific clarification questions.
- [56] arXiv:2403.18975 [pdf, other]
-
Title: A Novel Corpus of Annotated Medical Imaging Reports and Information Extraction Results Using BERT-based Language ModelsAuthors: Namu Park, Kevin Lybarger, Giridhar Kaushik Ramachandran, Spencer Lewis, Aashka Damani, Ozlem Uzuner, Martin Gunn, Meliha YetisgenComments: Accepted at LREC-COLING 2024Subjects: Computation and Language (cs.CL)
Medical imaging is critical to the diagnosis, surveillance, and treatment of many health conditions, including oncological, neurological, cardiovascular, and musculoskeletal disorders, among others. Radiologists interpret these complex, unstructured images and articulate their assessments through narrative reports that remain largely unstructured. This unstructured narrative must be converted into a structured semantic representation to facilitate secondary applications such as retrospective analyses or clinical decision support. Here, we introduce the Corpus of Annotated Medical Imaging Reports (CAMIR), which includes 609 annotated radiology reports from three imaging modality types: Computed Tomography, Magnetic Resonance Imaging, and Positron Emission Tomography-Computed Tomography. Reports were annotated using an event-based schema that captures clinical indications, lesions, and medical problems. Each event consists of a trigger and multiple arguments, and a majority of the argument types, including anatomy, normalize the spans to pre-defined concepts to facilitate secondary use. CAMIR uniquely combines a granular event structure and concept normalization. To extract CAMIR events, we explored two BERT (Bi-directional Encoder Representation from Transformers)-based architectures, including an existing architecture (mSpERT) that jointly extracts all event information and a multi-step approach (PL-Marker++) that we augmented for the CAMIR schema.
- [57] arXiv:2403.18976 [pdf, other]
-
Title: "Sorry, Come Again?" Prompting -- Enhancing Comprehension and Diminishing Hallucination with [PAUSE]-injected Optimal ParaphrasingAuthors: Vipula Rawte, S.M Towhidul Islam Tonmoy, S M Mehedi Zaman, Prachi Priya, Aman Chadha, Amit P. Sheth, Amitava DasSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Hallucination has emerged as the most vulnerable aspect of contemporary Large Language Models (LLMs). In this paper, we introduce the Sorry, Come Again (SCA) prompting, aimed to avoid LLM hallucinations by enhancing comprehension through: (i) optimal paraphrasing and (ii) injecting [PAUSE] tokens to delay LLM generation. First, we provide an in-depth analysis of linguistic nuances: formality, readability, and concreteness of prompts for 21 LLMs, and elucidate how these nuances contribute to hallucinated generation. Prompts with lower readability, formality, or concreteness pose comprehension challenges for LLMs, similar to those faced by humans. In such scenarios, an LLM tends to speculate and generate content based on its imagination (associative memory) to fill these information gaps. Although these speculations may occasionally align with factual information, their accuracy is not assured, often resulting in hallucination. Recent studies reveal that an LLM often neglects the middle sections of extended prompts, a phenomenon termed as lost in the middle. While a specific paraphrase may suit one LLM, the same paraphrased version may elicit a different response from another LLM. Therefore, we propose an optimal paraphrasing technique to identify the most comprehensible paraphrase of a given prompt, evaluated using Integrated Gradient (and its variations) to guarantee that the LLM accurately processes all words. While reading lengthy sentences, humans often pause at various points to better comprehend the meaning read thus far. We have fine-tuned an LLM with injected [PAUSE] tokens, allowing the LLM to pause while reading lengthier prompts. This has brought several key contributions: (i) determining the optimal position to inject [PAUSE], (ii) determining the number of [PAUSE] tokens to be inserted, and (iii) introducing reverse proxy tuning to fine-tune the LLM for [PAUSE] insertion.
- [58] arXiv:2403.18978 [pdf, other]
-
Title: TextCraftor: Your Text Encoder Can be Image Quality ControllerAuthors: Yanyu Li, Xian Liu, Anil Kag, Ju Hu, Yerlan Idelbayev, Dhritiman Sagar, Yanzhi Wang, Sergey Tulyakov, Jian RenSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Diffusion-based text-to-image generative models, e.g., Stable Diffusion, have revolutionized the field of content generation, enabling significant advancements in areas like image editing and video synthesis. Despite their formidable capabilities, these models are not without their limitations. It is still challenging to synthesize an image that aligns well with the input text, and multiple runs with carefully crafted prompts are required to achieve satisfactory results. To mitigate these limitations, numerous studies have endeavored to fine-tune the pre-trained diffusion models, i.e., UNet, utilizing various technologies. Yet, amidst these efforts, a pivotal question of text-to-image diffusion model training has remained largely unexplored: Is it possible and feasible to fine-tune the text encoder to improve the performance of text-to-image diffusion models? Our findings reveal that, instead of replacing the CLIP text encoder used in Stable Diffusion with other large language models, we can enhance it through our proposed fine-tuning approach, TextCraftor, leading to substantial improvements in quantitative benchmarks and human assessments. Interestingly, our technique also empowers controllable image generation through the interpolation of different text encoders fine-tuned with various rewards. We also demonstrate that TextCraftor is orthogonal to UNet finetuning, and can be combined to further improve generative quality.
- [59] arXiv:2403.18985 [pdf, ps, other]
-
Title: Robustness and Visual Explanation for Black Box Image, Video, and ECG Signal Classification with Reinforcement LearningAuthors: Soumyendu Sarkar, Ashwin Ramesh Babu, Sajad Mousavi, Vineet Gundecha, Avisek Naug, Sahand GhorbanpourComments: AAAI Proceedings reference: this https URLJournal-ref: 2024 Proceedings of the AAAI Conference on Artificial IntelligenceSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV); Multiagent Systems (cs.MA)
We present a generic Reinforcement Learning (RL) framework optimized for crafting adversarial attacks on different model types spanning from ECG signal analysis (1D), image classification (2D), and video classification (3D). The framework focuses on identifying sensitive regions and inducing misclassifications with minimal distortions and various distortion types. The novel RL method outperforms state-of-the-art methods for all three applications, proving its efficiency. Our RL approach produces superior localization masks, enhancing interpretability for image classification and ECG analysis models. For applications such as ECG analysis, our platform highlights critical ECG segments for clinicians while ensuring resilience against prevalent distortions. This comprehensive tool aims to bolster both resilience with adversarial training and transparency across varied applications and data types.
- [60] arXiv:2403.18989 [pdf, other]
-
Title: Dealing with Imbalanced Classes in Bot-IoT DatasetSubjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
With the rapidly spreading usage of Internet of Things (IoT) devices, a network intrusion detection system (NIDS) plays an important role in detecting and protecting various types of attacks in the IoT network. To evaluate the robustness of the NIDS in the IoT network, the existing work proposed a realistic botnet dataset in the IoT network (Bot-IoT dataset) and applied it to machine learning-based anomaly detection. This dataset contains imbalanced normal and attack packets because the number of normal packets is much smaller than that of attack ones. The nature of imbalanced data may make it difficult to identify the minority class correctly. In this thesis, to address the class imbalance problem in the Bot-IoT dataset, we propose a binary classification method with synthetic minority over-sampling techniques (SMOTE). The proposed classifier aims to detect attack packets and overcome the class imbalance problem using the SMOTE algorithm. Through numerical results, we demonstrate the proposed classifier's fundamental characteristics and the impact of imbalanced data on its performance.
- [61] arXiv:2403.18995 [pdf, other]
-
Title: Algebraic Reasoning Meets Automata in Solving Linear Integer ArithmeticSubjects: Logic in Computer Science (cs.LO)
We present a new angle on solving quantified linear integer arithmetic based on combining the automata-based approach, where numbers are understood as bitvectors, with ideas from (nowadays prevalent) algebraic approaches, which work directly with numbers. This combination is enabled by a fine-grained version of the duality between automata and arithmetic formulae. In particular, we employ a construction where states of automaton are obtained as derivatives of arithmetic formulae: then every state corresponds to a formula. Optimizations based on techniques and ideas transferred from the world of algebraic methods are used on thousands of automata states, which dramatically amplifies their effect. The merit of this combination of automata with algebraic methods is demonstrated by our prototype implementation being competitive to and even superior to state-of-the-art SMT solvers.
- [62] arXiv:2403.18996 [pdf, other]
-
Title: Envisioning MedCLIP: A Deep Dive into Explainability for Medical Vision-Language ModelsSubjects: Computer Vision and Pattern Recognition (cs.CV)
Explaining Deep Learning models is becoming increasingly important in the face of daily emerging multimodal models, particularly in safety-critical domains like medical imaging. However, the lack of detailed investigations into the performance of explainability methods on these models is widening the gap between their development and safe deployment. In this work, we analyze the performance of various explainable AI methods on a vision-language model, MedCLIP, to demystify its inner workings. We also provide a simple methodology to overcome the shortcomings of these methods. Our work offers a different new perspective on the explainability of a recent well-known VLM in the medical domain and our assessment method is generalizable to other current and possible future VLMs.
- [63] arXiv:2403.18998 [pdf, other]
-
Title: Few-Shot Cross-System Anomaly Trace Classification for Microservice-based systemsAuthors: Yuqing Wang, Mika V. Mantylä, Serge Demeyer, Mutlu Beyazit, Joanna Kisaakye, Jesse NyyssöläComments: 12 pagesSubjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Microservice-based systems (MSS) may experience failures in various fault categories due to their complex and dynamic nature. To effectively handle failures, AIOps tools utilize trace-based anomaly detection and root cause analysis. In this paper, we propose a novel framework for few-shot abnormal trace classification for MSS. Our framework comprises two main components: (1) Multi-Head Attention Autoencoder for constructing system-specific trace representations, which enables (2) Transformer Encoder-based Model-Agnostic Meta-Learning to perform effective and efficient few-shot learning for abnormal trace classification. The proposed framework is evaluated on two representative MSS, Trainticket and OnlineBoutique, with open datasets. The results show that our framework can adapt the learned knowledge to classify new, unseen abnormal traces of novel fault categories both within the same system it was initially trained on and even in the different MSS. Within the same MSS, our framework achieves an average accuracy of 93.26\% and 85.2\% across 50 meta-testing tasks for Trainticket and OnlineBoutique, respectively, when provided with 10 instances for each task. In a cross-system context, our framework gets an average accuracy of 92.19\% and 84.77\% for the same meta-testing tasks of the respective system, also with 10 instances provided for each task. Our work demonstrates the applicability of achieving few-shot abnormal trace classification for MSS and shows how it can enable cross-system adaptability. This opens an avenue for building more generalized AIOps tools that require less system-specific data labeling for anomaly detection and root cause analysis.
- [64] arXiv:2403.18999 [pdf, other]
-
Title: Deciding Boolean Separation Logic via Small Models (Technical Report)Comments: An extended version of a paper accepted to TACAS 2024Subjects: Logic in Computer Science (cs.LO)
We present a novel decision procedure for a fragment of separation logic (SL) with arbitrary nesting of separating conjunctions with boolean conjunctions, disjunctions, and guarded negations together with a support for the most common variants of linked lists. Our method is based on a model-based translation to SMT for which we introduce several optimisations$\unicode{x2013}$the most important of them is based on bounding the size of predicate instantiations within models of larger formulae, which leads to a much more efficient translation of SL formulae to SMT. Through a series of experiments, we show that, on the frequently used symbolic heap fragment, our decision procedure is competitive with other existing approaches, and it can outperform them outside the symbolic heap fragment. Moreover, our decision procedure can also handle some formulae for which no decision procedure has been implemented so far.
- [65] arXiv:2403.19001 [pdf, other]
-
Title: Cross--domain Fiber Cluster Shape Analysis for Language Performance Cognitive Score PredictionAuthors: Yui Lo, Yuqian Chen, Dongnan Liu, Wan Liu, Leo Zekelman, Fan Zhang, Yogesh Rathi, Nikos Makris, Alexandra J. Golby, Weidong Cai, Lauren J. O'DonnellComments: 2 figures, 11 pagesSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Image and Video Processing (eess.IV); Neurons and Cognition (q-bio.NC)
Shape plays an important role in computer graphics, offering informative features to convey an object's morphology and functionality. Shape analysis in brain imaging can help interpret structural and functionality correlations of the human brain. In this work, we investigate the shape of the brain's 3D white matter connections and its potential predictive relationship to human cognitive function. We reconstruct brain connections as sequences of 3D points using diffusion magnetic resonance imaging (dMRI) tractography. To describe each connection, we extract 12 shape descriptors in addition to traditional dMRI connectivity and tissue microstructure features. We introduce a novel framework, Shape--fused Fiber Cluster Transformer (SFFormer), that leverages a multi-head cross-attention feature fusion module to predict subject-specific language performance based on dMRI tractography. We assess the performance of the method on a large dataset including 1065 healthy young adults. The results demonstrate that both the transformer-based SFFormer model and its inter/intra feature fusion with shape, microstructure, and connectivity are informative, and together, they improve the prediction of subject-specific language performance scores. Overall, our results indicate that the shape of the brain's connections is predictive of human language function.
- [66] arXiv:2403.19002 [pdf, other]
-
Title: Robust Active Speaker Detection in Noisy EnvironmentsComments: 15 pages, 5 figuresSubjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
This paper addresses the issue of active speaker detection (ASD) in noisy environments and formulates a robust active speaker detection (rASD) problem. Existing ASD approaches leverage both audio and visual modalities, but non-speech sounds in the surrounding environment can negatively impact performance. To overcome this, we propose a novel framework that utilizes audio-visual speech separation as guidance to learn noise-free audio features. These features are then utilized in an ASD model, and both tasks are jointly optimized in an end-to-end framework. Our proposed framework mitigates residual noise and audio quality reduction issues that can occur in a naive cascaded two-stage framework that directly uses separated speech for ASD, and enables the two tasks to be optimized simultaneously. To further enhance the robustness of the audio features and handle inherent speech noises, we propose a dynamic weighted loss approach to train the speech separator. We also collected a real-world noise audio dataset to facilitate investigations. Experiments demonstrate that non-speech audio noises significantly impact ASD models, and our proposed approach improves ASD performance in noisy environments. The framework is general and can be applied to different ASD approaches to improve their robustness. Our code, models, and data will be released.
- [67] arXiv:2403.19004 [pdf, other]
-
Title: Discrete Poincaré inequality and Discrete Trace inequality in Piece-wise Polynomial Hybridizable SpacesAuthors: Yukun YueSubjects: Numerical Analysis (math.NA)
In this paper, we establish discrete versions of the Poincar\'e and trace inequalities for hybridizable finite element spaces. These spaces are made of piecewise polynomial functions defined both within the interiors of elements and across all faces in a mesh's skeleton, serving as the basis for both the hybridizable discontinuous Galerkin (HDG) and hybrid high-order (HHO) methods. Additionally, we present a specific adaptation of these inequalities for the HDG method and apply them to demonstrate the stability of the related numerical schemes for second-order elliptic equations under the minimal regularity assumptions for the source term and boundary data.
- [68] arXiv:2403.19006 [pdf, ps, other]
-
Title: Ensuring Safe Autonomy: Navigating the Future of Autonomous VehiclesAuthors: Patrick WolfComments: S. Bernardi, T. Zoppi (Editors), "Fast Abstracts and Student Forum Proceedings" - EDCC 2024 - 19th European Dependable Computing Conference, Leuven, Belgium, 8-11 AprilSubjects: Robotics (cs.RO)
Autonomous driving vehicles provide a vast potential for realizing use cases in the on-road and off-road domains. Consequently, remarkable solutions exist to autonomous systems' environmental perception and control. Nevertheless, proof of safety remains an open challenge preventing such machinery from being introduced to markets and deployed in real world. Traditional approaches for safety assurance of autonomously driving vehicles often lead to underperformance due to conservative safety assumptions that cannot handle the overall complexity. Besides, the more sophisticated safety systems rely on the vehicle's perception systems. However, perception is often unreliable due to uncertainties resulting from disturbances or the lack of context incorporation for data interpretation. Accordingly, this paper illustrates the potential of a modular, self-adaptive autonomy framework with integrated dynamic risk management to overcome the abovementioned drawbacks.
- [69] arXiv:2403.19009 [pdf, other]
-
Title: Towards Sustainable SecureML: Quantifying Carbon Footprint of Adversarial Machine LearningComments: Accepted at GreenNet Workshop @ IEEE International Conference on Communications (IEEE ICC 2024)Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
The widespread adoption of machine learning (ML) across various industries has raised sustainability concerns due to its substantial energy usage and carbon emissions. This issue becomes more pressing in adversarial ML, which focuses on enhancing model security against different network-based attacks. Implementing defenses in ML systems often necessitates additional computational resources and network security measures, exacerbating their environmental impacts. In this paper, we pioneer the first investigation into adversarial ML's carbon footprint, providing empirical evidence connecting greater model robustness to higher emissions. Addressing the critical need to quantify this trade-off, we introduce the Robustness Carbon Trade-off Index (RCTI). This novel metric, inspired by economic elasticity principles, captures the sensitivity of carbon emissions to changes in adversarial robustness. We demonstrate the RCTI through an experiment involving evasion attacks, analyzing the interplay between robustness against attacks, performance, and carbon emissions.
- [70] arXiv:2403.19010 [pdf, other]
-
Title: Gaussian Process-based Traversability Analysis for Terrain Mapless NavigationComments: This paper has been accepted for publication at 2024 IEEE International Conference on Robotics and Automation (ICRA 2024)Subjects: Robotics (cs.RO)
Efficient navigation through uneven terrain remains a challenging endeavor for autonomous robots. We propose a new geometric-based uneven terrain mapless navigation framework combining a Sparse Gaussian Process (SGP) local map with a Rapidly-Exploring Random Tree* (RRT*) planner. Our approach begins with the generation of a high-resolution SGP local map, providing an interpolated representation of the robot's immediate environment. This map captures crucial environmental variations, including height, uncertainties, and slope characteristics. Subsequently, we construct a traversability map based on the SGP representation to guide our planning process. The RRT* planner efficiently generates real-time navigation paths, avoiding untraversable terrain in pursuit of the goal. This combination of SGP-based terrain interpretation and RRT* planning enables ground robots to safely navigate environments with varying elevations and steep obstacles. We evaluate the performance of our proposed approach through robust simulation testing, highlighting its effectiveness in achieving safe and efficient navigation compared to existing methods.
- [71] arXiv:2403.19012 [pdf, other]
-
Title: ReflectSumm: A Benchmark for Course Reflection SummarizationComments: LREC-COLING 2024 camera ready; code and dataset are available at this https URLSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
This paper introduces ReflectSumm, a novel summarization dataset specifically designed for summarizing students' reflective writing. The goal of ReflectSumm is to facilitate developing and evaluating novel summarization techniques tailored to real-world scenarios with little training data, %practical tasks with potential implications in the opinion summarization domain in general and the educational domain in particular. The dataset encompasses a diverse range of summarization tasks and includes comprehensive metadata, enabling the exploration of various research questions and supporting different applications. To showcase its utility, we conducted extensive evaluations using multiple state-of-the-art baselines. The results provide benchmarks for facilitating further research in this area.
- [72] arXiv:2403.19014 [pdf, ps, other]
-
Title: Thelxinoë: Recognizing Human Emotions Using Pupillometry and Machine LearningComments: 14 pages, 9 figures, 1 table, journalJournal-ref: Machine Learning and Applications: An International Journal (MLAIJ), vol. 11, no. 1, pp. 1-14, Mar. 2024Subjects: Machine Learning (cs.LG); Human-Computer Interaction (cs.HC)
In this study, we present a method for emotion recognition in Virtual Reality (VR) using pupillometry. We analyze pupil diameter responses to both visual and auditory stimuli via a VR headset and focus on extracting key features in the time-domain, frequency-domain, and time-frequency domain from VR generated data. Our approach utilizes feature selection to identify the most impactful features using Maximum Relevance Minimum Redundancy (mRMR). By applying a Gradient Boosting model, an ensemble learning technique using stacked decision trees, we achieve an accuracy of 98.8% with feature engineering, compared to 84.9% without it. This research contributes significantly to the Thelxino\"e framework, aiming to enhance VR experiences by integrating multiple sensor data for realistic and emotionally resonant touch interactions. Our findings open new avenues for developing more immersive and interactive VR environments, paving the way for future advancements in virtual touch technology.
- [73] arXiv:2403.19016 [pdf, ps, other]
-
Title: Resource Allocation in Large Language Model Integrated 6G Vehicular NetworksComments: This paper appears in the 2024 IEEE 99th Vehicular Technology Conference (VTC)Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Signal Processing (eess.SP); Systems and Control (eess.SY); Optimization and Control (math.OC)
In the upcoming 6G era, vehicular networks are shifting from simple Vehicle-to-Vehicle (V2V) communication to the more complex Vehicle-to-Everything (V2X) connectivity. At the forefront of this shift is the incorporation of Large Language Models (LLMs) into vehicles. Known for their sophisticated natural language processing abilities, LLMs change how users interact with their vehicles. This integration facilitates voice-driven commands and interactions, departing from the conventional manual control systems. However, integrating LLMs into vehicular systems presents notable challenges. The substantial computational demands and energy requirements of LLMs pose significant challenges, especially in the constrained environment of a vehicle. Additionally, the time-sensitive nature of tasks in vehicular networks adds another layer of complexity. In this paper, we consider an edge computing system where vehicles process the initial layers of LLM computations locally, and offload the remaining LLM computation tasks to the Roadside Units (RSUs), envisioning a vehicular ecosystem where LLM computations seamlessly interact with the ultra-low latency and high-bandwidth capabilities of 6G networks. To balance the trade-off between completion time and energy consumption, we formulate a multi-objective optimization problem to minimize the total cost of the vehicles and RSUs. The problem is then decomposed into two sub-problems, which are solved by sequential quadratic programming (SQP) method and fractional programming technique. The simulation results clearly indicate that the algorithm we have proposed is highly effective in reducing both the completion time and energy consumption of the system.
- [74] arXiv:2403.19019 [pdf, ps, other]
-
Title: The Correlations of Scene Complexity, Workload, Presence, and Cybersickness in a Task-Based VR GameAuthors: Mohammadamin Sanaei, Stephen B. Gilbert, Nikoo Javadpour, Hila Sabouni, Michael C. Dorneich, Jonathan W. KellySubjects: Human-Computer Interaction (cs.HC)
This investigation examined the relationships among scene complexity, workload, presence, and cybersickness in virtual reality (VR) environments. Numerous factors can influence the overall VR experience, and existing research on this matter is not yet conclusive, warranting further investigation. In this between-subjects experimental setup, 44 participants engaged in the Pendulum Chair game, with half exposed to a simple scene with lower optic flow and lower familiarity, and the remaining half to a complex scene characterized by higher optic flow and greater familiarity. The study measured the dependent variables workload, presence, and cybersickness and analyzed their correlations. Equivalence testing was also used to compare the simple and complex environments. Results revealed that despite the visible differences between the environments, within the 10% boundaries of the maximum possible value for workload and presence, and 13.6% of the maximum SSQ value, a statistically significant equivalence was observed between the simple and complex scenes. Additionally, a moderate, negative correlation emerged between workload and SSQ scores. The findings suggest two key points: (1) the nature of the task can mitigate the impact of scene complexity factors such as optic flow and familiarity, and (2) the correlation between workload and cybersickness may vary, showing either a positive or negative relationship.
- [75] arXiv:2403.19021 [pdf, other]
-
Title: Towards LLM-RecSys Alignment with Textual ID LearningComments: Accepted in SIGIR 2024Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
Generative recommendation based on Large Language Models (LLMs) have transformed the traditional ranking-based recommendation style into a text-to-text generation paradigm. However, in contrast to standard NLP tasks that inherently operate on human vocabulary, current research in generative recommendations struggles to effectively encode recommendation items within the text-to-text framework using concise yet meaningful ID representations. To better align LLMs with recommendation needs, we propose IDGen, representing each item as a unique, concise, semantically rich, platform-agnostic textual ID using human language tokens. This is achieved by training a textual ID generator alongside the LLM-based recommender, enabling seamless integration of personalized recommendations into natural language generation. Notably, as user history is expressed in natural language and decoupled from the original dataset, our approach suggests the potential for a foundational generative recommendation model. Experiments show that our framework consistently surpasses existing models in sequential recommendation under standard experimental setting. Then, we explore the possibility of training a foundation recommendation model with the proposed method on data collected from 19 different datasets and tested its recommendation performance on 6 unseen datasets across different platforms under a completely zero-shot setting. The results show that the zero-shot performance of the pre-trained foundation model is comparable to or even better than some traditional recommendation models based on supervised training, showing the potential of the IDGen paradigm serving as the foundation model for generative recommendation. Code and data are open-sourced at https://github.com/agiresearch/IDGenRec.
- [76] arXiv:2403.19022 [pdf, other]
-
Title: WALT3D: Generating Realistic Training Data from Time-Lapse Imagery for Reconstructing Dynamic Objects under OcclusionComments: To appear in CVPR 2024Subjects: Computer Vision and Pattern Recognition (cs.CV)
Current methods for 2D and 3D object understanding struggle with severe occlusions in busy urban environments, partly due to the lack of large-scale labeled ground-truth annotations for learning occlusion. In this work, we introduce a novel framework for automatically generating a large, realistic dataset of dynamic objects under occlusions using freely available time-lapse imagery. By leveraging off-the-shelf 2D (bounding box, segmentation, keypoint) and 3D (pose, shape) predictions as pseudo-groundtruth, unoccluded 3D objects are identified automatically and composited into the background in a clip-art style, ensuring realistic appearances and physically accurate occlusion configurations. The resulting clip-art image with pseudo-groundtruth enables efficient training of object reconstruction methods that are robust to occlusions. Our method demonstrates significant improvements in both 2D and 3D reconstruction, particularly in scenarios with heavily occluded objects like vehicles and people in urban scenes.
- [77] arXiv:2403.19024 [pdf, other]
-
Title: Exploiting Symmetry in Dynamics for Model-Based Reinforcement Learning with Asymmetric RewardsSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO); Systems and Control (eess.SY)
Recent work in reinforcement learning has leveraged symmetries in the model to improve sample efficiency in training a policy. A commonly used simplifying assumption is that the dynamics and reward both exhibit the same symmetry. However, in many real-world environments, the dynamical model exhibits symmetry independent of the reward model: the reward may not satisfy the same symmetries as the dynamics. In this paper, we investigate scenarios where only the dynamics are assumed to exhibit symmetry, extending the scope of problems in reinforcement learning and learning in control theory where symmetry techniques can be applied. We use Cartan's moving frame method to introduce a technique for learning dynamics which, by construction, exhibit specified symmetries. We demonstrate through numerical experiments that the proposed method learns a more accurate dynamical model.
- [78] arXiv:2403.19026 [pdf, other]
-
Title: Egocentric Scene-aware Human Trajectory PredictionComments: 14 pages, 9 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV)
Wearable collaborative robots stand to assist human wearers who need fall prevention assistance or wear exoskeletons. Such a robot needs to be able to predict the ego motion of the wearer based on egocentric vision and the surrounding scene. In this work, we leveraged body-mounted cameras and sensors to anticipate the trajectory of human wearers through complex surroundings. To facilitate research in ego-motion prediction, we have collected a comprehensive walking scene navigation dataset centered on the user's perspective. We present a method to predict human motion conditioning on the surrounding static scene. Our method leverages a diffusion model to produce a distribution of potential future trajectories, taking into account the user's observation of the environment. We introduce a compact representation to encode the user's visual memory of the surroundings, as well as an efficient sample-generating technique to speed up real-time inference of a diffusion model. We ablate our model and compare it to baselines, and results show that our model outperforms existing methods on key metrics of collision avoidance and trajectory mode coverage.
- [79] arXiv:2403.19027 [pdf, other]
-
Title: Should I Help a Delivery Robot? Cultivating Prosocial Norms through ObservationsComments: Accepted as a Late Breaking Work at CHI'24Subjects: Robotics (cs.RO); Human-Computer Interaction (cs.HC)
We propose leveraging prosocial observations to cultivate new social norms to encourage prosocial behaviors toward delivery robots. With an online experiment, we quantitatively assess updates in norm beliefs regarding human-robot prosocial behaviors through observational learning. Results demonstrate the initially perceived normativity of helping robots is influenced by familiarity with delivery robots and perceptions of robots' social intelligence. Observing human-robot prosocial interactions notably shifts peoples' normative beliefs about prosocial actions; thereby changing their perceived obligations to offer help to delivery robots. Additionally, we found that observing robots offering help to humans, rather than receiving help, more significantly increased participants' feelings of obligation to help robots. Our findings provide insights into prosocial design for future mobility systems. Improved familiarity with robot capabilities and portraying them as desirable social partners can help foster wider acceptance. Furthermore, robots need to be designed to exhibit higher levels of interactivity and reciprocal capabilities for prosocial behavior.
- [80] arXiv:2403.19028 [pdf, other]
-
Title: Nonlinear Model Predictive Control for Enhanced Navigation of Autonomous Surface VesselsSubjects: Systems and Control (eess.SY)
This article proposes an approach for collision avoidance, path following, and anti-grounding of autonomous surface vessels under consideration of environmental forces based on Nonlinear Model Predictive Control (NMPC). Artificial Potential Fields (APFs) set the foundation for the cost function of the optimal control problem in terms of collision avoidance and anti-grounding. Depending on the risk of a collision given by the resulting force of the APFs, the controller optimizes regarding an adapted heading and travel speed by additionally following a desired path. For this purpose, nonlinear vessel dynamics are used for the NMPC. To extend the situational awareness concerning environmental disturbances impacted by wind, waves, and sea currents, a nonlinear disturbance observer is coupled to the entire NMPC scheme, allowing for the correction of an incorrect vessel motion due to external forces. In addition, the most essential rules according to the Convention on the International Regulations for Preventing Collisions at Sea (COLREGs) are considered. The results of the simulations show that the proposed framework can control an autonomous surface vessel under various challenging scenarios, including environmental disturbances, to avoid collisions and follow desired paths.
- [81] arXiv:2403.19031 [pdf, ps, other]
-
Title: Evaluating Large Language Models for Health-Related Text Classification Tasks with Public Social Media DataSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Large language models (LLMs) have demonstrated remarkable success in NLP tasks. However, there is a paucity of studies that attempt to evaluate their performances on social media-based health-related natural language processing tasks, which have traditionally been difficult to achieve high scores in. We benchmarked one supervised classic machine learning model based on Support Vector Machines (SVMs), three supervised pretrained language models (PLMs) based on RoBERTa, BERTweet, and SocBERT, and two LLM based classifiers (GPT3.5 and GPT4), across 6 text classification tasks. We developed three approaches for leveraging LLMs for text classification: employing LLMs as zero-shot classifiers, us-ing LLMs as annotators to annotate training data for supervised classifiers, and utilizing LLMs with few-shot examples for augmentation of manually annotated data. Our comprehensive experiments demonstrate that employ-ing data augmentation using LLMs (GPT-4) with relatively small human-annotated data to train lightweight supervised classification models achieves superior results compared to training with human-annotated data alone. Supervised learners also outperform GPT-4 and GPT-3.5 in zero-shot settings. By leveraging this data augmentation strategy, we can harness the power of LLMs to develop smaller, more effective domain-specific NLP models. LLM-annotated data without human guidance for training light-weight supervised classification models is an ineffective strategy. However, LLM, as a zero-shot classifier, shows promise in excluding false negatives and potentially reducing the human effort required for data annotation. Future investigations are imperative to explore optimal training data sizes and the optimal amounts of augmented data.
- [82] arXiv:2403.19036 [pdf, other]
-
Title: Tessellation and interactive visualization of four-dimensional spacetime geometriesAuthors: Philip Claude CaplanSubjects: Computational Engineering, Finance, and Science (cs.CE); Computational Geometry (cs.CG)
This paper addresses two problems needed to support four-dimensional ($3d + t$) spacetime numerical simulations. The first contribution is a general algorithm for producing conforming spacetime meshes of moving geometries. Here, the surface points of the geometry are embedded in a four-dimensional space as the geometry moves in time. The geometry is first tessellated at prescribed time steps and then these tessellations are connected in the parameter space of each geometry entity to form tetrahedra. In contrast to previous work, this approach allows the resolution of the geometry to be controlled at each time step. The only restriction on the algorithm is the requirement that no topological changes to the geometry are made (i.e. the hierarchical relations between all geometry entities are maintained) as the geometry moves in time. The validity of the final mesh topology is verified by ensuring the tetrahedralizations represent a closed 3-manifold. For some analytic problems, the $4d$ volume of the tetrahedralization is also verified. The second problem addressed in this paper is the design of a system to interactively visualize four-dimensional meshes, including tetrahedra (embedded in $4d$) and pentatopes. Algorithms that either include or exclude a geometry shader are described, and the efficiency of each approach is then compared. Overall, the results suggest that visualizing tetrahedra (either those bounding the domain, or extracted from a pentatopal mesh) using a geometry shader achieves the highest frame rate, in the range of $20-30$ frames per second for meshes with about $50$ million tetrahedra.
- [83] arXiv:2403.19037 [pdf, other]
-
Title: Women are less comfortable expressing opinions online than men and report heightened fears for safety: Surveying gender differences in experiences of online harmsAuthors: Francesca Stevens, Florence E. Enock, Tvesha Sippy, Jonathan Bright, Miranda Cross, Pica Johansson, Judy Wajcman, Helen Z. MargettsSubjects: Computers and Society (cs.CY); Human-Computer Interaction (cs.HC)
Online harms, such as hate speech, trolling and self-harm promotion, continue to be widespread. While some work suggests women are disproportionately affected, other studies find mixed evidence for gender differences in experiences with content of this kind. Using a nationally representative survey of UK adults (N=1992), we examine exposure to a variety of harms, fears surrounding being targeted, the psychological impact of online experiences, the use of safety tools to protect against harm, and comfort with various forms of online participation across men and women. We find that while men and women see harmful content online to a roughly similar extent, women are more at risk than men of being targeted by harms including online misogyny, cyberstalking and cyberflashing. Women are significantly more fearful of being targeted by harms overall, and report greater negative psychological impact as a result of particular experiences. Perhaps in an attempt to mitigate risk, women report higher use of a range of safety tools and less comfort with several forms of online participation, with just 23% of women comfortable expressing political views online compared to 40% of men. We also find direct associations between fears surrounding harms and comfort with online behaviours. For example, fear of being trolled significantly decreases comfort expressing opinions, and fear of being targeted by misogyny significantly decreases comfort sharing photos. Our results are important because with much public discourse happening online, we must ensure all members of society feel safe and able to participate in online spaces.
- [84] arXiv:2403.19040 [pdf, other]
-
Title: Visualizing High-Dimensional Temporal Data Using Direction-Aware t-SNESubjects: Machine Learning (cs.LG); Human-Computer Interaction (cs.HC)
Many real-world data sets contain a temporal component or involve transitions from state to state. For exploratory data analysis, we can represent these high-dimensional data sets in two-dimensional maps, using embeddings of the data objects under exploration and representing their temporal relationships with directed edges. Most existing dimensionality reduction techniques, such as t-SNE and UMAP, do not take into account the temporal or relational nature of the data when constructing the embeddings, resulting in temporally cluttered visualizations that obscure potentially interesting patterns. To address this problem, we propose two complementary, direction-aware loss terms in the optimization function of t-SNE that emphasize the temporal aspects of the data, guiding the optimization and the resulting embedding to reveal temporal patterns that might otherwise go unnoticed. The Directional Coherence Loss (DCL) encourages nearby arrows connecting two adjacent time series points to point in the same direction, while the Edge Length Loss (ELL) penalizes arrows - which effectively represent time gaps in the visualized embedding - based on their length. Both loss terms are differentiable and can be easily incorporated into existing dimensionality reduction techniques. By promoting local directionality of the directed edges, our procedure produces more temporally meaningful and less cluttered visualizations. We demonstrate the effectiveness of our approach on a toy dataset and two real-world datasets.
- [85] arXiv:2403.19042 [pdf, ps, other]
-
Title: Orchestrating Mixed-Criticality Cloud Workloads in Reconfigurable Manufacturing SystemsComments: S. Bernardi, T. Zoppi (Editors), Fast Abstracts and Student Forum Proceedings - EDCC 2024 - 19th European Dependable Computing Conference, Leuven, Belgium, 8-11 April 2024Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
The adoption of cloud computing technologies in the industry is paving the way to new manufacturing paradigms. In this paper we propose a model to optimize the orchestration of workloads with differentiated criticality levels on a cloud-enabled factory floor. Preliminary results show that it is possible to optimize the guarantees to deployed jobs without penalizing the number of schedulable jobs. We indicate future research paths to quantitatively evaluate job isolation.
- [86] arXiv:2403.19043 [pdf, other]
-
Title: Illicit object detection in X-ray images using Vision TransformersAuthors: Jorgen Cani, Ioannis Mademlis, Adamantia Anna Rebolledo Chrysochoou, Georgios Th. PapadopoulosSubjects: Computer Vision and Pattern Recognition (cs.CV)
Illicit object detection is a critical task performed at various high-security locations, including airports, train stations, subways, and ports. The continuous and tedious work of examining thousands of X-ray images per hour can be mentally taxing. Thus, Deep Neural Networks (DNNs) can be used to automate the X-ray image analysis process, improve efficiency and alleviate the security officers' inspection burden. The neural architectures typically utilized in relevant literature are Convolutional Neural Networks (CNNs), with Vision Transformers (ViTs) rarely employed. In order to address this gap, this paper conducts a comprehensive evaluation of relevant ViT architectures on illicit item detection in X-ray images. This study utilizes both Transformer and hybrid backbones, such as SWIN and NextViT, and detectors, such as DINO and RT-DETR. The results demonstrate the remarkable accuracy of the DINO Transformer detector in the low-data regime, the impressive real-time performance of YOLOv8, and the effectiveness of the hybrid NextViT backbone.
- [87] arXiv:2403.19046 [pdf, other]
-
Title: LITA: Language Instructed Temporal-Localization AssistantAuthors: De-An Huang, Shijia Liao, Subhashree Radhakrishnan, Hongxu Yin, Pavlo Molchanov, Zhiding Yu, Jan KautzSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
There has been tremendous progress in multimodal Large Language Models (LLMs). Recent works have extended these models to video input with promising instruction following capabilities. However, an important missing piece is temporal localization. These models cannot accurately answer the "When?" questions. We identify three key aspects that limit their temporal localization capabilities: (i) time representation, (ii) architecture, and (iii) data. We address these shortcomings by proposing Language Instructed Temporal-Localization Assistant (LITA) with the following features: (1) We introduce time tokens that encode timestamps relative to the video length to better represent time in videos. (2) We introduce SlowFast tokens in the architecture to capture temporal information at fine temporal resolution. (3) We emphasize temporal localization data for LITA. In addition to leveraging existing video datasets with timestamps, we propose a new task, Reasoning Temporal Localization (RTL), along with the dataset, ActivityNet-RTL, for learning and evaluating this task. Reasoning temporal localization requires both the reasoning and temporal localization of Video LLMs. LITA demonstrates strong performance on this challenging task, nearly doubling the temporal mean intersection-over-union (mIoU) of baselines. In addition, we show that our emphasis on temporal localization also substantially improves video-based text generation compared to existing Video LLMs, including a 36% relative improvement of Temporal Understanding. Code is available at: https://github.com/NVlabs/LITA
- [88] arXiv:2403.19049 [pdf, ps, other]
-
Title: Power and Play: Investigating "License to Critique" in Teams' AI Ethics DiscussionsComments: Accepted to CSCW 2024Subjects: Computers and Society (cs.CY)
Past work has sought to design AI ethics interventions-such as checklists or toolkits-to help practitioners design more ethical AI systems. However, other work demonstrates how these interventions and the principles they're based on may serve to instead limit critique to those addressed within the intervention, while rendering broader concerns illegitimate. In this paper, drawing on work examining how standards enact discursive closure and how power relations affect whether and how people raise critique, we recruit three corporate teams, and one activist team, each with prior context working with one another, to play a game designed to trigger broad discussion around AI ethics. We use this as a point of contrast to trigger reflection on their teams' past discussions, examining factors which may affect their "license to critique" in AI ethics discussions. We then report on how particular affordances of this game may influence discussion, and find that the hypothetical context created in the game is unlikely to be a viable mechanism for real world change. We discuss how power dynamics within a group and notions of "scope" affect whether people may be willing to raise critique in AI ethics discussions, and discuss our finding that games are unlikely to enable direct changes to products or practice, but may be more likely to allow members to find critically-aligned allies for future collective action.
- [89] arXiv:2403.19050 [pdf, other]
-
Title: Detecting Generative Parroting through Overfitting Masked AutoencodersSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
The advent of generative AI models has revolutionized digital content creation, yet it introduces challenges in maintaining copyright integrity due to generative parroting, where models mimic their training data too closely. Our research presents a novel approach to tackle this issue by employing an overfitted Masked Autoencoder (MAE) to detect such parroted samples effectively. We establish a detection threshold based on the mean loss across the training dataset, allowing for the precise identification of parroted content in modified datasets. Preliminary evaluations demonstrate promising results, suggesting our method's potential to ensure ethical use and enhance the legal compliance of generative models.
- [90] arXiv:2403.19052 [pdf, ps, other]
-
Title: On Orbital Labeling with Circular ContoursComments: Presented at EuroCG24, 12 pages, 10 figuresSubjects: Computational Geometry (cs.CG)
Schematic depictions in text books and maps often need to label specific point features with a text label. We investigate one variant of such a labeling, where the image contour is a circle and the labels are placed as circular arcs along the circumference of this circle. To map the labels to the feature points, we use orbital-radial leaders, which consist of a circular arc concentric with the image contour circle and a radial line to the contour. In this paper, we provide a framework, which captures various dimensions of the problem space as well as several polynomial time algorithms and complexity results for some problem variants.
- [91] arXiv:2403.19056 [pdf, other]
-
Title: CAUSE: Counterfactual Assessment of User Satisfaction Estimation in Task-Oriented Dialogue SystemsAuthors: Amin Abolghasemi, Zhaochun Ren, Arian Askari, Mohammad Aliannejadi, Maarten de Rijke, Suzan VerberneSubjects: Computation and Language (cs.CL)
An important unexplored aspect in previous work on user satisfaction estimation for Task-Oriented Dialogue (TOD) systems is their evaluation in terms of robustness for the identification of user dissatisfaction: current benchmarks for user satisfaction estimation in TOD systems are highly skewed towards dialogues for which the user is satisfied. The effect of having a more balanced set of satisfaction labels on performance is unknown. However, balancing the data with more dissatisfactory dialogue samples requires further data collection and human annotation, which is costly and time-consuming. In this work, we leverage large language models (LLMs) and unlock their ability to generate satisfaction-aware counterfactual dialogues to augment the set of original dialogues of a test collection. We gather human annotations to ensure the reliability of the generated samples. We evaluate two open-source LLMs as user satisfaction estimators on our augmented collection against state-of-the-art fine-tuned models. Our experiments show that when used as few-shot user satisfaction estimators, open-source LLMs show higher robustness to the increase in the number of dissatisfaction labels in the test collection than the fine-tuned state-of-the-art models. Our results shed light on the need for data augmentation approaches for user satisfaction estimation in TOD systems. We release our aligned counterfactual dialogues, which are curated by human annotation, to facilitate further research on this topic.
- [92] arXiv:2403.19057 [pdf, ps, other]
-
Title: Equity in Healthcare: Analyzing Disparities in Machine Learning Predictions of Diabetic Patient ReadmissionsSubjects: Machine Learning (cs.LG)
This study investigates how machine learning (ML) models can predict hospital readmissions for diabetic patients fairly and accurately across different demographics (age, gender, race). We compared models like Deep Learning, Generalized Linear Models, Gradient Boosting Machines (GBM), and Naive Bayes. GBM stood out with an F1-score of 84.3% and accuracy of 82.2%, accurately predicting readmissions across demographics. A fairness analysis was conducted across all the models. GBM minimized disparities in predictions, achieving balanced results across genders and races. It showed low False Discovery Rates (FDR) (6-7%) and False Positive Rates (FPR) (5%) for both genders. Additionally, FDRs remained low for racial groups, such as African Americans (8%) and Asians (7%). Similarly, FPRs were consistent across age groups (4%) for both patients under 40 and those above 40, indicating its precision and ability to reduce bias. These findings emphasize the importance of choosing ML models carefully to ensure both accuracy and fairness for all patients. By showcasing effectiveness of various models with fairness metrics, this study promotes personalized medicine and the need for fair ML algorithms in healthcare. This can ultimately reduce disparities and improve outcomes for diabetic patients of all backgrounds.
- [93] arXiv:2403.19060 [pdf, other]
-
Title: Towards Human-Centered Construction Robotics: An RL-Driven Companion Robot For Contextually Assisting Carpentry WorkersComments: 8 pages, 9 figures. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessibleSubjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
In the dynamic construction industry, traditional robotic integration has primarily focused on automating specific tasks, often overlooking the complexity and variability of human aspects in construction workflows. This paper introduces a human-centered approach with a ``work companion rover" designed to assist construction workers within their existing practices, aiming to enhance safety and workflow fluency while respecting construction labor's skilled nature. We conduct an in-depth study on deploying a robotic system in carpentry formwork, showcasing a prototype that emphasizes mobility, safety, and comfortable worker-robot collaboration in dynamic environments through a contextual Reinforcement Learning (RL)-driven modular framework. Our research advances robotic applications in construction, advocating for collaborative models where adaptive robots support rather than replace humans, underscoring the potential for an interactive and collaborative human-robot workforce.
- [94] arXiv:2403.19061 [pdf, other]
-
Title: One Code Fits All: Strong stuck-at codes for versatile memory encodingSubjects: Information Theory (cs.IT)
In this work we consider a generalization of the well-studied problem of coding for ``stuck-at'' errors, which we refer to as ``strong stuck-at'' codes. In the traditional framework of stuck-at codes, the task involves encoding a message into a one-dimensional binary vector. However, a certain number of the bits in this vector are 'frozen', meaning they are fixed at a predetermined value and cannot be altered by the encoder. The decoder, aware of the proportion of frozen bits but not their specific positions, is responsible for deciphering the intended message. We consider a more challenging version of this problem where the decoder does not know also the fraction of frozen bits. We construct explicit and efficient encoding and decoding algorithms that get arbitrarily close to capacity in this scenario. Furthermore, to the best of our knowledge, our construction is the first, fully explicit construction of stuck-at codes that approach capacity.
- [95] arXiv:2403.19062 [pdf, other]
-
Title: GENESIS-RL: GEnerating Natural Edge-cases with Systematic Integration of Safety considerations and Reinforcement LearningAuthors: Hsin-Jung Yang, Joe Beck, Md Zahid Hasan, Ekin Beyazit, Subhadeep Chakraborty, Tichakorn Wongpiromsarn, Soumik SarkarSubjects: Systems and Control (eess.SY); Robotics (cs.RO)
In the rapidly evolving field of autonomous systems, the safety and reliability of the system components are fundamental requirements. These components are often vulnerable to complex and unforeseen environments, making natural edge-case generation essential for enhancing system resilience. This paper presents GENESIS-RL, a novel framework that leverages system-level safety considerations and reinforcement learning techniques to systematically generate naturalistic edge cases. By simulating challenging conditions that mimic the real-world situations, our framework aims to rigorously test entire system's safety and reliability. Although demonstrated within the autonomous driving application, our methodology is adaptable across diverse autonomous systems. Our experimental validation, conducted on high-fidelity simulator underscores the overall effectiveness of this framework.
- [96] arXiv:2403.19063 [pdf, other]
-
Title: Instruction-based Hypergraph PretrainingComments: Accepted by SIGIR'24Subjects: Information Retrieval (cs.IR)
Pretraining has been widely explored to augment the adaptability of graph learning models to transfer knowledge from large datasets to a downstream task, such as link prediction or classification. However, the gap between training objectives and the discrepancy between data distributions in pretraining and downstream tasks hinders the transfer of the pretrained knowledge. Inspired by instruction-based prompts widely used in pretrained language models, we introduce instructions into graph pretraining. In this paper, we propose a novel pretraining framework named Instruction-based Hypergraph Pretraining. To overcome the discrepancy between pretraining and downstream tasks, text-based instructions are applied to provide explicit guidance on specific tasks for representation learning. Compared to learnable prompts, whose effectiveness depends on the quality and the diversity of training data, text-based instructions intrinsically encapsulate task information and support the model to generalize beyond the structure seen during pretraining. To capture high-order relations with task information in a context-aware manner, a novel prompting hypergraph convolution layer is devised to integrate instructions into information propagation in hypergraphs. Extensive experiments conducted on three public datasets verify the superiority of IHP in various scenarios.
- [97] arXiv:2403.19066 [pdf, other]
-
Title: Generative Quanta Color ImagingComments: Accepted at IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2024Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
The astonishing development of single-photon cameras has created an unprecedented opportunity for scientific and industrial imaging. However, the high data throughput generated by these 1-bit sensors creates a significant bottleneck for low-power applications. In this paper, we explore the possibility of generating a color image from a single binary frame of a single-photon camera. We evidently find this problem being particularly difficult to standard colorization approaches due to the substantial degree of exposure variation. The core innovation of our paper is an exposure synthesis model framed under a neural ordinary differential equation (Neural ODE) that allows us to generate a continuum of exposures from a single observation. This innovation ensures consistent exposure in binary images that colorizers take on, resulting in notably enhanced colorization. We demonstrate applications of the method in single-image and burst colorization and show superior generative performance over baselines. Project website can be found at https://vishal-s-p.github.io/projects/2023/generative_quanta_color.html.
- [98] arXiv:2403.19067 [pdf, other]
-
Title: Low-Rank Rescaled Vision Transformer Fine-Tuning: A Residual Design ApproachSubjects: Computer Vision and Pattern Recognition (cs.CV)
Parameter-efficient fine-tuning for pre-trained Vision Transformers aims to adeptly tailor a model to downstream tasks by learning a minimal set of new adaptation parameters while preserving the frozen majority of pre-trained parameters. Striking a balance between retaining the generalizable representation capacity of the pre-trained model and acquiring task-specific features poses a key challenge. Currently, there is a lack of focus on guiding this delicate trade-off. In this study, we approach the problem from the perspective of Singular Value Decomposition (SVD) of pre-trained parameter matrices, providing insights into the tuning dynamics of existing methods. Building upon this understanding, we propose a Residual-based Low-Rank Rescaling (RLRR) fine-tuning strategy. This strategy not only enhances flexibility in parameter tuning but also ensures that new parameters do not deviate excessively from the pre-trained model through a residual design. Extensive experiments demonstrate that our method achieves competitive performance across various downstream image classification tasks, all while maintaining comparable new parameters. We believe this work takes a step forward in offering a unified perspective for interpreting existing methods and serves as motivation for the development of new approaches that move closer to effectively considering the crucial trade-off mentioned above. Our code is available at \href{https://github.com/zstarN70/RLRR.git}{https://github.com/zstarN70/RLRR.git}.
- [99] arXiv:2403.19072 [pdf, other]
-
Title: AssetHarvester: A Static Analysis Tool for Detecting Assets Protected by Secrets in Software ArtifactsAuthors: Setu Kumar Basak, K. Virgil English, Ken Ogura, Vitesh Kambara, Bradley Reaves, Laurie WilliamsSubjects: Cryptography and Security (cs.CR); Software Engineering (cs.SE)
GitGuardian monitored secrets exposure in public GitHub repositories and reported developers leaked over 12 million secrets (database and other credentials) in 2023, indicating a 113% surge from 2021. Despite the availability of secret detection tools, developers ignore the tools' reported warnings because of false positives (25%-99%). However, each secret protects assets of different values accessible through asset identifiers (a DNS name and a public or private IP address). The asset information for a secret can aid developers in filtering false positives and prioritizing secret removal from the source code. However, existing secret detection tools do not provide the asset information, thus presenting difficulty to developers in filtering secrets only by looking at the secret value or finding the assets manually for each reported secret. The goal of our study is to aid software practitioners in prioritizing secrets removal by providing the assets information protected by the secrets through our novel static analysis tool. We present AssetHarvester, a static analysis tool to detect secret-asset pairs in a repository. Since the location of the asset can be distant from where the secret is defined, we investigated secret-asset co-location patterns and found four patterns. To identify the secret-asset pairs of the four patterns, we utilized three approaches (pattern matching, data flow analysis, and fast-approximation heuristics). We curated a benchmark of 1,791 secret-asset pairs of four database types extracted from 188 public GitHub repositories to evaluate the performance of AssetHarvester. AssetHarvester demonstrates precision of (97%), recall (90%), and F1-score (94%) in detecting secret-asset pairs. Our findings indicate that data flow analysis employed in AssetHarvester detects secret-asset pairs with 0% false positives and aids in improving the recall of secret detection tools.
- [100] arXiv:2403.19073 [pdf, ps, other]
-
Title: Dataflow-Aware PIM-Enabled Manycore Architecture for Deep Learning WorkloadsComments: Presented at DATE Conference, Valencia, Spain 2024Subjects: Hardware Architecture (cs.AR); Artificial Intelligence (cs.AI); Emerging Technologies (cs.ET)
Processing-in-memory (PIM) has emerged as an enabler for the energy-efficient and high-performance acceleration of deep learning (DL) workloads. Resistive random-access memory (ReRAM) is one of the most promising technologies to implement PIM. However, as the complexity of Deep convolutional neural networks (DNNs) grows, we need to design a manycore architecture with multiple ReRAM-based processing elements (PEs) on a single chip. Existing PIM-based architectures mostly focus on computation while ignoring the role of communication. ReRAM-based tiled manycore architectures often involve many Processing Elements (PEs), which need to be interconnected via an efficient on-chip communication infrastructure. Simply allocating more resources (ReRAMs) to speed up only computation is ineffective if the communication infrastructure cannot keep up with it. In this paper, we highlight the design principles of a dataflow-aware PIM-enabled manycore platform tailor-made for various types of DL workloads. We consider the design challenges with both 2.5D interposer- and 3D integration-enabled architectures.
- [101] arXiv:2403.19075 [pdf, other]
-
Title: Efficient Preference Elicitation in Iterative Combinatorial Auctions with Many ParticipantsSubjects: Computer Science and Game Theory (cs.GT)
We study the problem of achieving high efficiency in iterative combinatorial auctions (ICAs). ICAs are a kind of combinatorial auction where the auctioneer interacts with bidders to gather their valuation information using a limited number of queries, aiming for efficient allocation. Preference elicitation, a process that incrementally asks bidders to value bundles while refining the outcome allocation, is a commonly used technique in ICAs. Recently, the integration of machine learning (ML) into ICAs has significantly improved preference elicitation. This approach employs ML models that match the number of bidders, estimating each bidder's valuation functions based on their reported valuations. However, most current studies train a separate model for each bidder, which can be inefficient when there are numerous bidders with similar valuation functions and a limited number of available queries. In this study, we introduce a multi-task learning method to learn valuation functions more efficiently. Specifically, we propose to share model parameters during training to grasp the intrinsic relationships between valuations. We assess the performance of our method using a spectrum auction simulator. The findings demonstrate that our method achieves higher efficiency than existing methods, especially in scenarios with many bidders and items but a limited number of maximum queries.
- [102] arXiv:2403.19076 [pdf, other]
-
Title: Tiny Machine Learning: Progress and FuturesComments: IEEE Circuits and Systems Magazine (2023). arXiv admin note: text overlap with arXiv:2206.15472Journal-ref: IEEE Circuits and Systems Magazine, 23(3), pp. 8-34, October 2023Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Tiny Machine Learning (TinyML) is a new frontier of machine learning. By squeezing deep learning models into billions of IoT devices and microcontrollers (MCUs), we expand the scope of AI applications and enable ubiquitous intelligence. However, TinyML is challenging due to hardware constraints: the tiny memory resource makes it difficult to hold deep learning models designed for cloud and mobile platforms. There is also limited compiler and inference engine support for bare-metal devices. Therefore, we need to co-design the algorithm and system stack to enable TinyML. In this review, we will first discuss the definition, challenges, and applications of TinyML. We then survey the recent progress in TinyML and deep learning on MCUs. Next, we will introduce MCUNet, showing how we can achieve ImageNet-scale AI applications on IoT devices with system-algorithm co-design. We will further extend the solution from inference to training and introduce tiny on-device training techniques. Finally, we present future directions in this area. Today's large model might be tomorrow's tiny model. The scope of TinyML should evolve and adapt over time.
- [103] arXiv:2403.19078 [pdf, other]
-
Title: MVEB: Self-Supervised Learning with Multi-View Entropy BottleneckComments: Accepted by TPAMISubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Self-supervised learning aims to learn representation that can be effectively generalized to downstream tasks. Many self-supervised approaches regard two views of an image as both the input and the self-supervised signals, assuming that either view contains the same task-relevant information and the shared information is (approximately) sufficient for predicting downstream tasks. Recent studies show that discarding superfluous information not shared between the views can improve generalization. Hence, the ideal representation is sufficient for downstream tasks and contains minimal superfluous information, termed minimal sufficient representation. One can learn this representation by maximizing the mutual information between the representation and the supervised view while eliminating superfluous information. Nevertheless, the computation of mutual information is notoriously intractable. In this work, we propose an objective termed multi-view entropy bottleneck (MVEB) to learn minimal sufficient representation effectively. MVEB simplifies the minimal sufficient learning to maximizing both the agreement between the embeddings of two views and the differential entropy of the embedding distribution. Our experiments confirm that MVEB significantly improves performance. For example, it achieves top-1 accuracy of 76.9\% on ImageNet with a vanilla ResNet-50 backbone on linear evaluation. To the best of our knowledge, this is the new state-of-the-art result with ResNet-50.
- [104] arXiv:2403.19079 [pdf, other]
-
Title: A Real-Time Framework for Domain-Adaptive Underwater Object Detection with Image EnhancementComments: accepted by ICRA24Subjects: Computer Vision and Pattern Recognition (cs.CV)
In recent years, significant progress has been made in the field of underwater image enhancement (UIE). However, its practical utility for high-level vision tasks, such as underwater object detection (UOD) in Autonomous Underwater Vehicles (AUVs), remains relatively unexplored. It may be attributed to several factors: (1) Existing methods typically employ UIE as a pre-processing step, which inevitably introduces considerable computational overhead and latency. (2) The process of enhancing images prior to training object detectors may not necessarily yield performance improvements. (3) The complex underwater environments can induce significant domain shifts across different scenarios, seriously deteriorating the UOD performance. To address these challenges, we introduce EnYOLO, an integrated real-time framework designed for simultaneous UIE and UOD with domain-adaptation capability. Specifically, both the UIE and UOD task heads share the same network backbone and utilize a lightweight design. Furthermore, to ensure balanced training for both tasks, we present a multi-stage training strategy aimed at consistently enhancing their performance. Additionally, we propose a novel domain-adaptation strategy to align feature embeddings originating from diverse underwater environments. Comprehensive experiments demonstrate that our framework not only achieves state-of-the-art (SOTA) performance in both UIE and UOD tasks, but also shows superior adaptability when applied to different underwater scenarios. Our efficiency analysis further highlights the substantial potential of our framework for onboard deployment.
- [105] arXiv:2403.19080 [pdf, other]
-
Title: MMCert: Provable Defense against Adversarial Attacks to Multi-modal ModelsSubjects: Computer Vision and Pattern Recognition (cs.CV); Cryptography and Security (cs.CR)
Different from a unimodal model whose input is from a single modality, the input (called multi-modal input) of a multi-modal model is from multiple modalities such as image, 3D points, audio, text, etc. Similar to unimodal models, many existing studies show that a multi-modal model is also vulnerable to adversarial perturbation, where an attacker could add small perturbation to all modalities of a multi-modal input such that the multi-modal model makes incorrect predictions for it. Existing certified defenses are mostly designed for unimodal models, which achieve sub-optimal certified robustness guarantees when extended to multi-modal models as shown in our experimental results. In our work, we propose MMCert, the first certified defense against adversarial attacks to a multi-modal model. We derive a lower bound on the performance of our MMCert under arbitrary adversarial attacks with bounded perturbations to both modalities (e.g., in the context of auto-driving, we bound the number of changed pixels in both RGB image and depth image). We evaluate our MMCert using two benchmark datasets: one for the multi-modal road segmentation task and the other for the multi-modal emotion recognition task. Moreover, we compare our MMCert with a state-of-the-art certified defense extended from unimodal models. Our experimental results show that our MMCert outperforms the baseline.
- [106] arXiv:2403.19082 [pdf, other]
-
Title: Enhancing Conformal Prediction Using E-Test StatisticsSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Statistics Theory (math.ST)
Conformal Prediction (CP) serves as a robust framework that quantifies uncertainty in predictions made by Machine Learning (ML) models. Unlike traditional point predictors, CP generates statistically valid prediction regions, also known as prediction intervals, based on the assumption of data exchangeability. Typically, the construction of conformal predictions hinges on p-values. This paper, however, ventures down an alternative path, harnessing the power of e-test statistics to augment the efficacy of conformal predictions by introducing a BB-predictor (bounded from the below predictor).
- [107] arXiv:2403.19083 [pdf, other]
-
Title: Improving Cancer Imaging Diagnosis with Bayesian Networks and Deep Learning: A Bayesian Deep Learning ApproachAuthors: Pei Xi (Alex)LinSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Image and Video Processing (eess.IV)
With recent advancements in the development of artificial intelligence applications using theories and algorithms in machine learning, many accurate models can be created to train and predict on given datasets. With the realization of the importance of imaging interpretation in cancer diagnosis, this article aims to investigate the theory behind Deep Learning and Bayesian Network prediction models. Based on the advantages and drawbacks of each model, different approaches will be used to construct a Bayesian Deep Learning Model, combining the strengths while minimizing the weaknesses. Finally, the applications and accuracy of the resulting Bayesian Deep Learning approach in the health industry in classifying images will be analyzed.
- [108] arXiv:2403.19085 [pdf, other]
-
Title: Real-time accident detection and physiological signal monitoring to enhance motorbike safety and emergency responseAuthors: S. M. Kayser Mehbub Siam, Khadiza Islam Sumaiya, Md Rakib Al-Amin, Tamim Hasan Turjo, Ahsanul Islam, A.H.M.A. Rahim, Md Rakibul HasanSubjects: Systems and Control (eess.SY); Computers and Society (cs.CY); Human-Computer Interaction (cs.HC)
Rapid urbanization and improved living standards have led to a substantial increase in the number of vehicles on the road, consequently resulting in a rise in the frequency of accidents. Among these accidents, motorbike accidents pose a particularly high risk, often resulting in serious injuries or deaths. A significant number of these fatalities occur due to delayed or inadequate medical attention. To this end, we propose a novel automatic detection and notification system specifically designed for motorbike accidents. The proposed system comprises two key components: a detection system and a physiological signal monitoring system. The detection system is integrated into the helmet and consists of a microcontroller, accelerometer, GPS, GSM, and Wi-Fi modules. The physio-monitoring system incorporates a sensor for monitoring pulse rate and SpO$_{2}$ saturation. All collected data are presented on an LCD display and wirelessly transmitted to the detection system through the microcontroller of the physiological signal monitoring system. If the accelerometer readings consistently deviate from the specified threshold decided through extensive experimentation, the system identifies the event as an accident and transmits the victim's information -- including the GPS location, pulse rate, and SpO$_{2}$ saturation rate -- to the designated emergency contacts. Preliminary results demonstrate the efficacy of the proposed system in accurately detecting motorbike accidents and promptly alerting emergency contacts. We firmly believe that the proposed system has the potential to significantly mitigate the risks associated with motorbike accidents and save lives.
- [109] arXiv:2403.19090 [pdf, other]
-
Title: A Stabilized Physics Informed Neural Networks Method for Wave EquationsSubjects: Numerical Analysis (math.NA); Mathematical Physics (math-ph)
In this article, we propose a novel Stabilized Physics Informed Neural Networks method (SPINNs) for solving wave equations. In general, this method not only demonstrates theoretical convergence but also exhibits higher efficiency compared to the original PINNs. By replacing the $L^2$ norm with $H^1$ norm in the learning of initial condition and boundary condition, we theoretically proved that the error of solution can be upper bounded by the risk in SPINNs. Based on this, we decompose the error of SPINNs into approximation error, statistical error and optimization error. Furthermore, by applying the approximating theory of $ReLU^3$ networks and the learning theory on Rademacher complexity, covering number and pseudo-dimension of neural networks, we present a systematical non-asymptotic convergence analysis on our method, which shows that the error of SPINNs can be well controlled if the number of training samples, depth and width of the deep neural networks have been appropriately chosen. Two illustrative numerical examples on 1-dimensional and 2-dimensional wave equations demonstrate that SPINNs can achieve a faster and better convergence than classical PINNs method.
- [110] arXiv:2403.19093 [pdf, other]
-
Title: Task2Morph: Differentiable Task-inspired Framework for Contact-Aware Robot DesignComments: 9 pages, 10 figures, published to IROSJournal-ref: 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2023: 452-459Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)
Optimizing the morphologies and the controllers that adapt to various tasks is a critical issue in the field of robot design, aka. embodied intelligence. Previous works typically model it as a joint optimization problem and use search-based methods to find the optimal solution in the morphology space. However, they ignore the implicit knowledge of task-to-morphology mapping which can directly inspire robot design. For example, flipping heavier boxes tends to require more muscular robot arms. This paper proposes a novel and general differentiable task-inspired framework for contact-aware robot design called Task2Morph. We abstract task features highly related to task performance and use them to build a task-to-morphology mapping. Further, we embed the mapping into a differentiable robot design process, where the gradient information is leveraged for both the mapping learning and the whole optimization. The experiments are conducted on three scenarios, and the results validate that Task2Morph outperforms DiffHand, which lacks a task-inspired morphology module, in terms of efficiency and effectiveness.
- [111] arXiv:2403.19094 [pdf, other]
-
Title: Learning From Correctness Without Prompting Makes LLM Efficient ReasonerAuthors: Yuxuan Yao, Han Wu, Zhijiang Guo, Biyan Zhou, Jiahui Gao, Sichun Luo, Hanxu Hou, Xiaojin Fu, Linqi SongSubjects: Computation and Language (cs.CL)
Large language models (LLMs) have demonstrated outstanding performance across various tasks, yet they still exhibit limitations such as hallucination, unfaithful reasoning, and toxic content. One potential approach to mitigate these issues is learning from human or external feedback (e.g. tools). In this paper, we introduce an intrinsic self-correct reasoning framework for LLMs that eliminates the need for human feedback, external tools, and handcraft prompts. The proposed framework, based on a multi-step reasoning paradigm \textbf{Le}arning from \textbf{Co}rrectness (\textsc{LeCo}), improves reasoning performance without needing to learn from errors. This paradigm prioritizes learning from correct reasoning steps, and a unique method to measure confidence for each reasoning step based on generation logits. Experimental results across various multi-step reasoning tasks demonstrate the effectiveness of the framework in improving reasoning performance with reduced token consumption.
- [112] arXiv:2403.19095 [pdf, ps, other]
-
Title: Purposeful remixing with generative AI: Constructing designer voice in multimodal composingSubjects: Computers and Society (cs.CY)
Voice, the discursive construction of the writer's identity, has been extensively studied and theorized in composition studies. In multimodal writing, students are able to mobilize both linguistic and non linguistic resources to express their real or imagined identities. But at the same time, when students are limited to choose from available online resources, their voices might be compromised due to the incompatibility between their authorial intentions and the existing materials. This study, therefore, investigates whether the use of generative AI tools could help student authors construct a more consistent voice in multimodal writing. In this study, we have designed a photo essay assignment where students recount a story in the form of photo essays and prompt AI image generating tools to create photos for their storytelling. Drawing on interview data, written reflection, written annotation, and multimodal products from seven focal participants, we have identified two remixing practices, through which students attempted to establish a coherent and unique voice in writing. The study sheds light on the intentional and discursive nature of multimodal writing with AI as afforded by the technological flexibility, while also highlighting the practical and ethical challenges that could be attributed to students insufficient prompt and multimodal literacy and the innate limitations of AI systems. This study provides important implications for incorporating AI tools in designing multimodal writing tasks.
- [113] arXiv:2403.19096 [pdf, other]
-
Title: SCALE: Constructing Structured Natural Language Comment Trees for Software Vulnerability DetectionComments: Accepted by ISSTA 2024Subjects: Software Engineering (cs.SE); Cryptography and Security (cs.CR)
Recently, there has been a growing interest in automatic software vulnerability detection. Pre-trained model-based approaches have demonstrated superior performance than other Deep Learning (DL)-based approaches in detecting vulnerabilities. However, the existing pre-trained model-based approaches generally employ code sequences as input during prediction, and may ignore vulnerability-related structural information, as reflected in the following two aspects. First, they tend to fail to infer the semantics of the code statements with complex logic such as those containing multiple operators and pointers. Second, they are hard to comprehend various code execution sequences, which is essential for precise vulnerability detection.
To mitigate the challenges, we propose a Structured Natural Language Comment tree-based vulnerAbiLity dEtection framework based on the pre-trained models, named SCALE. The proposed Structured Natural Language Comment Tree (SCT) integrates the semantics of code statements with code execution sequences based on the Abstract Syntax Trees (ASTs). Specifically, SCALE comprises three main modules: (1) Comment Tree Construction, which aims at enhancing the model's ability to infer the semantics of code statements by first incorporating Large Language Models (LLMs) for comment generation and then adding the comment node to ASTs. (2) Structured Natural Language Comment Tree Construction}, which aims at explicitly involving code execution sequence by combining the code syntax templates with the comment tree. (3) SCT-Enhanced Representation, which finally incorporates the constructed SCTs for well capturing vulnerability patterns. - [114] arXiv:2403.19098 [pdf, other]
-
Title: GraphAD: Interaction Scene Graph for End-to-end Autonomous DrivingAuthors: Yunpeng Zhang, Deheng Qian, Ding Li, Yifeng Pan, Yong Chen, Zhenbao Liang, Zhiyao Zhang, Shurui Zhang, Hongxu Li, Maolei Fu, Yun Ye, Zhujin Liang, Yi Shan, Dalong DuComments: project page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
Modeling complicated interactions among the ego-vehicle, road agents, and map elements has been a crucial part for safety-critical autonomous driving. Previous works on end-to-end autonomous driving rely on the attention mechanism for handling heterogeneous interactions, which fails to capture the geometric priors and is also computationally intensive. In this paper, we propose the Interaction Scene Graph (ISG) as a unified method to model the interactions among the ego-vehicle, road agents, and map elements. With the representation of the ISG, the driving agents aggregate essential information from the most influential elements, including the road agents with potential collisions and the map elements to follow. Since a mass of unnecessary interactions are omitted, the more efficient scene-graph-based framework is able to focus on indispensable connections and leads to better performance. We evaluate the proposed method for end-to-end autonomous driving on the nuScenes dataset. Compared with strong baselines, our method significantly outperforms in the full-stack driving tasks, including perception, prediction, and planning. Code will be released at https://github.com/zhangyp15/GraphAD.
- [115] arXiv:2403.19101 [pdf, other]
-
Title: AAPMT: AGI Assessment Through Prompt and Metric TransformerAuthors: Benhao HuangSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
The emergence of text-to-image models marks a significant milestone in the evolution of AI-generated images (AGIs), expanding their use in diverse domains like design, entertainment, and more. Despite these breakthroughs, the quality of AGIs often remains suboptimal, highlighting the need for effective evaluation methods. These methods are crucial for assessing the quality of images relative to their textual descriptions, and they must accurately mirror human perception. Substantial progress has been achieved in this domain, with innovative techniques such as BLIP and DBCNN contributing significantly. However, recent studies, including AGIQA-3K, reveal a notable discrepancy between current methods and state-of-the-art (SOTA) standards. This gap emphasizes the necessity for a more sophisticated and precise evaluation metric. In response, our objective is to develop a model that could give ratings for metrics, which focuses on parameters like perceptual quality, authenticity, and the correspondence between text and image, that more closely aligns with human perception. In our paper, we introduce a range of effective methods, including prompt designs and the Metric Transformer. The Metric Transformer is a novel structure inspired by the complex interrelationships among various AGI quality metrics. The code is available at https://github.com/huskydoge/CS3324-Digital-Image-Processing/tree/main/Assignment1
- [116] arXiv:2403.19102 [pdf, other]
-
Title: Automatic Fingerpad Customization for Precise and Stable Grasping of 3D-Print PartsSubjects: Robotics (cs.RO)
The rise in additive manufacturing comes with unique opportunities and challenges. Massive part customization and rapid design changes are made possible with additive manufacturing, however, manufacturing industries that desire the implementation of robotics automation to improve production efficiency could face challenges in the gripper design and grasp planning due to highly complex geometrical shapes resulting from massive part customization. Yet, current gripper design for such objects are often manual and rely on ad-hoc design intuition. This would be limiting as such grippers would lack the ability to grasp different objects or grasp points, which is important for practical implementations. Hence, we introduce a fast, end-to-end approach to customize rigid gripper fingerpads that could achieve precise and stable grasping for different objects at multiple grasp points. Our approach relies on two key components: (i) a method based on set Boolean operations, e.g. intersections, subtractions, and unions to extract object features and synthesize gripper surfaces that conform to different local shapes to form caging grasps; (ii) a method to evaluate the grasp quality of synthesized grippers. We experimentally demonstrate the validity of our approach by synthesizing fingerpads that, once mounted on a physical robot gripper, are able to grasp different objects at multiple grasp points, all with tightly constrained grasps.
- [117] arXiv:2403.19103 [pdf, other]
-
Title: Automated Black-box Prompt Engineering for Personalized Text-to-Image GenerationAuthors: Yutong He, Alexander Robey, Naoki Murata, Yiding Jiang, Joshua Williams, George J. Pappas, Hamed Hassani, Yuki Mitsufuji, Ruslan Salakhutdinov, J. Zico KolterSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
Prompt engineering is effective for controlling the output of text-to-image (T2I) generative models, but it is also laborious due to the need for manually crafted prompts. This challenge has spurred the development of algorithms for automated prompt generation. However, these methods often struggle with transferability across T2I models, require white-box access to the underlying model, and produce non-intuitive prompts. In this work, we introduce PRISM, an algorithm that automatically identifies human-interpretable and transferable prompts that can effectively generate desired concepts given only black-box access to T2I models. Inspired by large language model (LLM) jailbreaking, PRISM leverages the in-context learning ability of LLMs to iteratively refine the candidate prompts distribution for given reference images. Our experiments demonstrate the versatility and effectiveness of PRISM in generating accurate prompts for objects, styles and images across multiple T2I models, including Stable Diffusion, DALL-E, and Midjourney.
- [118] arXiv:2403.19104 [pdf, other]
-
Title: CRKD: Enhanced Camera-Radar Object Detection with Cross-modality Knowledge DistillationComments: Accepted to CVPR 2024Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
In the field of 3D object detection for autonomous driving, LiDAR-Camera (LC) fusion is the top-performing sensor configuration. Still, LiDAR is relatively high cost, which hinders adoption of this technology for consumer automobiles. Alternatively, camera and radar are commonly deployed on vehicles already on the road today, but performance of Camera-Radar (CR) fusion falls behind LC fusion. In this work, we propose Camera-Radar Knowledge Distillation (CRKD) to bridge the performance gap between LC and CR detectors with a novel cross-modality KD framework. We use the Bird's-Eye-View (BEV) representation as the shared feature space to enable effective knowledge distillation. To accommodate the unique cross-modality KD path, we propose four distillation losses to help the student learn crucial features from the teacher model. We present extensive evaluations on the nuScenes dataset to demonstrate the effectiveness of the proposed CRKD framework. The project page for CRKD is https://song-jingyu.github.io/CRKD.
- [119] arXiv:2403.19105 [pdf, ps, other]
-
Title: Pilot Signal and Channel Estimator Co-Design for Hybrid-Field XL-MIMOSubjects: Information Theory (cs.IT); Signal Processing (eess.SP)
This paper addresses the intricate task of hybrid-field channel estimation in extremely large-scale MIMO (XL-MIMO) systems, critical for the progression of 6G communications. Within these systems, comprising a line-of-sight (LoS) channel component alongside far-field and near-field scattering channel components, our objective is to tackle the channel estimation challenge. We encounter two central hurdles for ensuring dependable sparse channel recovery: the design of pilot signals and channel estimators tailored for hybrid-field communications. To overcome the first challenge, we propose a method to derive optimal pilot signals, aimed at minimizing the mutual coherence of the sensing matrix within the context of compressive sensing (CS) problems. These optimal signals are derived using the alternating direction method of multipliers (ADMM), ensuring robust performance in sparse channel recovery. Additionally, leveraging the acquired optimal pilot signal, we introduce a two-stage channel estimation approach that sequentially estimates the LoS channel component and the hybrid-field scattering channel components. Simulation results attest to the superiority of our co-designed approach for pilot signal and channel estimation over conventional CS-based methods, providing more reliable sparse channel recovery in practical scenarios.
- [120] arXiv:2403.19107 [pdf, ps, other]
-
Title: Synthetic Medical Imaging Generation with Generative Adversarial Networks For Plain RadiographsAuthors: John R. McNulty, Lee Kho, Alexandria L. Case, Charlie Fornaca, Drew Johnston, David Slater, Joshua M. Abzug, Sybil A. RussellSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
In medical imaging, access to data is commonly limited due to patient privacy restrictions and the issue that it can be difficult to acquire enough data in the case of rare diseases.[1] The purpose of this investigation was to develop a reusable open-source synthetic image generation pipeline, the GAN Image Synthesis Tool (GIST), that is easy to use as well as easy to deploy. The pipeline helps to improve and standardize AI algorithms in the digital health space by generating high quality synthetic image data that is not linked to specific patients. Its image generation capabilities include the ability to generate imaging of pathologies or injuries with low incidence rates. This improvement of digital health AI algorithms could improve diagnostic accuracy, aid in patient care, decrease medicolegal claims, and ultimately decrease the overall cost of healthcare. The pipeline builds on existing Generative Adversarial Networks (GANs) algorithms, and preprocessing and evaluation steps were included for completeness. For this work, we focused on ensuring the pipeline supports radiography, with a focus on synthetic knee and elbow x-ray images. In designing the pipeline, we evaluated the performance of current GAN architectures, studying the performance on available x-ray data. We show that the pipeline is capable of generating high quality and clinically relevant images based on a lay person's evaluation and the Fr\'echet Inception Distance (FID) metric.
- [121] arXiv:2403.19109 [pdf, ps, other]
-
Title: Enhancing Evolutionary Solver Efficiency for NP Hard Single Machine Scheduling ProblemsComments: 11 pages, 13 figures, International Journal of Science and Research (IJSR), ISSN: 2319-7064, Volume 13 Issue 28, November 2023Journal-ref: International Journal of Science and Research (IJSR), ISSN: 2319-7064, Volume 13 Issue 28, November 2023Subjects: Computational Engineering, Finance, and Science (cs.CE)
The study explores the optimization of evolutionary solver parameters for minimizing total tardiness in single machine scheduling, an NP-hard problem with zero ready times included. It investigates various parameter combinations, including population sizes, mutation rates, and a constant convergence rate, both above and below default values. The aim is to enhance the solver's effectiveness in addressing this complex challenge. The findings contribute to improving scheduling efficiency in manufacturing and operations management contexts.
- [122] arXiv:2403.19111 [pdf, other]
-
Title: Patch Spatio-Temporal Relation Prediction for Video Anomaly DetectionSubjects: Computer Vision and Pattern Recognition (cs.CV)
Video Anomaly Detection (VAD), aiming to identify abnormalities within a specific context and timeframe, is crucial for intelligent Video Surveillance Systems. While recent deep learning-based VAD models have shown promising results by generating high-resolution frames, they often lack competence in preserving detailed spatial and temporal coherence in video frames. To tackle this issue, we propose a self-supervised learning approach for VAD through an inter-patch relationship prediction task. Specifically, we introduce a two-branch vision transformer network designed to capture deep visual features of video frames, addressing spatial and temporal dimensions responsible for modeling appearance and motion patterns, respectively. The inter-patch relationship in each dimension is decoupled into inter-patch similarity and the order information of each patch. To mitigate memory consumption, we convert the order information prediction task into a multi-label learning problem, and the inter-patch similarity prediction task into a distance matrix regression problem. Comprehensive experiments demonstrate the effectiveness of our method, surpassing pixel-generation-based methods by a significant margin across three public benchmarks. Additionally, our approach outperforms other self-supervised learning-based methods.
- [123] arXiv:2403.19112 [pdf, other]
-
Title: Uncover the Premeditated Attacks: Detecting Exploitable Reentrancy Vulnerabilities by Identifying Attacker ContractsComments: Accepted by ICSE 2024Subjects: Cryptography and Security (cs.CR); Software Engineering (cs.SE)
Reentrancy, a notorious vulnerability in smart contracts, has led to millions of dollars in financial loss. However, current smart contract vulnerability detection tools suffer from a high false positive rate in identifying contracts with reentrancy vulnerabilities. Moreover, only a small portion of the detected reentrant contracts can actually be exploited by hackers, making these tools less effective in securing the Ethereum ecosystem in practice.
In this paper, we propose BlockWatchdog, a tool that focuses on detecting reentrancy vulnerabilities by identifying attacker contracts. These attacker contracts are deployed by hackers to exploit vulnerable contracts automatically. By focusing on attacker contracts, BlockWatchdog effectively detects truly exploitable reentrancy vulnerabilities by identifying reentrant call flow. Additionally, BlockWatchdog is capable of detecting new types of reentrancy vulnerabilities caused by poor designs when using ERC tokens or user-defined interfaces, which cannot be detected by current rule-based tools. We implement BlockWatchdog using cross-contract static dataflow techniques based on attack logic obtained from an empirical study that analyzes attacker contracts from 281 attack incidents. BlockWatchdog is evaluated on 421,889 Ethereum contract bytecodes and identifies 113 attacker contracts that target 159 victim contracts, leading to the theft of Ether and tokens valued at approximately 908.6 million USD. Notably, only 18 of the identified 159 victim contracts can be reported by current reentrancy detection tools. - [124] arXiv:2403.19113 [pdf, other]
-
Title: FACTOID: FACtual enTailment fOr hallucInation DetectionAuthors: Vipula Rawte, S.M Towhidul Islam Tonmoy, Krishnav Rajbangshi, Shravani Nag, Aman Chadha, Amit P. Sheth, Amitava DasSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
The widespread adoption of Large Language Models (LLMs) has facilitated numerous benefits. However, hallucination is a significant concern. In response, Retrieval Augmented Generation (RAG) has emerged as a highly promising paradigm to improve LLM outputs by grounding them in factual information. RAG relies on textual entailment (TE) or similar methods to check if the text produced by LLMs is supported or contradicted, compared to retrieved documents. This paper argues that conventional TE methods are inadequate for spotting hallucinations in content generated by LLMs. For instance, consider a prompt about the 'USA's stance on the Ukraine war''. The AI-generated text states, ...U.S. President Barack Obama says the U.S. will not put troops in Ukraine...'' However, during the war the U.S. president is Joe Biden which contradicts factual reality. Moreover, current TE systems are unable to accurately annotate the given text and identify the exact portion that is contradicted. To address this, we introduces a new type of TE called ``Factual Entailment (FE).'', aims to detect factual inaccuracies in content generated by LLMs while also highlighting the specific text segment that contradicts reality. We present FACTOID (FACTual enTAILment for hallucInation Detection), a benchmark dataset for FE. We propose a multi-task learning (MTL) framework for FE, incorporating state-of-the-art (SoTA) long text embeddings such as e5-mistral-7b-instruct, along with GPT-3, SpanBERT, and RoFormer. The proposed MTL architecture for FE achieves an avg. 40\% improvement in accuracy on the FACTOID benchmark compared to SoTA TE methods. As FE automatically detects hallucinations, we assessed 15 modern LLMs and ranked them using our proposed Auto Hallucination Vulnerability Index (HVI_auto). This index quantifies and offers a comparative scale to evaluate and rank LLMs according to their hallucinations.
- [125] arXiv:2403.19114 [pdf, other]
-
Title: Top Leaderboard Ranking = Top Coding Proficiency, Always? EvoEval: Evolving Coding Benchmarks via LLMSubjects: Software Engineering (cs.SE); Computation and Language (cs.CL); Machine Learning (cs.LG); Programming Languages (cs.PL)
LLMs have become the go-to choice for code generation tasks, with an exponential increase in the training, development, and usage of LLMs specifically for code generation. To evaluate the ability of LLMs on code, both academic and industry practitioners rely on popular handcrafted benchmarks. However, prior benchmarks contain only a very limited set of problems, both in quantity and variety. Further, due to popularity and age, many benchmarks are prone to data leakage where example solutions can be readily found on the web and thus potentially in training data. Such limitations inevitably lead us to inquire: Is the leaderboard performance on existing benchmarks reliable and comprehensive enough to measure the program synthesis ability of LLMs? To address this, we introduce EvoEval -- a program synthesis benchmark suite created by evolving existing benchmarks into different targeted domains for a comprehensive evaluation of LLM coding abilities. Our study on 51 LLMs shows that compared to the high performance obtained on standard benchmarks like HumanEval, there is a significant drop in performance (on average 39.4%) when using EvoEval. Additionally, the decrease in performance can range from 19.6% to 47.7%, leading to drastic ranking changes amongst LLMs and showing potential overfitting of existing benchmarks. Furthermore, we showcase various insights, including the brittleness of instruction-following models when encountering rewording or subtle changes as well as the importance of learning problem composition and decomposition. EvoEval not only provides comprehensive benchmarks, but can be used to further evolve arbitrary problems to keep up with advances and the ever-changing landscape of LLMs for code. We have open-sourced our benchmarks, tools, and complete LLM generations at https://github.com/evo-eval/evoeval
- [126] arXiv:2403.19115 [pdf, other]
-
Title: HiRoPE: Length Extrapolation for Code ModelsSubjects: Software Engineering (cs.SE)
Addressing the limitation of context length in large language models for code-related tasks is the primary focus of this paper. Existing LLMs are constrained by their pre-trained context lengths, leading to performance issues in handling long complex code sequences. Inspired by how human programmers navigate code, we introduce Hierarchical Rotary Position Embedding (HiRoPE), a novel approach that enhances the traditional rotary position embedding into a hierarchical format based on the hierarchical structure of source code. HiRoPE offers easy integration into existing LLMs without extra training costs. Our method is extensively evaluated with various LLMs, demonstrating stable performance in tasks such as language modeling and long code completion. We also introduce a new long code understanding task with real-world code projects, in hopes of promoting further development in this code-related field. Theoretically and experimentally, we find that HiRoPE also addresses the out-of-distribution issue in position encoding. Our HiRoPE significantly expands the context length capabilities of LLMs, enabling inference at lengths exponentially greater than the training length.
- [127] arXiv:2403.19116 [pdf, ps, other]
-
Title: MFORT-QA: Multi-hop Few-shot Open Rich Table Question AnsweringComments: 8 pagesSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
In today's fast-paced industry, professionals face the challenge of summarizing a large number of documents and extracting vital information from them on a daily basis. These metrics are frequently hidden away in tables and/or their nested hyperlinks. To address this challenge, the approach of Table Question Answering (QA) has been developed to extract the relevant information. However, traditional Table QA training tasks that provide a table and an answer(s) from a gold cell coordinate(s) for a question may not always ensure extracting the accurate answer(s). Recent advancements in Large Language Models (LLMs) have opened up new possibilities for extracting information from tabular data using prompts. In this paper, we introduce the Multi-hop Few-shot Open Rich Table QA (MFORT-QA) approach, which consists of two major steps. The first step involves Few-Shot Learning (FSL), where relevant tables and associated contexts of hyperlinks are retrieved based on a given question. The retrieved content is then used to construct few-shot prompts as inputs to an LLM, such as ChatGPT. To tackle the challenge of answering complex questions, the second step leverages Chain-of-thought (CoT) prompting to decompose the complex question into a sequential chain of questions and reasoning thoughts in a multi-hop manner. Retrieval-Augmented Generation (RAG) enhances this process by retrieving relevant tables and contexts of hyperlinks that are relevant to the resulting reasoning thoughts and questions. These additional contexts are then used to supplement the prompt used in the first step, resulting in more accurate answers from an LLM. Empirical results from OTT-QA demonstrate that our abstractive QA approach significantly improves the accuracy of extractive Table QA methods.
- [128] arXiv:2403.19117 [pdf, other]
-
Title: A Faster Algorithm for Pigeonhole Equal SumsComments: 11 pagesSubjects: Data Structures and Algorithms (cs.DS)
An important area of research in exact algorithms is to solve Subset-Sum-type problems faster than meet-in-middle. In this paper we study Pigeonhole Equal Sums, a total search problem proposed by Papadimitriou (1994): given $n$ positive integers $w_1,\dots,w_n$ of total sum $\sum_{i=1}^n w_i < 2^n-1$, the task is to find two distinct subsets $A, B \subseteq [n]$ such that $\sum_{i\in A}w_i=\sum_{i\in B}w_i$.
Similar to the status of the Subset Sum problem, the best known algorithm for Pigeonhole Equal Sums runs in $O^*(2^{n/2})$ time, via either meet-in-middle or dynamic programming (Allcock, Hamoudi, Joux, Klingelh\"{o}fer, and Santha, 2022).
Our main result is an improved algorithm for Pigeonhole Equal Sums in $O^*(2^{0.4n})$ time. We also give a polynomial-space algorithm in $O^*(2^{0.75n})$ time. Unlike many previous works in this area, our approach does not use the representation method, but rather exploits a simple structural characterization of input instances with few solutions. - [129] arXiv:2403.19119 [pdf, other]
-
Title: Co-Designing Statistical MIMO Radar and In-band Full-Duplex Multi-User MIMO Communications -- Part II: Joint Precoder, Radar Code, and Receive Filters DesignComments: 25 pages, 5 figures. arXiv admin note: text overlap with arXiv:2006.14774Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
We address the challenge of spectral sharing between a statistical multiple-input multiple-output (MIMO) radar and an in-band full-duplex (IBFD) multi-user MIMO (MU-MIMO) communications system operating simultaneously in the same frequency band. Existing research on joint MIMO-radar-MIMO-communications (MRMC) systems has limitations, such as focusing on colocated MIMO radars, half-duplex MIMO communications, single-user scenarios, neglecting practical constraints, or employing separate transmit/receive units for MRMC coexistence. This paper, along with companion papers (Part I and III), proposes a comprehensive MRMC framework that addresses all these challenges. In the previous companion paper (Part I), we presented signal processing techniques for a distributed IBFD MRMC system. In this paper, we introduce joint design of statistical MIMO radar codes, uplink/downlink precoders, and corresponding receive filters using a novel metric called compounded-and-weighted sum mutual information. To solve the resulting highly non-convex problem, we employ a combination of block coordinate descent (BCD) and alternating projection methods. Numerical experiments show convergence of our algorithm, mitigation of uplink interference, and stable data rates under varying noise levels, channel estimate imperfections, and self-interference. The subsequent companion paper (Part III) extends the discussion to multiple targets and evaluates the tracking performance of our MRMC system.
- [130] arXiv:2403.19120 [pdf, other]
-
Title: Co-Designing Statistical MIMO Radar and In-band Full-Duplex Multi-User MIMO Communications -- Part III: Multi-Target TrackingComments: 29 pages, 8 figures, 1 tableSubjects: Information Theory (cs.IT); Signal Processing (eess.SP)
As a next-generation wireless technology, the in-band full-duplex (IBFD) transmission enables simultaneous transmission and reception of signals over the same frequency, thereby doubling spectral efficiency. Further, a continuous up-scaling of wireless network carrier frequencies arising from ever-increasing data traffic is driving research on integrated sensing and communications (ISAC) systems. In this context, we study the co-design of common waveforms, precoders, and filters for an IBFD multi-user (MU) multiple-input multiple-output (MIMO) communications with a distributed MIMO radar. This paper, along with companion papers (Part I and II), proposes a comprehensive MRMC framework that addresses all these challenges. In the companion papers, we developed signal processing and joint design algorithms for this distributed system. In this paper, we tackle multi-target detection, localization, and tracking. This co-design problem that includes practical MU-MIMO constraints on power and quality-of-service is highly non-convex. We propose a low-complexity procedure based on Barzilai-Borwein gradient algorithm to obtain the design parameters and mixed-integer linear program for distributed target localization. Numerical experiments demonstrate the feasibility and accuracy of multi-target sensing of the distributed FD ISAC system. Finally, we localize and track multiple targets by adapting the joint probabilistic data association and extended Kalman filter for this system.
- [131] arXiv:2403.19121 [pdf, other]
-
Title: Code Comparison Tuning for Code Large Language ModelsComments: PreprintSubjects: Computation and Language (cs.CL)
We present Code Comparison Tuning (CCT), a simple and effective tuning method for code large language models (Code LLMs) to better handle subtle code errors. Specifically, we integrate the concept of comparison into instruction tuning, both at the token and sequence levels, enabling the model to discern even the slightest deviations in code. To compare the original code with an erroneous version containing manually added code errors, we use token-level preference loss for detailed token-level comparisons. Additionally, we combine code segments to create a new instruction tuning sample for sequence-level comparisons, enhancing the model's bug-fixing capability. Experimental results on the HumanEvalFix benchmark show that CCT surpasses instruction tuning in pass@1 scores by up to 4 points across diverse code LLMs, and extensive analysis demonstrates the effectiveness of our method.
- [132] arXiv:2403.19122 [pdf, other]
-
Title: Safety-Critical Planning and Control for Dynamic Obstacle Avoidance Using Control Barrier FunctionsComments: 9 pages, 4 figures. arXiv admin note: text overlap with arXiv:2210.04361Subjects: Robotics (cs.RO); Optimization and Control (math.OC)
Dynamic obstacle avoidance is a challenging topic for optimal control and optimization-based trajectory planning problems, especially when in a tight environment. Many existing works use control barrier functions (CBFs) to enforce safety constraints within control systems. Inside these works, CBFs are usually formulated under model predictive control (MPC) framework to anticipate future states and make informed decisions, or integrated with path planning algorithms as a safety enhancement tool. However, these approaches usually require knowledge of the obstacle boundary equations or have very slow computational efficiency. In this paper, we propose a novel framework to the iterative MPC with discrete-time CBFs (DCBFs) to generate a collision-free trajectory. The DCBFs are obtained from convex polyhedra generated in sequential grid maps, without the need to know the boundary equations of obstacles. Additionally, a path planning algorithm is incorporated into this framework to ensure the global optimality of the generated trajectory. We demonstrate through numerical examples that our framework enables a unicycle robot to safely and efficiently navigate through tight and dynamically changing environments, tackling both convex and nonconvex obstacles with remarkable computing efficiency and reliability in control and trajectory generation.
- [133] arXiv:2403.19123 [pdf, ps, other]
-
Title: Schrödingerisation based computationally stable algorithms for ill-posed problems in partial differential equationsSubjects: Numerical Analysis (math.NA)
We introduce a simple and stable computational method for ill-posed partial differential equation (PDE) problems. The method is based on Schr\"odingerisation, introduced in [S. Jin, N. Liu and Y. Yu, Phys. Rev. A, 108 (2023), 032603], which maps all linear PDEs into Schr\"odinger-type equations in one higher dimension, for quantum simulations of these PDEs. Although the original problem is ill-posed, the Schr\"odingerized equations are Hamiltonian systems and time-reversible, allowing stable computation both forward and backward in time. The original variable can be recovered by data from suitably chosen domain in the extended dimension. We will use the backward heat equation and the linear convection equation with imaginary wave speed as examples. Error analysis of these algorithms are conducted and verified numerically. The methods apply to both classical and quantum computers, and we also layout the quantum algorithms for these methods.
- [134] arXiv:2403.19124 [pdf, other]
-
Title: PoCo: A Self-Supervised Approach via Polar Transformation Based Progressive Contrastive Learning for Ophthalmic Disease DiagnosisAuthors: Jinhong Wang, Tingting Chen, Jintai Chen, Yixuan Wu, Yuyang Xu, Danny Chen, Haochao Ying, Jian WuSubjects: Computer Vision and Pattern Recognition (cs.CV)
Automatic ophthalmic disease diagnosis on fundus images is important in clinical practice. However, due to complex fundus textures and limited annotated data, developing an effective automatic method for this problem is still challenging. In this paper, we present a self-supervised method via polar transformation based progressive contrastive learning, called PoCo, for ophthalmic disease diagnosis. Specifically, we novelly inject the polar transformation into contrastive learning to 1) promote contrastive learning pre-training to be faster and more stable and 2) naturally capture task-free and rotation-related textures, which provides insights into disease recognition on fundus images. Beneficially, simple normal translation-invariant convolution on transformed images can equivalently replace the complex rotation-invariant and sector convolution on raw images. After that, we develop a progressive contrastive learning method to efficiently utilize large unannotated images and a novel progressive hard negative sampling scheme to gradually reduce the negative sample number for efficient training and performance enhancement. Extensive experiments on three public ophthalmic disease datasets show that our PoCo achieves state-of-the-art performance with good generalization ability, validating that our method can reduce annotation efforts and provide reliable diagnosis. Codes are available at \url{https://github.com/wjh892521292/PoCo}.
- [135] arXiv:2403.19126 [pdf, other]
-
Title: Harnessing Data for Accelerating Model Predictive Control by Constraint RemovalSubjects: Systems and Control (eess.SY)
Model predictive control (MPC) solves a receding-horizon optimization problem in real-time, which can be computationally demanding when there are thousands of constraints. To accelerate online computation of MPC, we utilize data to adaptively remove the constraints while maintaining the MPC policy unchanged. Specifically, we design the removal rule based on the Lipschitz continuity of the MPC policy. This removal rule can use the information of historical data according to the Lipschitz constant and the distance between the current state and historical states. In particular, we provide the explicit expression for calculating the Lipschitz constant by the model parameters. Finally, simulations are performed to validate the effectiveness of the proposed method.
- [136] arXiv:2403.19128 [pdf, other]
-
Title: OmniParser: A Unified Framework for Text Spotting, Key Information Extraction and Table RecognitionAuthors: Jianqiang Wan, Sibo Song, Wenwen Yu, Yuliang Liu, Wenqing Cheng, Fei Huang, Xiang Bai, Cong Yao, Zhibo YangComments: CVPR 2024Subjects: Computer Vision and Pattern Recognition (cs.CV)
Recently, visually-situated text parsing (VsTP) has experienced notable advancements, driven by the increasing demand for automated document understanding and the emergence of Generative Large Language Models (LLMs) capable of processing document-based questions. Various methods have been proposed to address the challenging problem of VsTP. However, due to the diversified targets and heterogeneous schemas, previous works usually design task-specific architectures and objectives for individual tasks, which inadvertently leads to modal isolation and complex workflow. In this paper, we propose a unified paradigm for parsing visually-situated text across diverse scenarios. Specifically, we devise a universal model, called OmniParser, which can simultaneously handle three typical visually-situated text parsing tasks: text spotting, key information extraction, and table recognition. In OmniParser, all tasks share the unified encoder-decoder architecture, the unified objective: point-conditioned text generation, and the unified input & output representation: prompt & structured sequences. Extensive experiments demonstrate that the proposed OmniParser achieves state-of-the-art (SOTA) or highly competitive performances on 7 datasets for the three visually-situated text parsing tasks, despite its unified, concise design. The code is available at https://github.com/AlibabaResearch/AdvancedLiterateMachinery.
- [137] arXiv:2403.19129 [pdf, other]
-
Title: Stable Object Placing using Curl and Diff Features of Vision-based Tactile SensorsComments: 9 pages, 7 figuresSubjects: Robotics (cs.RO)
Ensuring stable object placement is crucial to prevent objects from toppling over, breaking, or causing spills. When an object makes initial contact to a surface, and some force is exerted, the moment of rotation caused by the instability of the object's placing can cause the object to rotate in a certain direction (henceforth referred to as direction of corrective rotation). Existing methods often employ a Force/Torque (F/T) sensor to estimate the direction of corrective rotation by detecting the moment of rotation as a torque. However, its effectiveness may be hampered by sensor noise and the tension of the external wiring of robot cables. To address these issues, we propose a method for stable object placing using GelSights, vision-based tactile sensors, as an alternative to F/T sensors. Our method estimates the direction of corrective rotation of objects using the displacement of the black dot pattern on the elastomeric surface of GelSight. We calculate the Curl from vector analysis, indicative of the rotational field magnitude and direction of the displacement of the black dots pattern. Simultaneously, we calculate the difference (Diff) of displacement between the left and right fingers' GelSight's black dots. Then, the robot can manipulate the objects' pose using Curl and Diff features, facilitating stable placing. Across experiments, handling 18 differently characterized objects, our method achieves precise placing accuracy (less than 1-degree error) in nearly 100% of cases. An accompanying video is available at the following link: https://youtu.be/fQbmCksVHlU
- [138] arXiv:2403.19130 [pdf, ps, other]
-
Title: Gamu Blue: A Practical Tool for Game Theory Security EquilibriaSubjects: Computer Science and Game Theory (cs.GT)
The application of game theory in cybersecurity enables strategic analysis, adversarial modeling, and optimal decision-making to address security threats' complex and dynamic nature. Previous studies by Abraham et al. and Bi\c{c}er et al. presented various definitions of equilibria to examine the security aspects of games involving multiple parties. Nonetheless, these definitions lack practical and easy-to-use implementations. Our primary contribution is addressing this gap by developing Gamu Blue, an easy-to-use tool with implementations for computing the equilibria definitions including k-resiliency, l-repellence, t-immunity, (l, t)-resistance, and m-stability.
- [139] arXiv:2403.19135 [pdf, other]
-
Title: Compressing Large Language Models by Streamlining the Unimportant LayerSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Large language models (LLM) have been extensively applied in various natural language tasks and domains, but their applicability is constrained by the large number of parameters of the models. Consequently, there is an increasing emphasis on compact models that exhibit high performance. In this study, we observe that different layers in LLM have varying degrees of perturbation on the hidden states, which allows us to identify less important layers. Based on this phenomenon, we propose LLM-Streamline, which consists of two parts: layer pruning, where we remove a set of consecutive layers with the lowest importance in the model according to the target sparsity; and layer replacement, where we train a lightweight model to substitute the pruned layers, thereby mitigating the performance degradation caused by pruning. In our experiments, we utilize structures such as a multi-layer perceptron (MLP) and a transformer layer as lightweight models and ultimately demonstrate that a single MLP can effectively fit the pruned layers. Comprehensive experiments show that our proposed method, LLM-Streamline, outperforms previous state-of-the-art (SOTA) model pruning methods.
- [140] arXiv:2403.19137 [pdf, other]
-
Title: CLAP4CLIP: Continual Learning with Probabilistic Finetuning for Vision-Language ModelsComments: Work under reviewSubjects: Computer Vision and Pattern Recognition (cs.CV)
Continual learning (CL) aims to help deep neural networks to learn new knowledge while retaining what has been learned. Recently, pre-trained vision-language models such as CLIP, with powerful generalization ability, have been gaining traction as practical CL candidates. However, the domain mismatch between the pre-training and the downstream CL tasks calls for finetuning of the CLIP on the latter. The deterministic nature of the existing finetuning methods makes them overlook the many possible interactions across the modalities and deems them unsafe for high-risk CL tasks requiring reliable uncertainty estimation. To address these, our work proposes Continual LeArning with Probabilistic finetuning (CLAP). CLAP develops probabilistic modeling over task-specific modules with visual-guided text features, providing more reliable fine-tuning in CL. It further alleviates forgetting by exploiting the rich pre-trained knowledge of CLIP for weight initialization and distribution regularization of task-specific modules. Cooperating with the diverse range of existing prompting methods, CLAP can surpass the predominant deterministic finetuning approaches for CL with CLIP. Lastly, we study the superior uncertainty estimation abilities of CLAP for novel data detection and exemplar selection within CL setups. Our code is available at \url{https://github.com/srvCodes/clap4clip}.
- [141] arXiv:2403.19139 [pdf, other]
-
Title: Symbiotic Control of Uncertain Dynamical Systems: Harnessing Synergy Between Fixed-Gain Control and Adaptive Learning ArchitecturesSubjects: Systems and Control (eess.SY)
Both fixed-gain control and adaptive learning architectures aim to mitigate the effects of uncertainties. In particular, fixed-gain control offers more predictable closed-loop system behavior but requires the knowledge of uncertainty bounds. In contrast, while adaptive learning does not necessarily require such knowledge, it often results in less predictable closed-loop system behavior compared to fixed-gain control. To this end, this paper presents a novel symbiotic control framework that offers the strengths of fixed-gain control and adaptive learning architectures. Specifically, this framework synergistically integrates these architectures to mitigate the effects of uncertainties in a more predictable manner as compared to adaptive learning alone and it does not require any knowledge on such uncertainties. Both parametric and nonparametric uncertainties are considered, where we utilize neural networks to approximate the unknown uncertainty basis for the latter case. Counterintuitively, the proposed framework has the ability to achieve a desired level of closed-loop system behavior even with an insufficient number of neurons (e.g., when the neural network approximation error is large) or in the face of injudiciously selected adaptive learning parameters (e.g., high leakage term parameters).
- [142] arXiv:2403.19140 [pdf, other]
-
Title: QNCD: Quantization Noise Correction for Diffusion ModelsSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Diffusion models have revolutionized image synthesis, setting new benchmarks in quality and creativity. However, their widespread adoption is hindered by the intensive computation required during the iterative denoising process. Post-training quantization (PTQ) presents a solution to accelerate sampling, aibeit at the expense of sample quality, extremely in low-bit settings. Addressing this, our study introduces a unified Quantization Noise Correction Scheme (QNCD), aimed at minishing quantization noise throughout the sampling process. We identify two primary quantization challenges: intra and inter quantization noise. Intra quantization noise, mainly exacerbated by embeddings in the resblock module, extends activation quantization ranges, increasing disturbances in each single denosing step. Besides, inter quantization noise stems from cumulative quantization deviations across the entire denoising process, altering data distributions step-by-step. QNCD combats these through embedding-derived feature smoothing for eliminating intra quantization noise and an effective runtime noise estimatiation module for dynamicly filtering inter quantization noise. Extensive experiments demonstrate that our method outperforms previous quantization methods for diffusion models, achieving lossless results in W4A8 and W8A8 quantization settings on ImageNet (LDM-4). Code is available at: https://github.com/huanpengchu/QNCD
- [143] arXiv:2403.19142 [pdf, other]
-
Title: A Tulu Resource for Machine TranslationComments: Accepted at LREC-COLING 2024Subjects: Computation and Language (cs.CL)
We present the first parallel dataset for English-Tulu translation. Tulu, classified within the South Dravidian linguistic family branch, is predominantly spoken by approximately 2.5 million individuals in southwestern India. Our dataset is constructed by integrating human translations into the multilingual machine translation resource FLORES-200. Furthermore, we use this dataset for evaluation purposes in developing our English-Tulu machine translation model. For the model's training, we leverage resources available for related South Dravidian languages. We adopt a transfer learning approach that exploits similarities between high-resource and low-resource languages. This method enables the training of a machine translation system even in the absence of parallel data between the source and target language, thereby overcoming a significant obstacle in machine translation development for low-resource languages. Our English-Tulu system, trained without using parallel English-Tulu data, outperforms Google Translate by 19 BLEU points (in September 2023). The dataset and code are available here: https://github.com/manunarayanan/Tulu-NMT.
- [144] arXiv:2403.19143 [pdf, ps, other]
-
Title: Tiny Graph Neural Networks for Radio Resource ManagementComments: Accepted as a full paper by the tinyML Research Symposium 2024Subjects: Machine Learning (cs.LG); Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)
The surge in demand for efficient radio resource management has necessitated the development of sophisticated yet compact neural network architectures. In this paper, we introduce a novel approach to Graph Neural Networks (GNNs) tailored for radio resource management by presenting a new architecture: the Low Rank Message Passing Graph Neural Network (LR-MPGNN). The cornerstone of LR-MPGNN is the implementation of a low-rank approximation technique that substitutes the conventional linear layers with their low-rank counterparts. This innovative design significantly reduces the model size and the number of parameters. We evaluate the performance of the proposed LR-MPGNN model based on several key metrics: model size, number of parameters, weighted sum rate of the communication system, and the distribution of eigenvalues of weight matrices. Our extensive evaluations demonstrate that the LR-MPGNN model achieves a sixtyfold decrease in model size, and the number of model parameters can be reduced by up to 98%. Performance-wise, the LR-MPGNN demonstrates robustness with a marginal 2% reduction in the best-case scenario in the normalized weighted sum rate compared to the original MPGNN model. Additionally, the distribution of eigenvalues of the weight matrices in the LR-MPGNN model is more uniform and spans a wider range, suggesting a strategic redistribution of weights.
- [145] arXiv:2403.19144 [pdf, other]
-
Title: MoDiTalker: Motion-Disentangled Diffusion Model for High-Fidelity Talking Head GenerationSubjects: Computer Vision and Pattern Recognition (cs.CV)
Conventional GAN-based models for talking head generation often suffer from limited quality and unstable training. Recent approaches based on diffusion models aimed to address these limitations and improve fidelity. However, they still face challenges, including extensive sampling times and difficulties in maintaining temporal consistency due to the high stochasticity of diffusion models. To overcome these challenges, we propose a novel motion-disentangled diffusion model for high-quality talking head generation, dubbed MoDiTalker. We introduce the two modules: audio-to-motion (AToM), designed to generate a synchronized lip motion from audio, and motion-to-video (MToV), designed to produce high-quality head video following the generated motion. AToM excels in capturing subtle lip movements by leveraging an audio attention mechanism. In addition, MToV enhances temporal consistency by leveraging an efficient tri-plane representation. Our experiments conducted on standard benchmarks demonstrate that our model achieves superior performance compared to existing models. We also provide comprehensive ablation studies and user study results.
- [146] arXiv:2403.19146 [pdf, ps, other]
-
Title: Improving the Bit Complexity of Communication for Distributed Convex OptimizationAuthors: Mehrdad Ghadiri, Yin Tat Lee, Swati Padmanabhan, William Swartworth, David Woodruff, Guanghao YeComments: To appear in STOC '24. Abstract shortened to meet the arXiv limits. Comments welcome!Subjects: Data Structures and Algorithms (cs.DS); Distributed, Parallel, and Cluster Computing (cs.DC); Optimization and Control (math.OC)
We consider the communication complexity of some fundamental convex optimization problems in the point-to-point (coordinator) and blackboard communication models. We strengthen known bounds for approximately solving linear regression, $p$-norm regression (for $1\leq p\leq 2$), linear programming, minimizing the sum of finitely many convex nonsmooth functions with varying supports, and low rank approximation; for a number of these fundamental problems our bounds are nearly optimal, as proven by our lower bounds.
Among our techniques, we use the notion of block leverage scores, which have been relatively unexplored in this context, as well as dropping all but the ``middle" bits in Richardson-style algorithms. We also introduce a new communication problem for accurately approximating inner products and establish a lower bound using the spherical Radon transform. Our lower bound can be used to show the first separation of linear programming and linear systems in the distributed model when the number of constraints is polynomial, addressing an open question in prior work. - [147] arXiv:2403.19147 [pdf, ps, other]
-
Title: Resilience-Oriented Operation of Micro-Grids in both Grid-Connected and Isolated Conditions within Sustainable Active Distribution NetworksJournal-ref: Journal of Operation and Automation in Power Engineering 2023Subjects: Systems and Control (eess.SY)
Due to the increasing occurrence of natural disasters, importance of maintaining sustainable energy for cities and society is felt more than ever. On the other hand, power loss reduction is a challenging issue of active distribution networks (ADNs). In this paper, a new convex optimization model is proposed with two objective functions including energy loss reduction in normal operating mode and system load shedding minimization in critical conditions after the occurrence of natural disasters. This purpose is fulfilled through optimal allocation of distributed generation (DG) units from both conventional and renewable types as well as energy storage systems (ESSs). In addition, a new formulation has been derived to form optimal micro-grids (MGs) aiming at energy loss reduction in normal operating condition and resiliency index improvement under emergency situations. The developed model is implemented in GAMS software and the studies have been tested and analyzed on the IEEE 33-bus system. The results verify the effectiveness of the proposed method in terms of energy loss reduction as well as resilience enhancement in extreme operation condition following severe disruptions in the system.
- [148] arXiv:2403.19148 [pdf, ps, other]
-
Title: GenAI Detection Tools, Adversarial Techniques and Implications for Inclusivity in Higher EducationAuthors: Mike Perkins (1), Jasper Roe (2), Binh H. Vu (1), Darius Postma (1), Don Hickerson (1), James McGaughran (1), Huy Q. Khuat (1) ((1) British University Vietnam, (2) James Cook University Singapore)Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI)
This study investigates the efficacy of six major Generative AI (GenAI) text detectors when confronted with machine-generated content that has been modified using techniques designed to evade detection by these tools (n=805). The results demonstrate that the detectors' already low accuracy rates (39.5%) show major reductions in accuracy (17.4%) when faced with manipulated content, with some techniques proving more effective than others in evading detection.
The accuracy limitations and the potential for false accusations demonstrate that these tools cannot currently be recommended for determining whether violations of academic integrity have occurred, underscoring the challenges educators face in maintaining inclusive and fair assessment practices. However, they may have a role in supporting student learning and maintaining academic integrity when used in a non-punitive manner.
These results underscore the need for a combined approach to addressing the challenges posed by GenAI in academia to promote the responsible and equitable use of these emerging technologies. The study concludes that the current limitations of AI text detectors require a critical approach for any possible implementation in HE and highlight possible alternatives to AI assessment strategies. - [149] arXiv:2403.19149 [pdf, other]
-
Title: Topological Cycle Graph Attention Network for Brain Functional ConnectivitySubjects: Machine Learning (cs.LG); Neurons and Cognition (q-bio.NC)
This study, we introduce a novel Topological Cycle Graph Attention Network (CycGAT), designed to delineate a functional backbone within brain functional graph--key pathways essential for signal transmissio--from non-essential, redundant connections that form cycles around this core structure. We first introduce a cycle incidence matrix that establishes an independent cycle basis within a graph, mapping its relationship with edges. We propose a cycle graph convolution that leverages a cycle adjacency matrix, derived from the cycle incidence matrix, to specifically filter edge signals in a domain of cycles. Additionally, we strengthen the representation power of the cycle graph convolution by adding an attention mechanism, which is further augmented by the introduction of edge positional encodings in cycles, to enhance the topological awareness of CycGAT. We demonstrate CycGAT's localization through simulation and its efficacy on an ABCD study's fMRI data (n=8765), comparing it with baseline models. CycGAT outperforms these models, identifying a functional backbone with significantly fewer cycles, crucial for understanding neural circuits related to general intelligence. Our code will be released once accepted.
- [150] arXiv:2403.19150 [pdf, other]
-
Title: Towards Understanding Dual BN In Hybrid Adversarial TrainingComments: Accepted at TMLRSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV)
There is a growing concern about applying batch normalization (BN) in adversarial training (AT), especially when the model is trained on both adversarial samples and clean samples (termed Hybrid-AT). With the assumption that adversarial and clean samples are from two different domains, a common practice in prior works is to adopt Dual BN, where BN and BN are used for adversarial and clean branches, respectively. A popular belief for motivating Dual BN is that estimating normalization statistics of this mixture distribution is challenging and thus disentangling it for normalization achieves stronger robustness. In contrast to this belief, we reveal that disentangling statistics plays a less role than disentangling affine parameters in model training. This finding aligns with prior work (Rebuffi et al., 2023), and we build upon their research for further investigations. We demonstrate that the domain gap between adversarial and clean samples is not very large, which is counter-intuitive considering the significant influence of adversarial perturbation on the model accuracy. We further propose a two-task hypothesis which serves as the empirical foundation and a unified framework for Hybrid-AT improvement. We also investigate Dual BN in test-time and reveal that affine parameters characterize the robustness during inference. Overall, our work sheds new light on understanding the mechanism of Dual BN in Hybrid-AT and its underlying justification.
- [151] arXiv:2403.19153 [pdf, other]
-
Title: Exploring Holistic HMI Design for Automated Vehicles: Insights from a Participatory Workshop to Bridge In-Vehicle and External CommunicationAuthors: Haoyu Dong, Tram Thi Minh Tran, Rutger Verstegen, Silvia Cazacu, Ruolin Gao, Marius Hoggenmüller, Debargha Dey, Mervyn Franssen, Markus Sasalovici, Pavlo Bazilinskyy, Marieke MartensSubjects: Human-Computer Interaction (cs.HC)
Human-Machine Interfaces (HMIs) for automated vehicles (AVs) are typically divided into two categories: internal HMIs for interactions within the vehicle, and external HMIs for communication with other road users. In this work, we examine the prospects of bridging these two seemingly distinct domains. Through a participatory workshop with automotive user interface researchers and practitioners, we facilitated a critical exploration of holistic HMI design by having workshop participants collaboratively develop interaction scenarios involving AVs, in-vehicle users, and external road users. The discussion offers insights into the escalation of interface elements as an HMI design strategy, the direct interactions between different users, and an expanded understanding of holistic HMI design. This work reflects a collaborative effort to understand the practical aspects of this holistic design approach, offering new perspectives and encouraging further investigation into this underexplored aspect of automotive user interfaces.
- [152] arXiv:2403.19154 [pdf, other]
-
Title: STaR-GATE: Teaching Language Models to Ask Clarifying QuestionsSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
When prompting language models to complete a task, users often leave important aspects unsaid. While asking questions could resolve this ambiguity \citep[GATE;][]{li2023eliciting}, models often struggle to ask good questions. We explore a language model's ability to self-improve \citep[STaR;][]{zelikman2022star} by rewarding the model for generating useful questions -- a simple method we dub STaR-GATE. We generate a synthetic dataset of 25,500 unique persona-task prompts to simulate conversations between a pretrained language model -- the \texttt{Questioner} -- and a \texttt{Roleplayer} whose preferences are unknown to the \texttt{Questioner}. By asking questions, the \texttt{Questioner} elicits preferences from the \texttt{Roleplayer}. The \texttt{Questioner} is iteratively finetuned on questions that increase the probability of high-quality responses to the task, which are generated by an \texttt{Oracle} with access to the \texttt{Roleplayer}'s latent preferences. After two iterations of self-improvement, the \texttt{Questioner} asks better questions, allowing it to generate responses that are preferred over responses from the initial model on \highlightpink{\textbf{72\%}} of tasks. Our results indicate that teaching a language model to ask better questions leads to better personalized responses.
- [153] arXiv:2403.19158 [pdf, other]
-
Title: Uncertainty-Aware Deep Video Compression with EnsemblesComments: Published on IEEE Transactions on MultimediaSubjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Deep learning-based video compression is a challenging task, and many previous state-of-the-art learning-based video codecs use optical flows to exploit the temporal correlation between successive frames and then compress the residual error. Although these two-stage models are end-to-end optimized, the epistemic uncertainty in the motion estimation and the aleatoric uncertainty from the quantization operation lead to errors in the intermediate representations and introduce artifacts in the reconstructed frames. This inherent flaw limits the potential for higher bit rate savings. To address this issue, we propose an uncertainty-aware video compression model that can effectively capture the predictive uncertainty with deep ensembles. Additionally, we introduce an ensemble-aware loss to encourage the diversity among ensemble members and investigate the benefits of incorporating adversarial training in the video compression task. Experimental results on 1080p sequences show that our model can effectively save bits by more than 20% compared to DVC Pro.
- [154] arXiv:2403.19159 [pdf, other]
-
Title: Disentangling Length from Quality in Direct Preference OptimizationSubjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
Reinforcement Learning from Human Feedback (RLHF) has been a crucial component in the recent success of Large Language Models. However, RLHF is know to exploit biases in human preferences, such as verbosity. A well-formatted and eloquent answer is often more highly rated by users, even when it is less helpful and objective. A number of approaches have been developed to control those biases in the classical RLHF literature, but the problem remains relatively under-explored for Direct Alignment Algorithms such as Direct Preference Optimization (DPO). Unlike classical RLHF, DPO does not train a separate reward model or use reinforcement learning directly, so previous approaches developed to control verbosity cannot be directly applied to this setting. Our work makes several contributions. For the first time, we study the length problem in the DPO setting, showing significant exploitation in DPO and linking it to out-of-distribution bootstrapping. We then develop a principled but simple regularization strategy that prevents length exploitation, while still maintaining improvements in model quality. We demonstrate these effects across datasets on summarization and dialogue, where we achieve up to 20\% improvement in win rates when controlling for length, despite the GPT4 judge's well-known verbosity bias.
- [155] arXiv:2403.19160 [pdf, other]
-
Title: Within the Dynamic Context: Inertia-aware 3D Human Modeling with Pose SequenceSubjects: Computer Vision and Pattern Recognition (cs.CV)
Neural rendering techniques have significantly advanced 3D human body modeling. However, previous approaches often overlook dynamics induced by factors such as motion inertia, leading to challenges in scenarios like abrupt stops after rotation, where the pose remains static while the appearance changes. This limitation arises from reliance on a single pose as conditional input, resulting in ambiguity in mapping one pose to multiple appearances. In this study, we elucidate that variations in human appearance depend not only on the current frame's pose condition but also on past pose states. Therefore, we introduce Dyco, a novel method utilizing the delta pose sequence representation for non-rigid deformations and canonical space to effectively model temporal appearance variations. To prevent a decrease in the model's generalization ability to novel poses, we further propose low-dimensional global context to reduce unnecessary inter-body part dependencies and a quantization operation to mitigate overfitting of the delta pose sequence by the model. To validate the effectiveness of our approach, we collected a novel dataset named I3D-Human, with a focus on capturing temporal changes in clothing appearance under approximate poses. Through extensive experiments on both I3D-Human and existing datasets, our approach demonstrates superior qualitative and quantitative performance. In addition, our inertia-aware 3D human method can unprecedentedly simulate appearance changes caused by inertia at different velocities.
- [156] arXiv:2403.19161 [pdf, other]
-
Title: Improving Vietnamese-English Medical Machine TranslationComments: To appear in Proceedings of LREC-COLING 2024Subjects: Computation and Language (cs.CL)
Machine translation for Vietnamese-English in the medical domain is still an under-explored research area. In this paper, we introduce MedEV -- a high-quality Vietnamese-English parallel dataset constructed specifically for the medical domain, comprising approximately 360K sentence pairs. We conduct extensive experiments comparing Google Translate, ChatGPT (gpt-3.5-turbo), state-of-the-art Vietnamese-English neural machine translation models and pre-trained bilingual/multilingual sequence-to-sequence models on our new MedEV dataset. Experimental results show that the best performance is achieved by fine-tuning "vinai-translate" for each translation direction. We publicly release our dataset to promote further research.
- [157] arXiv:2403.19163 [pdf, other]
-
Title: D'OH: Decoder-Only random Hypernetworks for Implicit Neural RepresentationsComments: 29 pages, 17 figuresSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Deep implicit functions have been found to be an effective tool for efficiently encoding all manner of natural signals. Their attractiveness stems from their ability to compactly represent signals with little to no off-line training data. Instead, they leverage the implicit bias of deep networks to decouple hidden redundancies within the signal. In this paper, we explore the hypothesis that additional compression can be achieved by leveraging the redundancies that exist between layers. We propose to use a novel run-time decoder-only hypernetwork - that uses no offline training data - to better model this cross-layer parameter redundancy. Previous applications of hyper-networks with deep implicit functions have applied feed-forward encoder/decoder frameworks that rely on large offline datasets that do not generalize beyond the signals they were trained on. We instead present a strategy for the initialization of run-time deep implicit functions for single-instance signals through a Decoder-Only randomly projected Hypernetwork (D'OH). By directly changing the dimension of a latent code to approximate a target implicit neural architecture, we provide a natural way to vary the memory footprint of neural representations without the costly need for neural architecture search on a space of alternative low-rate structures.
- [158] arXiv:2403.19164 [pdf, other]
-
Title: RecDiffusion: Rectangling for Image Stitching with Diffusion ModelsAuthors: Tianhao Zhou, Haipeng Li, Ziyi Wang, Ao Luo, Chen-Lin Zhang, Jiajun Li, Bing Zeng, Shuaicheng LiuSubjects: Computer Vision and Pattern Recognition (cs.CV)
Image stitching from different captures often results in non-rectangular boundaries, which is often considered unappealing. To solve non-rectangular boundaries, current solutions involve cropping, which discards image content, inpainting, which can introduce unrelated content, or warping, which can distort non-linear features and introduce artifacts. To overcome these issues, we introduce a novel diffusion-based learning framework, \textbf{RecDiffusion}, for image stitching rectangling. This framework combines Motion Diffusion Models (MDM) to generate motion fields, effectively transitioning from the stitched image's irregular borders to a geometrically corrected intermediary. Followed by Content Diffusion Models (CDM) for image detail refinement. Notably, our sampling process utilizes a weighted map to identify regions needing correction during each iteration of CDM. Our RecDiffusion ensures geometric accuracy and overall visual appeal, surpassing all previous methods in both quantitative and qualitative measures when evaluated on public benchmarks. Code is released at https://github.com/lhaippp/RecDiffusion.
- [159] arXiv:2403.19165 [pdf, ps, other]
-
Title: Evaluating Fair Feature Selection in Machine Learning for HealthcareComments: 10 pages, 7 figuresSubjects: Machine Learning (cs.LG); Computers and Society (cs.CY)
With the universal adoption of machine learning in healthcare, the potential for the automation of societal biases to further exacerbate health disparities poses a significant risk. We explore algorithmic fairness from the perspective of feature selection. Traditional feature selection methods identify features for better decision making by removing resource-intensive, correlated, or non-relevant features but overlook how these factors may differ across subgroups. To counter these issues, we evaluate a fair feature selection method that considers equal importance to all demographic groups. We jointly considered a fairness metric and an error metric within the feature selection process to ensure a balance between minimizing both bias and global classification error. We tested our approach on three publicly available healthcare datasets. On all three datasets, we observed improvements in fairness metrics coupled with a minimal degradation of balanced accuracy. Our approach addresses both distributive and procedural fairness within the fair machine learning context.
- [160] arXiv:2403.19167 [pdf, other]
-
Title: Mitigating Misleading Chain-of-Thought Reasoning with Selective FilteringSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Large language models have manifested remarkable capabilities by leveraging chain-of-thought (CoT) reasoning techniques to solve intricate questions through step-by-step reasoning chains. Despite its success, the efficacy of such reasoning is inherently contingent upon the quality of CoT. However, flawless CoT reasoning cannot be guaranteed due to the presence of indecomposable questions and the potential for erroneous reasoning chains, particularly in the case of small-scale language models. To tackle this challenge, we propose a novel approach called the selective filtering reasoner (SelF-Reasoner) that assesses the entailment relationship between the question and the candidate reasoning chain. Then, we proceed with CoT reasoning when the reasoning chain demonstrates confidence; otherwise, we opt to predict the answer directly. SelF-Reasoner improves the fine-tuned T5 baseline consistently over the ScienceQA, ECQA, and LastLetter tasks. Code is available at \texttt{https://github.com/LibroWu/SelF-Reasoner}.
- [161] arXiv:2403.19168 [pdf, ps, other]
-
Title: Tunable Superconducting Magnetic Levitation with Self-StabilityComments: 15pages,5 figuresSubjects: Systems and Control (eess.SY)
Magnetic levitation based on the flux pinning nature of type II superconductors has the merit of self-stability, making it appealing for applications such as high speed bearings, maglev trains, space generators, etc. However, such levitation systems physically rely on the superconductor pre-capturing magnetic flux (i.e. field cooling process) before establishing the levitation state which is nonadjustable afterwards. Moreover, practical type II superconductors in the levitation system inevitably suffer from various sources of energy losses, leading to continuous levitation force decay. These intrinsic drawbacks make superconducting maglev inflexible and impractical for long term operation. Here we propose and demonstrate a new form of superconducting maglev which is tunable and with self-stability. The maglev system uses a closed-loop type II superconducting coil to lock flux of a magnet, establishing self-stable levitation between the two objects. A flux pump is used to modulate the total magnetic flux of the coil without breaking its superconductivity, thus flexibly tuning levitation force and height meanwhile maintaining self-stability. For the first time, we experimentally demonstrate a self-stable type II superconducting maglev system which is able to: counteract long term levitation force decay, adjust levitation force and equilibrium position, and establish levitation under zero field cooling condition. These breakthroughs may bridge the gap between demonstrations and practical applications of type II superconducting maglevs.
- [162] arXiv:2403.19171 [pdf, other]
-
Title: Mining Bug Repositories for Multi-Fault ProgramsSubjects: Software Engineering (cs.SE)
Datasets such as Defects4J and BugsInPy that contain bugs from real-world software projects are necessary for a realistic evaluation of automated debugging tools. However these datasets largely identify only a single bug in each entry, while real-world software projects (including those used in Defects4J and BugsInPy) typically contain multiple bugs at the same time. We lift this limitation and describe an extension to these datasets in which multiple bugs are identified in individual entries. We use test case transplantation and fault location translation, in order to expose and locate the bugs, respectively. We thus provide datasets of true multi-fault versions within real-world software projects, which maintain the properties and usability of the original datasets.
- [163] arXiv:2403.19174 [pdf, other]
-
Title: Algorithmic Ways of Seeing: Using Object Detection to Facilitate Art ExplorationAuthors: Louie Søs Meyer, Johanne Engel Aaen, Anitamalina Regitse Tranberg, Peter Kun, Matthias Freiberger, Sebastian Risi, Anders Sundnes LøvlieSubjects: Human-Computer Interaction (cs.HC); Computer Vision and Pattern Recognition (cs.CV)
This Research through Design paper explores how object detection may be applied to a large digital art museum collection to facilitate new ways of encountering and experiencing art. We present the design and evaluation of an interactive application called SMKExplore, which allows users to explore a museum's digital collection of paintings by browsing through objects detected in the images, as a novel form of open-ended exploration. We provide three contributions. First, we show how an object detection pipeline can be integrated into a design process for visual exploration. Second, we present the design and development of an app that enables exploration of an art museum's collection. Third, we offer reflections on future possibilities for museums and HCI researchers to incorporate object detection techniques into the digitalization of museums.
- [164] arXiv:2403.19176 [pdf, other]
-
Title: Design and Evaluation of a DC Microgrid Testbed for DER Integration and Power ManagementComments: 2024 12th Workshop on Modeling and Simulation of Cyber-Physical Energy Systems (MSCPES)Subjects: Systems and Control (eess.SY)
This paper presents a DC microgrid testbed setup that consists of various Distributed Energy Resources (DERs) including solar Photovoltaics (PV), supercapacitors for voltage regulation, and Battery Energy Storage Systems (BESS). The DC microgrid accommodates both non-flexible and flexible loads which can be dynamically adjusted based on PV power availability. The integration of the setup with the Hyphae Autonomous Power Interchange System (APIS) framework automates energy transfer within the BESS, ensuring efficient power management and optimizing the overall efficiency of the DC microgrid. Furthermore, the setup is validated in terms of the efficacy of the proposed model via real-time simulation, facilitated by the Speedgoat baseline real-time target Hardware-in-the-Loop (HIL) machine. The results demonstrate the model's adeptness in efficiently managing power sharing, emphasizing the capabilities of the DC microgrid setup in terms of performance and reliability in dynamic energy scenarios as well as enhancing the resilience of the grid amidst PV uncertainties.
- [165] arXiv:2403.19177 [pdf, other]
-
Title: Rethinking Information Loss in Medical Image Segmentation with Various-sized TargetsSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Medical image segmentation presents the challenge of segmenting various-size targets, demanding the model to effectively capture both local and global information. Despite recent efforts using CNNs and ViTs to predict annotations of different scales, these approaches often struggle to effectively balance the detection of targets across varying sizes. Simply utilizing local information from CNNs and global relationships from ViTs without considering potential significant divergence in latent feature distributions may result in substantial information loss. To address this issue, in this paper, we will introduce a novel Stagger Network (SNet) and argues that a well-designed fusion structure can mitigate the divergence in latent feature distributions between CNNs and ViTs, thereby reducing information loss. Specifically, to emphasize both global dependencies and local focus, we design a Parallel Module to bridge the semantic gap. Meanwhile, we propose the Stagger Module, trying to fuse the selected features that are more semantically similar. An Information Recovery Module is further adopted to recover complementary information back to the network. As a key contribution, we theoretically analyze that the proposed parallel and stagger strategies would lead to less information loss, thus certifying the SNet's rationale. Experimental results clearly proved that the proposed SNet excels comparisons with recent SOTAs in segmenting on the Synapse dataset where targets are in various sizes. Besides, it also demonstrates superiority on the ACDC and the MoNuSeg datasets where targets are with more consistent dimensions.
- [166] arXiv:2403.19178 [pdf, other]
-
Title: Enhancing Trust and Privacy in Distributed Networks: A Comprehensive Survey on Blockchain-based Federated LearningComments: 25 pages, accepted by KAIS 2024Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)
While centralized servers pose a risk of being a single point of failure, decentralized approaches like blockchain offer a compelling solution by implementing a consensus mechanism among multiple entities. Merging distributed computing with cryptographic techniques, decentralized technologies introduce a novel computing paradigm. Blockchain ensures secure, transparent, and tamper-proof data management by validating and recording transactions via consensus across network nodes. Federated Learning (FL), as a distributed machine learning framework, enables participants to collaboratively train models while safeguarding data privacy by avoiding direct raw data exchange. Despite the growing interest in decentralized methods, their application in FL remains underexplored. This paper presents a thorough investigation into Blockchain-based FL (BCFL), spotlighting the synergy between blockchain's security features and FL's privacy-preserving model training capabilities. First, we present the taxonomy of BCFL from three aspects, including decentralized, separate networks, and reputation-based architectures. Then, we summarize the general architecture of BCFL systems, providing a comprehensive perspective on FL architectures informed by blockchain. Afterward, we analyze the application of BCFL in healthcare, IoT, and other privacy-sensitive areas. Finally, we identify future research directions of BCFL.
- [167] arXiv:2403.19181 [pdf, other]
-
Title: Make Large Language Model a Better RankerComments: 10 pages, 5 figuresSubjects: Information Retrieval (cs.IR); Computation and Language (cs.CL); Machine Learning (cs.LG)
The evolution of Large Language Models (LLMs) has significantly enhanced capabilities across various fields, leading to a paradigm shift in how Recommender Systems (RSs) are conceptualized and developed. However, existing research primarily focuses on point-wise and pair-wise recommendation paradigms. These approaches prove inefficient in LLM-based recommenders due to the high computational cost of utilizing Large Language Models. While some studies have delved into list-wise approaches, they fall short in ranking tasks. This shortfall is attributed to the misalignment between the objectives of ranking and language generation. To this end, this paper introduces the Language Model Framework with Aligned Listwise Ranking Objectives (ALRO). ALRO is designed to bridge the gap between the capabilities of LLMs and the nuanced requirements of ranking tasks within recommender systems. A key feature of ALRO is the introduction of soft lambda loss, an adaptation of lambda loss tailored to suit language generation tasks. Additionally, ALRO incorporates a permutation-sensitive learning mechanism that addresses position bias, a prevalent issue in generative models, without imposing additional computational burdens during inference. Our evaluative studies reveal that ALRO outperforms existing embedding-based recommendation methods and the existing LLM-based recommendation baselines, highlighting its efficacy.
- [168] arXiv:2403.19183 [pdf, other]
-
Title: Empirical Analysis for Unsupervised Universal Dependency Parse Tree AggregationSubjects: Computation and Language (cs.CL)
Dependency parsing is an essential task in NLP, and the quality of dependency parsers is crucial for many downstream tasks. Parsers' quality often varies depending on the domain and the language involved. Therefore, it is essential to combat the issue of varying quality to achieve stable performance. In various NLP tasks, aggregation methods are used for post-processing aggregation and have been shown to combat the issue of varying quality. However, aggregation methods for post-processing aggregation have not been sufficiently studied in dependency parsing tasks. In an extensive empirical study, we compare different unsupervised post-processing aggregation methods to identify the most suitable dependency tree structure aggregation method.
- [169] arXiv:2403.19185 [pdf, other]
-
Title: Deep CSI Compression for Dual-Polarized Massive MIMO Channels with Disentangled Representation LearningSubjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Channel state information (CSI) feedback is critical for achieving the promised advantages of enhancing spectral and energy efficiencies in massive multiple-input multiple-output (MIMO) wireless communication systems. Deep learning (DL)-based methods have been proven effective in reducing the required signaling overhead for CSI feedback. In practical dual-polarized MIMO scenarios, channels in the vertical and horizontal polarization directions tend to exhibit high polarization correlation. To fully exploit the inherent propagation similarity within dual-polarized channels, we propose a disentangled representation neural network (NN) for CSI feedback, referred to as DiReNet. The proposed DiReNet disentangles dual-polarized CSI into three components: polarization-shared information, vertical polarization-specific information, and horizontal polarization-specific information. This disentanglement of dual-polarized CSI enables the minimization of information redundancy caused by the polarization correlation and improves the performance of CSI compression and recovery. Additionally, flexible quantization and network extension schemes are designed. Consequently, our method provides a pragmatic solution for CSI feedback to harness the physical MIMO polarization as a priori information. Our experimental results show that the performance of our proposed DiReNet surpasses that of existing DL-based networks, while also effectively reducing the number of network parameters by nearly one third.
- [170] arXiv:2403.19193 [pdf, other]
-
Title: Text Data-Centric Image Captioning with Interactive PromptsSubjects: Computer Vision and Pattern Recognition (cs.CV)
Supervised image captioning approaches have made great progress, but it is challenging to collect high-quality human-annotated image-text data. Recently, large-scale vision and language models (e.g., CLIP) and large-scale generative language models (e.g., GPT-2) have shown strong performances in various tasks, which also provide some new solutions for image captioning with web paired data, unpaired data or even text-only data. Among them, the mainstream solution is to project image embeddings into the text embedding space with the assistance of consistent representations between image-text pairs from the CLIP model. However, the current methods still face several challenges in adapting to the diversity of data configurations in a unified solution, accurately estimating image-text embedding bias, and correcting unsatisfactory prediction results in the inference stage. This paper proposes a new Text data-centric approach with Interactive Prompts for image Captioning, named TIPCap. 1) We consider four different settings which gradually reduce the dependence on paired data. 2) We construct a mapping module driven by multivariate Gaussian distribution to mitigate the modality gap, which is applicable to the above four different settings. 3) We propose a prompt interaction module that can incorporate optional prompt information before generating captions. Extensive experiments show that our TIPCap outperforms other weakly or unsupervised image captioning methods and achieves a new state-of-the-art performance on two widely used datasets, i.e., MS-COCO and Flickr30K.
- [171] arXiv:2403.19194 [pdf, other]
-
Title: Detecting and taking Project Interactions into account in Participatory BudgetingSubjects: Computer Science and Game Theory (cs.GT)
The aim of this paper is to introduce models and algorithms for the Participatory Budgeting problem when projects can interact with each other. In this problem, the objective is to select a set of projects that fits in a given budget. Voters express their preferences over the projects and the goal is then to find a consensus set of projects that does not exceed the budget. Our goal is to detect such interactions thanks to the preferences expressed by the voters. Through the projects selected by the voters, we detect positive and negative interactions between the projects by identifying projects that are consistently chosen together. In presence of project interactions, it is preferable to select projects that interact positively rather than negatively, all other things being equal. We introduce desirable properties that utility functions should have in presence of project interactions and we build a utility function which fulfills the desirable properties introduced. We then give axiomatic properties of aggregation rules, and we study three classical aggregation rules: the maximization of the sum of the utilities, of the product of the utilities, or of the minimal utility. We show that in the three cases the problems solved by these rules are NP-hard, and we propose a branch and bound algorithm to solve them. We conclude the paper by experiments.
- [172] arXiv:2403.19197 [pdf, ps, other]
-
Title: Ordering Collective Unit Tasks: from Scheduling to Computational Social ChoiceSubjects: Computer Science and Game Theory (cs.GT)
We study the collective schedules problem, which consists in computing a one machine schedule of a set of tasks, knowing that a set of individuals (also called voters) have preferences regarding the order of the execution of the tasks. Our aim is to return a consensus schedule. We consider the setting in which all tasks have the same length -- such a schedule can therefore also be viewed as a ranking. We study two rules, one based on a distance criterion, and another one based one a binary criterion, and we show that these rules extend classic scheduling criteria. We also consider time constraints and precedence constraints between the tasks, and focus on two cases: the preferences of the voters fulfill these constraints, or they do not fulfill these constraints (but the collective schedule should fulfill them). In each case, either we show that the problem is NP-hard, or we provide a polynomial time algorithm which solves it. We also provide an analysis of a heuristic, which appears to be a 2 approximation of the Spearman's rule.
- [173] arXiv:2403.19200 [pdf, other]
-
Title: Cell-Free MIMO Perceptive Mobile Networks: Cloud vs. Edge ProcessingComments: 30 pages, 11 figuresSubjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Perceptive mobile networks implement sensing and communication by reusing existing cellular infrastructure. Cell-free multiple-input multiple-output, thanks to the cooperation among distributed access points, supports the deployment of multistatic radar sensing, while providing high spectral efficiency for data communication services. To this end, the distributed access points communicate over fronthaul links with a central processing unit acting as a cloud processor. This work explores four different types of PMN uplink solutions based on Cell-free multiple-input multiple-output, in which the sensing and decoding functionalities are carried out at either cloud or edge. Accordingly, we investigate and compare joint cloud-based decoding and sensing (CDCS), hybrid cloud-based decoding and edge-based sensing (CDES), hybrid edge-based decoding and cloud-based sensing (EDCS) and edge-based decoding and sensing (EDES). In all cases, we target a unified design problem formulation whereby the fronthaul quantization of signals received in the training and data phases are jointly designed to maximize the achievable rate under sensing requirements and fronthaul capacity constraints. Via numerical results, the four implementation scenarios are compared as a function of the available fronthaul resources by highlighting the relative merits of edge- and cloud-based sensing and communications. This study provides guidelines on the optimal functional allocation in fronthaul-constrained networks implementing integrated sensing and communications.
- [174] arXiv:2403.19201 [pdf, ps, other]
-
Title: Understanding Archives: Towards New Research Interfaces Relying on the Semantic Annotation of DocumentsComments: in French language. CiDE.23: Document et archivage: pratiques formelles et informelles, Oct 2023, Grenoble, FranceSubjects: Digital Libraries (cs.DL); Computation and Language (cs.CL)
The digitisation campaigns carried out by libraries and archives in recent years have facilitated access to documents in their collections. However, exploring and exploiting these documents remain difficult tasks due to the sheer quantity of documents available for consultation. In this article, we show how the semantic annotation of the textual content of study corpora of archival documents allow to facilitate their exploitation and valorisation. First, we present a methodological framework for the construction of new interfaces based on textual semantics, then address the current technological obstacles and their potential solutions. We conclude by presenting a practical case of the application of this framework.
- [175] arXiv:2403.19205 [pdf, other]
-
Title: From Activation to Initialization: Scaling Insights for Optimizing Neural FieldsComments: CVPR 2024Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
In the realm of computer vision, Neural Fields have gained prominence as a contemporary tool harnessing neural networks for signal representation. Despite the remarkable progress in adapting these networks to solve a variety of problems, the field still lacks a comprehensive theoretical framework. This article aims to address this gap by delving into the intricate interplay between initialization and activation, providing a foundational basis for the robust optimization of Neural Fields. Our theoretical insights reveal a deep-seated connection among network initialization, architectural choices, and the optimization process, emphasizing the need for a holistic approach when designing cutting-edge Neural Fields.
- [176] arXiv:2403.19206 [pdf, other]
-
Title: CogniDot: Vasoactivity-based Cognitive Load Monitoring with a Miniature On-skin SensorSubjects: Human-Computer Interaction (cs.HC)
Vascular activities offer valuable signatures for psychological monitoring applications. We present CogniDot, an affordable, miniature skin sensor placed on the temporal area on the head that senses cognitive loads with a single-pixel color sensor. With its energy-efficient design, bio-compatible adhesive, and compact size (22mm diameter, 8.5mm thickness), it is ideal for long-term monitoring of mind status. We showed in detail the hardware design of our sensor. The user study results with 12 participants show that CogniDot can accurately differentiate between three levels of cognitive loads with a within-user accuracy of 97%. We also discuss its potential for broader applications.
- [177] arXiv:2403.19211 [pdf, other]
-
Title: Dual-Personalizing Adapter for Federated Foundation ModelsSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Recently, foundation models, particularly large language models (LLMs), have demonstrated an impressive ability to adapt to various tasks by fine-tuning large amounts of instruction data. Notably, federated foundation models emerge as a privacy preservation method to fine-tune models collaboratively under federated learning (FL) settings by leveraging many distributed datasets with non-IID data. To alleviate communication and computation overhead, parameter-efficient methods are introduced for efficiency, and some research adapted personalization methods to federated foundation models for better user preferences alignment. However, a critical gap in existing research is the neglect of test-time distribution shifts in real-world applications. Therefore, to bridge this gap, we propose a new setting, termed test-time personalization, which not only concentrates on the targeted local task but also extends to other tasks that exhibit test-time distribution shifts. To address challenges in this new setting, we explore a simple yet effective solution to learn a comprehensive foundation model. Specifically, a dual-personalizing adapter architecture (FedDPA) is proposed, comprising a global adapter and a local adapter for addressing test-time distribution shifts and personalization, respectively. Additionally, we introduce an instance-wise dynamic weighting mechanism to optimize the balance between the global and local adapters, enhancing overall performance. The effectiveness of the proposed method has been evaluated on benchmark datasets across different NLP tasks.
- [178] arXiv:2403.19213 [pdf, other]
-
Title: Learning Multiple Representations with Inconsistency-Guided Detail Regularization for Mask-Guided MattingAuthors: Weihao Jiang, Zhaozhi Xie, Yuxiang Lu, Longjie Qi, Jingyong Cai, Hiroyuki Uchiyama, Bin Chen, Yue Ding, Hongtao LuSubjects: Computer Vision and Pattern Recognition (cs.CV)
Mask-guided matting networks have achieved significant improvements and have shown great potential in practical applications in recent years. However, simply learning matting representation from synthetic and lack-of-real-world-diversity matting data, these approaches tend to overfit low-level details in wrong regions, lack generalization to objects with complex structures and real-world scenes such as shadows, as well as suffer from interference of background lines or textures. To address these challenges, in this paper, we propose a novel auxiliary learning framework for mask-guided matting models, incorporating three auxiliary tasks: semantic segmentation, edge detection, and background line detection besides matting, to learn different and effective representations from different types of data and annotations. Our framework and model introduce the following key aspects: (1) to learn real-world adaptive semantic representation for objects with diverse and complex structures under real-world scenes, we introduce extra semantic segmentation and edge detection tasks on more diverse real-world data with segmentation annotations; (2) to avoid overfitting on low-level details, we propose a module to utilize the inconsistency between learned segmentation and matting representations to regularize detail refinement; (3) we propose a novel background line detection task into our auxiliary learning framework, to suppress interference of background lines or textures. In addition, we propose a high-quality matting benchmark, Plant-Mat, to evaluate matting methods on complex structures. Extensively quantitative and qualitative results show that our approach outperforms state-of-the-art mask-guided methods.
- [179] arXiv:2403.19216 [pdf, other]
-
Title: Are Large Language Models Good at Utility Judgments?Comments: Acctepted by SIGIR2024Subjects: Information Retrieval (cs.IR)
Retrieval-augmented generation (RAG) is considered to be a promising approach to alleviate the hallucination issue of large language models (LLMs), and it has received widespread attention from researchers recently. Due to the limitation in the semantic understanding of retrieval models, the success of RAG heavily lies on the ability of LLMs to identify passages with utility. Recent efforts have explored the ability of LLMs to assess the relevance of passages in retrieval, but there has been limited work on evaluating the utility of passages in supporting question answering. In this work, we conduct a comprehensive study about the capabilities of LLMs in utility evaluation for open-domain QA. Specifically, we introduce a benchmarking procedure and collection of candidate passages with different characteristics, facilitating a series of experiments with five representative LLMs. Our experiments reveal that: (i) well-instructed LLMs can distinguish between relevance and utility, and that LLMs are highly receptive to newly generated counterfactual passages. Moreover, (ii) we scrutinize key factors that affect utility judgments in the instruction design. And finally, (iii) to verify the efficacy of utility judgments in practical retrieval augmentation applications, we delve into LLMs' QA capabilities using the evidence judged with utility and direct dense retrieval results. (iv) We propose a k-sampling, listwise approach to reduce the dependency of LLMs on the sequence of input passages, thereby facilitating subsequent answer generation. We believe that the way we formalize and study the problem along with our findings contributes to a critical assessment of retrieval-augmented LLMs. Our code and benchmark can be found at \url{https://github.com/ict-bigdatalab/utility_judgments}.
- [180] arXiv:2403.19218 [pdf, other]
-
Title: A piecewise neural network method for solving large interval solution to initial value problem of ordinary differential equationsComments: 26 pages,13 figuresSubjects: Numerical Analysis (math.NA)
Various traditional numerical methods for solving initial value problems of differential equations often produce local solutions near the initial value point, despite the problems having larger interval solutions. Even current popular neural network algorithms or deep learning methods cannot guarantee yielding large interval solutions for these problems. In this paper, we propose a piecewise neural network approach to obtain a large interval numerical solution for initial value problems of differential equations. In this method, we first divide the solution interval, on which the initial problem is to be solved, into several smaller intervals. Neural networks with a unified structure are then employed on each sub-interval to solve the related sub-problems. By assembling these neural network solutions, a piecewise expression of the large interval solution to the problem is constructed, referred to as the piecewise neural network solution. The continuous differentiability of the solution over the entire interval, except for finite points, is proven through theoretical analysis and employing a parameter transfer technique. Additionally, a parameter transfer and multiple rounds of pre-training technique are utilized to enhance the accuracy of the approximation solution. Compared with existing neural network algorithms, this method does not increase the network size and training data scale for training the network on each sub-domain. Finally, several numerical experiments are presented to demonstrate the efficiency of the proposed algorithm.
- [181] arXiv:2403.19219 [pdf, other]
-
Title: Collaborative Knowledge Infusion for Low-resource Stance DetectionComments: 13 pages, 3 figures, Big Data Mining and AnalysisSubjects: Computation and Language (cs.CL)
Stance detection is the view towards a specific target by a given context (\textit{e.g.} tweets, commercial reviews). Target-related knowledge is often needed to assist stance detection models in understanding the target well and making detection correctly. However, prevailing works for knowledge-infused stance detection predominantly incorporate target knowledge from a singular source that lacks knowledge verification in limited domain knowledge. The low-resource training data further increases the challenge for the data-driven large models in this task. To address those challenges, we propose a collaborative knowledge infusion approach for low-resource stance detection tasks, employing a combination of aligned knowledge enhancement and efficient parameter learning techniques. Specifically, our stance detection approach leverages target background knowledge collaboratively from different knowledge sources with the help of knowledge alignment. Additionally, we also introduce the parameter-efficient collaborative adaptor with a staged optimization algorithm, which collaboratively addresses the challenges associated with low-resource stance detection tasks from both network structure and learning perspectives. To assess the effectiveness of our method, we conduct extensive experiments on three public stance detection datasets, including low-resource and cross-target settings. The results demonstrate significant performance improvements compared to the existing stance detection approaches.
- [182] arXiv:2403.19220 [pdf, other]
-
Title: GeoAuxNet: Towards Universal 3D Representation Learning for Multi-sensor Point CloudsComments: CVPR 2024Subjects: Computer Vision and Pattern Recognition (cs.CV)
Point clouds captured by different sensors such as RGB-D cameras and LiDAR possess non-negligible domain gaps. Most existing methods design different network architectures and train separately on point clouds from various sensors. Typically, point-based methods achieve outstanding performances on even-distributed dense point clouds from RGB-D cameras, while voxel-based methods are more efficient for large-range sparse LiDAR point clouds. In this paper, we propose geometry-to-voxel auxiliary learning to enable voxel representations to access point-level geometric information, which supports better generalisation of the voxel-based backbone with additional interpretations of multi-sensor point clouds. Specifically, we construct hierarchical geometry pools generated by a voxel-guided dynamic point network, which efficiently provide auxiliary fine-grained geometric information adapted to different stages of voxel features. We conduct experiments on joint multi-sensor datasets to demonstrate the effectiveness of GeoAuxNet. Enjoying elaborate geometric information, our method outperforms other models collectively trained on multi-sensor datasets, and achieve competitive results with the-state-of-art experts on each single dataset.
- [183] arXiv:2403.19221 [pdf, other]
-
Title: Towards Multimodal Video Paragraph Captioning Models Robust to Missing ModalityComments: Code available at this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Video paragraph captioning (VPC) involves generating detailed narratives for long videos, utilizing supportive modalities such as speech and event boundaries. However, the existing models are constrained by the assumption of constant availability of a single auxiliary modality, which is impractical given the diversity and unpredictable nature of real-world scenarios. To this end, we propose a Missing-Resistant framework MR-VPC that effectively harnesses all available auxiliary inputs and maintains resilience even in the absence of certain modalities. Under this framework, we propose the Multimodal VPC (MVPC) architecture integrating video, speech, and event boundary inputs in a unified manner to process various auxiliary inputs. Moreover, to fortify the model against incomplete data, we introduce DropAM, a data augmentation strategy that randomly omits auxiliary inputs, paired with DistillAM, a regularization target that distills knowledge from teacher models trained on modality-complete data, enabling efficient learning in modality-deficient environments. Through exhaustive experimentation on YouCook2 and ActivityNet Captions, MR-VPC has proven to deliver superior performance on modality-complete and modality-missing test data. This work highlights the significance of developing resilient VPC models and paves the way for more adaptive, robust multimodal video understanding.
- [184] arXiv:2403.19223 [pdf, ps, other]
-
Title: Computing large deviation rate functions of entropy production for diffusion processes in the vanishing-noise limit and high dimensions by an interacting particle methodSubjects: Numerical Analysis (math.NA)
We study an interacting particle method (IPM) for computing the large deviation rate function of entropy production for diffusion processes, with emphasis on the vanishing-noise limit and high dimensions. The crucial ingredient to obtain the rate function is the computation of the principal eigenvalue $\lambda$ of elliptic, non-self-adjoint operators. We show that this principal eigenvalue can be approximated in terms of the spectral radius of a discretized evolution operator obtained from an operator splitting scheme and an Euler--Maruyama scheme with a small time step size, and we show that this spectral radius can be accessed through a large number of iterations of this discretized semigroup, suitable for the IPM. The IPM applies naturally to problems in unbounded domains, scales easily to high dimensions, and adapts to singular behaviors in the vanishing-noise limit. We show numerical examples in dimensions up to 16. The numerical results show that our numerical approximation of $\lambda$ converges to the analytical vanishing-noise limit with a fixed number of particles and a fixed time step size. Our paper appears to be the first one to obtain numerical results of principal eigenvalue problems for non-self-adjoint operators in such high dimensions.
- [185] arXiv:2403.19224 [pdf, other]
-
Title: Emotion Neural Transducer for Fine-Grained Speech Emotion RecognitionComments: Accepted by 49th IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2024)Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
The mainstream paradigm of speech emotion recognition (SER) is identifying the single emotion label of the entire utterance. This line of works neglect the emotion dynamics at fine temporal granularity and mostly fail to leverage linguistic information of speech signal explicitly. In this paper, we propose Emotion Neural Transducer for fine-grained speech emotion recognition with automatic speech recognition (ASR) joint training. We first extend typical neural transducer with emotion joint network to construct emotion lattice for fine-grained SER. Then we propose lattice max pooling on the alignment lattice to facilitate distinguishing emotional and non-emotional frames. To adapt fine-grained SER to transducer inference manner, we further make blank, the special symbol of ASR, serve as underlying emotion indicator as well, yielding Factorized Emotion Neural Transducer. For typical utterance-level SER, our ENT models outperform state-of-the-art methods on IEMOCAP in low word error rate. Experiments on IEMOCAP and the latest speech emotion diarization dataset ZED also demonstrate the superiority of fine-grained emotion modeling. Our code is available at https://github.com/ECNU-Cross-Innovation-Lab/ENT.
- [186] arXiv:2403.19225 [pdf, other]
-
Title: Efficient and Effective Weakly-Supervised Action Segmentation via Action-Transition-Aware Boundary AlignmentComments: Accepted to CVPR 2024Subjects: Computer Vision and Pattern Recognition (cs.CV)
Weakly-supervised action segmentation is a task of learning to partition a long video into several action segments, where training videos are only accompanied by transcripts (ordered list of actions). Most of existing methods need to infer pseudo segmentation for training by serial alignment between all frames and the transcript, which is time-consuming and hard to be parallelized while training. In this work, we aim to escape from this inefficient alignment with massive but redundant frames, and instead to directly localize a few action transitions for pseudo segmentation generation, where a transition refers to the change from an action segment to its next adjacent one in the transcript. As the true transitions are submerged in noisy boundaries due to intra-segment visual variation, we propose a novel Action-Transition-Aware Boundary Alignment (ATBA) framework to efficiently and effectively filter out noisy boundaries and detect transitions. In addition, to boost the semantic learning in the case that noise is inevitably present in the pseudo segmentation, we also introduce video-level losses to utilize the trusted video-level supervision. Extensive experiments show the effectiveness of our approach on both performance and training speed.
- [187] arXiv:2403.19231 [pdf, other]
-
Title: Numerical approximations of a lattice Boltzmann scheme with a family of partial differential equationsSubjects: Numerical Analysis (math.NA)
Is it possible to consider a lattice Boltzmann scheme as an approximation of a partial differential equation? For a nonhomogeneous advection problem in one spatial dimension, we propose equivalent partial differential equations at various orders. We compare the lattice Boltzmann results and a spectral approximation of the differential equations. No simple correlation is obtained for a stationary problem. For an unsteady situation, we show that the initialization scheme of the microscopic moments plays a crucial role.
- [188] arXiv:2403.19232 [pdf, other]
-
Title: AZ-NAS: Assembling Zero-Cost Proxies for Network Architecture SearchComments: CVPR 2024Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Training-free network architecture search (NAS) aims to discover high-performing networks with zero-cost proxies, capturing network characteristics related to the final performance. However, network rankings estimated by previous training-free NAS methods have shown weak correlations with the performance. To address this issue, we propose AZ-NAS, a novel approach that leverages the ensemble of various zero-cost proxies to enhance the correlation between a predicted ranking of networks and the ground truth substantially in terms of the performance. To achieve this, we introduce four novel zero-cost proxies that are complementary to each other, analyzing distinct traits of architectures in the views of expressivity, progressivity, trainability, and complexity. The proxy scores can be obtained simultaneously within a single forward and backward pass, making an overall NAS process highly efficient. In order to integrate the rankings predicted by our proxies effectively, we introduce a non-linear ranking aggregation method that highlights the networks highly-ranked consistently across all the proxies. Experimental results conclusively demonstrate the efficacy and efficiency of AZ-NAS, outperforming state-of-the-art methods on standard benchmarks, all while maintaining a reasonable runtime cost.
- [189] arXiv:2403.19234 [pdf, other]
-
Title: Regularized dynamical parametric approximationSubjects: Numerical Analysis (math.NA)
This paper studies the numerical approximation of evolution equations by nonlinear parametrizations $u(t)=\Phi(q(t))$ with time-dependent parameters $q(t)$, which are to be determined in the computation. The motivation comes from approximations in quantum dynamics by multiple Gaussians and approximations of various dynamical problems by tensor networks and neural networks. In all these cases, the parametrization is typically irregular: the derivative $\Phi'(q)$ can have arbitrarily small singular values and may have varying rank. We derive approximation results for a regularized approach in the time-continuous case as well as in time-discretized cases. With a suitable choice of the regularization parameter and the time stepsize, the approach can be successfully applied in irregular situations, even though it runs counter to the basic principle in numerical analysis to avoid solving ill-posed subproblems when aiming for a stable algorithm. Numerical experiments with sums of Gaussians for approximating quantum dynamics and with neural networks for approximating the flow map of a system of ordinary differential equations illustrate and complement the theoretical results.
- [190] arXiv:2403.19235 [pdf, other]
-
Title: DreamSalon: A Staged Diffusion Framework for Preserving Identity-Context in Editable Face GenerationAuthors: Haonan Lin, Mengmeng Wang, Yan Chen, Wenbin An, Yuzhe Yao, Guang Dai, Qianying Wang, Yong Liu, Jingdong WangSubjects: Computer Vision and Pattern Recognition (cs.CV)
While large-scale pre-trained text-to-image models can synthesize diverse and high-quality human-centered images, novel challenges arise with a nuanced task of "identity fine editing": precisely modifying specific features of a subject while maintaining its inherent identity and context. Existing personalization methods either require time-consuming optimization or learning additional encoders, adept in "identity re-contextualization". However, they often struggle with detailed and sensitive tasks like human face editing. To address these challenges, we introduce DreamSalon, a noise-guided, staged-editing framework, uniquely focusing on detailed image manipulations and identity-context preservation. By discerning editing and boosting stages via the frequency and gradient of predicted noises, DreamSalon first performs detailed manipulations on specific features in the editing stage, guided by high-frequency information, and then employs stochastic denoising in the boosting stage to improve image quality. For more precise editing, DreamSalon semantically mixes source and target textual prompts, guided by differences in their embedding covariances, to direct the model's focus on specific manipulation areas. Our experiments demonstrate DreamSalon's ability to efficiently and faithfully edit fine details on human faces, outperforming existing methods both qualitatively and quantitatively.
- [191] arXiv:2403.19238 [pdf, other]
-
Title: Taming Lookup Tables for Efficient Image RetouchingSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Image and Video Processing (eess.IV)
The widespread use of high-definition screens in edge devices, such as end-user cameras, smartphones, and televisions, is spurring a significant demand for image enhancement. Existing enhancement models often optimize for high performance while falling short of reducing hardware inference time and power consumption, especially on edge devices with constrained computing and storage resources. To this end, we propose Image Color Enhancement Lookup Table (ICELUT) that adopts LUTs for extremely efficient edge inference, without any convolutional neural network (CNN). During training, we leverage pointwise (1x1) convolution to extract color information, alongside a split fully connected layer to incorporate global information. Both components are then seamlessly converted into LUTs for hardware-agnostic deployment. ICELUT achieves near-state-of-the-art performance and remarkably low power consumption. We observe that the pointwise network structure exhibits robust scalability, upkeeping the performance even with a heavily downsampled 32x32 input image. These enable ICELUT, the first-ever purely LUT-based image enhancer, to reach an unprecedented speed of 0.4ms on GPU and 7ms on CPU, at least one order faster than any CNN solution. Codes are available at https://github.com/Stephen0808/ICELUT.
- [192] arXiv:2403.19242 [pdf, other]
-
Title: RTracker: Recoverable Tracking via PN Tree Structured MemoryComments: accepted by CVPR 2024Subjects: Computer Vision and Pattern Recognition (cs.CV)
Existing tracking methods mainly focus on learning better target representation or developing more robust prediction models to improve tracking performance. While tracking performance has significantly improved, the target loss issue occurs frequently due to tracking failures, complete occlusion, or out-of-view situations. However, considerably less attention is paid to the self-recovery issue of tracking methods, which is crucial for practical applications. To this end, we propose a recoverable tracking framework, RTracker, that uses a tree-structured memory to dynamically associate a tracker and a detector to enable self-recovery ability. Specifically, we propose a Positive-Negative Tree-structured memory to chronologically store and maintain positive and negative target samples. Upon the PN tree memory, we develop corresponding walking rules for determining the state of the target and define a set of control flows to unite the tracker and the detector in different tracking scenarios. Our core idea is to use the support samples of positive and negative target categories to establish a relative distance-based criterion for a reliable assessment of target loss. The favorable performance in comparison against the state-of-the-art methods on numerous challenging benchmarks demonstrates the effectiveness of the proposed algorithm.
- [193] arXiv:2403.19243 [pdf, other]
-
Title: Sine Activated Low-Rank Matrices for Parameter Efficient LearningComments: The first two authors contributed equallySubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)
Low-rank decomposition has emerged as a vital tool for enhancing parameter efficiency in neural network architectures, gaining traction across diverse applications in machine learning. These techniques significantly lower the number of parameters, striking a balance between compactness and performance. However, a common challenge has been the compromise between parameter efficiency and the accuracy of the model, where reduced parameters often lead to diminished accuracy compared to their full-rank counterparts. In this work, we propose a novel theoretical framework that integrates a sinusoidal function within the low-rank decomposition process. This approach not only preserves the benefits of the parameter efficiency characteristic of low-rank methods but also increases the decomposition's rank, thereby enhancing model accuracy. Our method proves to be an adaptable enhancement for existing low-rank models, as evidenced by its successful application in Vision Transformers (ViT), Large Language Models (LLMs), Neural Radiance Fields (NeRF), and 3D shape modeling. This demonstrates the wide-ranging potential and efficiency of our proposed technique.
- [194] arXiv:2403.19245 [pdf, ps, other]
-
Title: The use of ChatGPT in higher education: The advantages and disadvantagesAuthors: Joshua Ebere ChukwuereSubjects: Computers and Society (cs.CY)
Higher education scholars are interested in an artificial intelligence (AI) technology called ChatGPT, which was developed by OpenAI. Whether ChatGPT can improve learning is still a topic of debate among experts. This concise overview of the literature examines the application of ChatGPT in higher education to comprehend and produce high-level instruction. By examining the essential literature, this study seeks to provide a thorough assessment of the advantages and disadvantages of utilizing ChatGPT in higher education settings. But it's crucial to consider both the positive and negative elements. For this rapid review, the researcher searched Google Scholar, Scopus, and others between January 2023 and July 2023 for prior research from various publications. These studies were examined. The study found that employing ChatGPT in higher education is beneficial for a number of reasons. It can provide individualized instruction, and prompt feedback, facilitate access to learning, and promote student interaction. These benefits could improve the learning environment and make it more fun for academics and students. The cons of ChatGPT are equally present. These problems include the inability to comprehend emotions, the lack of social interaction chances, technological limitations, and the dangers of depending too much on ChatGPT for higher education. Higher education should combine ChatGPT with other teaching techniques to provide students and lecturers with a comprehensive education. However, it is crucial to consider the positives, negatives, and moral issues before adopting ChatGPT in the classroom.
- [195] arXiv:2403.19246 [pdf, other]
-
Title: MPXGAT: An Attention based Deep Learning Model for Multiplex Graphs EmbeddingSubjects: Machine Learning (cs.LG); Discrete Mathematics (cs.DM); Social and Information Networks (cs.SI)
Graph representation learning has rapidly emerged as a pivotal field of study. Despite its growing popularity, the majority of research has been confined to embedding single-layer graphs, which fall short in representing complex systems with multifaceted relationships. To bridge this gap, we introduce MPXGAT, an innovative attention-based deep learning model tailored to multiplex graph embedding. Leveraging the robustness of Graph Attention Networks (GATs), MPXGAT captures the structure of multiplex networks by harnessing both intra-layer and inter-layer connections. This exploitation facilitates accurate link prediction within and across the network's multiple layers. Our comprehensive experimental evaluation, conducted on various benchmark datasets, confirms that MPXGAT consistently outperforms state-of-the-art competing algorithms.
- [196] arXiv:2403.19248 [pdf, other]
-
Title: Genos: General In-Network Unsupervised Intrusion Detection by Rule ExtractionComments: accepted by IEEE International Conference on Computer Communications (INFOCOM 2024)Subjects: Cryptography and Security (cs.CR)
Anomaly-based network intrusion detection systems (A-NIDS) use unsupervised models to detect unforeseen attacks. However, existing A-NIDS solutions suffer from low throughput, lack of interpretability, and high maintenance costs. Recent in-network intelligence (INI) exploits programmable switches to offer line-rate deployment of NIDS. Nevertheless, current in-network NIDS are either model-specific or only apply to supervised models. In this paper, we propose Genos, a general in-network framework for unsupervised A-NIDS by rule extraction, which consists of a Model Compiler, a Model Interpreter, and a Model Debugger. Specifically, observing benign data are multimodal and usually located in multiple subspaces in the feature space, we utilize a divide-and-conquer approach for model-agnostic rule extraction. In the Model Compiler, we first propose a tree-based clustering algorithm to partition the feature space into subspaces, then design a decision boundary estimation mechanism to approximate the source model in each subspace. The Model Interpreter interprets predictions by important attributes to aid network operators in understanding the predictions. The Model Debugger conducts incremental updating to rectify errors by only fine-tuning rules on affected subspaces, thus reducing maintenance costs. We implement a prototype using physical hardware, and experiments demonstrate its superior performance of 100 Gbps throughput, great interpretability, and trivial updating overhead.
- [197] arXiv:2403.19253 [pdf, other]
-
Title: Inferring Latent Temporal Sparse Coordination Graph for Multi-Agent Reinforcement LearningSubjects: Machine Learning (cs.LG); Multiagent Systems (cs.MA)
Effective agent coordination is crucial in cooperative Multi-Agent Reinforcement Learning (MARL). While agent cooperation can be represented by graph structures, prevailing graph learning methods in MARL are limited. They rely solely on one-step observations, neglecting crucial historical experiences, leading to deficient graphs that foster redundant or detrimental information exchanges. Additionally, high computational demands for action-pair calculations in dense graphs impede scalability. To address these challenges, we propose inferring a Latent Temporal Sparse Coordination Graph (LTS-CG) for MARL. The LTS-CG leverages agents' historical observations to calculate an agent-pair probability matrix, where a sparse graph is sampled from and used for knowledge exchange between agents, thereby simultaneously capturing agent dependencies and relation uncertainty. The computational complexity of this procedure is only related to the number of agents. This graph learning process is further augmented by two innovative characteristics: Predict-Future, which enables agents to foresee upcoming observations, and Infer-Present, ensuring a thorough grasp of the environmental context from limited data. These features allow LTS-CG to construct temporal graphs from historical and real-time information, promoting knowledge exchange during policy learning and effective collaboration. Graph learning and agent training occur simultaneously in an end-to-end manner. Our demonstrated results on the StarCraft II benchmark underscore LTS-CG's superior performance.
- [198] arXiv:2403.19254 [pdf, other]
-
Title: Imperceptible Protection against Style Imitation from Diffusion ModelsSubjects: Computer Vision and Pattern Recognition (cs.CV)
Recent progress in diffusion models has profoundly enhanced the fidelity of image generation. However, this has raised concerns about copyright infringements. While prior methods have introduced adversarial perturbations to prevent style imitation, most are accompanied by the degradation of artworks' visual quality. Recognizing the importance of maintaining this, we develop a visually improved protection method that preserves its protection capability. To this end, we create a perceptual map to identify areas most sensitive to human eyes. We then adjust the protection intensity guided by an instance-aware refinement. We also integrate a perceptual constraints bank to further improve the imperceptibility. Results show that our method substantially elevates the quality of the protected image without compromising on protection efficacy.
- [199] arXiv:2403.19257 [pdf, other]
-
Title: UniFaaS: Programming across Distributed Cyberinfrastructure with Federated Function ServingComments: 13 pages, 13 figures, IPDPS2024Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Modern scientific applications are increasingly decomposable into individual functions that may be deployed across distributed and diverse cyberinfrastructure such as supercomputers, clouds, and accelerators. Such applications call for new approaches to programming, distributed execution, and function-level management. We present UniFaaS, a parallel programming framework that relies on a federated function-as-a-service (FaaS) model to enable composition of distributed, scalable, and high-performance scientific workflows, and to support fine-grained function-level management. UniFaaS provides a unified programming interface to compose dynamic task graphs with transparent wide-area data management. UniFaaS exploits an observe-predict-decide approach to efficiently map workflow tasks to target heterogeneous and dynamic resources. We propose a dynamic heterogeneity-aware scheduling algorithm that employs a delay mechanism and a re-scheduling mechanism to accommodate dynamic resource capacity. Our experiments show that UniFaaS can efficiently execute workflows across computing resources with minimal scheduling overhead. We show that UniFaaS can improve the performance of a real-world drug screening workflow by as much as 22.99% when employing an additional 19.48% of resources and a montage workflow by 54.41% when employing an additional 47.83% of resources across multiple distributed clusters, in contrast to using a single cluster
- [200] arXiv:2403.19259 [pdf, other]
-
Title: J-CRe3: A Japanese Conversation Dataset for Real-world Reference ResolutionAuthors: Nobuhiro Ueda, Hideko Habe, Yoko Matsui, Akishige Yuguchi, Seiya Kawano, Yasutomo Kawanishi, Sadao Kurohashi, Koichiro YoshinoComments: LREC-COLING 2024Subjects: Computation and Language (cs.CL)
Understanding expressions that refer to the physical world is crucial for such human-assisting systems in the real world, as robots that must perform actions that are expected by users. In real-world reference resolution, a system must ground the verbal information that appears in user interactions to the visual information observed in egocentric views. To this end, we propose a multimodal reference resolution task and construct a Japanese Conversation dataset for Real-world Reference Resolution (J-CRe3). Our dataset contains egocentric video and dialogue audio of real-world conversations between two people acting as a master and an assistant robot at home. The dataset is annotated with crossmodal tags between phrases in the utterances and the object bounding boxes in the video frames. These tags include indirect reference relations, such as predicate-argument structures and bridging references as well as direct reference relations. We also constructed an experimental model and clarified the challenges in multimodal reference resolution tasks.
- [201] arXiv:2403.19260 [pdf, other]
-
Title: NaijaHate: Evaluating Hate Speech Detection on Nigerian Twitter Using Representative DataAuthors: Manuel Tonneau, Pedro Vitor Quinta de Castro, Karim Lasri, Ibrahim Farouq, Lakshminarayanan Subramanian, Victor Orozco-Olvera, Samuel FraibergerSubjects: Computation and Language (cs.CL)
To address the global issue of hateful content proliferating in online platforms, hate speech detection (HSD) models are typically developed on datasets collected in the United States, thereby failing to generalize to English dialects from the Majority World. Furthermore, HSD models are often evaluated on curated samples, raising concerns about overestimating model performance in real-world settings. In this work, we introduce NaijaHate, the first dataset annotated for HSD which contains a representative sample of Nigerian tweets. We demonstrate that HSD evaluated on biased datasets traditionally used in the literature largely overestimates real-world performance on representative data. We also propose NaijaXLM-T, a pretrained model tailored to the Nigerian Twitter context, and establish the key role played by domain-adaptive pretraining and finetuning in maximizing HSD performance. Finally, we show that in this context, a human-in-the-loop approach to content moderation where humans review 1% of Nigerian tweets flagged as hateful would enable to moderate 60% of all hateful content. Taken together, these results pave the way towards robust HSD systems and a better protection of social media users from hateful content in low-resource settings.
- [202] arXiv:2403.19265 [pdf, other]
-
Title: Neural Fields for 3D Tracking of Anatomy and Surgical Instruments in Monocular Laparoscopic Video ClipsSubjects: Computer Vision and Pattern Recognition (cs.CV)
Laparoscopic video tracking primarily focuses on two target types: surgical instruments and anatomy. The former could be used for skill assessment, while the latter is necessary for the projection of virtual overlays. Where instrument and anatomy tracking have often been considered two separate problems, in this paper, we propose a method for joint tracking of all structures simultaneously. Based on a single 2D monocular video clip, we train a neural field to represent a continuous spatiotemporal scene, used to create 3D tracks of all surfaces visible in at least one frame. Due to the small size of instruments, they generally cover a small part of the image only, resulting in decreased tracking accuracy. Therefore, we propose enhanced class weighting to improve the instrument tracks. We evaluate tracking on video clips from laparoscopic cholecystectomies, where we find mean tracking accuracies of 92.4% for anatomical structures and 87.4% for instruments. Additionally, we assess the quality of depth maps obtained from the method's scene reconstructions. We show that these pseudo-depths have comparable quality to a state-of-the-art pre-trained depth estimator. On laparoscopic videos in the SCARED dataset, the method predicts depth with an MAE of 2.9 mm and a relative error of 9.2%. These results show the feasibility of using neural fields for monocular 3D reconstruction of laparoscopic scenes.
- [203] arXiv:2403.19266 [pdf, other]
-
Title: On the Performance of Low-complexity Decoders of LDPC and Polar CodesComments: arXiv admin note: text overlap with arXiv:2012.13378 by other authorsSubjects: Information Theory (cs.IT)
Efficient decoding is crucial to high-throughput and low-power wireless communication scenarios. A theoretical analysis of the performance-complexity tradeoff toward low-complexity decoding is required for a better understanding of the fundamental limits in the above-mentioned scenarios. This study aims to explore the performance of decoders with complexity constraints. Specifically, we investigate the performance of LDPC codes with different numbers of belief-propagation iterations and the performance of polar codes with an SSC decoder. We found that the asymptotic error rates of both polar codes and LDPC codes are functions of complexity $T$ and code length $N$, in the form of $2^{-a2^{b\frac{T}{N}}}$, where $a$ and $b$ are constants that depend on channel and coding schemes. Our analysis reveals the different performance-complexity tradeoffs for LDPC and polar codes. The results indicate that if one aims to further enhance the decoding efficiency for LDPC codes, the key lies in how to efficiently pass messages on the factor graph. In terms of decoding efficiency, polar codes asymptotically outperform $(J, K)$-regular LDPC codes with a code rate $R \le 1-\frac{J(J-1)}{2^J+(J-1)}$ in the low-complexity regime $(T \le O(NlogN))$.
- [204] arXiv:2403.19267 [pdf, other]
-
Title: MineLand: Simulating Large-Scale Multi-Agent Interactions with Limited Multimodal Senses and Physical NeedsComments: Project website: this https URLSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Conventional multi-agent simulators often assume perfect information and limitless capabilities, hindering the ecological validity of social interactions. We propose a multi-agent Minecraft simulator, MineLand, that bridges this gap by introducing limited multimodal senses and physical needs. Our simulator supports up to 48 agents with limited visual, auditory, and environmental awareness, forcing them to actively communicate and collaborate to fulfill physical needs like food and resources. This fosters dynamic and valid multi-agent interactions. We further introduce an AI agent framework, Alex, inspired by multitasking theory, enabling agents to handle intricate coordination and scheduling. Our experiments demonstrate that the simulator, the corresponding benchmark, and the AI agent framework contribute to more ecological and nuanced collective behavior. The source code of MineLand and Alex is openly available at https://github.com/cocacola-lab/MineLand.
- [205] arXiv:2403.19270 [pdf, other]
-
Title: sDPO: Don't Use Your Data All at OnceSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
As development of large language models (LLM) progresses, aligning them with human preferences has become increasingly important. We propose stepwise DPO (sDPO), an extension of the recently popularized direct preference optimization (DPO) for alignment tuning. This approach involves dividing the available preference datasets and utilizing them in a stepwise manner, rather than employing it all at once. We demonstrate that this method facilitates the use of more precisely aligned reference models within the DPO training framework. Furthermore, sDPO trains the final model to be more performant, even outperforming other popular LLMs with more parameters.
- [206] arXiv:2403.19271 [pdf, other]
-
Title: DeepSample: DNN sampling-based testing for operational accuracy assessmentComments: Accepted for publication at ICSE 2024, Lisbon, PortugalSubjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI)
Deep Neural Networks (DNN) are core components for classification and regression tasks of many software systems. Companies incur in high costs for testing DNN with datasets representative of the inputs expected in operation, as these need to be manually labelled. The challenge is to select a representative set of test inputs as small as possible to reduce the labelling cost, while sufficing to yield unbiased high-confidence estimates of the expected DNN accuracy. At the same time, testers are interested in exposing as many DNN mispredictions as possible to improve the DNN, ending up in the need for techniques pursuing a threefold aim: small dataset size, trustworthy estimates, mispredictions exposure. This study presents DeepSample, a family of DNN testing techniques for cost-effective accuracy assessment based on probabilistic sampling. We investigate whether, to what extent, and under which conditions probabilistic sampling can help to tackle the outlined challenge. We implement five new sampling-based testing techniques, and perform a comprehensive comparison of such techniques and of three further state-of-the-art techniques for both DNN classification and regression tasks. Results serve as guidance for best use of sampling-based testing for faithful and high-confidence estimates of DNN accuracy in operation at low cost.
- [207] arXiv:2403.19272 [pdf, other]
-
Title: Mil2: Efficient Cloth Simulation Using Non-distance Barriers and Subspace ReuseAuthors: Lei Lan, Zixuan Lu, Jingyi Long, Chun Yuan, Xuan Li, Xiaowei He, Huamin Wang, Chenfanfu Jiang, Yin YangSubjects: Graphics (cs.GR)
Mil2 pushes the performance of high-resolution cloth simulation, making the simulation interactive (in milliseconds) for models with one million degrees of freedom (DOFs) while keeping every triangle untangled. The guarantee of being penetration-free is inspired by the interior-point method, which converts the inequality constraints to barrier potentials. Nevertheless, we propose a major overhaul of this modality by defining a novel and simple barrier formulation which does not depend on the distance between mesh primitives. Such a non-distance barrier model allows a new way to integrate collision detection into the simulation pipeline. Another contributor to the performance boost comes from the so-called subspace reuse strategy. This is based on the observation that low-frequency strain vibrations are near orthogonal to the deformation induced by collisions or self-collisions, often of high frequency. Subspace reuse then takes care of low-frequency residuals, while high-frequency residuals can also be effectively smoothed by GPU-based iterative solvers. We show that our method outperforms existing fast cloth simulators by nearly one order while keeping the entire simulation penetration-free and producing high-equality animations of high-resolution models.
- [208] arXiv:2403.19273 [pdf, other]
-
Title: A Machine Learning Approach for Crop Yield and Disease Prediction Integrating Soil Nutrition and Weather FactorsAuthors: Forkan Uddin Ahmed (1), Annesha Das (1), Md Zubair (1) ((1) Department of Computer Science and Engineering, Chittagong University of Engineering & Technology, Chattogram, Bangladesh)Comments: This paper was presented to the IEEE conference, "2024 International Conference on Advances in Computing, Communication, Electrical, and Smart Systems (iCACCESS), 8-9 March, Dhaka, Bangladesh"Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
The development of an intelligent agricultural decision-supporting system for crop selection and disease forecasting in Bangladesh is the main objective of this work. The economy of the nation depends heavily on agriculture. However, choosing crops with better production rates and efficiently controlling crop disease are obstacles that farmers have to face. These issues are addressed in this research by utilizing machine learning methods and real-world datasets. The recommended approach uses a variety of datasets on the production of crops, soil conditions, agro-meteorological regions, crop disease, and meteorological factors. These datasets offer insightful information on disease trends, soil nutrition demand of crops, and agricultural production history. By incorporating this knowledge, the model first recommends the list of primarily selected crops based on the soil nutrition of a particular user location. Then the predictions of meteorological variables like temperature, rainfall, and humidity are made using SARIMAX models. These weather predictions are then used to forecast the possibilities of diseases for the primary crops list by utilizing the support vector classifier. Finally, the developed model makes use of the decision tree regression model to forecast crop yield and provides a final crop list along with associated possible disease forecast. Utilizing the outcome of the model, farmers may choose the best productive crops as well as prevent crop diseases and reduce output losses by taking preventive actions. Consequently, planning and decision-making processes are supported and farmers can predict possible crop yields. Overall, by offering a detailed decision support system for crop selection and disease prediction, this work can play a vital role in advancing agricultural practices in Bangladesh.
- [209] arXiv:2403.19275 [pdf, other]
-
Title: Knowledge Boundary and Persona Dynamic Shape A Better Social Media AgentSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Constructing personalized and anthropomorphic agents holds significant importance in the simulation of social networks. However, there are still two key problems in existing works: the agent possesses world knowledge that does not belong to its personas, and it cannot eliminate the interference of diverse persona information on current actions, which reduces the personalization and anthropomorphism of the agent. To solve the above problems, we construct the social media agent based on personalized knowledge and dynamic persona information. For personalized knowledge, we add external knowledge sources and match them with the persona information of agents, thereby giving the agent personalized world knowledge. For dynamic persona information, we use current action information to internally retrieve the persona information of the agent, thereby reducing the interference of diverse persona information on the current action. To make the agent suitable for social media, we design five basic modules for it: persona, planning, action, memory and reflection. To provide an interaction and verification environment for the agent, we build a social media simulation sandbox. In the experimental verification, automatic and human evaluations demonstrated the effectiveness of the agent we constructed.
- [210] arXiv:2403.19276 [pdf, ps, other]
-
Title: Enhanced Bayesian Personalized Ranking for Robust Hard Negative Sampling in Recommender SystemsComments: 9 pagesSubjects: Information Retrieval (cs.IR)
In implicit collaborative filtering, hard negative mining techniques are developed to accelerate and enhance the recommendation model learning. However, the inadvertent selection of false negatives remains a major concern in hard negative sampling, as these false negatives can provide incorrect information and mislead the model learning. To date, only a small number of studies have been committed to solve the false negative problem, primarily focusing on designing sophisticated sampling algorithms to filter false negatives. In contrast, this paper shifts its focus to refining the loss function. We find that the original Bayesian Personalized Ranking (BPR), initially designed for uniform negative sampling, is inadequate in adapting to hard sampling scenarios. Hence, we introduce an enhanced Bayesian Personalized Ranking objective, named as Hard-BPR, which is specifically crafted for dynamic hard negative sampling to mitigate the influence of false negatives. This method is simple yet efficient for real-world deployment. Extensive experiments conducted on three real-world datasets demonstrate the effectiveness and robustness of our approach, along with the enhanced ability to distinguish false negatives.
- [211] arXiv:2403.19278 [pdf, other]
-
Title: CAT: Exploiting Inter-Class Dynamics for Domain Adaptive Object DetectionComments: Accepted into CVPR 2024Subjects: Computer Vision and Pattern Recognition (cs.CV)
Domain adaptive object detection aims to adapt detection models to domains where annotated data is unavailable. Existing methods have been proposed to address the domain gap using the semi-supervised student-teacher framework. However, a fundamental issue arises from the class imbalance in the labelled training set, which can result in inaccurate pseudo-labels. The relationship between classes, especially where one class is a majority and the other minority, has a large impact on class bias. We propose Class-Aware Teacher (CAT) to address the class bias issue in the domain adaptation setting. In our work, we approximate the class relationships with our Inter-Class Relation module (ICRm) and exploit it to reduce the bias within the model. In this way, we are able to apply augmentations to highly related classes, both inter- and intra-domain, to boost the performance of minority classes while having minimal impact on majority classes. We further reduce the bias by implementing a class-relation weight to our classification loss. Experiments conducted on various datasets and ablation studies show that our method is able to address the class bias in the domain adaptation setting. On the Cityscapes to Foggy Cityscapes dataset, we attained a 52.5 mAP, a substantial improvement over the 51.2 mAP achieved by the state-of-the-art method.
- [212] arXiv:2403.19279 [pdf, other]
-
Title: Fine-Tuning Language Models with Reward Learning on PolicyComments: NAACL2024 Main Track Long PaperSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Reinforcement learning from human feedback (RLHF) has emerged as an effective approach to aligning large language models (LLMs) to human preferences. RLHF contains three steps, i.e., human preference collecting, reward learning, and policy optimization, which are usually performed serially. Despite its popularity, however, (fixed) reward models may suffer from inaccurate off-distribution, since policy optimization continuously shifts LLMs' data distribution. Repeatedly collecting new preference data from the latest LLMs may alleviate this issue, which unfortunately makes the resulting system more complicated and difficult to optimize. In this paper, we propose reward learning on policy (RLP), an unsupervised framework that refines a reward model using policy samples to keep it on-distribution. Specifically, an unsupervised multi-view learning method is introduced to learn robust representations of policy samples. Meanwhile, a synthetic preference generation approach is developed to simulate high-quality preference data with policy outputs. Extensive experiments on three benchmark datasets show that RLP consistently outperforms the state-of-the-art. Our code is available at \url{https://github.com/AlibabaResearch/DAMO-ConvAI/tree/main/rlp}.
- [213] arXiv:2403.19283 [pdf, other]
-
Title: Ungrammatical-syntax-based In-context Example Selection for Grammatical Error CorrectionComments: Accepted to NAACL 2024 Main ConferenceSubjects: Computation and Language (cs.CL)
In the era of large language models (LLMs), in-context learning (ICL) stands out as an effective prompting strategy that explores LLMs' potency across various tasks. However, applying LLMs to grammatical error correction (GEC) is still a challenging task. In this paper, we propose a novel ungrammatical-syntax-based in-context example selection strategy for GEC. Specifically, we measure similarity of sentences based on their syntactic structures with diverse algorithms, and identify optimal ICL examples sharing the most similar ill-formed syntax to the test input. Additionally, we carry out a two-stage process to further improve the quality of selection results. On benchmark English GEC datasets, empirical results show that our proposed ungrammatical-syntax-based strategies outperform commonly-used word-matching or semantics-based methods with multiple LLMs. This indicates that for a syntax-oriented task like GEC, paying more attention to syntactic information can effectively boost LLMs' performance. Our code will be publicly available after the publication of this paper.
- [214] arXiv:2403.19285 [pdf, other]
-
Title: Going Beyond Word Matching: Syntax Improves In-context Example Selection for Machine TranslationSubjects: Computation and Language (cs.CL)
In-context learning (ICL) is the trending prompting strategy in the era of large language models (LLMs), where a few examples are demonstrated to evoke LLMs' power for a given task. How to select informative examples remains an open issue. Previous works on in-context example selection for machine translation (MT) focus on superficial word-level features while ignoring deep syntax-level knowledge. In this paper, we propose a syntax-based in-context example selection method for MT, by computing the syntactic similarity between dependency trees using Polynomial Distance. In addition, we propose an ensemble strategy combining examples selected by both word-level and syntax-level criteria. Experimental results between English and 6 common languages indicate that syntax can effectively enhancing ICL for MT, obtaining the highest COMET scores on 11 out of 12 translation directions.
- [215] arXiv:2403.19286 [pdf, other]
-
Title: Adaptive optimization of isogeometric multi-patch discretizations using artificial neural networksSubjects: Numerical Analysis (math.NA)
In isogeometric analysis, isogeometric function spaces are employed for accurately representing the solution to a partial differential equation (PDE) on a parameterized domain. They are generated from a tensor-product spline space by composing the basis functions with the inverse of the parameterization. Depending on the geometry of the domain and on the data of the PDE, the solution might not have maximum Sobolev regularity, leading to a reduced convergence rate. In this case it is necessary to reduce the local mesh size close to the singularities. The classical approach is to perform adaptive h-refinement, which either leads to an unnecessarily large number of degrees of freedom or to a spline space that does not possess a tensor-product structure. Based on the concept of r-adaptivity we present a novel approach for finding a suitable isogeometric function space for a given PDE without sacrificing the tensor-product structure of the underlying spline space. In particular, we use the fact that different reparameterizations of the same computational domain lead to different isogeometric function spaces while preserving the geometry. Starting from a multi-patch domain consisting of bilinearly parameterized patches, we aim to find the biquadratic multi-patch parameterization that leads to the isogeometric function space with the smallest best approximation error of the solution. In order to estimate the location of the optimal control points, we employ a trained residual neural network that is applied to the graph surfaces of the approximated solution and its derivatives. In our experimental results, we observe that our new method results in a vast improvement of the approximation error for different PDE problems on multi-patch domains.
- [216] arXiv:2403.19287 [pdf, other]
-
Title: CoderUJB: An Executable and Unified Java Benchmark for Practical Programming ScenariosComments: 11 pages, 4 figures, issta2024 acceptedSubjects: Software Engineering (cs.SE)
In the evolving landscape of large language models (LLMs) tailored for software engineering, the need for benchmarks that accurately reflect real-world development scenarios is paramount. Current benchmarks are either too simplistic or fail to capture the multi-tasking nature of software development. To address this, we introduce CoderUJB, a new benchmark designed to evaluate LLMs across diverse Java programming tasks that are executable and reflective of actual development scenarios, acknowledging Java's prevalence in real-world software production. CoderUJB comprises 2,239 programming questions derived from 17 real open-source Java projects and spans five practical programming tasks. Our empirical study on this benchmark investigates the coding abilities of various open-source and closed-source LLMs, examining the effects of continued pre-training in specific programming languages code and instruction fine-tuning on their performance. The findings indicate that while LLMs exhibit strong potential, challenges remain, particularly in non-functional code generation (e.g., test generation and defect detection). Importantly, our results advise caution in the specific programming languages continued pre-training and instruction fine-tuning, as these techniques could hinder model performance on certain tasks, suggesting the need for more nuanced strategies. CoderUJB thus marks a significant step towards more realistic evaluations of programming capabilities in LLMs, and our study provides valuable insights for the future development of these models in software engineering.
- [217] arXiv:2403.19289 [pdf, other]
-
Title: Graph Neural Networks for Treatment Effect PredictionSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Methodology (stat.ME)
Estimating causal effects in e-commerce tends to involve costly treatment assignments which can be impractical in large-scale settings. Leveraging machine learning to predict such treatment effects without actual intervention is a standard practice to diminish the risk. However, existing methods for treatment effect prediction tend to rely on training sets of substantial size, which are built from real experiments and are thus inherently risky to create. In this work we propose a graph neural network to diminish the required training set size, relying on graphs that are common in e-commerce data. Specifically, we view the problem as node regression with a restricted number of labeled instances, develop a two-model neural architecture akin to previous causal effect estimators, and test varying message-passing layers for encoding. Furthermore, as an extra step, we combine the model with an acquisition function to guide the creation of the training set in settings with extremely low experimental budget. The framework is flexible since each step can be used separately with other models or policies. The experiments on real large-scale networks indicate a clear advantage of our methodology over the state of the art, which in many cases performs close to random underlining the need for models that can generalize with limited labeled samples to reduce experimental risks.
- [218] arXiv:2403.19292 [pdf, other]
-
Title: Deep Learning-based Modulation Classification of Practical OFDM Signals for Spectrum SensingComments: 9 pages, 12 figuresSubjects: Networking and Internet Architecture (cs.NI)
In this study, the modulation of symbols on OFDM subcarriers is classified for transmissions following Wi-Fi~6 and 5G downlink specifications. First, our approach estimates the OFDM symbol duration and cyclic prefix length based on the cyclic autocorrelation function. We propose a feature extraction algorithm characterizing the modulation of OFDM signals, which includes removing the effects of a synchronization error. The obtained feature is converted into a 2D histogram of phase and amplitude and this histogram is taken as input to a convolutional neural network (CNN)-based classifier. The classifier does not require prior knowledge of protocol-specific information such as Wi-Fi preamble or resource allocation of 5G physical channels. The classifier's performance, evaluated using synthetic and real-world measured over-the-air (OTA) datasets, achieves a minimum accuracy of 97\% accuracy with OTA data when SNR is above the value required for data transmission.
- [219] arXiv:2403.19293 [pdf, other]
-
Title: Adaptive Preload Control of Cable-Driven Parallel Robots for Handling TaskComments: Submitted to "Annals of Scientific Society for Assembly, Handling and Industrial Robotics" (MHI2024 conference/colloquium)Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
This paper presents a method for dynamic adjustment of cable preloads based on the actuation redundancy of \acp{CDPR}, which allows increasing or decreasing the platform stiffness depending on task requirements. This is achieved by computing preload parameters with an extended nullspace formulation of the kinematics. The method facilitates the operator's ability to specify a defined preload within the operation space. The algorithms are implemented in a real-time environment, allowing for the use of optimization in hybrid position-force control. To validate the effectiveness of this approach, a simulation study is performed, and the obtained results are compared to existing methods. Furthermore, the method is investigated experimentally and compared with the conventional position-controlled operation of a cable robot. The results demonstrate the feasibility of adaptively adjusting cable preloads during platform motion and manipulation of additional objects.
- [220] arXiv:2403.19294 [pdf, other]
-
Title: FlowDepth: Decoupling Optical Flow for Self-Supervised Monocular Depth EstimationSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Self-supervised multi-frame methods have currently achieved promising results in depth estimation. However, these methods often suffer from mismatch problems due to the moving objects, which break the static assumption. Additionally, unfairness can occur when calculating photometric errors in high-freq or low-texture regions of the images. To address these issues, existing approaches use additional semantic priori black-box networks to separate moving objects and improve the model only at the loss level. Therefore, we propose FlowDepth, where a Dynamic Motion Flow Module (DMFM) decouples the optical flow by a mechanism-based approach and warps the dynamic regions thus solving the mismatch problem. For the unfairness of photometric errors caused by high-freq and low-texture regions, we use Depth-Cue-Aware Blur (DCABlur) and Cost-Volume sparsity loss respectively at the input and the loss level to solve the problem. Experimental results on the KITTI and Cityscapes datasets show that our method outperforms the state-of-the-art methods.
- [221] arXiv:2403.19298 [pdf, other]
-
Title: A unified SHTC multiphase model of continuum mechanicsSubjects: Numerical Analysis (math.NA); Fluid Dynamics (physics.flu-dyn)
In this paper, we present a unified nonequilibrium model of continuum mechanics for compressible multiphase flows. The model, which is formulated within the framework of Symmetric Hyperbolic Thermodynamically Compatible (SHTC) equations, can describe the arbitrary number of phases that can be heat-conducting inviscid and viscous fluids}, as well as elastoplastic solids. The phases are allowed to have different velocities, pressures, temperatures, and shear stresses, while the material interfaces are treated as diffuse interfaces with the volume fraction playing the role of the interface field. To relate our model to other multiphase approaches, we reformulate the SHTC governing equations in terms of the phase state parameters and put them in the form of Baer-Nunziato-type models. It is the Baer-Nunziato form of the SHTC equations which is then solved numerically using a robust second-order path-conservative MUSCL-Hancock finite volume method on Cartesian meshes. Due to the fact that the obtained governing equations are very challenging, we restrict our numerical examples to a simplified version of the model, focusing on the isentropic limit for three-phase mixtures. To address the stiffness properties of the relaxation source terms present in the model, the implemented scheme incorporates a semi-analytical time integration method specifically designed for the non-linear stiff source terms governing the strain relaxation. The validation process involves a wide range of benchmarks and several applications for compressible multiphase problems. Notably, results are presented for multiphase flows in all the relaxation limit cases of the model, including inviscid and viscous Newtonian fluids, as well as non-linear hyperelastic and elastoplastic solids.
- [222] arXiv:2403.19299 [pdf, other]
-
Title: Post Quantum Cryptography & its Comparison with Classical CryptographySubjects: Cryptography and Security (cs.CR)
Cryptography plays a pivotal role in safeguarding sensitive information and facilitating secure communication. Classical cryptography relies on mathematical computations, whereas quantum cryptography operates on the principles of quantum mechanics, offering a new frontier in secure communication. Quantum cryptographic systems introduce novel dimensions to security, capable of detecting and thwarting eavesdropping attempts. By contrasting quantum cryptography with its classical counterpart, it becomes evident how quantum mechanics revolutionizes the landscape of secure communication.
- [223] arXiv:2403.19302 [pdf, other]
-
Title: Generate then Retrieve: Conversational Response Retrieval Using LLMs as Answer and Query GeneratorsSubjects: Information Retrieval (cs.IR)
CIS is a prominent area in IR that focuses on developing interactive knowledge assistants. These systems must adeptly comprehend the user's information requirements within the conversational context and retrieve the relevant information. To this aim, the existing approaches model the user's information needs with one query called rewritten query and use this query for passage retrieval. In this paper, we propose three different methods for generating multiple queries to enhance the retrieval. In these methods, we leverage the capabilities of large language models (LLMs) in understanding the user's information need and generating an appropriate response, to generate multiple queries. We implement and evaluate the proposed models utilizing various LLMs including GPT-4 and Llama-2 chat in zero-shot and few-shot settings. In addition, we propose a new benchmark for TREC iKAT based on gpt 3.5 judgments. Our experiments reveal the effectiveness of our proposed models on the TREC iKAT dataset.
- [224] arXiv:2403.19303 [pdf, ps, other]
-
Title: Developing generative AI chatbots conceptual framework for higher educationAuthors: Joshua Ebere ChukwuereComments: 28 pagesSubjects: Computers and Society (cs.CY)
This research explores the quickly changing field of generative artificial intelligence (GAI) chatbots in higher education, an industry that is undergoing major technological changes. AI chatbots, such as ChatGPT, HuggingChat, and Google Bard, are becoming more and more common in a variety of sectors, including education. Their acceptance is still in its early phases, with a variety of prospects and obstacles. However, their potential in higher education is particularly noteworthy, providing lecturers and students with affordable, individualized support. Creating a comprehensive framework to aid the usage of generative AI chatbots in higher education institutions (HEIs) is the aim of this project. The Chukwuere Generative AI Chatbots Acceptance Model (CGAICAM) is the result of this study's synthesis of elements from well-known frameworks, including the TAM, UTAUT2, TPB, and others along with variables like optimism, innovativeness, discomfort, insecurity, and others. Using a research method that encompasses a comprehensive analysis of extant literature from databases such as IEEE, ACM, ScienceDirect, and Google Scholar, the study aims to comprehend the implications of AI Chatbots on higher education and pinpoint critical elements for their efficacious implementation. Peer-reviewed English-language publications published between 2020 and 2023 with a focus on the use of AI chatbots in higher education were the main focus of the search criteria. The results demonstrate how much AI chatbots can do to improve student engagement, streamline the educational process, and support administrative and research duties. But there are also clear difficulties, such as unfavorable student sentiments, doubts about the veracity of material produced by AI, and unease and nervousness with new technologies.
- [225] arXiv:2403.19305 [pdf, other]
-
Title: MATEval: A Multi-Agent Discussion Framework for Advancing Open-Ended Text EvaluationComments: This paper has been ACCEPTED as a LONG PAPER presentation by DASFAA 2024 Industrial TrackSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Recent advancements in generative Large Language Models(LLMs) have been remarkable, however, the quality of the text generated by these models often reveals persistent issues. Evaluating the quality of text generated by these models, especially in open-ended text, has consistently presented a significant challenge. Addressing this, recent work has explored the possibility of using LLMs as evaluators. While using a single LLM as an evaluation agent shows potential, it is filled with significant uncertainty and instability. To address these issues, we propose the MATEval: A "Multi-Agent Text Evaluation framework" where all agents are played by LLMs like GPT-4. The MATEval framework emulates human collaborative discussion methods, integrating multiple agents' interactions to evaluate open-ended text. Our framework incorporates self-reflection and Chain-of-Thought (CoT) strategies, along with feedback mechanisms, enhancing the depth and breadth of the evaluation process and guiding discussions towards consensus, while the framework generates comprehensive evaluation reports, including error localization, error types and scoring. Experimental results show that our framework outperforms existing open-ended text evaluation methods and achieves the highest correlation with human evaluation, which confirms the effectiveness and advancement of our framework in addressing the uncertainties and instabilities in evaluating LLMs-generated text. Furthermore, our framework significantly improves the efficiency of text evaluation and model iteration in industrial scenarios.
- [226] arXiv:2403.19306 [pdf, other]
-
Title: Sparse Generation: Making Pseudo Labels Sparse for weakly supervision with pointsSubjects: Computer Vision and Pattern Recognition (cs.CV)
In recent years, research on point weakly supervised object detection (PWSOD) methods in the field of computer vision has attracted people's attention. However, existing pseudo labels generation methods perform poorly in a small amount of supervised annotation data and dense object detection tasks. We consider the generation of weakly supervised pseudo labels as the result of model's sparse output, and propose a method called Sparse Generation to make pseudo labels sparse. It constructs dense tensors through the relationship between data and detector model, optimizes three of its parameters, and obtains a sparse tensor via coordinated calculation, thereby indirectly obtaining higher quality pseudo labels, and solving the model's density problem in the situation of only a small amount of supervised annotation data can be used. On two broadly used open-source datasets (RSOD, SIMD) and a self-built dataset (Bullet-Hole), the experimental results showed that the proposed method has a significant advantage in terms of overall performance metrics, comparing to that state-of-the-art method.
- [227] arXiv:2403.19309 [pdf, other]
-
Title: Improving performance of contour integral-based nonlinear eigensolvers with infinite GMRESSubjects: Numerical Analysis (math.NA)
In this work, the infinite GMRES algorithm, recently proposed by Correnty et al., is employed in contour integral-based nonlinear eigensolvers, avoiding the computation of costly factorizations at each quadrature node to solve the linear systems efficiently. Several techniques are applied to make the infinite GMRES memory-friendly, computationally efficient, and numerically stable in practice. More specifically, we analyze the relationship between polynomial eigenvalue problems and their scaled linearizations, and provide a novel weighting strategy which can significantly accelerate the convergence of infinite GMRES in this particular context. We also adopt the technique of TOAR to infinite GMRES to reduce the memory footprint. Theoretical analysis and numerical experiments are provided to illustrate the efficiency of the proposed algorithm.
- [228] arXiv:2403.19310 [pdf, other]
-
Title: MRNaB: Mixed Reality-based Robot Navigation Interface using Optical-see-through MR-beaconSubjects: Robotics (cs.RO)
Recent advancements in robotics have led to the development of numerous interfaces to enhance the intuitiveness of robot navigation. However, the reliance on traditional 2D displays imposes limitations on the simultaneous visualization of information. Mixed Reality (MR) technology addresses this issue by enhancing the dimensionality of information visualization, allowing users to perceive multiple pieces of information concurrently. This paper proposes Mixed reality-based robot navigation interface using an optical-see-through MR-beacon (MRNaB), a novel approach that incorporates an MR-beacon, situated atop the real-world environment, to function as a signal transmitter for robot navigation. This MR-beacon is designed to be persistent, eliminating the need for repeated navigation inputs for the same location. Our system is mainly constructed into four primary functions: "Add", "Move", "Delete", and "Select". These allow for the addition of a MR-beacon, location movement, its deletion, and the selection of MR-beacon for navigation purposes, respectively. The effectiveness of the proposed method was then validated through experiments by comparing it with the traditional 2D system. As the result, MRNaB was proven to increase the performance of the user when doing navigation to a certain place subjectively and objectively. For additional material, please check: https://mertcookimg.github.io/mrnab
- [229] arXiv:2403.19314 [pdf, other]
-
Title: Total-Decom: Decomposed 3D Scene Reconstruction with Minimal InteractionComments: 8 pages, 7 figures, accepted by CVPR 2024Subjects: Computer Vision and Pattern Recognition (cs.CV)
Scene reconstruction from multi-view images is a fundamental problem in computer vision and graphics. Recent neural implicit surface reconstruction methods have achieved high-quality results; however, editing and manipulating the 3D geometry of reconstructed scenes remains challenging due to the absence of naturally decomposed object entities and complex object/background compositions. In this paper, we present Total-Decom, a novel method for decomposed 3D reconstruction with minimal human interaction. Our approach seamlessly integrates the Segment Anything Model (SAM) with hybrid implicit-explicit neural surface representations and a mesh-based region-growing technique for accurate 3D object decomposition. Total-Decom requires minimal human annotations while providing users with real-time control over the granularity and quality of decomposition. We extensively evaluate our method on benchmark datasets and demonstrate its potential for downstream applications, such as animation and scene editing. The code is available at \href{https://github.com/CVMI-Lab/Total-Decom.git}{https://github.com/CVMI-Lab/Total-Decom.git}.
- [230] arXiv:2403.19316 [pdf, other]
-
Title: Hypergraph-based Multi-View Action Recognition using Event CamerasComments: Accepted by IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI 2024)Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Action recognition from video data forms a cornerstone with wide-ranging applications. Single-view action recognition faces limitations due to its reliance on a single viewpoint. In contrast, multi-view approaches capture complementary information from various viewpoints for improved accuracy. Recently, event cameras have emerged as innovative bio-inspired sensors, leading to advancements in event-based action recognition. However, existing works predominantly focus on single-view scenarios, leaving a gap in multi-view event data exploitation, particularly in challenges like information deficit and semantic misalignment. To bridge this gap, we introduce HyperMV, a multi-view event-based action recognition framework. HyperMV converts discrete event data into frame-like representations and extracts view-related features using a shared convolutional network. By treating segments as vertices and constructing hyperedges using rule-based and KNN-based strategies, a multi-view hypergraph neural network that captures relationships across viewpoint and temporal features is established. The vertex attention hypergraph propagation is also introduced for enhanced feature fusion. To prompt research in this area, we present the largest multi-view event-based action dataset $\text{THU}^{\text{MV-EACT}}\text{-50}$, comprising 50 actions from 6 viewpoints, which surpasses existing datasets by over tenfold. Experimental results show that HyperMV significantly outperforms baselines in both cross-subject and cross-view scenarios, and also exceeds the state-of-the-arts in frame-based multi-view action recognition.
- [231] arXiv:2403.19317 [pdf, other]
-
Title: Beyond Borders: Investigating Cross-Jurisdiction Transfer in Legal Case SummarizationComments: Accepted to NAACL 2024Subjects: Computation and Language (cs.CL)
Legal professionals face the challenge of managing an overwhelming volume of lengthy judgments, making automated legal case summarization crucial. However, prior approaches mainly focused on training and evaluating these models within the same jurisdiction. In this study, we explore the cross-jurisdictional generalizability of legal case summarization models.Specifically, we explore how to effectively summarize legal cases of a target jurisdiction where reference summaries are not available. In particular, we investigate whether supplementing models with unlabeled target jurisdiction corpus and extractive silver summaries obtained from unsupervised algorithms on target data enhances transfer performance. Our comprehensive study on three datasets from different jurisdictions highlights the role of pre-training in improving transfer performance. We shed light on the pivotal influence of jurisdictional similarity in selecting optimal source datasets for effective transfer. Furthermore, our findings underscore that incorporating unlabeled target data yields improvements in general pre-trained models, with additional gains when silver summaries are introduced. This augmentation is especially valuable when dealing with extractive datasets and scenarios featuring limited alignment between source and target jurisdictions. Our study provides key insights for developing adaptable legal case summarization systems, transcending jurisdictional boundaries.
- [232] arXiv:2403.19318 [pdf, other]
-
Title: TableLLM: Enabling Tabular Data Manipulation by LLMs in Real Office Usage ScenariosAuthors: Xiaokang Zhang, Jing Zhang, Zeyao Ma, Yang Li, Bohan Zhang, Guanlin Li, Zijun Yao, Kangli Xu, Jinchang Zhou, Daniel Zhang-Li, Jifan Yu, Shu Zhao, Juanzi Li, Jie TangComments: 30 pagesSubjects: Computation and Language (cs.CL)
We introduce TableLLM, a robust large language model (LLM) with 13 billion parameters, purpose-built for proficiently handling tabular data manipulation tasks, whether they are embedded within documents or spreadsheets, catering to real-world office scenarios. We propose a distant supervision method for training, which comprises a reasoning process extension strategy, aiding in training LLMs to understand reasoning patterns more effectively as well as a cross-way validation strategy, ensuring the quality of the automatically generated data. To evaluate the performance of TableLLM, we have crafted a benchmark tailored to address both document and spreadsheet formats as well as constructed a well-organized evaluation pipeline capable of handling both scenarios. Thorough evaluations underscore the advantages of TableLLM when compared to various existing general-purpose and tabular data-focused LLMs. We have publicly released the model checkpoint, source code, benchmarks, and a web application for user interaction.
- [233] arXiv:2403.19319 [pdf, other]
-
Title: Mesh2NeRF: Direct Mesh Supervision for Neural Radiance Field Representation and GenerationAuthors: Yujin Chen, Yinyu Nie, Benjamin Ummenhofer, Reiner Birkl, Michael Paulitsch, Matthias Müller, Matthias NießnerSubjects: Computer Vision and Pattern Recognition (cs.CV)
We present Mesh2NeRF, an approach to derive ground-truth radiance fields from textured meshes for 3D generation tasks. Many 3D generative approaches represent 3D scenes as radiance fields for training. Their ground-truth radiance fields are usually fitted from multi-view renderings from a large-scale synthetic 3D dataset, which often results in artifacts due to occlusions or under-fitting issues. In Mesh2NeRF, we propose an analytic solution to directly obtain ground-truth radiance fields from 3D meshes, characterizing the density field with an occupancy function featuring a defined surface thickness, and determining view-dependent color through a reflection function considering both the mesh and environment lighting. Mesh2NeRF extracts accurate radiance fields which provides direct supervision for training generative NeRFs and single scene representation. We validate the effectiveness of Mesh2NeRF across various tasks, achieving a noteworthy 3.12dB improvement in PSNR for view synthesis in single scene representation on the ABO dataset, a 0.69 PSNR enhancement in the single-view conditional generation of ShapeNet Cars, and notably improved mesh extraction from NeRF in the unconditional generation of Objaverse Mugs.
- [234] arXiv:2403.19322 [pdf, other]
-
Title: Plug-and-Play Grounding of Reasoning in Multimodal Large Language ModelsComments: 14 pages, 3 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
The surge of Multimodal Large Language Models (MLLMs), given their prominent emergent capabilities in instruction following and reasoning, has greatly advanced the field of visual reasoning. However, constrained by their non-lossless image tokenization, most MLLMs fall short of comprehensively capturing details of text and objects, especially in high-resolution images. To address this, we propose P2G, a novel framework for plug-and-play grounding of reasoning in MLLMs. Specifically, P2G exploits the tool-usage potential of MLLMs to employ expert agents to achieve on-the-fly grounding to critical visual and textual objects of image, thus achieving deliberate reasoning via multimodal prompting. We further create P2GB, a benchmark aimed at assessing MLLMs' ability to understand inter-object relationships and text in challenging high-resolution images. Comprehensive experiments on visual reasoning tasks demonstrate the superiority of P2G. Noteworthy, P2G achieved comparable performance with GPT-4V on P2GB, with a 7B backbone. Our work highlights the potential of plug-and-play grounding of reasoning and opens up a promising alternative beyond model scaling.
- [235] arXiv:2403.19326 [pdf, other]
-
Title: MedBN: Robust Test-Time Adaptation against Malicious Test SamplesComments: Accepted to CVPR 2024Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV)
Test-time adaptation (TTA) has emerged as a promising solution to address performance decay due to unforeseen distribution shifts between training and test data. While recent TTA methods excel in adapting to test data variations, such adaptability exposes a model to vulnerability against malicious examples, an aspect that has received limited attention. Previous studies have uncovered security vulnerabilities within TTA even when a small proportion of the test batch is maliciously manipulated. In response to the emerging threat, we propose median batch normalization (MedBN), leveraging the robustness of the median for statistics estimation within the batch normalization layer during test-time inference. Our method is algorithm-agnostic, thus allowing seamless integration with existing TTA frameworks. Our experimental results on benchmark datasets, including CIFAR10-C, CIFAR100-C and ImageNet-C, consistently demonstrate that MedBN outperforms existing approaches in maintaining robust performance across different attack scenarios, encompassing both instant and cumulative attacks. Through extensive experiments, we show that our approach sustains the performance even in the absence of attacks, achieving a practical balance between robustness and performance.
- [236] arXiv:2403.19328 [pdf, ps, other]
-
Title: Complex generalized Gauss-Radau quadrature rules for Hankel transforms of integer orderComments: 24 pagesSubjects: Numerical Analysis (math.NA); Classical Analysis and ODEs (math.CA)
Complex Gaussian quadrature rules for oscillatory integral transforms have the advantage that they can achieve optimal asymptotic order. However, their existence for Hankel transform can only be guaranteed when the order of the transform belongs to $[0,1/2]$. In this paper we consider the construction of generalized Gauss-Radau quadrature rules for Hankel transform. We show that, if adding certain value and derivative information at the left endpoint, then complex generalized Gauss-Radau quadrature rules for Hankel transform of integer order can be constructed with theoretical guarantees. Orthogonal polynomials that are closely related to such quadrature rules are investigated and their existence for even degrees is proved. Numerical experiments are presented to confirm our findings.
- [237] arXiv:2403.19329 [pdf, other]
-
Title: Simulating Relational Event Histories -- Why and HowSubjects: Social and Information Networks (cs.SI); Methodology (stat.ME)
Many important social phenomena result from repeated interactions among individuals over time such as email exchanges in an organization, or face-to-face interactions in a classroom. Insights into the mechanisms underlying the dynamics of these interactions can be achieved through simulations of networks on a fine temporal granularity. In this paper, we present statistical frameworks to simulate relational event networks under dyadic and actor-oriented relational event models. These simulators have a broad applicability in temporal social network research such as model fit assessment, theory building, network intervention planning, making predictions, understanding the impact of network structures, to name a few. We show this in three extensive applications. First, it is shown why simulation-based techniques are crucial for relational event model assessment, for example to investigate how past events affect future interactions in the network. Second, we demonstrate how simulation techniques contribute to a better understanding of the longevity of network interventions. Third, we show how simulation techniques are important when building and extending theories about social phenomena such as understanding social identity dynamics using optimal distinctiveness theory.
- [238] arXiv:2403.19332 [pdf, other]
-
Title: Learning a Formally Verified Control Barrier Function in Stochastic EnvironmentComments: 8 pages, 3 figuresSubjects: Robotics (cs.RO)
Safety is a fundamental requirement of control systems. Control Barrier Functions (CBFs) are proposed to ensure the safety of the control system by constructing safety filters or synthesizing control inputs. However, the safety guarantee and performance of safe controllers rely on the construction of valid CBFs. Inspired by universal approximatability, CBFs are represented by neural networks, known as neural CBFs (NCBFs). This paper presents an algorithm for synthesizing formally verified continuous-time neural Control Barrier Functions in stochastic environments in a single step. The proposed training process ensures efficacy across the entire state space with only a finite number of data points by constructing a sample-based learning framework for Stochastic Neural CBFs (SNCBFs). Our methodology eliminates the need for post hoc verification by enforcing Lipschitz bounds on the neural network, its Jacobian, and Hessian terms. We demonstrate the effectiveness of our approach through case studies on the inverted pendulum system and obstacle avoidance in autonomous driving, showcasing larger safe regions compared to baseline methods.
- [239] arXiv:2403.19334 [pdf, other]
-
Title: Test-Time Domain Generalization for Face Anti-SpoofingComments: Accepted to IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024Subjects: Computer Vision and Pattern Recognition (cs.CV)
Face Anti-Spoofing (FAS) is pivotal in safeguarding facial recognition systems against presentation attacks. While domain generalization (DG) methods have been developed to enhance FAS performance, they predominantly focus on learning domain-invariant features during training, which may not guarantee generalizability to unseen data that differs largely from the source distributions. Our insight is that testing data can serve as a valuable resource to enhance the generalizability beyond mere evaluation for DG FAS. In this paper, we introduce a novel Test-Time Domain Generalization (TTDG) framework for FAS, which leverages the testing data to boost the model's generalizability. Our method, consisting of Test-Time Style Projection (TTSP) and Diverse Style Shifts Simulation (DSSS), effectively projects the unseen data to the seen domain space. In particular, we first introduce the innovative TTSP to project the styles of the arbitrarily unseen samples of the testing distribution to the known source space of the training distributions. We then design the efficient DSSS to synthesize diverse style shifts via learnable style bases with two specifically designed losses in a hyperspherical feature space. Our method eliminates the need for model updates at the test time and can be seamlessly integrated into not only the CNN but also ViT backbones. Comprehensive experiments on widely used cross-domain FAS benchmarks demonstrate our method's state-of-the-art performance and effectiveness.
- [240] arXiv:2403.19335 [pdf, other]
-
Title: KazSAnDRA: Kazakh Sentiment Analysis Dataset of Reviews and AttitudesSubjects: Computation and Language (cs.CL)
This paper presents KazSAnDRA, a dataset developed for Kazakh sentiment analysis that is the first and largest publicly available dataset of its kind. KazSAnDRA comprises an extensive collection of 180,064 reviews obtained from various sources and includes numerical ratings ranging from 1 to 5, providing a quantitative representation of customer attitudes. The study also pursued the automation of Kazakh sentiment classification through the development and evaluation of four machine learning models trained for both polarity classification and score classification. Experimental analysis included evaluation of the results considering both balanced and imbalanced scenarios. The most successful model attained an F1-score of 0.81 for polarity classification and 0.39 for score classification on the test sets. The dataset and fine-tuned models are open access and available for download under the Creative Commons Attribution 4.0 International License (CC BY 4.0) through our GitHub repository.
- [241] arXiv:2403.19336 [pdf, other]
-
Title: IVLMap: Instance-Aware Visual Language Grounding for Consumer Robot NavigationSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Vision-and-Language Navigation (VLN) is a challenging task that requires a robot to navigate in photo-realistic environments with human natural language promptings. Recent studies aim to handle this task by constructing the semantic spatial map representation of the environment, and then leveraging the strong ability of reasoning in large language models for generalizing code for guiding the robot navigation. However, these methods face limitations in instance-level and attribute-level navigation tasks as they cannot distinguish different instances of the same object. To address this challenge, we propose a new method, namely, Instance-aware Visual Language Map (IVLMap), to empower the robot with instance-level and attribute-level semantic mapping, where it is autonomously constructed by fusing the RGBD video data collected from the robot agent with special-designed natural language map indexing in the bird's-in-eye view. Such indexing is instance-level and attribute-level. In particular, when integrated with a large language model, IVLMap demonstrates the capability to i) transform natural language into navigation targets with instance and attribute information, enabling precise localization, and ii) accomplish zero-shot end-to-end navigation tasks based on natural language commands. Extensive navigation experiments are conducted. Simulation results illustrate that our method can achieve an average improvement of 14.4\% in navigation accuracy. Code and demo are released at https://ivlmap.github.io/.
- [242] arXiv:2403.19339 [pdf, other]
-
Title: An Interactive Human-Machine Learning Interface for Collecting and Learning from Complex AnnotationsComments: 4 pages, 2 figures, Submitted to IJCAI 2024 Demonstration TrackSubjects: Machine Learning (cs.LG); Human-Computer Interaction (cs.HC)
Human-Computer Interaction has been shown to lead to improvements in machine learning systems by boosting model performance, accelerating learning and building user confidence. In this work, we aim to alleviate the expectation that human annotators adapt to the constraints imposed by traditional labels by allowing for extra flexibility in the form that supervision information is collected. For this, we propose a human-machine learning interface for binary classification tasks which enables human annotators to utilise counterfactual examples to complement standard binary labels as annotations for a dataset. Finally we discuss the challenges in future extensions of this work.
- [243] arXiv:2403.19340 [pdf, other]
-
Title: Dataverse: Open-Source ETL (Extract, Transform, Load) Pipeline for Large Language ModelsSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
To address the challenges associated with data processing at scale, we propose Dataverse, a unified open-source Extract-Transform-Load (ETL) pipeline for large language models (LLMs) with a user-friendly design at its core. Easy addition of custom processors with block-based interface in Dataverse allows users to readily and efficiently use Dataverse to build their own ETL pipeline. We hope that Dataverse will serve as a vital tool for LLM development and open source the entire library to welcome community contribution. Additionally, we provide a concise, two-minute video demonstration of our system, illustrating its capabilities and implementation.
- [244] arXiv:2403.19342 [pdf, other]
-
Title: An efficient multiscale multigrid preconditioner for Darcy flow in high-contrast mediaSubjects: Numerical Analysis (math.NA)
In this paper, we develop a multigrid preconditioner to solve Darcy flow in highly heterogeneous porous media. The key component of the preconditioner is to construct a sequence of nested subspaces $W_{\mathcal{L}}\subset W_{\mathcal{L}-1}\subset\cdots\subset W_1=W_h$. An appropriate spectral problem is defined in the space of $W_{i-1}$, then the eigenfunctions of the spectral problems are utilized to form $W_i$. The preconditioner is applied to solve a positive semidefinite linear system which results from discretizing the Darcy flow equation with the lowest order Raviart-Thomas spaces and adopting a trapezoidal quadrature rule. Theoretical analysis and numerical investigations of this preconditioner will be presented. In particular, we will consider several typical highly heterogeneous permeability fields whose resolutions are up to $1024^3$ and examine the computational performance of the preconditioner in several aspects, such as strong scalability, weak scalability, and robustness against the contrast of the media. We also demonstrate an application of this preconditioner for solving a two-phase flow benchmark problem.
- [245] arXiv:2403.19344 [pdf, ps, other]
-
Title: Gain-Only Neural Operator Approximators of PDE Backstepping ControllersComments: Preprint submitted to ECC 2024 (full 8-page version containing proofs)Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC)
For the recently introduced deep learning-powered approach to PDE backstepping control, we present an advancement applicable across all the results developed thus far: approximating the control gain function only (a function of one variable), rather than the entire kernel function of the backstepping transformation (a function of two variables). We introduce this idea on a couple benchmark (unstable) PDEs, hyperbolic and parabolic. We alter the approach of quantifying the effect of the approximation error by replacing a backstepping transformation that employs the approximated kernel (suitable for adaptive control) by a transformation that employs the exact kernel (suitable for gain scheduling). A major simplification in the target system arises, with the perturbation due to the approximation shifting from the domain to the boundary condition. This results in a significant difference in the Lyapunov analysis, which nevertheless results in a guarantee of the stability being retained with the simplified approximation approach. The approach of approximating only the control gain function simplifies the operator being approximated and the training of its neural approximation, with an expected reduction in the neural network size. The price for the savings in approximation is paid through a somewhat more intricate Lyapunov analysis, in higher Sobolev spaces for some PDEs, as well as some restrictions on initial conditions that result from higher Sobolev spaces. While the proposed approach appears inapplicable to uses in adaptive control, it is almost certainly applicable in gain scheduling applications of neural operator-approximated PDE backstepping controllers.
- [246] arXiv:2403.19345 [pdf, ps, other]
-
Title: Intelligent Classification and Personalized Recommendation of E-commerce Products Based on Machine LearningSubjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)
With the rapid evolution of the Internet and the exponential proliferation of information, users encounter information overload and the conundrum of choice. Personalized recommendation systems play a pivotal role in alleviating this burden by aiding users in filtering and selecting information tailored to their preferences and requirements. Such systems not only enhance user experience and satisfaction but also furnish opportunities for businesses and platforms to augment user engagement, sales, and advertising efficacy.This paper undertakes a comparative analysis between the operational mechanisms of traditional e-commerce commodity classification systems and personalized recommendation systems. It delineates the significance and application of personalized recommendation systems across e-commerce, content information, and media domains. Furthermore, it delves into the challenges confronting personalized recommendation systems in e-commerce, including data privacy, algorithmic bias, scalability, and the cold start problem. Strategies to address these challenges are elucidated.Subsequently, the paper outlines a personalized recommendation system leveraging the BERT model and nearest neighbor algorithm, specifically tailored to address the exigencies of the eBay e-commerce platform. The efficacy of this recommendation system is substantiated through manual evaluation, and a practical application operational guide and structured output recommendation results are furnished to ensure the system's operability and scalability.
- [247] arXiv:2403.19346 [pdf, other]
-
Title: Large Language Models Are Unconscious of Unreasonability in Math ProblemsComments: 12 pages, 4 figuresSubjects: Computation and Language (cs.CL)
Large language models (LLMs) demonstrate substantial capabilities in solving math problems. However, they tend to produce hallucinations when given questions containing unreasonable errors. In this paper, we study the behavior of LLMs when faced with unreasonable math problems and further explore their potential to address these problems. First, we construct the Unreasonable Math Problem (UMP) benchmark to examine the error detection ability of LLMs. Experiments show that LLMs are able to detect unreasonable errors, but still fail in generating non-hallucinatory content. In order to improve their ability of error detection and correction, we further design a strategic prompt template called Critical Calculation and Conclusion(CCC). With CCC, LLMs can better self-evaluate and detect unreasonable errors in math questions, making them more reliable and safe in practical application scenarios.
- [248] arXiv:2403.19347 [pdf, other]
-
Title: Breaking the Length Barrier: LLM-Enhanced CTR Prediction in Long Textual User BehaviorsAuthors: Binzong Geng, Zhaoxin Huan, Xiaolu Zhang, Yong He, Liang Zhang, Fajie Yuan, Jun Zhou, Linjian MoComments: Accepted by the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2024Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)
With the rise of large language models (LLMs), recent works have leveraged LLMs to improve the performance of click-through rate (CTR) prediction. However, we argue that a critical obstacle remains in deploying LLMs for practical use: the efficiency of LLMs when processing long textual user behaviors. As user sequences grow longer, the current efficiency of LLMs is inadequate for training on billions of users and items. To break through the efficiency barrier of LLMs, we propose Behavior Aggregated Hierarchical Encoding (BAHE) to enhance the efficiency of LLM-based CTR modeling. Specifically, BAHE proposes a novel hierarchical architecture that decouples the encoding of user behaviors from inter-behavior interactions. Firstly, to prevent computational redundancy from repeated encoding of identical user behaviors, BAHE employs the LLM's pre-trained shallow layers to extract embeddings of the most granular, atomic user behaviors from extensive user sequences and stores them in the offline database. Subsequently, the deeper, trainable layers of the LLM facilitate intricate inter-behavior interactions, thereby generating comprehensive user embeddings. This separation allows the learning of high-level user representations to be independent of low-level behavior encoding, significantly reducing computational complexity. Finally, these refined user embeddings, in conjunction with correspondingly processed item embeddings, are incorporated into the CTR model to compute the CTR scores. Extensive experimental results show that BAHE reduces training time and memory by five times for CTR models using LLMs, especially with longer user sequences. BAHE has been deployed in a real-world system, allowing for daily updates of 50 million CTR data on 8 A100 GPUs, making LLMs practical for industrial CTR prediction.
- [249] arXiv:2403.19348 [pdf, ps, other]
-
Title: Efficient Anchor Point Deployment for Low Latency Connectivity in MEC-Assisted C-V2X ScenariosAuthors: Pablo Fondo-Ferreiro, Felipe Gil-Castiñeira, Francisco Javier González-Castaño, David Candal-Ventureira, Jonathan Rodriguez, Antonio J. Morgado, Shahid MumtazComments: Article published in IEEE Transactions on Vehicular TechnologyJournal-ref: IEEE Transactions on Vehicular Technology, vol. 72, no. 12, pp. 16637 - 16649, December 2023Subjects: Networking and Internet Architecture (cs.NI)
Next-generation cellular networks will play a key role in the evolution of different vertical industries. Low latency will be a major requirement in many related uses cases. This requirement is specially challenging in scenarios with high mobility of end devices, such as vehicular communications. The Multi-Access Edge Computing (MEC) paradigm seeks to satisfy it. In this article we propose the dynamic deployment of anchor point network functions at edge locations and the assignment of terminals to these anchor points with the joint objective of minimizing communications latency and reducing network overhead. We formally define the problem as a multi-objective optimization and also propose a novel heuristic greedy algorithm for approximating the solution. This algorithm compares favorably with baseline and state-of-the-art strategies for latency minimization while reducing the overhead caused by network reconfigurations.
- [250] arXiv:2403.19352 [pdf, other]
-
Title: A diverse Multilingual News Headlines Dataset from around the WorldComments: Published in NAACL 2024 Proceedings (Short Paper track)Subjects: Computation and Language (cs.CL)
Babel Briefings is a novel dataset featuring 4.7 million news headlines from August 2020 to November 2021, across 30 languages and 54 locations worldwide with English translations of all articles included. Designed for natural language processing and media studies, it serves as a high-quality dataset for training or evaluating language models as well as offering a simple, accessible collection of articles, for example, to analyze global news coverage and cultural narratives. As a simple demonstration of the analyses facilitated by this dataset, we use a basic procedure using a TF-IDF weighted similarity metric to group articles into clusters about the same event. We then visualize the \emph{event signatures} of the event showing articles of which languages appear over time, revealing intuitive features based on the proximity of the event and unexpectedness of the event. The dataset is available on \href{https://www.kaggle.com/datasets/felixludos/babel-briefings}{Kaggle} and \href{https://huggingface.co/datasets/felixludos/babel-briefings}{HuggingFace} with accompanying \href{https://github.com/felixludos/babel-briefings}{GitHub} code.
- [251] arXiv:2403.19353 [pdf, ps, other]
-
Title: A Software-Defined Networking Solution for Interconnecting Network Functions in Service-Based ArchitecturesAuthors: Pablo Fondo-Ferreiro, Felipe Gil-Castiñeira, Francisco Javier González-Castaño, David Candal-VentureiraComments: Article published in IEEE AccessJournal-ref: IEEE Access, vol. 10, pp. 19905-19916, 2022Subjects: Networking and Internet Architecture (cs.NI)
Mobile core networks handle critical control functions for delivering services in modern cellular networks. Traditional point-to-point architectures, where network functions are directly connected through standardized interfaces, are being substituted by service-based architectures (SBAs), where core functionalities are finer-grained microservices decoupled from the underlying infrastructure. In this way, network functions and services can be distributed, with scaling and fail-over mechanisms, and can be dynamically deployed, updated, or removed to support slicing. A myriad of network functions can be deployed or removed according to traffic flows, thereby increasing the complexity of connection management. In this context, 3GPP Release 16 defines the service communication proxy (SCP) as a unified communication interface for a set of network functions. In this paper, we propose a novel software-defined networking (SDN)-based solution with the same role for a service mesh architecture where network functions can be deployed anywhere in the infrastructure. We demonstrated its efficiency in comparison with alternative architectures.
- [252] arXiv:2403.19354 [pdf, other]
-
Title: AIpom at SemEval-2024 Task 8: Detecting AI-produced Outputs in M4Comments: 2nd place at SemEval-2024 Task 8, Subtask C, to appear in SemEval-2024 proceedingsSubjects: Computation and Language (cs.CL)
This paper describes AIpom, a system designed to detect a boundary between human-written and machine-generated text (SemEval-2024 Task 8, Subtask C: Human-Machine Mixed Text Detection). We propose a two-stage pipeline combining predictions from an instruction-tuned decoder-only model and encoder-only sequence taggers. AIpom is ranked second on the leaderboard while achieving a Mean Absolute Error of 15.94. Ablation studies confirm the benefits of pipelining encoder and decoder models, particularly in terms of improved performance.
- [253] arXiv:2403.19355 [pdf, ps, other]
-
Title: Artificial Intelligence (AI) Based Prediction of Mortality, for COVID-19 PatientsAuthors: Mahbubunnabi Tamala, Mohammad Marufur Rahmanb, Maryam Alhasimc, Mobarak Al Mulhimd, Mohamed DericheeComments: Submitted to Biocybernetics and Biomedical Engineering, 22 March, 2024Subjects: Machine Learning (cs.LG)
For severely affected COVID-19 patients, it is crucial to identify high-risk patients and predict survival and need for intensive care (ICU). Most of the proposed models are not well reported making them less reproducible and prone to high risk of bias particularly in presence of imbalance data/class. In this study, the performances of nine machine and deep learning algorithms in combination with two widely used feature selection methods were investigated to predict last status representing mortality, ICU requirement, and ventilation days. Fivefold cross-validation was used for training and validation purposes. To minimize bias, the training and testing sets were split maintaining similar distributions. Only 10 out of 122 features were found to be useful in prediction modelling with Acute kidney injury during hospitalization feature being the most important one. The algorithms performances depend on feature numbers and data pre-processing techniques. LSTM performs the best in predicting last status and ICU requirement with 90%, 92%, 86% and 95% accuracy, sensitivity, specificity, and AUC respectively. DNN performs the best in predicting Ventilation days with 88% accuracy. Considering all the factors and limitations including absence of exact time point of clinical onset, LSTM with carefully selected features can accurately predict last status and ICU requirement. DNN performs the best in predicting Ventilation days. Appropriate machine learning algorithm with carefully selected features and balance data can accurately predict mortality, ICU requirement and ventilation support. Such model can be very useful in emergency and pandemic where prompt and precise
- [254] arXiv:2403.19356 [pdf, other]
-
Title: A robust two-level overlapping preconditioner for Darcy flow in high-contrast mediaSubjects: Numerical Analysis (math.NA)
In this article, a two-level overlapping domain decomposition preconditioner is developed for solving linear algebraic systems obtained from simulating Darcy flow in high-contrast media. Our preconditioner starts at a mixed finite element method for discretizing the partial differential equation by Darcy's law with the no-flux boundary condition and is then followed by a velocity elimination technique to yield a linear algebraic system with only unknowns of pressure. Then, our main objective is to design a robust and efficient domain decomposition preconditioner for this system, which is accomplished by engineering a multiscale coarse space that is capable of characterizing high-contrast features of the permeability field. A generalized eigenvalue problem is solved in each non-overlapping coarse element in a communication-free manner to form the global solver, which is accompanied by local solvers originated from additive Schwarz methods but with a non-Galerkin discretization to derive the two-level preconditioner. We provide a rigorous analysis that indicates that the condition number of the preconditioned system could be bounded above with several assumptions. Extensive numerical experiments with various types of three-dimensional high-contrast models are exhibited. In particular, we study the robustness against the contrast of the media as well as the influences of numbers of eigenfunctions, oversampling sizes, and subdomain partitions on the efficiency of the proposed preconditioner. Besides, strong and weak scalability performances are also examined.
- [255] arXiv:2403.19358 [pdf, other]
-
Title: Risk prediction of pathological gambling on social mediaSubjects: Computation and Language (cs.CL)
This paper addresses the problem of risk prediction on social media data, specifically focusing on the classification of Reddit users as having a pathological gambling disorder. To tackle this problem, this paper focuses on incorporating temporal and emotional features into the model. The preprocessing phase involves dealing with the time irregularity of posts by padding sequences. Two baseline architectures are used for preliminary evaluation: BERT classifier on concatenated posts per user and GRU with LSTM on sequential data. Experimental results demonstrate that the sequential models outperform the concatenation-based model. The results of the experiments conclude that the incorporation of a time decay layer (TD) and passing the emotion classification layer (EmoBERTa) through LSTM improves the performance significantly. Experiments concluded that the addition of a self-attention layer didn't significantly improve the performance of the model, however provided easily interpretable attention scores. The developed architecture with the inclusion of EmoBERTa and TD layers achieved a high F1 score, beating existing benchmarks on pathological gambling dataset. Future work may involve the early prediction of risk factors associated with pathological gambling disorder and testing models on other datasets. Overall, this research highlights the significance of the sequential processing of posts including temporal and emotional features to boost the predictive power, as well as adding an attention layer for interpretability.
- [256] arXiv:2403.19359 [pdf, ps, other]
-
Title: Coordinated Allocation of Radio Resources to Wi-Fi and Cellular Technologies in Shared Unlicensed FrequenciesAuthors: David Candal-Ventureira, Francisco Javier González-Castaño, Felipe Gil-Castiñeira, Pablo Fondo-FerreiroComments: Article published in IEEE AccessJournal-ref: IEEE Access, vol. 9, pp. 134435-134456, 2021Subjects: Networking and Internet Architecture (cs.NI)
Wireless connectivity is essential for industrial production processes and workflow management. Moreover, the connectivity requirements of industrial devices, which are usually long-term investments, are diverse and require different radio interfaces. In this regard, the 3GPP has studied how to support heterogeneous radio access technologies (RATs) such as Wi-Fi and unlicensed cellular technologies in 5G core networks. In some cases, these technologies coexist in the same spectrum. Dynamic spectrum sharing (DSS), which has already been proven to increase spectrum efficiency in licensed bands, can also be applied to this scenario. In this paper, we propose two solutions for mobile network operators (MNOs) or service providers to dynamically divide (multiplex) the radio resources of a shared channel between a Wi-Fi basic service set (BSS) and one or several carriers of scheduled wireless networks, such as cellular technologies, with a configurable level of sharing granularity. These solutions do not require modifications to the current commercial off-the-shelf (COTS) end devices. We adapt the existing IEEE 802.11 procedures to notify the Wi-Fi stations that they must share channels with different access networks. We demonstrate that our dynamic sharing proposals are also advantageous over direct coexistence and evaluate each of them quantitatively and qualitatively to determine when one or the other is preferable. The evaluation is particularized for IEEE 802.11ac and long-term evolution (LTE) license assisted access (LAA), but the solutions can be easily extended to 5G new radio-unlicensed (5G NR-U) or to any other wireless technology in which the network side schedules end device transmissions.
- [257] arXiv:2403.19365 [pdf, other]
-
Title: EthioMT: Parallel Corpus for Low-resource Ethiopian LanguagesComments: Accepted at The Fifth workshop on Resources for African Indigenous Languages (RAIL) 2024 ( LREC-COLING 2024)Subjects: Computation and Language (cs.CL)
Recent research in natural language processing (NLP) has achieved impressive performance in tasks such as machine translation (MT), news classification, and question-answering in high-resource languages. However, the performance of MT leaves much to be desired for low-resource languages. This is due to the smaller size of available parallel corpora in these languages, if such corpora are available at all. NLP in Ethiopian languages suffers from the same issues due to the unavailability of publicly accessible datasets for NLP tasks, including MT. To help the research community and foster research for Ethiopian languages, we introduce EthioMT -- a new parallel corpus for 15 languages. We also create a new benchmark by collecting a dataset for better-researched languages in Ethiopia. We evaluate the newly collected corpus and the benchmark dataset for 23 Ethiopian languages using transformer and fine-tuning approaches.
- [258] arXiv:2403.19366 [pdf, other]
-
Title: Infrared Small Target Detection with Scale and Location SensitivityComments: Accepted by CVPR 2024Subjects: Computer Vision and Pattern Recognition (cs.CV)
Recently, infrared small target detection (IRSTD) has been dominated by deep-learning-based methods. However, these methods mainly focus on the design of complex model structures to extract discriminative features, leaving the loss functions for IRSTD under-explored. For example, the widely used Intersection over Union (IoU) and Dice losses lack sensitivity to the scales and locations of targets, limiting the detection performance of detectors. In this paper, we focus on boosting detection performance with a more effective loss but a simpler model structure. Specifically, we first propose a novel Scale and Location Sensitive (SLS) loss to handle the limitations of existing losses: 1) for scale sensitivity, we compute a weight for the IoU loss based on target scales to help the detector distinguish targets with different scales: 2) for location sensitivity, we introduce a penalty term based on the center points of targets to help the detector localize targets more precisely. Then, we design a simple Multi-Scale Head to the plain U-Net (MSHNet). By applying SLS loss to each scale of the predictions, our MSHNet outperforms existing state-of-the-art methods by a large margin. In addition, the detection performance of existing detectors can be further improved when trained with our SLS loss, demonstrating the effectiveness and generalization of our SLS loss. The code is available at https://github.com/ying-fu/MSHNet.
- [259] arXiv:2403.19368 [pdf, other]
-
Title: Cloudy with a Chance of Cyberattacks: Dangling Resources Abuse on Cloud PlatformsAuthors: Jens Frieß (1 and 3), Tobias Gattermayer (1 and 2), Nethanel Gelernter (4), Haya Schulmann (1 and 5), Michael Waidner (1, 2 and 3) ((1) National Research Center for Applied Cybersecurity ATHENE, (2) Fraunhofer Institute for Secure Information Technology SIT, (3) Technische Universität Darmstadt, (4) IONIX, (5) Goethe-Universität Frankfurt)Comments: 17 pages, 29 figures, to be published in NSDI'24: Proceedings of the 21st USENIX Symposium on Networked Systems Design and ImplementationSubjects: Networking and Internet Architecture (cs.NI); Cryptography and Security (cs.CR)
Recent works showed that it is feasible to hijack resources on cloud platforms. In such hijacks, attackers can take over released resources that belong to legitimate organizations. It was proposed that adversaries could abuse these resources to carry out attacks against customers of the hijacked services, e.g., through malware distribution. However, to date, no research has confirmed the existence of these attacks. We identify, for the first time, real-life hijacks of cloud resources. This yields a number of surprising and important insights. First, contrary to previous assumption that attackers primarily target IP addresses, our findings reveal that the type of resource is not the main consideration in a hijack. Attackers focus on hijacking records that allow them to determine the resource by entering freetext. The costs and overhead of hijacking such records are much lower than those of hijacking IP addresses, which are randomly selected from a large pool. Second, identifying hijacks poses a substantial challenge. Monitoring resource changes, e.g., changes in content, is insufficient, since such changes could also be legitimate. Retrospective analysis of digital assets to identify hijacks is also arduous due to the immense volume of data involved and the absence of indicators to search for. To address this challenge, we develop a novel approach that involves analyzing data from diverse sources to effectively differentiate between malicious and legitimate modifications. Our analysis has revealed 20,904 instances of hijacked resources on popular cloud platforms. While some hijacks are short-lived (up to 15 days), 1/3 persist for more than 65 days. We study how attackers abuse the hijacked resources and find that, in contrast to the threats considered in previous work, the majority of the abuse (75%) is blackhat search engine optimization.
- [260] arXiv:2403.19369 [pdf, other]
-
Title: RAIL: Robot Affordance Imagination with Large Language ModelsSubjects: Robotics (cs.RO)
This paper introduces an automatic affordance reasoning paradigm tailored to minimal semantic inputs, addressing the critical challenges of classifying and manipulating unseen classes of objects in household settings. Inspired by human cognitive processes, our method integrates generative language models and physics-based simulators to foster analytical thinking and creative imagination of novel affordances. Structured with a tripartite framework consisting of analysis, imagination, and evaluation, our system "analyzes" the requested affordance names into interaction-based definitions, "imagines" the virtual scenarios, and "evaluates" the object affordance. If an object is recognized as possessing the requested affordance, our method also predicts the optimal pose for such functionality, and how a potential user can interact with it. Tuned on only a few synthetic examples across 3 affordance classes, our pipeline achieves a very high success rate on affordance classification and functional pose prediction of 8 classes of novel objects, outperforming learning-based baselines. Validation through real robot manipulating experiments demonstrates the practical applicability of the imagined user interaction, showcasing the system's ability to independently conceptualize unseen affordances and interact with new objects and scenarios in everyday settings.
- [261] arXiv:2403.19371 [pdf, other]
-
Title: Cell Electropermeabilization Modeling via Multiple Traces Formulation and Time Semi-Implicit CouplingSubjects: Computational Engineering, Finance, and Science (cs.CE)
We simulate the electrical response of multiple disjoint biological 3D cells in the electropermeabilization process. Instead of solving the boundary value problem in the volume, we reduce it to a system of boundary integrals equations with nonlinear dynamics on the cell membranes via a coupling the local Multiple Traces Formulation with a time semi-implicit scheme. Spatially, boundary unknowns are approximated by spherical harmonics, thereby allowing for spectral convergence rates for suitable time steps. Numerical results are provided to validate our claims.
- [262] arXiv:2403.19374 [pdf, other]
-
Title: A noise-tolerant, resource-saving probabilistic binary neural network implemented by the SOT-MRAM compute-in-memory systemAuthors: Yu Gu, Puyang Huang, Tianhao Chen, Chenyi Fu, Aitian Chen, Shouzhong Peng, Xixiang Zhang, Xufeng KouComments: 5 pages, 10 figuresSubjects: Emerging Technologies (cs.ET); Systems and Control (eess.SY)
We report a spin-orbit torque(SOT) magnetoresistive random-access memory(MRAM)-based probabilistic binary neural network(PBNN) for resource-saving and hardware noise-tolerant computing applications. With the presence of thermal fluctuation, the non-destructive SOT-driven magnetization switching characteristics lead to a random weight matrix with controllable probability distribution. In the meanwhile, the proposed CIM architecture allows for the concurrent execution of the probabilistic vector-matrix multiplication (PVMM) and binarization. Furthermore, leveraging the effectiveness of random binary cells to propagate multi-bit probabilistic information, our SOT-MRAM-based PBNN system achieves a 97.78\% classification accuracy under a 7.01\% weight variation on the MNIST database through 10 sampling cycles, and the number of bit-level computation operations is reduced by a factor of 6.9 compared to that of the full-precision LeNet-5 network. Our work provides a compelling framework for the design of reliable neural networks tailored to the applications with low power consumption and limited computational resources.
- [263] arXiv:2403.19375 [pdf, other]
-
Title: Multi-Agent Team Access Monitoring: Environments that Benefit from Target Information SharingSubjects: Robotics (cs.RO); Multiagent Systems (cs.MA)
Robotic access monitoring of multiple target areas has applications including checkpoint enforcement, surveillance and containment of fire and flood hazards. Monitoring access for a single target region has been successfully modeled as a minimum-cut problem. We generalize this model to support multiple target areas using two approaches: iterating on individual targets and examining the collections of targets holistically. Through simulation we measure the performance of each approach on different scenarios.
- [264] arXiv:2403.19376 [pdf, other]
-
Title: NIGHT -- Non-Line-of-Sight Imaging from Indirect Time of Flight DataComments: Submitted to ECCV 24, 17 pages, 6 figures, 2 tablesSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Image and Video Processing (eess.IV)
The acquisition of objects outside the Line-of-Sight of cameras is a very intriguing but also extremely challenging research topic. Recent works showed the feasibility of this idea exploiting transient imaging data produced by custom direct Time of Flight sensors. In this paper, for the first time, we tackle this problem using only data from an off-the-shelf indirect Time of Flight sensor without any further hardware requirement. We introduced a Deep Learning model able to re-frame the surfaces where light bounces happen as a virtual mirror. This modeling makes the task easier to handle and also facilitates the construction of annotated training data. From the obtained data it is possible to retrieve the depth information of the hidden scene. We also provide a first-in-its-kind synthetic dataset for the task and demonstrate the feasibility of the proposed idea over it.
- [265] arXiv:2403.19378 [pdf, other]
-
Title: Cleaning data with SwipeSubjects: Databases (cs.DB)
The repair problem for functional dependencies is the problem where an input database needs to be modified such that all functional dependencies are satisfied and the difference with the original database is minimal. The output database is then called an optimal repair. If the allowed modifications are value updates, finding an optimal repair is NP-hard. A well-known approach to find approximations of optimal repairs builds a Chase tree in which each internal node resolves violations of one functional dependency and leaf nodes represent repairs. A key property of this approach is that controlling the branching factor of the Chase tree allows to control the trade-off between repair quality and computational efficiency. In this paper, we explore an extreme variant of this idea in which the Chase tree has only one path. To construct this path, we first create a partition of attributes such that classes can be repaired sequentially. We repair each class only once and do so by fixing the order in which dependencies are repaired. This principle is called priority repairing and we provide a simple heuristic to determine priority. The algorithms for attribute partitioning and priority repair are combined in the Swipe algorithm. An empirical study on four real-life data sets shows that Swipe is in the range of one to three orders of magnitude faster than multi-sequence Chase-based approaches, whereas the quality of repairs is comparable or better. Moreover, we provide a scalability analysis of the Swipe algorithm.
- [266] arXiv:2403.19386 [pdf, other]
-
Title: PointCloud-Text Matching: Benchmark Datasets and a BaselineSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
In this paper, we present and study a new instance-level retrieval task: PointCloud-Text Matching~(PTM), which aims to find the exact cross-modal instance that matches a given point-cloud query or text query. PTM could be applied to various scenarios, such as indoor/urban-canyon localization and scene retrieval. However, there exists no suitable and targeted dataset for PTM in practice. Therefore, we construct three new PTM benchmark datasets, namely 3D2T-SR, 3D2T-NR, and 3D2T-QA. We observe that the data is challenging and with noisy correspondence due to the sparsity, noise, or disorder of point clouds and the ambiguity, vagueness, or incompleteness of texts, which make existing cross-modal matching methods ineffective for PTM. To tackle these challenges, we propose a PTM baseline, named Robust PointCloud-Text Matching method (RoMa). RoMa consists of two modules: a Dual Attention Perception module (DAP) and a Robust Negative Contrastive Learning module (RNCL). Specifically, DAP leverages token-level and feature-level attention to adaptively focus on useful local and global features, and aggregate them into common representations, thereby reducing the adverse impact of noise and ambiguity. To handle noisy correspondence, RNCL divides negative pairs, which are much less error-prone than positive pairs, into clean and noisy subsets, and assigns them forward and reverse optimization directions respectively, thus enhancing robustness against noisy correspondence. We conduct extensive experiments on our benchmarks and demonstrate the superiority of our RoMa.
- [267] arXiv:2403.19390 [pdf, other]
-
Title: Checkpoint Merging via Bayesian Optimization in LLM PretrainingAuthors: Deyuan Liu, Zecheng Wang, Bingning Wang, Weipeng Chen, Chunshan Li, Zhiying Tu, Dianhui Chu, Bo Li, Dianbo SuiSubjects: Computation and Language (cs.CL)
The rapid proliferation of large language models (LLMs) such as GPT-4 and Gemini underscores the intense demand for resources during their training processes, posing significant challenges due to substantial computational and environmental costs. To alleviate this issue, we propose checkpoint merging in pretraining LLM. This method utilizes LLM checkpoints with shared training trajectories, and is rooted in an extensive search space exploration for the best merging weight via Bayesian optimization. Through various experiments, we demonstrate that: (1) Our proposed methodology exhibits the capacity to augment pretraining, presenting an opportunity akin to obtaining substantial benefits at minimal cost; (2) Our proposed methodology, despite requiring a given held-out dataset, still demonstrates robust generalization capabilities across diverse domains, a pivotal aspect in pretraining.
- [268] arXiv:2403.19394 [pdf, ps, other]
-
Title: Cycling on the Freeway: The Perilous State of Open Source Neuroscience SoftwareAuthors: Britta U. Westner, Daniel R. McCloy, Eric Larson, Alexandre Gramfort, Daniel S. Katz, Arfon M. Smith, invited co-signeesSubjects: Computers and Society (cs.CY); Other Quantitative Biology (q-bio.OT)
Most scientists need software to perform their research (Barker et al., 2020; Carver et al., 2022; Hettrick, 2014; Hettrick et al., 2014; Switters and Osimo, 2019), and neuroscientists are no exception. Whether we work with reaction times, electrophysiological signals, or magnetic resonance imaging data, we rely on software to acquire, analyze, and statistically evaluate the raw data we obtain - or to generate such data if we work with simulations. In recent years there has been a shift toward relying on free, open-source scientific software (FOSSS) for neuroscience data analysis (Poldrack et al., 2019), in line with the broader open science movement in academia (McKiernan et al., 2016) and wider industry trends (Eghbal, 2016). Importantly, FOSSS is typically developed by working scientists (not professional software developers) which sets up a precarious situation given the nature of the typical academic workplace (wherein academics, especially in their early careers, are on short and fixed term contracts). In this paper, we will argue that the existing ecosystem of neuroscientific open source software is brittle, and discuss why and how the neuroscience community needs to come together to ensure a healthy growth of our software landscape to the benefit of all.
- [269] arXiv:2403.19398 [pdf, other]
-
Title: Clustering MOOC Programming Solutions to Diversify Their Presentation to StudentsAuthors: Elizaveta Artser, Anastasiia Birillo, Yaroslav Golubev, Maria Tigina, Hieke Keuning, Nikolay Vyahhi, Timofey BryksinComments: 7 pages, 4 figuresSubjects: Software Engineering (cs.SE)
In many MOOCs, whenever a student completes a programming task, they can see previous solutions of other students to find potentially different ways of solving the problem and learn new coding constructs. However, a lot of MOOCs simply show the most recent solutions, disregarding their diversity or quality.
To solve this novel problem, we adapted the existing plagiarism detection tool JPlag to Python submissions on Hyperskill, a popular MOOC platform. However, due to the tool's inner algorithm, it fully processed only 46 out of 867 studied tasks. Therefore, we developed our own tool called Rhubarb. This tool first standardizes solutions that are algorithmically the same, then calculates the structure-aware edit distance between them, and then applies clustering. Finally, it selects one example from each of the largest clusters, taking into account their code quality. Rhubarb was able to handle all 867 tasks successfully.
We compared approaches on a set of 59 tasks that both tools could process. Eight experts rated the selected solutions based on diversity, code quality, and usefulness. The default platform approach of selecting recent submissions received on average 3.12 out of 5, JPlag - 3.77, Rhubarb - 3.50. Since in the real MOOC, it is imperative to process everything, we created a system that uses JPlag on the 5.3% of tasks it fully processes and Rhubarb on the remaining 94.7%. - [270] arXiv:2403.19399 [pdf, ps, other]
-
Title: KazParC: Kazakh Parallel Corpus for Machine TranslationSubjects: Computation and Language (cs.CL)
We introduce KazParC, a parallel corpus designed for machine translation across Kazakh, English, Russian, and Turkish. The first and largest publicly available corpus of its kind, KazParC contains a collection of 371,902 parallel sentences covering different domains and developed with the assistance of human translators. Our research efforts also extend to the development of a neural machine translation model nicknamed Tilmash. Remarkably, the performance of Tilmash is on par with, and in certain instances, surpasses that of industry giants, such as Google Translate and Yandex Translate, as measured by standard evaluation metrics, such as BLEU and chrF. Both KazParC and Tilmash are openly available for download under the Creative Commons Attribution 4.0 International License (CC BY 4.0) through our GitHub repository.
- [271] arXiv:2403.19401 [pdf, ps, other]
-
Title: Hardness of Learning Boolean Functions from Label ProportionsComments: 17 pages. Conference version of this paper appeared in FSTTCS 2023Subjects: Computational Complexity (cs.CC); Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG)
In recent years the framework of learning from label proportions (LLP) has been gaining importance in machine learning. In this setting, the training examples are aggregated into subsets or bags and only the average label per bag is available for learning an example-level predictor. This generalizes traditional PAC learning which is the special case of unit-sized bags. The computational learning aspects of LLP were studied in recent works (Saket, NeurIPS'21; Saket, NeurIPS'22) which showed algorithms and hardness for learning halfspaces in the LLP setting. In this work we focus on the intractability of LLP learning Boolean functions. Our first result shows that given a collection of bags of size at most $2$ which are consistent with an OR function, it is NP-hard to find a CNF of constantly many clauses which satisfies any constant-fraction of the bags. This is in contrast with the work of (Saket, NeurIPS'21) which gave a $(2/5)$-approximation for learning ORs using a halfspace. Thus, our result provides a separation between constant clause CNFs and halfspaces as hypotheses for LLP learning ORs.
Next, we prove the hardness of satisfying more than $1/2 + o(1)$ fraction of such bags using a $t$-DNF (i.e. DNF where each term has $\leq t$ literals) for any constant $t$. In usual PAC learning such a hardness was known (Khot-Saket, FOCS'08) only for learning noisy ORs. We also study the learnability of parities and show that it is NP-hard to satisfy more than $(q/2^{q-1} + o(1))$-fraction of $q$-sized bags which are consistent with a parity using a parity, while a random parity based algorithm achieves a $(1/2^{q-2})$-approximation. - [272] arXiv:2403.19405 [pdf, other]
-
Title: Tabular Learning: Encoding for Entity and Context EmbeddingsAuthors: Fredy ReusserSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computational Engineering, Finance, and Science (cs.CE)
Examining the effect of different encoding techniques on entity and context embeddings, the goal of this work is to challenge commonly used Ordinal encoding for tabular learning. Applying different preprocessing methods and network architectures over several datasets resulted in a benchmark on how the encoders influence the learning outcome of the networks. By keeping the test, validation and training data consistent, results have shown that ordinal encoding is not the most suited encoder for categorical data in terms of preprocessing the data and thereafter, classifying the target variable correctly. A better outcome was achieved, encoding the features based on string similarities by computing a similarity matrix as input for the network. This is the case for both, entity and context embeddings, where the transformer architecture showed improved performance for Ordinal and Similarity encoding with regard to multi-label classification tasks.
- [273] arXiv:2403.19407 [pdf, other]
-
Title: Towards Temporally Consistent Referring Video Object SegmentationSubjects: Computer Vision and Pattern Recognition (cs.CV)
Referring Video Object Segmentation (R-VOS) methods face challenges in maintaining consistent object segmentation due to temporal context variability and the presence of other visually similar objects. We propose an end-to-end R-VOS paradigm that explicitly models temporal instance consistency alongside the referring segmentation. Specifically, we introduce a novel hybrid memory that facilitates inter-frame collaboration for robust spatio-temporal matching and propagation. Features of frames with automatically generated high-quality reference masks are propagated to segment the remaining frames based on multi-granularity association to achieve temporally consistent R-VOS. Furthermore, we propose a new Mask Consistency Score (MCS) metric to evaluate the temporal consistency of video segmentation. Extensive experiments demonstrate that our approach enhances temporal consistency by a significant margin, leading to top-ranked performance on popular R-VOS benchmarks, i.e., Ref-YouTube-VOS (67.1%) and Ref-DAVIS17 (65.6%).
- [274] arXiv:2403.19412 [pdf, other]
-
Title: A Simple and Effective Point-based Network for Event Camera 6-DOFs Pose RelocalizationComments: Accepted by CVPR 2024Subjects: Computer Vision and Pattern Recognition (cs.CV)
Event cameras exhibit remarkable attributes such as high dynamic range, asynchronicity, and low latency, making them highly suitable for vision tasks that involve high-speed motion in challenging lighting conditions. These cameras implicitly capture movement and depth information in events, making them appealing sensors for Camera Pose Relocalization (CPR) tasks. Nevertheless, existing CPR networks based on events neglect the pivotal fine-grained temporal information in events, resulting in unsatisfactory performance. Moreover, the energy-efficient features are further compromised by the use of excessively complex models, hindering efficient deployment on edge devices. In this paper, we introduce PEPNet, a simple and effective point-based network designed to regress six degrees of freedom (6-DOFs) event camera poses. We rethink the relationship between the event camera and CPR tasks, leveraging the raw Point Cloud directly as network input to harness the high-temporal resolution and inherent sparsity of events. PEPNet is adept at abstracting the spatial and implicit temporal features through hierarchical structure and explicit temporal features by Attentive Bi-directional Long Short-Term Memory (A-Bi-LSTM). By employing a carefully crafted lightweight design, PEPNet delivers state-of-the-art (SOTA) performance on both indoor and outdoor datasets with meager computational resources. Specifically, PEPNet attains a significant 38% and 33% performance improvement on the random split IJRR and M3ED datasets, respectively. Moreover, the lightweight design version PEPNet$_{tiny}$ accomplishes results comparable to the SOTA while employing a mere 0.5% of the parameters.
- [275] arXiv:2403.19414 [pdf, other]
-
Title: BP4ER: Bootstrap Prompting for Explicit Reasoning in Medical Dialogue GenerationComments: Accepted at LREC-COLING 2024Subjects: Computation and Language (cs.CL)
Medical dialogue generation (MDG) has gained increasing attention due to its substantial practical value. Previous works typically employ a sequence-to-sequence framework to generate medical responses by modeling dialogue context as sequential text with annotated medical entities. While these methods have been successful in generating fluent responses, they fail to provide process explanations of reasoning and require extensive entity annotation. To address these limitations, we propose the method Bootstrap Prompting for Explicit Reasoning in MDG (BP4ER), which explicitly model MDG's multi-step reasoning process and iteratively enhance this reasoning process. We employ a least-to-most prompting strategy to guide a large language model (LLM) in explicit reasoning, breaking down MDG into simpler sub-questions. These sub-questions build on answers from previous ones. Additionally, we also introduce two distinct bootstrapping techniques for prompting, which autonomously correct errors and facilitate the LLM's explicit reasoning. This approach eliminates the need for entity annotation and increases the transparency of the MDG process by explicitly generating the intermediate reasoning chain. The experimental findings on the two public datasets indicate that BP4ER outperforms state-of-the-art methods in terms of both objective and subjective evaluation metrics.
- [276] arXiv:2403.19417 [pdf, other]
-
Title: OAKINK2: A Dataset of Bimanual Hands-Object Manipulation in Complex Task CompletionComments: To be appeared in CVPR 2024. 26 pagesSubjects: Computer Vision and Pattern Recognition (cs.CV)
We present OAKINK2, a dataset of bimanual object manipulation tasks for complex daily activities. In pursuit of constructing the complex tasks into a structured representation, OAKINK2 introduces three level of abstraction to organize the manipulation tasks: Affordance, Primitive Task, and Complex Task. OAKINK2 features on an object-centric perspective for decoding the complex tasks, treating them as a sequence of object affordance fulfillment. The first level, Affordance, outlines the functionalities that objects in the scene can afford, the second level, Primitive Task, describes the minimal interaction units that humans interact with the object to achieve its affordance, and the third level, Complex Task, illustrates how Primitive Tasks are composed and interdependent. OAKINK2 dataset provides multi-view image streams and precise pose annotations for the human body, hands and various interacting objects. This extensive collection supports applications such as interaction reconstruction and motion synthesis. Based on the 3-level abstraction of OAKINK2, we explore a task-oriented framework for Complex Task Completion (CTC). CTC aims to generate a sequence of bimanual manipulation to achieve task objectives. Within the CTC framework, we employ Large Language Models (LLMs) to decompose the complex task objectives into sequences of Primitive Tasks and have developed a Motion Fulfillment Model that generates bimanual hand motion for each Primitive Task. OAKINK2 datasets and models are available at https://oakink.net/v2.
- [277] arXiv:2403.19418 [pdf, other]
-
Title: Constants of Motion for Conserved and Non-conserved DynamicsAuthors: Michael F. ZimmerComments: 14 pages, 5 figuresSubjects: Machine Learning (cs.LG); Chaotic Dynamics (nlin.CD)
This paper begins with a dynamical model that was obtained by applying a machine learning technique (FJet) to time-series data; this dynamical model is then analyzed with Lie symmetry techniques to obtain constants of motion. This analysis is performed on both the conserved and non-conserved cases of the 1D and 2D harmonic oscillators. For the 1D oscillator, constants are found in the cases where the system is underdamped, overdamped, and critically damped. The novel existence of such a constant for a non-conserved model is interpreted as a manifestation of the conservation of energy of the {\em total} system (i.e., oscillator plus dissipative environment). For the 2D oscillator, constants are found for the isotropic and anisotropic cases, including when the frequencies are incommensurate; it is also generalized to arbitrary dimensions. In addition, a constant is identified which generalizes angular momentum for all ratios of the frequencies. The approach presented here can produce {\em multiple} constants of motion from a {\em single}, generic data set.
- [278] arXiv:2403.19419 [pdf, other]
-
Title: Fairness in Ranking: Robustness through Randomization without the Protected AttributeSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
There has been great interest in fairness in machine learning, especially in relation to classification problems. In ranking-related problems, such as in online advertising, recommender systems, and HR automation, much work on fairness remains to be done. Two complications arise: first, the protected attribute may not be available in many applications. Second, there are multiple measures of fairness of rankings, and optimization-based methods utilizing a single measure of fairness of rankings may produce rankings that are unfair with respect to other measures. In this work, we propose a randomized method for post-processing rankings, which do not require the availability of the protected attribute. In an extensive numerical study, we show the robustness of our methods with respect to P-Fairness and effectiveness with respect to Normalized Discounted Cumulative Gain (NDCG) from the baseline ranking, improving on previously proposed methods.
- [279] arXiv:2403.19421 [pdf, other]
-
Title: Scaling up ridge regression for brain encoding in a massive individual fMRI datasetSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Neurons and Cognition (q-bio.NC); Quantitative Methods (q-bio.QM)
Brain encoding with neuroimaging data is an established analysis aimed at predicting human brain activity directly from complex stimuli features such as movie frames. Typically, these features are the latent space representation from an artificial neural network, and the stimuli are image, audio, or text inputs. Ridge regression is a popular prediction model for brain encoding due to its good out-of-sample generalization performance. However, training a ridge regression model can be highly time-consuming when dealing with large-scale deep functional magnetic resonance imaging (fMRI) datasets that include many space-time samples of brain activity. This paper evaluates different parallelization techniques to reduce the training time of brain encoding with ridge regression on the CNeuroMod Friends dataset, one of the largest deep fMRI resource currently available. With multi-threading, our results show that the Intel Math Kernel Library (MKL) significantly outperforms the OpenBLAS library, being 1.9 times faster using 32 threads on a single machine. We then evaluated the Dask multi-CPU implementation of ridge regression readily available in scikit-learn (MultiOutput), and we proposed a new "batch" version of Dask parallelization, motivated by a time complexity analysis. In line with our theoretical analysis, MultiOutput parallelization was found to be impractical, i.e., slower than multi-threading on a single machine. In contrast, the Batch-MultiOutput regression scaled well across compute nodes and threads, providing speed-ups of up to 33 times with 8 compute nodes and 32 threads compared to a single-threaded scikit-learn execution. Batch parallelization using Dask thus emerges as a scalable approach for brain encoding with ridge regression on high-performance computing systems using scikit-learn and large fMRI datasets.
- [280] arXiv:2403.19423 [pdf, other]
-
Title: Echo-chambers and Idea Labs: Communication Styles on TwitterSubjects: Social and Information Networks (cs.SI); Computation and Language (cs.CL)
This paper investigates the communication styles and structures of Twitter (X) communities within the vaccination context. While mainstream research primarily focuses on the echo-chamber phenomenon, wherein certain ideas are reinforced and participants are isolated from opposing opinions, this study reveals the presence of diverse communication styles across various communities. In addition to the communities exhibiting echo-chamber behavior, this research uncovers communities with distinct communication patterns. By shedding light on the nuanced nature of communication within social networks, this study emphasizes the significance of understanding the diversity of perspectives within online communities.
- [281] arXiv:2403.19424 [pdf, other]
-
Title: The Role of Syntactic Span Preferences in Post-Hoc Explanation DisagreementComments: Long paper accepted to LREC-Coling 2024 main conference. Please cite the conference proceedings version when availableSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Post-hoc explanation methods are an important tool for increasing model transparency for users. Unfortunately, the currently used methods for attributing token importance often yield diverging patterns. In this work, we study potential sources of disagreement across methods from a linguistic perspective. We find that different methods systematically select different classes of words and that methods that agree most with other methods and with humans display similar linguistic preferences. Token-level differences between methods are smoothed out if we compare them on the syntactic span level. We also find higher agreement across methods by estimating the most important spans dynamically instead of relying on a fixed subset of size $k$. We systematically investigate the interaction between $k$ and spans and propose an improved configuration for selecting important tokens.
- [282] arXiv:2403.19428 [pdf, other]
-
Title: Burst Super-Resolution with Diffusion Models for Improving Perceptual QualityComments: Accepted to IJCNN 2024 (International Joint Conference on Neural Networks)Subjects: Computer Vision and Pattern Recognition (cs.CV)
While burst LR images are useful for improving the SR image quality compared with a single LR image, prior SR networks accepting the burst LR images are trained in a deterministic manner, which is known to produce a blurry SR image. In addition, it is difficult to perfectly align the burst LR images, making the SR image more blurry. Since such blurry images are perceptually degraded, we aim to reconstruct the sharp high-fidelity boundaries. Such high-fidelity images can be reconstructed by diffusion models. However, prior SR methods using the diffusion model are not properly optimized for the burst SR task. Specifically, the reverse process starting from a random sample is not optimized for image enhancement and restoration methods, including burst SR. In our proposed method, on the other hand, burst LR features are used to reconstruct the initial burst SR image that is fed into an intermediate step in the diffusion model. This reverse process from the intermediate step 1) skips diffusion steps for reconstructing the global structure of the image and 2) focuses on steps for refining detailed textures. Our experimental results demonstrate that our method can improve the scores of the perceptual quality metrics. Code: https://github.com/placerkyo/BSRD
- [283] arXiv:2403.19432 [pdf, other]
-
Title: Uncovering Misattributed Suicide Causes through Annotation Inconsistency Detection in Death Investigation NotesAuthors: Song Wang, Yiliang Zhou, Ziqiang Han, Cui Tao, Yunyu Xiao, Ying Ding, Joydeep Ghosh, Yifan PengComments: 19 pages, 6 figuresSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Data accuracy is essential for scientific research and policy development. The National Violent Death Reporting System (NVDRS) data is widely used for discovering the patterns and causes of death. Recent studies suggested the annotation inconsistencies within the NVDRS and the potential impact on erroneous suicide-cause attributions. We present an empirical Natural Language Processing (NLP) approach to detect annotation inconsistencies and adopt a cross-validation-like paradigm to identify problematic instances. We analyzed 267,804 suicide death incidents between 2003 and 2020 from the NVDRS. Our results showed that incorporating the target state's data into training the suicide-crisis classifier brought an increase of 5.4% to the F-1 score on the target state's test set and a decrease of 1.1% on other states' test set. To conclude, we demonstrated the annotation inconsistencies in NVDRS's death investigation notes, identified problematic instances, evaluated the effectiveness of correcting problematic instances, and eventually proposed an NLP improvement solution.
- [284] arXiv:2403.19433 [pdf, other]
-
Title: Puzzle game: Prediction and Classification of Wordle Solution WordsComments: 17 pages, 15 figures, MCM/ICM 2023 Award winning PaperSubjects: Numerical Analysis (math.NA)
With the popularity of the Wordle game launched by the New York Times, more and more players are getting involved in this challenging game. Submitting the correct answer not only requires your luck, but is also influenced by various attributes of the word.
For question 1, we preprocessed the original data by removing and replacing the abnormal data firstly. Then, we established an ARIMA-based prediction model for the number of reported results, with the parameters p=0, d=1, q=1 of the model determined. And it gave [20337, 21673] as the prediction interval for the number of reported results on March 1, 2023. Then we selected the frequency of word usage (FREQ), the information entropy of the word (WIE) and the number of repeated letters contained in the word (NRE) as attributes of the word and we made correlation analysis between these three attributes and seven percentages of tries. The results showed that FREQ was positively correlated with the number of tries, while WIE and NRE were negatively correlated with the number of tries.
For problem 2, we established a regression model based on XGBoost algorithm for predicting the distribution of the reported results, and the three attributes selected for problem 1 were used to establish seven regression models for seven different tries, named XGB1 - XGB7. Since the percentage of 1 try was small and XGB1's prediction effect was poor, so we use the mean value of 1 try data - 0.5 as the prediction value of XGB1, and XGB2 - XGB7 model predicted 85.67%, 83.23%, 80.34%, 78.77%, 79.89%, and 84.63% respectively, with an overall accuracy of 82.1%. The associated percentages of (1, 2, 3, 4, 5, 6, X) of "EERIE" was predicted to be 0.5, 2.3, 13.8, 21.7, 29.4, 22.3 and 10.
Due to length limitations, we will not continue to display more content. - [285] arXiv:2403.19435 [pdf, other]
-
Title: BAMM: Bidirectional Autoregressive Motion ModelSubjects: Computer Vision and Pattern Recognition (cs.CV)
Generating human motion from text has been dominated by denoising motion models either through diffusion or generative masking process. However, these models face great limitations in usability by requiring prior knowledge of the motion length. Conversely, autoregressive motion models address this limitation by adaptively predicting motion endpoints, at the cost of degraded generation quality and editing capabilities. To address these challenges, we propose Bidirectional Autoregressive Motion Model (BAMM), a novel text-to-motion generation framework. BAMM consists of two key components: (1) a motion tokenizer that transforms 3D human motion into discrete tokens in latent space, and (2) a masked self-attention transformer that autoregressively predicts randomly masked tokens via a hybrid attention masking strategy. By unifying generative masked modeling and autoregressive modeling, BAMM captures rich and bidirectional dependencies among motion tokens, while learning the probabilistic mapping from textual inputs to motion outputs with dynamically-adjusted motion sequence length. This feature enables BAMM to simultaneously achieving high-quality motion generation with enhanced usability and built-in motion editability. Extensive experiments on HumanML3D and KIT-ML datasets demonstrate that BAMM surpasses current state-of-the-art methods in both qualitative and quantitative measures.
- [286] arXiv:2403.19436 [pdf, other]
-
Title: "At the end of the day, I am accountable": Gig Workers' Self-Tracking for Multi-Dimensional Accountability ManagementComments: Accepted to CHI 2024Subjects: Human-Computer Interaction (cs.HC)
Tracking is inherent in and central to the gig economy. Platforms track gig workers' performance through metrics such as acceptance rate and punctuality, while gig workers themselves engage in self-tracking. Although prior research has extensively examined how gig platforms track workers through metrics -- with some studies briefly acknowledging the phenomenon of self-tracking among workers -- there is a dearth of studies that explore how and why gig workers track themselves. To address this, we conducted 25 semi-structured interviews, revealing how gig workers self-tracking to manage accountabilities to themselves and external entities across three identities: the holistic self, the entrepreneurial self, and the platformized self. We connect our findings to neoliberalism, through which we contextualize gig workers' self-accountability and the invisible labor of self-tracking. We further discuss how self-tracking mitigates information and power asymmetries in gig work and offer design implications to support gig workers' multi-dimensional self-tracking.
- [287] arXiv:2403.19438 [pdf, other]
-
Title: SubjectDrive: Scaling Generative Data in Autonomous Driving via Subject ControlAuthors: Binyuan Huang, Yuqing Wen, Yucheng Zhao, Yaosi Hu, Yingfei Liu, Fan Jia, Weixin Mao, Tiancai Wang, Chi Zhang, Chang Wen Chen, Zhenzhong Chen, Xiangyu ZhangComments: Project page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
Autonomous driving progress relies on large-scale annotated datasets. In this work, we explore the potential of generative models to produce vast quantities of freely-labeled data for autonomous driving applications and present SubjectDrive, the first model proven to scale generative data production in a way that could continuously improve autonomous driving applications. We investigate the impact of scaling up the quantity of generative data on the performance of downstream perception models and find that enhancing data diversity plays a crucial role in effectively scaling generative data production. Therefore, we have developed a novel model equipped with a subject control mechanism, which allows the generative model to leverage diverse external data sources for producing varied and useful data. Extensive evaluations confirm SubjectDrive's efficacy in generating scalable autonomous driving training data, marking a significant step toward revolutionizing data production methods in this field.
- [288] arXiv:2403.19441 [pdf, other]
-
Title: A Novel Stochastic Transformer-based Approach for Post-Traumatic Stress Disorder Detection using Audio Recording of Clinical InterviewsJournal-ref: 2023 IEEE 36th International Symposium on Computer-Based Medical Systems (2023) 700-705Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
Post-traumatic stress disorder (PTSD) is a mental disorder that can be developed after witnessing or experiencing extremely traumatic events. PTSD can affect anyone, regardless of ethnicity, or culture. An estimated one in every eleven people will experience PTSD during their lifetime. The Clinician-Administered PTSD Scale (CAPS) and the PTSD Check List for Civilians (PCL-C) interviews are gold standards in the diagnosis of PTSD. These questionnaires can be fooled by the subject's responses. This work proposes a deep learning-based approach that achieves state-of-the-art performances for PTSD detection using audio recordings during clinical interviews. Our approach is based on MFCC low-level features extracted from audio recordings of clinical interviews, followed by deep high-level learning using a Stochastic Transformer. Our proposed approach achieves state-of-the-art performances with an RMSE of 2.92 on the eDAIC dataset thanks to the stochastic depth, stochastic deep learning layers, and stochastic activation function.
- [289] arXiv:2403.19442 [pdf, other]
-
Title: Exploiting Individual Graph Structures to Enhance Ecological Momentary Assessment (EMA) ForecastingComments: 9 pages, 3 figures, 2024 IEEE 40th International Conference on Data Engineering WorkshopsSubjects: Machine Learning (cs.LG)
In the evolving field of psychopathology, the accurate assessment and forecasting of data derived from Ecological Momentary Assessment (EMA) is crucial. EMA offers contextually-rich psychopathological measurements over time, that practically lead to Multivariate Time Series (MTS) data. Thus, many challenges arise in analysis from the temporal complexities inherent in emotional, behavioral, and contextual EMA data as well as their inter-dependencies. To address both of these aspects, this research investigates the performance of Recurrent and Temporal Graph Neural Networks (GNNs). Overall, GNNs, by incorporating additional information from graphs reflecting the inner relationships between the variables, notably enhance the results by decreasing the Mean Squared Error (MSE) to 0.84 compared to the baseline LSTM model at 1.02. Therefore, the effect of constructing graphs with different characteristics on GNN performance is also explored. Additionally, GNN-learned graphs, which are dynamically refined during the training process, were evaluated. Using such graphs showed a similarly good performance. Thus, graph learning proved also promising for other GNN methods, potentially refining the pre-defined graphs.
- [290] arXiv:2403.19443 [pdf, other]
-
Title: Mixed Preference Optimization: Reinforcement Learning with Data Selection and Better Reference ModelSubjects: Computation and Language (cs.CL)
Large Language Models (LLMs) have become increasingly popular due to their ability to process and generate natural language. However, as they are trained on massive datasets of text, LLMs can inherit harmful biases and produce outputs that are not aligned with human values. This paper studies two main approaches to LLM alignment: Reinforcement Learning with Human Feedback (RLHF) and contrastive learning-based methods like Direct Preference Optimization (DPO). By analyzing the stability and robustness of RLHF and DPO, we propose MPO (Mixed Preference Optimization), a novel method that mitigates the weaknesses of both approaches. Specifically, we propose a two-stage training procedure: first train DPO on an easy dataset, and then perform RLHF on a difficult set with DPO model being the reference model. Here, the easy and difficult sets are constructed by a well-trained reward model that splits response pairs into those with large gaps of reward (easy), and those with small gaps (difficult). The first stage allows us to obtain a relatively optimal policy (LLM) model quickly, whereas the second stage refines LLM with online RLHF, thus mitigating the distribution shift issue associated with DPO. Experiments are conducted on two public alignment datasets, namely HH-RLHF and TLDR, demonstrating the effectiveness of MPO, both in terms of GPT4 and human evaluation.
- [291] arXiv:2403.19444 [pdf, other]
-
Title: Transparent and Clinically Interpretable AI for Lung Cancer Detection in Chest X-RaysComments: 12 pages, 10 figuresSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
The rapidly advancing field of Explainable Artificial Intelligence (XAI) aims to tackle the issue of trust regarding the use of complex black-box deep learning models in real-world applications. Existing post-hoc XAI techniques have recently been shown to have poor performance on medical data, producing unreliable explanations which are infeasible for clinical use. To address this, we propose an ante-hoc approach based on concept bottleneck models which introduces for the first time clinical concepts into the classification pipeline, allowing the user valuable insight into the decision-making process. On a large public dataset of chest X-rays and associated medical reports, we focus on the binary classification task of lung cancer detection. Our approach yields improved classification performance in lung cancer detection when compared to baseline deep learning models (F1 > 0.9), while also generating clinically relevant and more reliable explanations than existing techniques. We evaluate our approach against post-hoc image XAI techniques LIME and SHAP, as well as CXR-LLaVA, a recent textual XAI tool which operates in the context of question answering on chest X-rays.
- [292] arXiv:2403.19446 [pdf, other]
-
Title: EDA-Driven Preprocessing for SAT SolvingAuthors: Zhengyuan Shi, Tiebing Tang, Sadaf Khan, Hui-Ling Zhen, Mingxuan Yuan, Zhufei Chu, Qiang XuSubjects: Logic in Computer Science (cs.LO)
Effective formulation of problems into Conjunctive Normal Form (CNF) is critical in modern Boolean Satisfiability (SAT) solving for optimizing solver performance. Addressing the limitations of existing methods, our Electronic Design Automation (EDA)-driven preprocessing framework introduces a novel methodology for preparing SAT instances, leveraging both circuit and CNF formats for enhanced flexibility and efficiency. Central to our approach is the integration of a new logic synthesis technique, guided by a reinforcement learning agent, and a novel cost-customized LUT mapping strategy, enabling efficient handling of diverse SAT challenges. By transforming the SAT competition benchmarks into circuit instances, our framework demonstrates substantial performance improvements, as evidenced by a 52.42% reduction on average compared to solving directly. Moreover, our framework achieves a remarkable 96.14% runtime reduction on average for a set of logic equivalence checking problems that exhibit inherent circuit structures. These results highlight the effectiveness and versatility of our approach in handling both CNF and circuit instances. The code is available at https://github.com/cure-lab/EDA4SAT.
- [293] arXiv:2403.19449 [pdf, other]
-
Title: O-RAN for Energy-Efficient Serving Cluster Formulation in User-Centric Cell-Free MMIMOComments: Accepted for presentation during The 2nd Workshop on Next-generation Open and Programmable Radio Access Networks (NG-OPERA), organized in conjunction with IEEE International Conference on Computer Communications, May 20, 2024Subjects: Information Theory (cs.IT); Networking and Internet Architecture (cs.NI)
The 6G Massive Multiple-Input Multiple-Output (MMIMO) networks can follow the so-called User-Centric Cell-Free (UCCF) architecture, where a single user is served by multiple Access Points (APs) coordinated by the Central Processing Unit (CPU). In this paper, we propose how O-RAN functionalities, i.e., rApp-xApp pair, can be used for energy-efficient Serving Cluster Formulation (SCF). Simulation studies show up to 37\% gain in Energy Efficiency (EE) of the proposed solution over the state-of-the-art Network-Centric (NC) designs.
- [294] arXiv:2403.19454 [pdf, other]
-
Title: JDocQA: Japanese Document Question Answering Dataset for Generative Language ModelsComments: LREC-COLING2024Subjects: Computation and Language (cs.CL)
Document question answering is a task of question answering on given documents such as reports, slides, pamphlets, and websites, and it is a truly demanding task as paper and electronic forms of documents are so common in our society. This is known as a quite challenging task because it requires not only text understanding but also understanding of figures and tables, and hence visual question answering (VQA) methods are often examined in addition to textual approaches. We introduce Japanese Document Question Answering (JDocQA), a large-scale document-based QA dataset, essentially requiring both visual and textual information to answer questions, which comprises 5,504 documents in PDF format and annotated 11,600 question-and-answer instances in Japanese. Each QA instance includes references to the document pages and bounding boxes for the answer clues. We incorporate multiple categories of questions and unanswerable questions from the document for realistic question-answering applications. We empirically evaluate the effectiveness of our dataset with text-based large language models (LLMs) and multimodal models. Incorporating unanswerable questions in finetuning may contribute to harnessing the so-called hallucination generation.
- [295] arXiv:2403.19456 [pdf, other]
-
Title: Break-for-Make: Modular Low-Rank Adaptations for Composable Content-Style CustomizationAuthors: Yu Xu, Fan Tang, Juan Cao, Yuxin Zhang, Oliver Deussen, Weiming Dong, Jintao Li, Tong-Yee LeeSubjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Multimedia (cs.MM)
Personalized generation paradigms empower designers to customize visual intellectual properties with the help of textual descriptions by tuning or adapting pre-trained text-to-image models on a few images. Recent works explore approaches for concurrently customizing both content and detailed visual style appearance. However, these existing approaches often generate images where the content and style are entangled. In this study, we reconsider the customization of content and style concepts from the perspective of parameter space construction. Unlike existing methods that utilize a shared parameter space for content and style, we propose a learning framework that separates the parameter space to facilitate individual learning of content and style, thereby enabling disentangled content and style. To achieve this goal, we introduce "partly learnable projection" (PLP) matrices to separate the original adapters into divided sub-parameter spaces. We propose "break-for-make" customization learning pipeline based on PLP, which is simple yet effective. We break the original adapters into "up projection" and "down projection", train content and style PLPs individually with the guidance of corresponding textual prompts in the separate adapters, and maintain generalization by employing a multi-correspondence projection learning strategy. Based on the adapters broken apart for separate training content and style, we then make the entity parameter space by reconstructing the content and style PLPs matrices, followed by fine-tuning the combined adapter to generate the target object with the desired appearance. Experiments on various styles, including textures, materials, and artistic style, show that our method outperforms state-of-the-art single/multiple concept learning pipelines in terms of content-style-prompt alignment.
- [296] arXiv:2403.19457 [pdf, other]
-
Title: Transmissive RIS Transmitter Enabled Spatial Modulation for MIMO SystemsSubjects: Information Theory (cs.IT); Signal Processing (eess.SP)
In this paper, we propose a novel transmissive reconfigurable intelligent surface (TRIS) transmitter-enabled spatial modulation (SM) multiple-input multiple-output (MIMO) system. In the transmission phase, a column-wise activation strategy is implemented for the TRIS panel, where the specific column elements are activated per time slot. Concurrently, the receiver employs the maximum likelihood detection technique. Based on this, for the transmit signals, we derive the closed-form expressions for the upper bounds of the average bit error probability (ABEP) of the proposed scheme from different perspectives, employing both vector-based and element-based approaches. Furthermore, we provide the asymptotic closed-form expressions for the ABEP of the TRIS-SM scheme, as well as the diversity gain. To improve the performance of the proposed TRIS-SM system, we optimize ABEP with a fixed data rate. Additionally, we provide lower bounds to simplify the computational complexity of improved TRIS-SM scheme. The Monte Carlo simulation method is used to validate the theoretical derivations exhaustively. The results demonstrate that the proposed TRIS-SM scheme can achieve better ABEP performance compared to the conventional SM scheme. Furthermore, the improved TRIS-SM scheme outperforms the TRIS-SM scheme in terms of reliability.
- [297] arXiv:2403.19459 [pdf, other]
-
Title: NeuroLGP-SM: A Surrogate-assisted Neuroevolution Approach using Linear Genetic ProgrammingComments: Accepted in "International Conference on Optimization and Learning (OLA), Dubrovnik, Croatia, 2024", 13 pages, 4 figures, 1 tableSubjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI)
Evolutionary algorithms are increasingly recognised as a viable computational approach for the automated optimisation of deep neural networks (DNNs) within artificial intelligence. This method extends to the training of DNNs, an approach known as neuroevolution. However, neuroevolution is an inherently resource-intensive process, with certain studies reporting the consumption of thousands of GPU days for refining and training a single DNN network. To address the computational challenges associated with neuroevolution while still attaining good DNN accuracy, surrogate models emerge as a pragmatic solution. Despite their potential, the integration of surrogate models into neuroevolution is still in its early stages, hindered by factors such as the effective use of high-dimensional data and the representation employed in neuroevolution. In this context, we address these challenges by employing a suitable representation based on Linear Genetic Programming, denoted as NeuroLGP, and leveraging Kriging Partial Least Squares. The amalgamation of these two techniques culminates in our proposed methodology known as the NeuroLGP-Surrogate Model (NeuroLGP-SM). For comparison purposes, we also code and use a baseline approach incorporating a repair mechanism, a common practice in neuroevolution. Notably, the baseline approach surpasses the renowned VGG-16 model in accuracy. Given the computational intensity inherent in DNN operations, a singular run is typically the norm. To evaluate the efficacy of our proposed approach, we conducted 96 independent runs. Significantly, our methodologies consistently outperform the baseline, with the SM model demonstrating superior accuracy or comparable results to the NeuroLGP approach. Noteworthy is the additional advantage that the SM approach exhibits a 25% reduction in computational requirements, further emphasising its efficiency for neuroevolution.
- [298] arXiv:2403.19460 [pdf, other]
-
Title: RiEMann: Near Real-Time SE(3)-Equivariant Robot Manipulation without Point Cloud SegmentationSubjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)
We present RiEMann, an end-to-end near Real-time SE(3)-Equivariant Robot Manipulation imitation learning framework from scene point cloud input. Compared to previous methods that rely on descriptor field matching, RiEMann directly predicts the target poses of objects for manipulation without any object segmentation. RiEMann learns a manipulation task from scratch with 5 to 10 demonstrations, generalizes to unseen SE(3) transformations and instances of target objects, resists visual interference of distracting objects, and follows the near real-time pose change of the target object. The scalable action space of RiEMann facilitates the addition of custom equivariant actions such as the direction of turning the faucet, which makes articulated object manipulation possible for RiEMann. In simulation and real-world 6-DOF robot manipulation experiments, we test RiEMann on 5 categories of manipulation tasks with a total of 25 variants and show that RiEMann outperforms baselines in both task success rates and SE(3) geodesic distance errors on predicted poses (reduced by 68.6%), and achieves a 5.4 frames per second (FPS) network inference speed. Code and video results are available at https://riemann-web.github.io/.
- [299] arXiv:2403.19461 [pdf, other]
-
Title: Learning Sampling Distribution and Safety Filter for Autonomous Driving with VQ-VAE and Differentiable OptimizationSubjects: Robotics (cs.RO)
Sampling trajectories from a distribution followed by ranking them based on a specified cost function is a common approach in autonomous driving. Typically, the sampling distribution is hand-crafted (e.g a Gaussian, or a grid). Recently, there have been efforts towards learning the sampling distribution through generative models such as Conditional Variational Autoencoder (CVAE). However, these approaches fail to capture the multi-modality of the driving behaviour due to the Gaussian latent prior of the CVAE. Thus, in this paper, we re-imagine the distribution learning through vector quantized variational autoencoder (VQ-VAE), whose discrete latent-space is well equipped to capture multi-modal sampling distribution. The VQ-VAE is trained with demonstration data of optimal trajectories. We further propose a differentiable optimization based safety filter to minimally correct the VQVAE sampled trajectories to ensure collision avoidance. We use backpropagation through the optimization layers in a self-supervised learning set-up to learn good initialization and optimal parameters of the safety filter. We perform extensive comparisons with state-of-the-art CVAE-based baseline in dense and aggressive traffic scenarios and show a reduction of up to 12 times in collision-rate while being competitive in driving speeds.
- [300] arXiv:2403.19462 [pdf, other]
-
Title: Offline Imitation Learning from Multiple Baselines with Applications to Compiler OptimizationSubjects: Machine Learning (cs.LG); Programming Languages (cs.PL)
This work studies a Reinforcement Learning (RL) problem in which we are given a set of trajectories collected with K baseline policies. Each of these policies can be quite suboptimal in isolation, and have strong performance in complementary parts of the state space. The goal is to learn a policy which performs as well as the best combination of baselines on the entire state space. We propose a simple imitation learning based algorithm, show a sample complexity bound on its accuracy and prove that the the algorithm is minimax optimal by showing a matching lower bound. Further, we apply the algorithm in the setting of machine learning guided compiler optimization to learn policies for inlining programs with the objective of creating a small binary. We demonstrate that we can learn a policy that outperforms an initial policy learned via standard RL through a few iterations of our approach.
- [301] arXiv:2403.19467 [pdf, other]
-
Title: Beyond Talking -- Generating Holistic 3D Human Dyadic Motion for CommunicationSubjects: Computer Vision and Pattern Recognition (cs.CV)
In this paper, we introduce an innovative task focused on human communication, aiming to generate 3D holistic human motions for both speakers and listeners. Central to our approach is the incorporation of factorization to decouple audio features and the combination of textual semantic information, thereby facilitating the creation of more realistic and coordinated movements. We separately train VQ-VAEs with respect to the holistic motions of both speaker and listener. We consider the real-time mutual influence between the speaker and the listener and propose a novel chain-like transformer-based auto-regressive model specifically designed to characterize real-world communication scenarios effectively which can generate the motions of both the speaker and the listener simultaneously. These designs ensure that the results we generate are both coordinated and diverse. Our approach demonstrates state-of-the-art performance on two benchmark datasets. Furthermore, we introduce the HoCo holistic communication dataset, which is a valuable resource for future research. Our HoCo dataset and code will be released for research purposes upon acceptance.
- [302] arXiv:2403.19470 [pdf, other]
-
Title: Deep decomposition method for the limited aperture inverse obstacle scattering problemSubjects: Numerical Analysis (math.NA); Machine Learning (cs.LG); Signal Processing (eess.SP)
In this paper, we consider a deep learning approach to the limited aperture inverse obstacle scattering problem. It is well known that traditional deep learning relies solely on data, which may limit its performance for the inverse problem when only indirect observation data and a physical model are available. A fundamental question arises in light of these limitations: is it possible to enable deep learning to work on inverse problems without labeled data and to be aware of what it is learning? This work proposes a deep decomposition method (DDM) for such purposes, which does not require ground truth labels. It accomplishes this by providing physical operators associated with the scattering model to the neural network architecture. Additionally, a deep learning based data completion scheme is implemented in DDM to prevent distorting the solution of the inverse problem for limited aperture data. Furthermore, apart from addressing the ill-posedness imposed by the inverse problem itself, DDM is a physics-aware machine learning technique that can have interpretability property. The convergence result of DDM is theoretically proven. Numerical experiments are presented to demonstrate the validity of the proposed DDM even when the incident and observation apertures are extremely limited.
- [303] arXiv:2403.19473 [pdf, other]
-
Title: Benchmarking Implicit Neural Representation and Geometric Rendering in Real-Time RGB-D SLAMComments: CVPR 2024Subjects: Computer Vision and Pattern Recognition (cs.CV)
Implicit neural representation (INR), in combination with geometric rendering, has recently been employed in real-time dense RGB-D SLAM. Despite active research endeavors being made, there lacks a unified protocol for fair evaluation, impeding the evolution of this area. In this work, we establish, to our knowledge, the first open-source benchmark framework to evaluate the performance of a wide spectrum of commonly used INRs and rendering functions for mapping and localization. The goal of our benchmark is to 1) gain an intuition of how different INRs and rendering functions impact mapping and localization and 2) establish a unified evaluation protocol w.r.t. the design choices that may impact the mapping and localization. With the framework, we conduct a large suite of experiments, offering various insights in choosing the INRs and geometric rendering functions: for example, the dense feature grid outperforms other INRs (e.g. tri-plane and hash grid), even when geometric and color features are jointly encoded for memory efficiency. To extend the findings into the practical scenario, a hybrid encoding strategy is proposed to bring the best of the accuracy and completion from the grid-based and decomposition-based INRs. We further propose explicit hybrid encoding for high-fidelity dense grid mapping to comply with the RGB-D SLAM system that puts the premise on robustness and computation efficiency.
- [304] arXiv:2403.19474 [pdf, other]
-
Title: SG-PGM: Partial Graph Matching Network with Semantic Geometric Fusion for 3D Scene Graph Alignment and Its Downstream TasksComments: 16 pages, 10 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
Scene graphs have been recently introduced into 3D spatial understanding as a comprehensive representation of the scene. The alignment between 3D scene graphs is the first step of many downstream tasks such as scene graph aided point cloud registration, mosaicking, overlap checking, and robot navigation. In this work, we treat 3D scene graph alignment as a partial graph-matching problem and propose to solve it with a graph neural network. We reuse the geometric features learned by a point cloud registration method and associate the clustered point-level geometric features with the node-level semantic feature via our designed feature fusion module. Partial matching is enabled by using a learnable method to select the top-k similar node pairs. Subsequent downstream tasks such as point cloud registration are achieved by running a pre-trained registration network within the matched regions. We further propose a point-matching rescoring method, that uses the node-wise alignment of the 3D scene graph to reweight the matching candidates from a pre-trained point cloud registration method. It reduces the false point correspondences estimated especially in low-overlapping cases. Experiments show that our method improves the alignment accuracy by 10~20% in low-overlap and random transformation scenarios and outperforms the existing work in multiple downstream tasks.
- [305] arXiv:2403.19475 [pdf, other]
-
Title: A theoretical framework for the design and analysis of computational thinking problems in educationAuthors: Giorgia Adorni, Alberto Piatti, Engin Bumbacher, Lucio Negrini, Francesco Mondada, Dorit Assaf, Francesca Mangili, Luca GambardellaSubjects: Human-Computer Interaction (cs.HC)
The field of computational thinking education has grown in recent years as researchers and educators have sought to develop and assess students' computational thinking abilities. While much of the research in this area has focused on defining computational thinking, the competencies it involves and how to assess them in teaching and learning contexts, this work takes a different approach. We provide a more situated perspective on computational thinking, focusing on the types of problems that require computational thinking skills to be solved and the features that support these processes. We develop a framework for analysing existing computational thinking problems in an educational context. We conduct a comprehensive literature review to identify prototypical activities from areas where computational thinking is typically pursued in education. We identify the main components and characteristics of these activities, along with their influence on activating computational thinking competencies. The framework provides a catalogue of computational thinking skills that can be used to understand the relationship between problem features and competencies activated. This study contributes to the field of computational thinking education by offering a tool for evaluating and revising existing problems to activate specific skills and for assisting in designing new problems that target the development of particular competencies. The results of this study may be of interest to researchers and educators working in computational thinking education.
- [306] arXiv:2403.19480 [pdf, ps, other]
-
Title: $H$-Consistency Guarantees for RegressionSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
We present a detailed study of $H$-consistency bounds for regression. We first present new theorems that generalize the tools previously given to establish $H$-consistency bounds. This generalization proves essential for analyzing $H$-consistency bounds specific to regression. Next, we prove a series of novel $H$-consistency bounds for surrogate loss functions of the squared loss, under the assumption of a symmetric distribution and a bounded hypothesis set. This includes positive results for the Huber loss, all $\ell_p$ losses, $p \geq 1$, the squared $\epsilon$-insensitive loss, as well as a negative result for the $\epsilon$-insensitive loss used in squared Support Vector Regression (SVR). We further leverage our analysis of $H$-consistency for regression and derive principled surrogate losses for adversarial regression (Section 5). This readily establishes novel algorithms for adversarial regression, for which we report favorable experimental results in Section 6.
- [307] arXiv:2403.19482 [pdf, other]
-
Title: The linear sampling method for data generated by small random scatterersSubjects: Numerical Analysis (math.NA)
We present an extension of the linear sampling method for solving the sound-soft inverse scattering problem in two dimensions with data generated by randomly distributed small scatterers. The theoretical justification of our novel sampling method is based on a rigorous asymptotic model, a modified Helmholtz--Kirchhoff identity, and our previous work on the linear sampling method for random sources. Our numerical implementation incorporates boundary elements, Singular Value Decomposition, Tikhonov regularization, and Morozov's discrepancy principle. We showcase the robustness and accuracy of our algorithms with a series of numerical experiments.
- [308] arXiv:2403.19484 [pdf, other]
-
Title: Improved Genetic Algorithm Based on Greedy and Simulated Annealing Ideas for Vascular Robot Ordering StrategyComments: 17 pagesSubjects: Neural and Evolutionary Computing (cs.NE)
This study presents a comprehensive approach for optimizing the acquisition, utilization, and maintenance of ABLVR vascular robots in healthcare settings. Medical robotics, particularly in vascular treatments, necessitates precise resource allocation and optimization due to the complex nature of robot and operator maintenance. Traditional heuristic methods, though intuitive, often fail to achieve global optimization. To address these challenges, this research introduces a novel strategy, combining mathematical modeling, a hybrid genetic algorithm, and ARIMA time series forecasting. Considering the dynamic healthcare environment, our approach includes a robust resource allocation model for robotic vessels and operators. We incorporate the unique requirements of the adaptive learning process for operators and the maintenance needs of robotic components. The hybrid genetic algorithm, integrating simulated annealing and greedy approaches, efficiently solves the optimization problem. Additionally, ARIMA time series forecasting predicts the demand for vascular robots, further enhancing the adaptability of our strategy. Experimental results demonstrate the superiority of our approach in terms of optimization, transparency, and convergence speed from other state-of-the-art methods.
- [309] arXiv:2403.19485 [pdf, other]
-
Title: Adaptive resolution of fine scales in modes of microstructured optical fibersSubjects: Numerical Analysis (math.NA); Optics (physics.optics)
An adaptive algorithm for computing eigenmodes and propagation constants of optical fibers is proposed. The algorithm is built using a dual-weighted residual error estimator. The residuals are based on the eigensystem for leaky hybrid modes obtained from Maxwell equations truncated to a finite domain after a transformation by a perfectly matched layer. The adaptive algorithm is then applied to compute practically interesting modes for multiple fiber microstructures. Emerging microstructured optical fibers are characterized by complex geometrical features in their transverse cross-section. Their leaky modes, useful for confining and propagating light in their cores, often exhibit fine scale features. The adaptive algorithm automatically captures these features without any expert input. The results also show that confinement losses of these modes are captured accurately on the adaptively found meshes.
- [310] arXiv:2403.19489 [pdf, other]
-
Title: Evolving Assembly Code in an Adversarial EnvironmentComments: 9 pages, 5 figures, 6 listingsSubjects: Neural and Evolutionary Computing (cs.NE)
In this work, we evolve assembly code for the CodeGuru competition. The competition's goal is to create a survivor -- an assembly program that runs the longest in shared memory, by resisting attacks from adversary survivors and finding their weaknesses. For evolving top-notch solvers, we specify a Backus Normal Form (BNF) for the assembly language and synthesize the code from scratch using Genetic Programming (GP). We evaluate the survivors by running CodeGuru games against human-written winning survivors. Our evolved programs found weaknesses in the programs they were trained against and utilized them. In addition, we compare our approach with a Large-Language Model, demonstrating that the latter cannot generate a survivor that can win at any competition. This work has important applications for cyber-security, as we utilize evolution to detect weaknesses in survivors. The assembly BNF is domain-independent; thus, by modifying the fitness function, it can detect code weaknesses and help fix them. Finally, the CodeGuru competition offers a novel platform for analyzing GP and code evolution in adversarial environments. To support further research in this direction, we provide a thorough qualitative analysis of the evolved survivors and the weaknesses found.
- [311] arXiv:2403.19490 [pdf, other]
-
Title: Jointly Training and Pruning CNNs via Learnable Agent Guidance and AlignmentComments: IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024Subjects: Computer Vision and Pattern Recognition (cs.CV)
Structural model pruning is a prominent approach used for reducing the computational cost of Convolutional Neural Networks (CNNs) before their deployment on resource-constrained devices. Yet, the majority of proposed ideas require a pretrained model before pruning, which is costly to secure. In this paper, we propose a novel structural pruning approach to jointly learn the weights and structurally prune architectures of CNN models. The core element of our method is a Reinforcement Learning (RL) agent whose actions determine the pruning ratios of the CNN model's layers, and the resulting model's accuracy serves as its reward. We conduct the joint training and pruning by iteratively training the model's weights and the agent's policy, and we regularize the model's weights to align with the selected structure by the agent. The evolving model's weights result in a dynamic reward function for the agent, which prevents using prominent episodic RL methods with stationary environment assumption for our purpose. We address this challenge by designing a mechanism to model the complex changing dynamics of the reward function and provide a representation of it to the RL agent. To do so, we take a learnable embedding for each training epoch and employ a recurrent model to calculate a representation of the changing environment. We train the recurrent model and embeddings using a decoder model to reconstruct observed rewards. Such a design empowers our agent to effectively leverage episodic observations along with the environment representations to learn a proper policy to determine performant sub-networks of the CNN model. Our extensive experiments on CIFAR-10 and ImageNet using ResNets and MobileNets demonstrate the effectiveness of our method.
- [312] arXiv:2403.19492 [pdf, other]
-
Title: Segmentation tool for images of cracksSubjects: Computer Vision and Pattern Recognition (cs.CV)
Safety-critical infrastructures, such as bridges, are periodically inspected to check for existing damage, such as fatigue cracks and corrosion, and to guarantee the safe use of the infrastructure. Visual inspection is the most frequent type of general inspection, despite the fact that its detection capability is rather limited, especially for fatigue cracks. Machine learning algorithms can be used for augmenting the capability of classical visual inspection of bridge structures, however, the implementation of such an algorithm requires a massive annotated training dataset, which is time-consuming to produce. This paper proposes a semi-automatic crack segmentation tool that eases the manual segmentation of cracks on images needed to create a training dataset for a machine learning algorithm. Also, it can be used to measure the geometry of the crack. This tool makes use of an image processing algorithm, which was initially developed for the analysis of vascular systems on retinal images. The algorithm relies on a multi-orientation wavelet transform, which is applied to the image to construct the so-called "orientation scores", i.e. a modified version of the image. Afterwards, the filtered orientation scores are used to formulate an optimal path problem that identifies the crack. The globally optimal path between manually selected crack endpoints is computed, using a state-of-the-art geometric tracking method. The pixel-wise segmentation is done afterwards using the obtained crack path. The proposed method outperforms fully automatic methods and shows potential to be an adequate alternative to the manual data annotation.
- [313] arXiv:2403.19494 [pdf, ps, other]
-
Title: Regression with Multi-Expert DeferralSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Learning to defer with multiple experts is a framework where the learner can choose to defer the prediction to several experts. While this problem has received significant attention in classification contexts, it presents unique challenges in regression due to the infinite and continuous nature of the label space. In this work, we introduce a novel framework of regression with deferral, which involves deferring the prediction to multiple experts. We present a comprehensive analysis for both the single-stage scenario, where there is simultaneous learning of predictor and deferral functions, and the two-stage scenario, which involves a pre-trained predictor with a learned deferral function. We introduce new surrogate loss functions for both scenarios and prove that they are supported by $H$-consistency bounds. These bounds provide consistency guarantees that are stronger than Bayes consistency, as they are non-asymptotic and hypothesis set-specific. Our framework is versatile, applying to multiple experts, accommodating any bounded regression losses, addressing both instance-dependent and label-dependent costs, and supporting both single-stage and two-stage methods. A by-product is that our single-stage formulation includes the recent regression with abstention framework (Cheng et al., 2023) as a special case, where only a single expert, the squared loss and a label-independent cost are considered. Minimizing our proposed loss functions directly leads to novel algorithms for regression with deferral. We report the results of extensive experiments showing the effectiveness of our proposed algorithms.
- [314] arXiv:2403.19495 [pdf, other]
-
Title: CoherentGS: Sparse Novel View Synthesis with Coherent 3D GaussiansAuthors: Avinash Paliwal, Wei Ye, Jinhui Xiong, Dmytro Kotovenko, Rakesh Ranjan, Vikas Chandra, Nima Khademi KalantariComments: Project page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
The field of 3D reconstruction from images has rapidly evolved in the past few years, first with the introduction of Neural Radiance Field (NeRF) and more recently with 3D Gaussian Splatting (3DGS). The latter provides a significant edge over NeRF in terms of the training and inference speed, as well as the reconstruction quality. Although 3DGS works well for dense input images, the unstructured point-cloud like representation quickly overfits to the more challenging setup of extremely sparse input images (e.g., 3 images), creating a representation that appears as a jumble of needles from novel views. To address this issue, we propose regularized optimization and depth-based initialization. Our key idea is to introduce a structured Gaussian representation that can be controlled in 2D image space. We then constraint the Gaussians, in particular their position, and prevent them from moving independently during optimization. Specifically, we introduce single and multiview constraints through an implicit convolutional decoder and a total variation loss, respectively. With the coherency introduced to the Gaussians, we further constrain the optimization through a flow-based loss function. To support our regularized optimization, we propose an approach to initialize the Gaussians using monocular depth estimates at each input view. We demonstrate significant improvements compared to the state-of-the-art sparse-view NeRF-based approaches on a variety of scenes.
- [315] arXiv:2403.19497 [pdf, other]
-
Title: Surface-based parcellation and vertex-wise analysis of ultra high-resolution ex vivo 7 tesla MRI in neurodegenerative diseasesAuthors: Pulkit Khandelwal, Michael Tran Duong, Constanza Fuentes, Amanda Denning, Winifred Trotman, Ranjit Ittyerah, Alejandra Bahena, Theresa Schuck, Marianna Gabrielyan, Karthik Prabhakaran, Daniel Ohm, Gabor Mizsei, John Robinson, Monica Munoz, John Detre, Edward Lee, David Irwin, Corey McMillan, M. Dylan Tisdall, Sandhitsu Das, David Wolk, Paul A. YushkevichComments: Under review at MICCAI 2024Subjects: Computer Vision and Pattern Recognition (cs.CV)
Magnetic resonance imaging (MRI) is the standard modality to understand human brain structure and function in vivo (antemortem). Decades of research in human neuroimaging has led to the widespread development of methods and tools to provide automated volume-based segmentations and surface-based parcellations which help localize brain functions to specialized anatomical regions. Recently ex vivo (postmortem) imaging of the brain has opened-up avenues to study brain structure at sub-millimeter ultra high-resolution revealing details not possible to observe with in vivo MRI. Unfortunately, there has been limited methodological development in ex vivo MRI primarily due to lack of datasets and limited centers with such imaging resources. Therefore, in this work, we present one-of-its-kind dataset of 82 ex vivo T2w whole brain hemispheres MRI at 0.3 mm isotropic resolution spanning Alzheimer's disease and related dementias. We adapted and developed a fast and easy-to-use automated surface-based pipeline to parcellate, for the first time, ultra high-resolution ex vivo brain tissue at the native subject space resolution using the Desikan-Killiany-Tourville (DKT) brain atlas. This allows us to perform vertex-wise analysis in the template space and thereby link morphometry measures with pathology measurements derived from histology. We will open-source our dataset docker container, Jupyter notebooks for ready-to-use out-of-the-box set of tools and command line options to advance ex vivo MRI clinical brain imaging research on the project webpage.
- [316] arXiv:2403.19499 [pdf, other]
-
Title: Client-supervised Federated Learning: Towards One-model-for-all PersonalizationSubjects: Machine Learning (cs.LG)
Personalized Federated Learning (PerFL) is a new machine learning paradigm that delivers personalized models for diverse clients under federated learning settings. Most PerFL methods require extra learning processes on a client to adapt a globally shared model to the client-specific personalized model using its own local data. However, the model adaptation process in PerFL is still an open challenge in the stage of model deployment and test time. This work tackles the challenge by proposing a novel federated learning framework to learn only one robust global model to achieve competitive performance to those personalized models on unseen/test clients in the FL system. Specifically, we design a new Client-Supervised Federated Learning (FedCS) to unravel clients' bias on instances' latent representations so that the global model can learn both client-specific and client-agnostic knowledge. Experimental study shows that the FedCS can learn a robust FL global model for the changing data distributions of unseen/test clients. The FedCS's global model can be directly deployed to the test clients while achieving comparable performance to other personalized FL methods that require model adaptation.
- [317] arXiv:2403.19500 [pdf, other]
-
Title: Tensor Network-Constrained Kernel Machines as Gaussian ProcessesSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Tensor Networks (TNs) have recently been used to speed up kernel machines by constraining the model weights, yielding exponential computational and storage savings. In this paper we prove that the outputs of Canonical Polyadic Decomposition (CPD) and Tensor Train (TT)-constrained kernel machines recover a Gaussian Process (GP), which we fully characterize, when placing i.i.d. priors over their parameters. We analyze the convergence of both CPD and TT-constrained models, and show how TT yields models exhibiting more GP behavior compared to CPD, for the same number of model parameters. We empirically observe this behavior in two numerical experiments where we respectively analyze the convergence to the GP and the performance at prediction. We thereby establish a connection between TN-constrained kernel machines and GPs.
- [318] arXiv:2403.19501 [pdf, other]
-
Title: RELI11D: A Comprehensive Multimodal Human Motion Dataset and MethodAuthors: Ming Yan, Yan Zhang, Shuqiang Cai, Shuqi Fan, Xincheng Lin, Yudi Dai, Siqi Shen, Chenglu Wen, Lan Xu, Yuexin Ma, Cheng WangComments: CVPR2024, Project website: this http URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
Comprehensive capturing of human motions requires both accurate captures of complex poses and precise localization of the human within scenes. Most of the HPE datasets and methods primarily rely on RGB, LiDAR, or IMU data. However, solely using these modalities or a combination of them may not be adequate for HPE, particularly for complex and fast movements. For holistic human motion understanding, we present RELI11D, a high-quality multimodal human motion dataset involves LiDAR, IMU system, RGB camera, and Event camera. It records the motions of 10 actors performing 5 sports in 7 scenes, including 3.32 hours of synchronized LiDAR point clouds, IMU measurement data, RGB videos and Event steams. Through extensive experiments, we demonstrate that the RELI11D presents considerable challenges and opportunities as it contains many rapid and complex motions that require precise location. To address the challenge of integrating different modalities, we propose LEIR, a multimodal baseline that effectively utilizes LiDAR Point Cloud, Event stream, and RGB through our cross-attention fusion strategy. We show that LEIR exhibits promising results for rapid motions and daily motions and that utilizing the characteristics of multiple modalities can indeed improve HPE performance. Both the dataset and source code will be released publicly to the research community, fostering collaboration and enabling further exploration in this field.
- [319] arXiv:2403.19506 [pdf, ps, other]
-
Title: LLMs as Academic Reading Companions: Extending HCI Through Synthetic PersonaeComments: 3 pages, accepted to CHI2024 workshop "Challenges and Opportunities of LLM-Based Synthetic Personae and Data in HCI"Subjects: Human-Computer Interaction (cs.HC)
This position paper argues that large language models (LLMs) constitute promising yet underutilized academic reading companions capable of enhancing learning. We detail an exploratory study examining Claude.ai from Anthropic, an LLM-based interactive assistant that helps students comprehend complex qualitative literature content. The study compares quantitative survey data and qualitative interviews assessing outcomes between a control group and an experimental group leveraging Claude.ai over a semester across two graduate courses. Initial findings demonstrate tangible improvements in reading comprehension and engagement among participants using the AI agent versus unsupported independent study. However, there is potential for overreliance and ethical considerations that warrant continued investigation. By documenting an early integration of an LLM reading companion into an educational context, this work contributes pragmatic insights to guide development of synthetic personae supporting learning. Broader impacts compel policy and industry actions to uphold responsible design in order to maximize benefits of AI integration while prioritizing student wellbeing.
- [320] arXiv:2403.19507 [pdf, other]
-
Title: SineNet: Learning Temporal Dynamics in Time-Dependent Partial Differential EquationsAuthors: Xuan Zhang, Jacob Helwig, Yuchao Lin, Yaochen Xie, Cong Fu, Stephan Wojtowytsch, Shuiwang JiComments: The Twelfth International Conference on Learning RepresentationsSubjects: Machine Learning (cs.LG)
We consider using deep neural networks to solve time-dependent partial differential equations (PDEs), where multi-scale processing is crucial for modeling complex, time-evolving dynamics. While the U-Net architecture with skip connections is commonly used by prior studies to enable multi-scale processing, our analysis shows that the need for features to evolve across layers results in temporally misaligned features in skip connections, which limits the model's performance. To address this limitation, we propose SineNet, consisting of multiple sequentially connected U-shaped network blocks, referred to as waves. In SineNet, high-resolution features are evolved progressively through multiple stages, thereby reducing the amount of misalignment within each stage. We furthermore analyze the role of skip connections in enabling both parallel and sequential processing of multi-scale information. Our method is rigorously tested on multiple PDE datasets, including the Navier-Stokes equations and shallow water equations, showcasing the advantages of our proposed approach over conventional U-Nets with a comparable parameter budget. We further demonstrate that increasing the number of waves in SineNet while maintaining the same number of parameters leads to a monotonically improved performance. The results highlight the effectiveness of SineNet and the potential of our approach in advancing the state-of-the-art in neural PDE solver design. Our code is available as part of AIRS (https://github.com/divelab/AIRS).
- [321] arXiv:2403.19509 [pdf, ps, other]
-
Title: Phonetic Segmentation of the UCLA Phonetics Lab ArchiveComments: Accepted at LREC-COLING 2024Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Research in speech technologies and comparative linguistics depends on access to diverse and accessible speech data. The UCLA Phonetics Lab Archive is one of the earliest multilingual speech corpora, with long-form audio recordings and phonetic transcriptions for 314 languages (Ladefoged et al., 2009). Recently, 95 of these languages were time-aligned with word-level phonetic transcriptions (Li et al., 2021). Here we present VoxAngeles, a corpus of audited phonetic transcriptions and phone-level alignments of the UCLA Phonetics Lab Archive, which uses the 95-language CMU re-release as our starting point. VoxAngeles also includes word- and phone-level segmentations from the original UCLA corpus, as well as phonetic measurements of word and phone durations, vowel formants, and vowel f0. This corpus enhances the usability of the original data, particularly for quantitative phonetic typology, as demonstrated through a case study of vowel intrinsic f0. We also discuss the utility of the VoxAngeles corpus for general research and pedagogy in crosslinguistic phonetics, as well as for low-resource and multilingual speech technologies. VoxAngeles is free to download and use under a CC-BY-NC 4.0 license.
- [322] arXiv:2403.19510 [pdf, other]
-
Title: On the Robustness of LDP Protocols for Numerical Attributes under Data Poisoning AttacksSubjects: Cryptography and Security (cs.CR)
Recent studies reveal that local differential privacy (LDP) protocols are vulnerable to data poisoning attacks where an attacker can manipulate the final estimate on the server by leveraging the characteristics of LDP and sending carefully crafted data from a small fraction of controlled local clients. This vulnerability raises concerns regarding the robustness and reliability of LDP in hostile environments.
In this paper, we conduct a systematic investigation of the robustness of state-of-the-art LDP protocols for numerical attributes, i.e., categorical frequency oracles (CFOs) with binning and consistency, and distribution reconstruction. We evaluate protocol robustness through an attack-driven approach and propose new metrics for cross-protocol attack gain measurement. The results indicate that Square Wave and CFO-based protocols in the Server setting are more robust against the attack compared to the CFO-based protocols in the User setting. Our evaluation also unfolds new relationships between LDP security and its inherent design choices. We found that the hash domain size in local-hashing-based LDP has a profound impact on protocol robustness beyond the well-known effect on utility. Further, we propose a zero-shot attack detection by leveraging the rich reconstructed distribution information. The experiment show that our detection significantly improves the existing methods and effectively identifies data manipulation in challenging scenarios. - [323] arXiv:2403.19511 [pdf, ps, other]
-
Title: Improving Clinical NLP Performance through Language Model-Generated Synthetic Clinical DataAuthors: Shan Chen, Jack Gallifant, Marco Guevara, Yanjun Gao, Majid Afshar, Timothy Miller, Dmitriy Dligach, Danielle S. BittermanComments: submitted to reviewSubjects: Computation and Language (cs.CL)
Generative models have been showing potential for producing data in mass. This study explores the enhancement of clinical natural language processing performance by utilizing synthetic data generated from advanced language models. Promising results show feasible applications in such a high-stakes domain.
- [324] arXiv:2403.19514 [pdf, other]
-
Title: CDIMC-net: Cognitive Deep Incomplete Multi-view Clustering NetworkComments: Accepted by IJCAI 2020Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
In recent years, incomplete multi-view clustering, which studies the challenging multi-view clustering problem on missing views, has received growing research interests. Although a series of methods have been proposed to address this issue, the following problems still exist: 1) Almost all of the existing methods are based on shallow models, which is difficult to obtain discriminative common representations. 2) These methods are generally sensitive to noise or outliers since the negative samples are treated equally as the important samples. In this paper, we propose a novel incomplete multi-view clustering network, called Cognitive Deep Incomplete Multi-view Clustering Network (CDIMC-net), to address these issues. Specifically, it captures the high-level features and local structure of each view by incorporating the view-specific deep encoders and graph embedding strategy into a framework. Moreover, based on the human cognition, i.e., learning from easy to hard, it introduces a self-paced strategy to select the most confident samples for model training, which can reduce the negative influence of outliers. Experimental results on several incomplete datasets show that CDIMC-net outperforms the state-of-the-art incomplete multi-view clustering methods.
- [325] arXiv:2403.19517 [pdf, other]
-
Title: XScale-NVS: Cross-Scale Novel View Synthesis with Hash Featurized ManifoldComments: Accepted to CVPR 2024. Project page: xscalenvs.github.io/Subjects: Computer Vision and Pattern Recognition (cs.CV)
We propose XScale-NVS for high-fidelity cross-scale novel view synthesis of real-world large-scale scenes. Existing representations based on explicit surface suffer from discretization resolution or UV distortion, while implicit volumetric representations lack scalability for large scenes due to the dispersed weight distribution and surface ambiguity. In light of the above challenges, we introduce hash featurized manifold, a novel hash-based featurization coupled with a deferred neural rendering framework. This approach fully unlocks the expressivity of the representation by explicitly concentrating the hash entries on the 2D manifold, thus effectively representing highly detailed contents independent of the discretization resolution. We also introduce a novel dataset, namely GigaNVS, to benchmark cross-scale, high-resolution novel view synthesis of realworld large-scale scenes. Our method significantly outperforms competing baselines on various real-world scenes, yielding an average LPIPS that is 40% lower than prior state-of-the-art on the challenging GigaNVS benchmark. Please see our project page at: xscalenvs.github.io.
- [326] arXiv:2403.19521 [pdf, other]
-
Title: Interpreting Key Mechanisms of Factual Recall in Transformer-Based Language ModelsSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
In this paper, we deeply explore the mechanisms employed by Transformer-based language models in factual recall tasks. In zero-shot scenarios, given a prompt like "The capital of France is," task-specific attention heads extract the topic entity, such as "France," from the context and pass it to subsequent MLPs to recall the required answer such as "Paris." We introduce a novel analysis method aimed at decomposing the outputs of the MLP into components understandable by humans. Through this method, we quantify the function of the MLP layer following these task-specific heads. In the residual stream, it either erases or amplifies the information originating from individual heads. Moreover, it generates a component that redirects the residual stream towards the direction of its expected answer. These zero-shot mechanisms are also employed in few-shot scenarios. Additionally, we observed a widely existent anti-overconfidence mechanism in the final layer of models, which suppresses correct predictions. We mitigate this suppression by leveraging our interpretation to improve factual recall performance. Our interpretations have been evaluated across various language models, from the GPT-2 families to 1.3B OPT, and across tasks covering different domains of factual knowledge.
- [327] arXiv:2403.19522 [pdf, other]
-
Title: Model Stock: All we need is just a few fine-tuned modelsComments: Code at this https URLSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
This paper introduces an efficient fine-tuning method for large pre-trained models, offering strong in-distribution (ID) and out-of-distribution (OOD) performance. Breaking away from traditional practices that need a multitude of fine-tuned models for averaging, our approach employs significantly fewer models to achieve final weights yet yield superior accuracy. Drawing from key insights in the weight space of fine-tuned weights, we uncover a strong link between the performance and proximity to the center of weight space. Based on this, we introduce a method that approximates a center-close weight using only two fine-tuned models, applicable during or after training. Our innovative layer-wise weight averaging technique surpasses state-of-the-art model methods such as Model Soup, utilizing only two fine-tuned models. This strategy can be aptly coined Model Stock, highlighting its reliance on selecting a minimal number of models to draw a more optimized-averaged model. We demonstrate the efficacy of Model Stock with fine-tuned models based upon pre-trained CLIP architectures, achieving remarkable performance on both ID and OOD tasks on the standard benchmarks, all while barely bringing extra computational demands. Our code and pre-trained models are available at https://github.com/naver-ai/model-stock.
- [328] arXiv:2403.19526 [pdf, ps, other]
-
Title: Logic and Languages of Higher-Dimensional AutomataComments: Submission to DLT24, 12 pages + references + appendixSubjects: Formal Languages and Automata Theory (cs.FL); Logic in Computer Science (cs.LO)
In this paper we study finite higher-dimensional automata (HDAs) from the logical point of view. Languages of HDAs are sets of finite bounded-width interval pomsets with interfaces (iiPoms<=k) closed under order extension. We prove that languages of HDAs are MSO-definable. For the converse, we show that the order extensions of MSO-definable sets of iiPoms<=k are languages of HDAs. As a consequence, unlike the case of all pomsets, order extension of MSO-definable sets of iiPoms<=k is also MSO-definable.
- [329] arXiv:2403.19527 [pdf, other]
-
Title: Instance-Adaptive and Geometric-Aware Keypoint Learning for Category-Level 6D Object Pose EstimationComments: Accepted to CVPR2024Subjects: Computer Vision and Pattern Recognition (cs.CV)
Category-level 6D object pose estimation aims to estimate the rotation, translation and size of unseen instances within specific categories. In this area, dense correspondence-based methods have achieved leading performance. However, they do not explicitly consider the local and global geometric information of different instances, resulting in poor generalization ability to unseen instances with significant shape variations. To deal with this problem, we propose a novel Instance-Adaptive and Geometric-Aware Keypoint Learning method for category-level 6D object pose estimation (AG-Pose), which includes two key designs: (1) The first design is an Instance-Adaptive Keypoint Detection module, which can adaptively detect a set of sparse keypoints for various instances to represent their geometric structures. (2) The second design is a Geometric-Aware Feature Aggregation module, which can efficiently integrate the local and global geometric information into keypoint features. These two modules can work together to establish robust keypoint-level correspondences for unseen instances, thus enhancing the generalization ability of the model.Experimental results on CAMERA25 and REAL275 datasets show that the proposed AG-Pose outperforms state-of-the-art methods by a large margin without category-specific shape priors.
- [330] arXiv:2403.19530 [pdf, other]
-
Title: Detecting Financial Bots on the Ethereum BlockchainSubjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)
The integration of bots in Distributed Ledger Technologies (DLTs) fosters efficiency and automation. However, their use is also associated with predatory trading and market manipulation, and can pose threats to system integrity. It is therefore essential to understand the extent of bot deployment in DLTs; despite this, current detection systems are predominantly rule-based and lack flexibility. In this study, we present a novel approach that utilizes machine learning for the detection of financial bots on the Ethereum platform. First, we systematize existing scientific literature and collect anecdotal evidence to establish a taxonomy for financial bots, comprising 7 categories and 24 subcategories. Next, we create a ground-truth dataset consisting of 133 human and 137 bot addresses. Third, we employ both unsupervised and supervised machine learning algorithms to detect bots deployed on Ethereum. The highest-performing clustering algorithm is a Gaussian Mixture Model with an average cluster purity of 82.6%, while the highest-performing model for binary classification is a Random Forest with an accuracy of 83%. Our machine learning-based detection mechanism contributes to understanding the Ethereum ecosystem dynamics by providing additional insights into the current bot landscape.
- [331] arXiv:2403.19531 [pdf, other]
-
Title: SecGraph: Towards SGX-based Efficient and Confidentiality-Preserving Graph SearchComments: This paper has been accepted by DASFAA 2024Subjects: Cryptography and Security (cs.CR); Databases (cs.DB); Social and Information Networks (cs.SI)
Graphs have more expressive power and are widely researched in various search demand scenarios, compared with traditional relational and XML models. Today, many graph search services have been deployed on a third-party server, which can alleviate users from the burdens of maintaining large-scale graphs and huge computation costs. Nevertheless, outsourcing graph search services to the third-party server may invade users' privacy. PeGraph was recently proposed to achieve the encrypted search over the social graph. The main idea of PeGraph is to maintain two data structures XSet and TSet motivated by the OXT technology to support encrypted conductive search. However, PeGraph still has some limitations. First, PeGraph suffers from high communication and computation costs in search operations. Second, PeGraph cannot support encrypted search over dynamic graphs. In this paper, we propose an SGX-based efficient and confidentiality-preserving graph search scheme SecGraph that can support insertion and deletion operations. We first design a new proxy-token generation method to reduce the communication cost. Then, we design an LDCF-encoded XSet based on the Logarithmic Dynamic Cuckoo Filter to reduce the computation cost. Finally, we design a new dynamic version of TSet named Twin-TSet to enable encrypted search over dynamic graphs. We have demonstrated the confidentiality preservation property of SecGraph through rigorous security analysis. Experiment results show that SecGraph yields up to 208x improvement in search time compared with PeGraph and the communication cost in PeGraph is up to 540x larger than that in SecGraph.
- [332] arXiv:2403.19534 [pdf, other]
-
Title: Locate, Assign, Refine: Taming Customized Image Inpainting with Text-Subject GuidanceComments: 22 pages, 14 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV)
Prior studies have made significant progress in image inpainting guided by either text or subject image. However, the research on editing with their combined guidance is still in the early stages. To tackle this challenge, we present LAR-Gen, a novel approach for image inpainting that enables seamless inpainting of masked scene images, incorporating both the textual prompts and specified subjects. Our approach adopts a coarse-to-fine manner to ensure subject identity preservation and local semantic coherence. The process involves (i) Locate: concatenating the noise with masked scene image to achieve precise regional editing, (ii) Assign: employing decoupled cross-attention mechanism to accommodate multi-modal guidance, and (iii) Refine: using a novel RefineNet to supplement subject details. Additionally, to address the issue of scarce training data, we introduce a novel data construction pipeline. This pipeline extracts substantial pairs of data consisting of local text prompts and corresponding visual instances from a vast image dataset, leveraging publicly available large models. Extensive experiments and varied application scenarios demonstrate the superiority of LAR-Gen in terms of both identity preservation and text semantic consistency. Project page can be found at \url{https://ali-vilab.github.io/largen-page/}.
- [333] arXiv:2403.19539 [pdf, other]
-
Title: De-confounded Data-free Knowledge Distillation for Handling Distribution ShiftsAuthors: Yuzheng Wang, Dingkang Yang, Zhaoyu Chen, Yang Liu, Siao Liu, Wenqiang Zhang, Lihua Zhang, Lizhe QiComments: Accepted by CVPR24Subjects: Computer Vision and Pattern Recognition (cs.CV)
Data-Free Knowledge Distillation (DFKD) is a promising task to train high-performance small models to enhance actual deployment without relying on the original training data. Existing methods commonly avoid relying on private data by utilizing synthetic or sampled data. However, a long-overlooked issue is that the severe distribution shifts between their substitution and original data, which manifests as huge differences in the quality of images and class proportions. The harmful shifts are essentially the confounder that significantly causes performance bottlenecks. To tackle the issue, this paper proposes a novel perspective with causal inference to disentangle the student models from the impact of such shifts. By designing a customized causal graph, we first reveal the causalities among the variables in the DFKD task. Subsequently, we propose a Knowledge Distillation Causal Intervention (KDCI) framework based on the backdoor adjustment to de-confound the confounder. KDCI can be flexibly combined with most existing state-of-the-art baselines. Experiments in combination with six representative DFKD methods demonstrate the effectiveness of our KDCI, which can obviously help existing methods under almost all settings, \textit{e.g.}, improving the baseline by up to 15.54\% accuracy on the CIFAR-100 dataset.
- [334] arXiv:2403.19540 [pdf, ps, other]
-
Title: A third-order low-regularity trigonometric integrator for the semilinear Klein-Gordon equationSubjects: Numerical Analysis (math.NA)
In this paper, we propose and analyze a novel third-order low-regularity trigonometric integrator for the semilinear Klein-Gordon equation in the $d$-dimensional space with $d=1,2,3$. The integrator is constructed based on the full use of Duhamel's formula and the technique of twisted function to the trigonometric integrals. Rigorous error estimates are presented and the proposed method is shown to have third-order accuracy in the energy space under a weak regularity requirement in $H^{2}\times H^{1}$. A numerical experiment shows that the proposed third-order low-regularity integrator is much more accurate than the well-known exponential integrators of order three for approximating the Klein-Gordon equation with nonsmooth solutions.
- [335] arXiv:2403.19545 [pdf, other]
-
Title: Lamarckian Inheritance Improves Robot Evolution in Dynamic EnvironmentsComments: Nature. arXiv admin note: substantial text overlap with arXiv:2309.13099; text overlap with arXiv:2303.12594, arXiv:2309.14387Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)
This study explores the integration of Lamarckian system into evolutionary robotics (ER), comparing it with the traditional Darwinian model across various environments. By adopting Lamarckian principles, where robots inherit learned traits, alongside Darwinian learning without inheritance, we investigate adaptation in dynamic settings. Our research, conducted in six distinct environmental setups, demonstrates that Lamarckian systems outperform Darwinian ones in adaptability and efficiency, particularly in challenging conditions. Our analysis highlights the critical role of the interplay between controller \& morphological evolution and environment adaptation, with parent-offspring similarities and newborn \&survivors before and after learning providing insights into the effectiveness of trait inheritance. Our findings suggest Lamarckian principles could significantly advance autonomous system design, highlighting the potential for more adaptable and robust robotic solutions in complex, real-world applications. These theoretical insights were validated using real physical robots, bridging the gap between simulation and practical application.
- [336] arXiv:2403.19546 [pdf, other]
-
Title: Croissant: A Metadata Format for ML-Ready DatasetsAuthors: Mubashara Akhtar, Omar Benjelloun, Costanza Conforti, Joan Giner-Miguelez, Nitisha Jain, Michael Kuchnik, Quentin Lhoest, Pierre Marcenac, Manil Maskey, Peter Mattson, Luis Oala, Pierre Ruyssen, Rajat Shinde, Elena Simperl, Goeffry Thomas, Slava Tykhonov, Joaquin Vanschoren, Steffen Vogler, Carole-Jean WuComments: Preprint. Contributors listed in alphabetical orderSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Databases (cs.DB); Information Retrieval (cs.IR)
Data is a critical resource for Machine Learning (ML), yet working with data remains a key friction point. This paper introduces Croissant, a metadata format for datasets that simplifies how data is used by ML tools and frameworks. Croissant makes datasets more discoverable, portable and interoperable, thereby addressing significant challenges in ML data management and responsible AI. Croissant is already supported by several popular dataset repositories, spanning hundreds of thousands of datasets, ready to be loaded into the most popular ML frameworks.
- [337] arXiv:2403.19548 [pdf, other]
-
Title: WaterJudge: Quality-Detection Trade-off when Watermarking Large Language ModelsComments: NAACL 2024 (Findings)Subjects: Computation and Language (cs.CL)
Watermarking generative-AI systems, such as LLMs, has gained considerable interest, driven by their enhanced capabilities across a wide range of tasks. Although current approaches have demonstrated that small, context-dependent shifts in the word distributions can be used to apply and detect watermarks, there has been little work in analyzing the impact that these perturbations have on the quality of generated texts. Balancing high detectability with minimal performance degradation is crucial in terms of selecting the appropriate watermarking setting; therefore this paper proposes a simple analysis framework where comparative assessment, a flexible NLG evaluation framework, is used to assess the quality degradation caused by a particular watermark setting. We demonstrate that our framework provides easy visualization of the quality-detection trade-off of watermark settings, enabling a simple solution to find an LLM watermark operating point that provides a well-balanced performance. This approach is applied to two different summarization systems and a translation system, enabling cross-model analysis for a task, and cross-task analysis.
- [338] arXiv:2403.19549 [pdf, other]
-
Title: GlORIE-SLAM: Globally Optimized RGB-only Implicit Encoding Point Cloud SLAMSubjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
Recent advancements in RGB-only dense Simultaneous Localization and Mapping (SLAM) have predominantly utilized grid-based neural implicit encodings and/or struggle to efficiently realize global map and pose consistency. To this end, we propose an efficient RGB-only dense SLAM system using a flexible neural point cloud scene representation that adapts to keyframe poses and depth updates, without needing costly backpropagation. Another critical challenge of RGB-only SLAM is the lack of geometric priors. To alleviate this issue, with the aid of a monocular depth estimator, we introduce a novel DSPO layer for bundle adjustment which optimizes the pose and depth of keyframes along with the scale of the monocular depth. Finally, our system benefits from loop closure and online global bundle adjustment and performs either better or competitive to existing dense neural RGB SLAM methods in tracking, mapping and rendering accuracy on the Replica, TUM-RGBD and ScanNet datasets. The source code will be made available.
- [339] arXiv:2403.19554 [pdf, other]
-
Title: Cross-Attention is Not Always Needed: Dynamic Cross-Attention for Audio-Visual Dimensional Emotion RecognitionComments: Accepted at IEEE ICME2024Subjects: Computer Vision and Pattern Recognition (cs.CV)
In video-based emotion recognition, audio and visual modalities are often expected to have a complementary relationship, which is widely explored using cross-attention. However, they may also exhibit weak complementary relationships, resulting in poor representations of audio-visual features, thus degrading the performance of the system. To address this issue, we propose Dynamic Cross-Attention (DCA) that can dynamically select cross-attended or unattended features on the fly based on their strong or weak complementary relationship with each other, respectively. Specifically, a simple yet efficient gating layer is designed to evaluate the contribution of the cross-attention mechanism and choose cross-attended features only when they exhibit a strong complementary relationship, otherwise unattended features. We evaluate the performance of the proposed approach on the challenging RECOLA and Aff-Wild2 datasets. We also compare the proposed approach with other variants of cross-attention and show that the proposed model consistently improves the performance on both datasets.
- [340] arXiv:2403.19556 [pdf, other]
-
Title: Expectation Maximization Aided Modified Weighted Sequential Energy Detector for Distributed Cooperative Spectrum SensingSubjects: Systems and Control (eess.SY)
Distributed cooperative spectrum sensing usually involves a group of unlicensed secondary users (SUs) collaborating to detect the primary user (PU) in the channel, and thereby opportunistically utilize it without causing interference to the PU. The conventional energy detector (ED) based spectrum sensing ignores the dynamic nature of the PU by using energy statistic only from the present sensing interval for the PU detection. However, for a dynamic PU, previous studies have shown that improved detection capabilities can be achieved by aggregating both present and past energy samples in a test statistic. To this end, a weighted sequential energy detector (WSED) has been proposed, but it is based on aggregating all the collected energy samples over an observation window. For a highly dynamic PU, that involves also combining the outdated samples in the test statistic. In this paper, we propose a modified WSED (mWSED) that uses the primary user states information over the window to aggregate only the highly correlated energy samples in its test statistic. In practice, since the PU states are a priori unknown, we also develop a joint expectation-maximization and Viterbi (EM-Viterbi) algorithm based scheme to iteratively estimate the states by using the energy samples collected over the window. The estimated states are then used in mWSED to compute its test statistics, and the algorithm is referred to here as EM-mWSED. Simulation results are presented to demonstrate the states estimation performance of EM-Viterbi and the PU detection performance of EM-mWSED. The results show that, for both highly dynamic as well as slowly time-varying PU, these algorithms outperform the ED and WSED at PU detection, and their performances improve by either increasing the average number of neighbors per SU in the network, or by increasing the SNR or the number of samples per energy statistic.
- [341] arXiv:2403.19559 [pdf, other]
-
Title: Improving Adversarial Data Collection by Supporting Annotators: Lessons from GAHD, a German Hate Speech DatasetComments: Accepted at NAACL 2024 (main conference)Subjects: Computation and Language (cs.CL)
Hate speech detection models are only as good as the data they are trained on. Datasets sourced from social media suffer from systematic gaps and biases, leading to unreliable models with simplistic decision boundaries. Adversarial datasets, collected by exploiting model weaknesses, promise to fix this problem. However, adversarial data collection can be slow and costly, and individual annotators have limited creativity. In this paper, we introduce GAHD, a new German Adversarial Hate speech Dataset comprising ca.\ 11k examples. During data collection, we explore new strategies for supporting annotators, to create more diverse adversarial examples more efficiently and provide a manual analysis of annotator disagreements for each strategy. Our experiments show that the resulting dataset is challenging even for state-of-the-art hate speech detection models, and that training on GAHD clearly improves model robustness. Further, we find that mixing multiple support strategies is most advantageous. We make GAHD publicly available at https://github.com/jagol/gahd.
- [342] arXiv:2403.19560 [pdf, other]
-
Title: Exploring Communication Dynamics: Eye-tracking Analysis in Pair Programming of Computer Science EducationSubjects: Human-Computer Interaction (cs.HC)
Pair programming is widely recognized as an effective educational tool in computer science that promotes collaborative learning and mirrors real-world work dynamics. However, communication breakdowns within pairs significantly challenge this learning process. In this study, we use eye-tracking data recorded during pair programming sessions to study communication dynamics between various pair programming roles across different student, expert, and mixed group cohorts containing 19 participants. By combining eye-tracking data analysis with focus group interviews and questionnaires, we provide insights into communication's multifaceted nature in pair programming. Our findings highlight distinct eye-tracking patterns indicating changes in communication skills across group compositions, with participants prioritizing code exploration over communication, especially during challenging tasks. Further, students showed a preference for pairing with experts, emphasizing the importance of understanding group formation in pair programming scenarios. These insights emphasize the importance of understanding group dynamics and enhancing communication skills through pair programming for successful outcomes in computer science education.
- [343] arXiv:2403.19561 [pdf, other]
-
Title: Self-Improved Learning for Scalable Neural Combinatorial OptimizationSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
The end-to-end neural combinatorial optimization (NCO) method shows promising performance in solving complex combinatorial optimization problems without the need for expert design. However, existing methods struggle with large-scale problems, hindering their practical applicability. To overcome this limitation, this work proposes a novel Self-Improved Learning (SIL) method for better scalability of neural combinatorial optimization. Specifically, we develop an efficient self-improved mechanism that enables direct model training on large-scale problem instances without any labeled data. Powered by an innovative local reconstruction approach, this method can iteratively generate better solutions by itself as pseudo-labels to guide efficient model training. In addition, we design a linear complexity attention mechanism for the model to efficiently handle large-scale combinatorial problem instances with low computation overhead. Comprehensive experiments on the Travelling Salesman Problem (TSP) and the Capacitated Vehicle Routing Problem (CVRP) with up to 100K nodes in both uniform and real-world distributions demonstrate the superior scalability of our method.
- [344] arXiv:2403.19570 [pdf, other]
-
Title: GrINd: Grid Interpolation Network for Scattered ObservationsSubjects: Machine Learning (cs.LG)
Predicting the evolution of spatiotemporal physical systems from sparse and scattered observational data poses a significant challenge in various scientific domains. Traditional methods rely on dense grid-structured data, limiting their applicability in scenarios with sparse observations. To address this challenge, we introduce GrINd (Grid Interpolation Network for Scattered Observations), a novel network architecture that leverages the high-performance of grid-based models by mapping scattered observations onto a high-resolution grid using a Fourier Interpolation Layer. In the high-resolution space, a NeuralPDE-class model predicts the system's state at future timepoints using differentiable ODE solvers and fully convolutional neural networks parametrizing the system's dynamics. We empirically evaluate GrINd on the DynaBench benchmark dataset, comprising six different physical systems observed at scattered locations, demonstrating its state-of-the-art performance compared to existing models. GrINd offers a promising approach for forecasting physical systems from sparse, scattered observational data, extending the applicability of deep learning methods to real-world scenarios with limited data availability.
- [345] arXiv:2403.19572 [pdf, other]
-
Title: Swarm Characteristics Classification Using Neural NetworksSubjects: Machine Learning (cs.LG)
Understanding the characteristics of swarming autonomous agents is critical for defense and security applications. This article presents a study on using supervised neural network time series classification (NN TSC) to predict key attributes and tactics of swarming autonomous agents for military contexts. Specifically, NN TSC is applied to infer two binary attributes - communication and proportional navigation - which combine to define four mutually exclusive swarm tactics. We identify a gap in literature on using NNs for swarm classification and demonstrate the effectiveness of NN TSC in rapidly deducing intelligence about attacking swarms to inform counter-maneuvers. Through simulated swarm-vs-swarm engagements, we evaluate NN TSC performance in terms of observation window requirements, noise robustness, and scalability to swarm size. Key findings show NNs can predict swarm behaviors with 97% accuracy using short observation windows of 20 time steps, while also demonstrating graceful degradation down to 80% accuracy under 50% noise, as well as excellent scalability to swarm sizes from 10 to 100 agents. These capabilities are promising for real-time decision-making support in defense scenarios by rapidly inferring insights about swarm behavior.
- [346] arXiv:2403.19577 [pdf, other]
-
Title: A Public and Reproducible Assessment of the Topics API on Real DataComments: Accepted at SecWeb 2024: Workshop on Designing Security for the WebSubjects: Cryptography and Security (cs.CR)
The Topics API for the web is Google's privacy-enhancing alternative to replace third-party cookies. Results of prior work have led to an ongoing discussion between Google and research communities about the capability of Topics to trade off both utility and privacy. The central point of contention is largely around the realism of the datasets used in these analyses and their reproducibility; researchers using data collected on a small sample of users or generating synthetic datasets, while Google's results are inferred from a private dataset. In this paper, we complement prior research by performing a reproducible assessment of the latest version of the Topics API on the largest and publicly available dataset of real browsing histories. First, we measure how unique and stable real users' interests are over time. Then, we evaluate if Topics can be used to fingerprint the users from these real browsing traces by adapting methodologies from prior privacy studies. Finally, we call on web actors to perform and enable reproducible evaluations by releasing anonymized distributions. We find that 46%, 55%, and 60% of the 1207 users in the dataset are uniquely re-identified across websites after only 1, 2, and 3 observations of their topics by advertisers, respectively. This paper shows on real data that Topics does not provide the same privacy guarantees to all users, further highlighting the need for public and reproducible evaluations of the claims made by new web proposals.
- [347] arXiv:2403.19578 [pdf, other]
-
Title: Keypoint Action Tokens Enable In-Context Imitation Learning in RoboticsSubjects: Robotics (cs.RO); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
We show that off-the-shelf text-based Transformers, with no additional training, can perform few-shot in-context visual imitation learning, mapping visual observations to action sequences that emulate the demonstrator's behaviour. We achieve this by transforming visual observations (inputs) and trajectories of actions (outputs) into sequences of tokens that a text-pretrained Transformer (GPT-4 Turbo) can ingest and generate, via a framework we call Keypoint Action Tokens (KAT). Despite being trained only on language, we show that these Transformers excel at translating tokenised visual keypoint observations into action trajectories, performing on par or better than state-of-the-art imitation learning (diffusion policies) in the low-data regime on a suite of real-world, everyday tasks. Rather than operating in the language domain as is typical, KAT leverages text-based Transformers to operate in the vision and action domains to learn general patterns in demonstration data for highly efficient imitation learning, indicating promising new avenues for repurposing natural language models for embodied tasks. Videos are available at https://www.robot-learning.uk/keypoint-action-tokens.
- [348] arXiv:2403.19579 [pdf, other]
-
Title: The Bad Batches: Enhancing Self-Supervised Learning in Image Classification Through Representative Batch CurationComments: 8 Pages, 4 figures, IEEE WCCI 2024 ConferenceSubjects: Computer Vision and Pattern Recognition (cs.CV)
The pursuit of learning robust representations without human supervision is a longstanding challenge. The recent advancements in self-supervised contrastive learning approaches have demonstrated high performance across various representation learning challenges. However, current methods depend on the random transformation of training examples, resulting in some cases of unrepresentative positive pairs that can have a large impact on learning. This limitation not only impedes the convergence of the learning process but the robustness of the learnt representation as well as requiring larger batch sizes to improve robustness to such bad batches. This paper attempts to alleviate the influence of false positive and false negative pairs by employing pairwise similarity calculations through the Fr\'echet ResNet Distance (FRD), thereby obtaining robust representations from unlabelled data. The effectiveness of the proposed method is substantiated by empirical results, where a linear classifier trained on self-supervised contrastive representations achieved an impressive 87.74\% top-1 accuracy on STL10 and 99.31\% on the Flower102 dataset. These results emphasize the potential of the proposed approach in pushing the boundaries of the state-of-the-art in self-supervised contrastive learning, particularly for image classification tasks.
- [349] arXiv:2403.19580 [pdf, other]
-
Title: OV-Uni3DETR: Towards Unified Open-Vocabulary 3D Object Detection via Cycle-Modality PropagationSubjects: Computer Vision and Pattern Recognition (cs.CV)
In the current state of 3D object detection research, the severe scarcity of annotated 3D data, substantial disparities across different data modalities, and the absence of a unified architecture, have impeded the progress towards the goal of universality. In this paper, we propose \textbf{OV-Uni3DETR}, a unified open-vocabulary 3D detector via cycle-modality propagation. Compared with existing 3D detectors, OV-Uni3DETR offers distinct advantages: 1) Open-vocabulary 3D detection: During training, it leverages various accessible data, especially extensive 2D detection images, to boost training diversity. During inference, it can detect both seen and unseen classes. 2) Modality unifying: It seamlessly accommodates input data from any given modality, effectively addressing scenarios involving disparate modalities or missing sensor information, thereby supporting test-time modality switching. 3) Scene unifying: It provides a unified multi-modal model architecture for diverse scenes collected by distinct sensors. Specifically, we propose the cycle-modality propagation, aimed at propagating knowledge bridging 2D and 3D modalities, to support the aforementioned functionalities. 2D semantic knowledge from large-vocabulary learning guides novel class discovery in the 3D domain, and 3D geometric knowledge provides localization supervision for 2D detection images. OV-Uni3DETR achieves the state-of-the-art performance on various scenarios, surpassing existing methods by more than 6\% on average. Its performance using only RGB images is on par with or even surpasses that of previous point cloud based methods. Code and pre-trained models will be released later.
- [350] arXiv:2403.19584 [pdf, other]
-
Title: Img2Loc: Revisiting Image Geolocalization using Multi-modality Foundation Models and Image-based Retrieval-Augmented GenerationAuthors: Zhongliang Zhou, Jielu Zhang, Zihan Guan, Mengxuan Hu, Ni Lao, Lan Mu, Sheng Li, Gengchen MaiSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Geolocating precise locations from images presents a challenging problem in computer vision and information retrieval.Traditional methods typically employ either classification, which dividing the Earth surface into grid cells and classifying images accordingly, or retrieval, which identifying locations by matching images with a database of image-location pairs. However, classification-based approaches are limited by the cell size and cannot yield precise predictions, while retrieval-based systems usually suffer from poor search quality and inadequate coverage of the global landscape at varied scale and aggregation levels. To overcome these drawbacks, we present Img2Loc, a novel system that redefines image geolocalization as a text generation task. This is achieved using cutting-edge large multi-modality models like GPT4V or LLaVA with retrieval augmented generation. Img2Loc first employs CLIP-based representations to generate an image-based coordinate query database. It then uniquely combines query results with images itself, forming elaborate prompts customized for LMMs. When tested on benchmark datasets such as Im2GPS3k and YFCC4k, Img2Loc not only surpasses the performance of previous state-of-the-art models but does so without any model training.
- [351] arXiv:2403.19586 [pdf, other]
-
Title: TOGS: Gaussian Splatting with Temporal Opacity Offset for Real-Time 4D DSA RenderingAuthors: Shuai Zhang, Huangxuan Zhao, Zhenghong Zhou, Guanjun Wu, Chuansheng Zheng, Xinggang Wang, Wenyu LiuSubjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
Four-dimensional Digital Subtraction Angiography (4D DSA) is a medical imaging technique that provides a series of 2D images captured at different stages and angles during the process of contrast agent filling blood vessels. It plays a significant role in the diagnosis of cerebrovascular diseases. Improving the rendering quality and speed under sparse sampling is important for observing the status and location of lesions. The current methods exhibit inadequate rendering quality in sparse views and suffer from slow rendering speed. To overcome these limitations, we propose TOGS, a Gaussian splatting method with opacity offset over time, which can effectively improve the rendering quality and speed of 4D DSA. We introduce an opacity offset table for each Gaussian to model the temporal variations in the radiance of the contrast agent. By interpolating the opacity offset table, the opacity variation of the Gaussian at different time points can be determined. This enables us to render the 2D DSA image at that specific moment. Additionally, we introduced a Smooth loss term in the loss function to mitigate overfitting issues that may arise in the model when dealing with sparse view scenarios. During the training phase, we randomly prune Gaussians, thereby reducing the storage overhead of the model. The experimental results demonstrate that compared to previous methods, this model achieves state-of-the-art reconstruction quality under the same number of training views. Additionally, it enables real-time rendering while maintaining low storage overhead. The code will be publicly available.
- [352] arXiv:2403.19588 [pdf, other]
-
Title: DenseNets Reloaded: Paradigm Shift Beyond ResNets and ViTsComments: Code at this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
This paper revives Densely Connected Convolutional Networks (DenseNets) and reveals the underrated effectiveness over predominant ResNet-style architectures. We believe DenseNets' potential was overlooked due to untouched training methods and traditional design elements not fully revealing their capabilities. Our pilot study shows dense connections through concatenation are strong, demonstrating that DenseNets can be revitalized to compete with modern architectures. We methodically refine suboptimal components - architectural adjustments, block redesign, and improved training recipes towards widening DenseNets and boosting memory efficiency while keeping concatenation shortcuts. Our models, employing simple architectural elements, ultimately surpass Swin Transformer, ConvNeXt, and DeiT-III - key architectures in the residual learning lineage. Furthermore, our models exhibit near state-of-the-art performance on ImageNet-1K, competing with the very recent models and downstream tasks, ADE20k semantic segmentation, and COCO object detection/instance segmentation. Finally, we provide empirical analyses that uncover the merits of the concatenation over additive shortcuts, steering a renewed preference towards DenseNet-style designs. Our code is available at https://github.com/naver-ai/rdnet.
- [353] arXiv:2403.19589 [pdf, other]
-
Title: TOD3Cap: Towards 3D Dense Captioning in Outdoor ScenesAuthors: Bu Jin, Yupeng Zheng, Pengfei Li, Weize Li, Yuhang Zheng, Sujie Hu, Xinyu Liu, Jinwei Zhu, Zhijie Yan, Haiyang Sun, Kun Zhan, Peng Jia, Xiaoxiao Long, Yilun Chen, Hao ZhaoComments: Code, data, and models are publicly available at this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
3D dense captioning stands as a cornerstone in achieving a comprehensive understanding of 3D scenes through natural language. It has recently witnessed remarkable achievements, particularly in indoor settings. However, the exploration of 3D dense captioning in outdoor scenes is hindered by two major challenges: 1) the \textbf{domain gap} between indoor and outdoor scenes, such as dynamics and sparse visual inputs, makes it difficult to directly adapt existing indoor methods; 2) the \textbf{lack of data} with comprehensive box-caption pair annotations specifically tailored for outdoor scenes. To this end, we introduce the new task of outdoor 3D dense captioning. As input, we assume a LiDAR point cloud and a set of RGB images captured by the panoramic camera rig. The expected output is a set of object boxes with captions. To tackle this task, we propose the TOD3Cap network, which leverages the BEV representation to generate object box proposals and integrates Relation Q-Former with LLaMA-Adapter to generate rich captions for these objects. We also introduce the TOD3Cap dataset, the largest one to our knowledge for 3D dense captioning in outdoor scenes, which contains 2.3M descriptions of 64.3K outdoor objects from 850 scenes. Notably, our TOD3Cap network can effectively localize and caption 3D objects in outdoor scenes, which outperforms baseline methods by a significant margin (+9.6 CiDEr@0.5IoU). Code, data, and models are publicly available at https://github.com/jxbbb/TOD3Cap.
- [354] arXiv:2403.19591 [pdf, other]
-
Title: Genetic Quantization-Aware Approximation for Non-Linear Operations in TransformersAuthors: Pingcheng Dong, Yonghao Tan, Dong Zhang, Tianwei Ni, Xuejiao Liu, Yu Liu, Peng Luo, Luhong Liang, Shih-Yang Liu, Xijie Huang, Huaiyu Zhu, Yun Pan, Fengwei An, Kwang-Ting ChengComments: 61st ACM/IEEE Design Automation Conference (DAC) 2024Subjects: Machine Learning (cs.LG); Hardware Architecture (cs.AR); Neural and Evolutionary Computing (cs.NE)
Non-linear functions are prevalent in Transformers and their lightweight variants, incurring substantial and frequently underestimated hardware costs. Previous state-of-the-art works optimize these operations by piece-wise linear approximation and store the parameters in look-up tables (LUT), but most of them require unfriendly high-precision arithmetics such as FP/INT 32 and lack consideration of integer-only INT quantization. This paper proposed a genetic LUT-Approximation algorithm namely GQA-LUT that can automatically determine the parameters with quantization awareness. The results demonstrate that GQA-LUT achieves negligible degradation on the challenging semantic segmentation task for both vanilla and linear Transformer models. Besides, proposed GQA-LUT enables the employment of INT8-based LUT-Approximation that achieves an area savings of 81.3~81.7% and a power reduction of 79.3~80.2% compared to the high-precision FP/INT 32 alternatives. Code is available at https:// github.com/PingchengDong/GQA-LUT.
- [355] arXiv:2403.19593 [pdf, other]
-
Title: Frame by Familiar Frame: Understanding Replication in Video Diffusion ModelsSubjects: Computer Vision and Pattern Recognition (cs.CV)
Building on the momentum of image generation diffusion models, there is an increasing interest in video-based diffusion models. However, video generation poses greater challenges due to its higher-dimensional nature, the scarcity of training data, and the complex spatiotemporal relationships involved. Image generation models, due to their extensive data requirements, have already strained computational resources to their limits. There have been instances of these models reproducing elements from the training samples, leading to concerns and even legal disputes over sample replication. Video diffusion models, which operate with even more constrained datasets and are tasked with generating both spatial and temporal content, may be more prone to replicating samples from their training sets. Compounding the issue, these models are often evaluated using metrics that inadvertently reward replication. In our paper, we present a systematic investigation into the phenomenon of sample replication in video diffusion models. We scrutinize various recent diffusion models for video synthesis, assessing their tendency to replicate spatial and temporal content in both unconditional and conditional generation scenarios. Our study identifies strategies that are less likely to lead to replication. Furthermore, we propose new evaluation strategies that take replication into account, offering a more accurate measure of a model's ability to generate the original content.
- [356] arXiv:2403.19595 [pdf, other]
-
Title: Situation Awareness for Driver-Centric Driving Style AdaptationComments: 14 pages, 6 figures. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessibleSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Systems and Control (eess.SY)
There is evidence that the driving style of an autonomous vehicle is important to increase the acceptance and trust of the passengers. The driving situation has been found to have a significant influence on human driving behavior. However, current driving style models only partially incorporate driving environment information, limiting the alignment between an agent and the given situation. Therefore, we propose a situation-aware driving style model based on different visual feature encoders pretrained on fleet data, as well as driving behavior predictors, which are adapted to the driving style of a specific driver. Our experiments show that the proposed method outperforms static driving styles significantly and forms plausible situation clusters. Furthermore, we found that feature encoders pretrained on our dataset lead to more precise driving behavior modeling. In contrast, feature encoders pretrained supervised and unsupervised on different data sources lead to more specific situation clusters, which can be utilized to constrain and control the driving style adaptation for specific situations. Moreover, in a real-world setting, where driving style adaptation is happening iteratively, we found the MLP-based behavior predictors achieve good performance initially but suffer from catastrophic forgetting. In contrast, behavior predictors based on situationdependent statistics can learn iteratively from continuous data streams by design. Overall, our experiments show that important information for driving behavior prediction is contained within the visual feature encoder. The dataset is publicly available at huggingface.co/datasets/jHaselberger/SADC-Situation-Awareness-for-Driver-Centric-Driving-Style-Adaptation.
- [357] arXiv:2403.19596 [pdf, other]
-
Title: LocCa: Visual Pretraining with Location-aware CaptionersAuthors: Bo Wan, Michael Tschannen, Yongqin Xian, Filip Pavetic, Ibrahim Alabdulmohsin, Xiao Wang, André Susano Pinto, Andreas Steiner, Lucas Beyer, Xiaohua ZhaiSubjects: Computer Vision and Pattern Recognition (cs.CV)
Image captioning has been shown as an effective pretraining method similar to contrastive pretraining. However, the incorporation of location-aware information into visual pretraining remains an area with limited research. In this paper, we propose a simple visual pretraining method with location-aware captioners (LocCa). LocCa uses a simple image captioner task interface, to teach a model to read out rich information, i.e. bounding box coordinates, and captions, conditioned on the image pixel input. Thanks to the multitask capabilities of an encoder-decoder architecture, we show that an image captioner can easily handle multiple tasks during pretraining. Our experiments demonstrate that LocCa outperforms standard captioners significantly on localization downstream tasks while maintaining comparable performance on holistic tasks.
- [358] arXiv:2403.19600 [pdf, other]
-
Title: Enhance Image Classification via Inter-Class Image Mixup with Diffusion ModelAuthors: Zhicai Wang, Longhui Wei, Tan Wang, Heyu Chen, Yanbin Hao, Xiang Wang, Xiangnan He, Qi TianSubjects: Computer Vision and Pattern Recognition (cs.CV)
Text-to-image (T2I) generative models have recently emerged as a powerful tool, enabling the creation of photo-realistic images and giving rise to a multitude of applications. However, the effective integration of T2I models into fundamental image classification tasks remains an open question. A prevalent strategy to bolster image classification performance is through augmenting the training set with synthetic images generated by T2I models. In this study, we scrutinize the shortcomings of both current generative and conventional data augmentation techniques. Our analysis reveals that these methods struggle to produce images that are both faithful (in terms of foreground objects) and diverse (in terms of background contexts) for domain-specific concepts. To tackle this challenge, we introduce an innovative inter-class data augmentation method known as Diff-Mix (https://github.com/Zhicaiwww/Diff-Mix), which enriches the dataset by performing image translations between classes. Our empirical results demonstrate that Diff-Mix achieves a better balance between faithfulness and diversity, leading to a marked improvement in performance across diverse image classification scenarios, including few-shot, conventional, and long-tail classifications for domain-specific datasets.
- [359] arXiv:2403.19602 [pdf, other]
-
Title: Behavior Trees in Industrial Applications: A Case Study in Underground Explosive ChargingAuthors: Mattias Hallen (1), Matteo Iovino (2), Shiva Sander-Tavallaey (2), Christian Smith (3) ((1) ABB Mining R&D, Umeå, Sweden, (2) ABB Corporate Research, Västerås, Sweden, (3) Division of Robotics, Perception and Learning, KTH - Royal Institute of Technology, Stockholm, Sweden)Subjects: Robotics (cs.RO)
In industrial applications Finite State Machines (FSMs) are often used to implement decision making policies for autonomous systems. In recent years, the use of Behavior Trees (BT) as an alternative policy representation has gained considerable attention. The benefits of using BTs over FSMs are modularity and reusability, enabling a system that is easy to extend and modify. However, there exists few published studies on successful implementations of BTs for industrial applications. This paper contributes with the lessons learned from implementing BTs in a complex industrial use case, where a robotic system assembles explosive charges and places them in holes on the rock face. The main result of the paper is that even if it is possible to model the entire system as a BT, combining BTs with FSMs can increase the readability and maintainability of the system. The benefit of such combination is remarked especially in the use case studied in this paper, where the full system cannot run autonomously but human supervision and feedback are needed.
- [360] arXiv:2403.19603 [pdf, other]
-
Title: Semantic Map-based Generation of Navigation InstructionsComments: 5 pages, 2 figures, 3 tables (13 pages, 3 figures, 5 tables including references and appendices), accepted at LREC-COLING 2024Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
We are interested in the generation of navigation instructions, either in their own right or as training material for robotic navigation task. In this paper, we propose a new approach to navigation instruction generation by framing the problem as an image captioning task using semantic maps as visual input. Conventional approaches employ a sequence of panorama images to generate navigation instructions. Semantic maps abstract away from visual details and fuse the information in multiple panorama images into a single top-down representation, thereby reducing computational complexity to process the input. We present a benchmark dataset for instruction generation using semantic maps, propose an initial model and ask human subjects to manually assess the quality of generated instructions. Our initial investigations show promise in using semantic maps for instruction generation instead of a sequence of panorama images, but there is vast scope for improvement. We release the code for data preparation and model training at https://github.com/chengzu-li/VLGen.
- [361] arXiv:2403.19607 [pdf, other]
-
Title: SAID-NeRF: Segmentation-AIDed NeRF for Depth Completion of Transparent ObjectsAuthors: Avinash Ummadisingu, Jongkeum Choi, Koki Yamane, Shimpei Masuda, Naoki Fukaya, Kuniyuki TakahashiComments: 8 pages. An accompanying video is available at this https URLSubjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
Acquiring accurate depth information of transparent objects using off-the-shelf RGB-D cameras is a well-known challenge in Computer Vision and Robotics. Depth estimation/completion methods are typically employed and trained on datasets with quality depth labels acquired from either simulation, additional sensors or specialized data collection setups and known 3d models. However, acquiring reliable depth information for datasets at scale is not straightforward, limiting training scalability and generalization. Neural Radiance Fields (NeRFs) are learning-free approaches and have demonstrated wide success in novel view synthesis and shape recovery. However, heuristics and controlled environments (lights, backgrounds, etc) are often required to accurately capture specular surfaces. In this paper, we propose using Visual Foundation Models (VFMs) for segmentation in a zero-shot, label-free way to guide the NeRF reconstruction process for these objects via the simultaneous reconstruction of semantic fields and extensions to increase robustness. Our proposed method Segmentation-AIDed NeRF (SAID-NeRF) shows significant performance on depth completion datasets for transparent objects and robotic grasping.
- [362] arXiv:2403.19611 [pdf, ps, other]
-
Title: Nearest Neighbor Classication for Classical Image UpsamplingComments: 6 pagesSubjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
Given a set of ordered pixel data in the form of an image, our goal is to perform upsampling on the data such that: the resulting resolution is improved by some factor, the final result passes the human test, having added new, believable, and realistic information and detail to the image, the time complexity for upscaling is relatively close to that of lossy upscaling implementations.
- [363] arXiv:2403.19612 [pdf, other]
-
Title: ILPO-NET: Network for the invariant recognition of arbitrary volumetric patterns in 3DSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Effective recognition of spatial patterns and learning their hierarchy is crucial in modern spatial data analysis. Volumetric data applications seek techniques ensuring invariance not only to shifts but also to pattern rotations. While traditional methods can readily achieve translational invariance, rotational invariance possesses multiple challenges and remains an active area of research. Here, we present ILPO-Net (Invariant to Local Patterns Orientation Network), a novel approach that handles arbitrarily shaped patterns with the convolutional operation inherently invariant to local spatial pattern orientations using the Wigner matrix expansions. Our architecture seamlessly integrates the new convolution operator and, when benchmarked on diverse volumetric datasets such as MedMNIST and CATH, demonstrates superior performance over the baselines with significantly reduced parameter counts - up to 1000 times fewer in the case of MedMNIST. Beyond these demonstrations, ILPO-Net's rotational invariance paves the way for other applications across multiple disciplines. Our code is publicly available at https://gricad-gitlab.univ-grenoble-alpes.fr/GruLab/ILPONet.
- [364] arXiv:2403.19615 [pdf, other]
-
Title: SA-GS: Scale-Adaptive Gaussian Splatting for Training-Free Anti-AliasingAuthors: Xiaowei Song, Jv Zheng, Shiran Yuan, Huan-ang Gao, Jingwei Zhao, Xiang He, Weihao Gu, Hao ZhaoSubjects: Computer Vision and Pattern Recognition (cs.CV)
In this paper, we present a Scale-adaptive method for Anti-aliasing Gaussian Splatting (SA-GS). While the state-of-the-art method Mip-Splatting needs modifying the training procedure of Gaussian splatting, our method functions at test-time and is training-free. Specifically, SA-GS can be applied to any pretrained Gaussian splatting field as a plugin to significantly improve the field's anti-alising performance. The core technique is to apply 2D scale-adaptive filters to each Gaussian during test time. As pointed out by Mip-Splatting, observing Gaussians at different frequencies leads to mismatches between the Gaussian scales during training and testing. Mip-Splatting resolves this issue using 3D smoothing and 2D Mip filters, which are unfortunately not aware of testing frequency. In this work, we show that a 2D scale-adaptive filter that is informed of testing frequency can effectively match the Gaussian scale, thus making the Gaussian primitive distribution remain consistent across different testing frequencies. When scale inconsistency is eliminated, sampling rates smaller than the scene frequency result in conventional jaggedness, and we propose to integrate the projected 2D Gaussian within each pixel during testing. This integration is actually a limiting case of super-sampling, which significantly improves anti-aliasing performance over vanilla Gaussian Splatting. Through extensive experiments using various settings and both bounded and unbounded scenes, we show SA-GS performs comparably with or better than Mip-Splatting. Note that super-sampling and integration are only effective when our scale-adaptive filtering is activated. Our codes, data and models are available at https://github.com/zsy1987/SA-GS.
- [365] arXiv:2403.19616 [pdf, ps, other]
-
Title: Feedback Optimization of Incentives for Distribution Grid ServicesSubjects: Systems and Control (eess.SY)
Energy prices and net power injection limitations regulate the operations in distribution grids and typically ensure that operational constraints are met. Nevertheless, unexpected or prolonged abnormal events could undermine the grid's functioning. During contingencies, customers could contribute effectively to sustaining the network by providing services. This paper proposes an incentive mechanism that promotes users' active participation by essentially altering the energy pricing rule. The incentives are modeled via a linear function whose parameters can be computed by the system operator (SO) by solving an optimization problem. Feedback-based optimization algorithms are then proposed to seek optimal incentives by leveraging measurements from the grid, even in the case when the SO does not have a full grid and customer information. Numerical simulations on a standard testbed validate the proposed approach.
- [366] arXiv:2403.19620 [pdf, other]
-
Title: Collaborative Interactive Evolution of Art in the Latent Space of Deep Generative ModelsComments: Preprint. The Version of Record of this contribution is to be published in the proceedings of the 13th International Conference on Artificial Intelligence in Music, Sound, Art and Design (EvoMUSART) 2024Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
Generative Adversarial Networks (GANs) have shown great success in generating high quality images and are thus used as one of the main approaches to generate art images. However, usually the image generation process involves sampling from the latent space of the learned art representations, allowing little control over the output. In this work, we first employ GANs that are trained to produce creative images using an architecture known as Creative Adversarial Networks (CANs), then, we employ an evolutionary approach to navigate within the latent space of the models to discover images. We use automatic aesthetic and collaborative interactive human evaluation metrics to assess the generated images. In the human interactive evaluation case, we propose a collaborative evaluation based on the assessments of several participants. Furthermore, we also experiment with an intelligent mutation operator that aims to improve the quality of the images through local search based on an aesthetic measure. We evaluate the effectiveness of this approach by comparing the results produced by the automatic and collaborative interactive evolution. The results show that the proposed approach can generate highly attractive art images when the evolution is guided by collaborative human feedback.
- [367] arXiv:2403.19622 [pdf, other]
-
Title: RH20T-P: A Primitive-Level Robotic Dataset Towards Composable Generalization AgentsAuthors: Zeren Chen, Zhelun Shi, Xiaoya Lu, Lehan He, Sucheng Qian, Hao Shu Fang, Zhenfei Yin, Wanli Ouyang, Jing Shao, Yu Qiao, Cewu Lu, Lu ShengComments: 24 pages, 12 figures, 6 tablesSubjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
The ultimate goals of robotic learning is to acquire a comprehensive and generalizable robotic system capable of performing both seen skills within the training distribution and unseen skills in novel environments. Recent progress in utilizing language models as high-level planners has demonstrated that the complexity of tasks can be reduced through decomposing them into primitive-level plans, making it possible to generalize on novel robotic tasks in a composable manner. Despite the promising future, the community is not yet adequately prepared for composable generalization agents, particularly due to the lack of primitive-level real-world robotic datasets. In this paper, we propose a primitive-level robotic dataset, namely RH20T-P, which contains about 33000 video clips covering 44 diverse and complicated robotic tasks. Each clip is manually annotated according to a set of meticulously designed primitive skills, facilitating the future development of composable generalization agents. To validate the effectiveness of RH20T-P, we also construct a potential and scalable agent based on RH20T-P, called RA-P. Equipped with two planners specialized in task decomposition and motion planning, RA-P can adapt to novel physical skills through composable generalization. Our website and videos can be found at https://sites.google.com/view/rh20t-primitive/main. Dataset and code will be made available soon.
- [368] arXiv:2403.19625 [pdf, other]
-
Title: Top-$k$ Classification and Cardinality-Aware PredictionSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
We present a detailed study of top-$k$ classification, the task of predicting the $k$ most probable classes for an input, extending beyond single-class prediction. We demonstrate that several prevalent surrogate loss functions in multi-class classification, such as comp-sum and constrained losses, are supported by $H$-consistency bounds with respect to the top-$k$ loss. These bounds guarantee consistency in relation to the hypothesis set $H$, providing stronger guarantees than Bayes-consistency due to their non-asymptotic and hypothesis-set specific nature. To address the trade-off between accuracy and cardinality $k$, we further introduce cardinality-aware loss functions through instance-dependent cost-sensitive learning. For these functions, we derive cost-sensitive comp-sum and constrained surrogate losses, establishing their $H$-consistency bounds and Bayes-consistency. Minimizing these losses leads to new cardinality-aware algorithms for top-$k$ classification. We report the results of extensive experiments on CIFAR-100, ImageNet, CIFAR-10, and SVHN datasets demonstrating the effectiveness and benefit of these algorithms.
- [369] arXiv:2403.19629 [pdf, other]
-
Title: Metric Learning from Limited Pairwise Preference ComparisonsSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
We study metric learning from preference comparisons under the ideal point model, in which a user prefers an item over another if it is closer to their latent ideal item. These items are embedded into $\mathbb{R}^d$ equipped with an unknown Mahalanobis distance shared across users. While recent work shows that it is possible to simultaneously recover the metric and ideal items given $\mathcal{O}(d)$ pairwise comparisons per user, in practice we often have a limited budget of $o(d)$ comparisons. We study whether the metric can still be recovered, even though it is known that learning individual ideal items is now no longer possible. We show that in general, $o(d)$ comparisons reveals no information about the metric, even with infinitely many users. However, when comparisons are made over items that exhibit low-dimensional structure, each user can contribute to learning the metric restricted to a low-dimensional subspace so that the metric can be jointly identified. We present a divide-and-conquer approach that achieves this, and provide theoretical recovery guarantees and empirical validation.
- [370] arXiv:2403.19631 [pdf, other]
-
Title: Retrieval-Enhanced Knowledge Editing for Multi-Hop Question Answering in Language ModelsComments: Work in progressSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Large Language Models (LLMs) have shown proficiency in question-answering tasks but often struggle to integrate real-time knowledge updates, leading to potentially outdated or inaccurate responses. This problem becomes even more challenging when dealing with multi-hop questions since they require LLMs to update and integrate multiple knowledge pieces relevant to the questions. To tackle the problem, we propose the Retrieval-Augmented model Editing (RAE) framework tailored for multi-hop question answering. RAE first retrieves edited facts and then refines the language model through in-context learning. Specifically, our retrieval approach, based on mutual information maximization, leverages the reasoning abilities of LLMs to identify chain facts that na\"ive similarity-based searches might miss. Additionally, our framework incorporates a pruning strategy to eliminate redundant information from the retrieved facts, which enhances the editing accuracy and mitigates the hallucination problem. Our framework is supported by theoretical justification for its fact retrieval efficacy. Finally, comprehensive evaluation across various LLMs validates RAE's ability in providing accurate answers with updated knowledge.
- [371] arXiv:2403.19632 [pdf, other]
-
Title: GauStudio: A Modular Framework for 3D Gaussian Splatting and BeyondComments: Code: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
We present GauStudio, a novel modular framework for modeling 3D Gaussian Splatting (3DGS) to provide standardized, plug-and-play components for users to easily customize and implement a 3DGS pipeline. Supported by our framework, we propose a hybrid Gaussian representation with foreground and skyball background models. Experiments demonstrate this representation reduces artifacts in unbounded outdoor scenes and improves novel view synthesis. Finally, we propose Gaussian Splatting Surface Reconstruction (GauS), a novel render-then-fuse approach for high-fidelity mesh reconstruction from 3DGS inputs without fine-tuning. Overall, our GauStudio framework, hybrid representation, and GauS approach enhance 3DGS modeling and rendering capabilities, enabling higher-quality novel view synthesis and surface reconstruction.
- [372] arXiv:2403.19633 [pdf, other]
-
Title: Lane-Change in Dense Traffic with Model Predictive Control and Neural NetworksAuthors: Sangjae Bae, David Isele, Alireza Nakhaei, Peng Xu, Alexandre Miranda Anon, Chiho Choi, Kikuo Fujimura, Scott MouraJournal-ref: IEEE Transactions on Control Systems Technology ( Volume: 31, Issue: 2, March 2023)Subjects: Systems and Control (eess.SY)
This paper presents an online smooth-path lane-change control framework. We focus on dense traffic where inter-vehicle space gaps are narrow, and cooperation with surrounding drivers is essential to achieve the lane-change maneuver. We propose a two-stage control framework that harmonizes Model Predictive Control (MPC) with Generative Adversarial Networks (GAN) by utilizing driving intentions to generate smooth lane-change maneuvers. To improve performance in practice, the system is augmented with an adaptive safety boundary and a Kalman Filter to mitigate sensor noise. Simulation studies are investigated in different levels of traffic density and cooperativeness of other drivers. The simulation results support the effectiveness, driving comfort, and safety of the proposed method.
- [373] arXiv:2403.19634 [pdf, ps, other]
-
Title: Asymmetric and trial-dependent modeling: the contribution of LIA to SdSV Challenge Task 2Comments: LIA system description for the Short Duration Speaker Verification (SdSv) challenge 2020 Task 2Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
The SdSv challenge Task 2 provided an opportunity to assess efficiency and robustness of modern text-independent speaker verification systems. But it also made it possible to test new approaches, capable of taking into account the main issues of this challenge (duration, language, ...). This paper describes the contributions of our laboratory to the speaker recognition field. These contributions highlight two other challenges in addition to short-duration and language: the mismatch between enrollment and test data and the one between subsets of the evaluation trial dataset. The proposed approaches experimentally show their relevance and efficiency on the SdSv evaluation, and could be of interest in many real-life applications.
- [374] arXiv:2403.19638 [pdf, other]
-
Title: Siamese Vision Transformers are Scalable Audio-visual LearnersSubjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Traditional audio-visual methods rely on independent audio and visual backbones, which is costly and not scalable. In this work, we investigate using an audio-visual siamese network (AVSiam) for efficient and scalable audio-visual pretraining. Our framework uses a single shared vision transformer backbone to process audio and visual inputs, improving its parameter efficiency, reducing the GPU memory footprint, and allowing us to scale our method to larger datasets and model sizes. We pretrain our model using a contrastive audio-visual matching objective with a multi-ratio random masking scheme, which enables our model to process larger audio-visual instance batches, helpful for contrastive learning. Unlike prior audio-visual methods, our method can robustly handle audio, visual, and audio-visual inputs with a single shared ViT backbone. Furthermore, despite using the shared backbone for both modalities, AVSiam achieves competitive or even better results than prior methods on AudioSet and VGGSound for audio-visual classification and retrieval. Our code is available at https://github.com/GenjiB/AVSiam
- [375] arXiv:2403.19639 [pdf, other]
-
Title: Linear Programming in Isabelle/HOLAuthors: Julian ParsertSubjects: Logic in Computer Science (cs.LO)
Linear programming describes the problem of optimising a linear objective function over a set of constraints on its variables. In this paper we present a solver for linear programs implemented in the proof assistant Isabelle/HOL. This allows formally proving its soundness, termination, and other properties. We base these results on a previous formalisation of the simplex algorithm which does not take optimisation problems into account. Using the weak duality theorem of linear programming we obtain an algorithm for solving linear programs. Using Isabelle's code generation mechanism we can generate an external solver for linear programs.
- [376] arXiv:2403.19645 [pdf, other]
-
Title: GANTASTIC: GAN-based Transfer of Interpretable Directions for Disentangled Image Editing in Text-to-Image Diffusion ModelsComments: Project page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
The rapid advancement in image generation models has predominantly been driven by diffusion models, which have demonstrated unparalleled success in generating high-fidelity, diverse images from textual prompts. Despite their success, diffusion models encounter substantial challenges in the domain of image editing, particularly in executing disentangled edits-changes that target specific attributes of an image while leaving irrelevant parts untouched. In contrast, Generative Adversarial Networks (GANs) have been recognized for their success in disentangled edits through their interpretable latent spaces. We introduce GANTASTIC, a novel framework that takes existing directions from pre-trained GAN models-representative of specific, controllable attributes-and transfers these directions into diffusion-based models. This novel approach not only maintains the generative quality and diversity that diffusion models are known for but also significantly enhances their capability to perform precise, targeted image edits, thereby leveraging the best of both worlds.
- [377] arXiv:2403.19646 [pdf, other]
-
Title: Change-Agent: Towards Interactive Comprehensive Change Interpretation and Analysis from Change Detection and Change CaptioningSubjects: Computer Vision and Pattern Recognition (cs.CV)
Monitoring changes in the Earth's surface is crucial for understanding natural processes and human impacts, necessitating precise and comprehensive interpretation methodologies. Remote sensing satellite imagery offers a unique perspective for monitoring these changes, leading to the emergence of remote sensing image change interpretation (RSICI) as a significant research focus. Current RSICI technology encompasses change detection and change captioning, each with its limitations in providing comprehensive interpretation. To address this, we propose an interactive Change-Agent which integrates a multi-level change interpretation (MCI) model as eyes and a large language model (LLM) as the brain. Our Change-Agent can follow user instructions to achieve comprehensive change interpretation and insightful analysis according to user instructions, such as change detection and change captioning, change object counting, change cause analysis, etc. Our proposed MCI model contains two branches of pixel-level change detection and semantic-level change captioning, in which multiple BI-temporal Iterative Interaction (BI3) layers utilize Local Perception Enhancement (LPE) and the Global Difference Fusion Attention (GDFA) modules to enhance the model's discriminative feature representation capabilities. To train the MCI model, we build the LEVIR-MCI dataset with change masks and captions of bi-temporal images. Extensive experiments demonstrate the effectiveness of the proposed change interpretation model and highlight the promising potential of our Change-Agent in facilitating comprehensive and intelligent interpretation of surface changes. We will make our dataset and codebase of the change interpretation model and Change-Agent publicly available to facilitate future research at https://github.com/Chen-Yang-Liu/Change-Agent
- [378] arXiv:2403.19647 [pdf, other]
-
Title: Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language ModelsSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
We introduce methods for discovering and applying sparse feature circuits. These are causally implicated subnetworks of human-interpretable features for explaining language model behaviors. Circuits identified in prior work consist of polysemantic and difficult-to-interpret units like attention heads or neurons, rendering them unsuitable for many downstream applications. In contrast, sparse feature circuits enable detailed understanding of unanticipated mechanisms. Because they are based on fine-grained units, sparse feature circuits are useful for downstream tasks: We introduce SHIFT, where we improve the generalization of a classifier by ablating features that a human judges to be task-irrelevant. Finally, we demonstrate an entirely unsupervised and scalable interpretability pipeline by discovering thousands of sparse feature circuits for automatically discovered model behaviors.
- [379] arXiv:2403.19648 [pdf, other]
-
Title: Human-compatible driving partners through data-regularized self-play reinforcement learningSubjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multiagent Systems (cs.MA)
A central challenge for autonomous vehicles is coordinating with humans. Therefore, incorporating realistic human agents is essential for scalable training and evaluation of autonomous driving systems in simulation. Simulation agents are typically developed by imitating large-scale, high-quality datasets of human driving. However, pure imitation learning agents empirically have high collision rates when executed in a multi-agent closed-loop setting. To build agents that are realistic and effective in closed-loop settings, we propose Human-Regularized PPO (HR-PPO), a multi-agent algorithm where agents are trained through self-play with a small penalty for deviating from a human reference policy. In contrast to prior work, our approach is RL-first and only uses 30 minutes of imperfect human demonstrations. We evaluate agents in a large set of multi-agent traffic scenes. Results show our HR-PPO agents are highly effective in achieving goals, with a success rate of 93%, an off-road rate of 3.5%, and a collision rate of 3%. At the same time, the agents drive in a human-like manner, as measured by their similarity to existing human driving logs. We also find that HR-PPO agents show considerable improvements on proxy measures for coordination with human driving, particularly in highly interactive scenarios. We open-source our code and trained agents at https://github.com/Emerge-Lab/nocturne_lab and provide demonstrations of agent behaviors at https://sites.google.com/view/driving-partners.
- [380] arXiv:2403.19649 [pdf, other]
-
Title: GraspXL: Generating Grasping Motions for Diverse Objects at ScaleComments: Project Page: this https URLSubjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
Human hands possess the dexterity to interact with diverse objects such as grasping specific parts of the objects and/or approaching them from desired directions. More importantly, humans can grasp objects of any shape without object-specific skills. Recent works synthesize grasping motions following single objectives such as a desired approach heading direction or a grasping area. Moreover, they usually rely on expensive 3D hand-object data during training and inference, which limits their capability to synthesize grasping motions for unseen objects at scale. In this paper, we unify the generation of hand-object grasping motions across multiple motion objectives, diverse object shapes and dexterous hand morphologies in a policy learning framework GraspXL. The objectives are composed of the graspable area, heading direction during approach, wrist rotation, and hand position. Without requiring any 3D hand-object interaction data, our policy trained with 58 objects can robustly synthesize diverse grasping motions for more than 500k unseen objects with a success rate of 82.2%. At the same time, the policy adheres to objectives, which enables the generation of diverse grasps per object. Moreover, we show that our framework can be deployed to different dexterous hands and work with reconstructed or generated objects. We quantitatively and qualitatively evaluate our method to show the efficacy of our approach. Our model and code will be available.
- [381] arXiv:2403.19651 [pdf, other]
-
Title: MagicLens: Self-Supervised Image Retrieval with Open-Ended InstructionsComments: Work in progressSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Information Retrieval (cs.IR); Multimedia (cs.MM)
Image retrieval, i.e., finding desired images given a reference image, inherently encompasses rich, multi-faceted search intents that are difficult to capture solely using image-based measures. Recent work leverages text instructions to allow users to more freely express their search intents. However, existing work primarily focuses on image pairs that are visually similar and/or can be characterized by a small set of pre-defined relations. The core thesis of this paper is that text instructions can enable retrieving images with richer relations beyond visual similarity. To show this, we introduce MagicLens, a series of self-supervised image retrieval models that support open-ended instructions. MagicLens is built on a key novel insight: image pairs that naturally occur on the same web pages contain a wide range of implicit relations (e.g., inside view of), and we can bring those implicit relations explicit by synthesizing instructions via large multimodal models (LMMs) and large language models (LLMs). Trained on 36.7M (query image, instruction, target image) triplets with rich semantic relations mined from the web, MagicLens achieves comparable or better results on eight benchmarks of various image retrieval tasks than prior state-of-the-art (SOTA) methods. Remarkably, it outperforms previous SOTA but with a 50X smaller model size on multiple benchmarks. Additional human analyses on a 1.4M-image unseen corpus further demonstrate the diversity of search intents supported by MagicLens.
- [382] arXiv:2403.19652 [pdf, other]
-
Title: InterDreamer: Zero-Shot Text to 3D Dynamic Human-Object InteractionComments: Project Page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Text-conditioned human motion generation has experienced significant advancements with diffusion models trained on extensive motion capture data and corresponding textual annotations. However, extending such success to 3D dynamic human-object interaction (HOI) generation faces notable challenges, primarily due to the lack of large-scale interaction data and comprehensive descriptions that align with these interactions. This paper takes the initiative and showcases the potential of generating human-object interactions without direct training on text-interaction pair data. Our key insight in achieving this is that interaction semantics and dynamics can be decoupled. Being unable to learn interaction semantics through supervised training, we instead leverage pre-trained large models, synergizing knowledge from a large language model and a text-to-motion model. While such knowledge offers high-level control over interaction semantics, it cannot grasp the intricacies of low-level interaction dynamics. To overcome this issue, we further introduce a world model designed to comprehend simple physics, modeling how human actions influence object motion. By integrating these components, our novel framework, InterDreamer, is able to generate text-aligned 3D HOI sequences in a zero-shot manner. We apply InterDreamer to the BEHAVE and CHAIRS datasets, and our comprehensive experimental analysis demonstrates its capability to generate realistic and coherent interaction sequences that seamlessly align with the text directives.
- [383] arXiv:2403.19653 [pdf, other]
-
Title: Detecting Image Attribution for Text-to-Image Diffusion Models in RGB and BeyondComments: Code available at this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
Modern text-to-image (T2I) diffusion models can generate images with remarkable realism and creativity. These advancements have sparked research in fake image detection and attribution, yet prior studies have not fully explored the practical and scientific dimensions of this task. In addition to attributing images to 12 state-of-the-art T2I generators, we provide extensive analyses on what inference stage hyperparameters and image modifications are discernible. Our experiments reveal that initialization seeds are highly detectable, along with other subtle variations in the image generation process to some extent. We further investigate what visual traces are leveraged in image attribution by perturbing high-frequency details and employing mid-level representations of image style and structure. Notably, altering high-frequency information causes only slight reductions in accuracy, and training an attributor on style representations outperforms training on RGB images. Our analyses underscore that fake images are detectable and attributable at various levels of visual granularity than previously explored.
- [384] arXiv:2403.19654 [pdf, other]
-
Title: RSMamba: Remote Sensing Image Classification with State Space ModelSubjects: Computer Vision and Pattern Recognition (cs.CV)
Remote sensing image classification forms the foundation of various understanding tasks, serving a crucial function in remote sensing image interpretation. The recent advancements of Convolutional Neural Networks (CNNs) and Transformers have markedly enhanced classification accuracy. Nonetheless, remote sensing scene classification remains a significant challenge, especially given the complexity and diversity of remote sensing scenarios and the variability of spatiotemporal resolutions. The capacity for whole-image understanding can provide more precise semantic cues for scene discrimination. In this paper, we introduce RSMamba, a novel architecture for remote sensing image classification. RSMamba is based on the State Space Model (SSM) and incorporates an efficient, hardware-aware design known as the Mamba. It integrates the advantages of both a global receptive field and linear modeling complexity. To overcome the limitation of the vanilla Mamba, which can only model causal sequences and is not adaptable to two-dimensional image data, we propose a dynamic multi-path activation mechanism to augment Mamba's capacity to model non-causal data. Notably, RSMamba maintains the inherent modeling mechanism of the vanilla Mamba, yet exhibits superior performance across multiple remote sensing image classification datasets. This indicates that RSMamba holds significant potential to function as the backbone of future visual foundation models. The code will be available at \url{https://github.com/KyanChen/RSMamba}.
- [385] arXiv:2403.19655 [pdf, other]
-
Title: GaussianCube: Structuring Gaussian Splatting using Optimal Transport for 3D Generative ModelingAuthors: Bowen Zhang, Yiji Cheng, Jiaolong Yang, Chunyu Wang, Feng Zhao, Yansong Tang, Dong Chen, Baining GuoComments: Project Page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
3D Gaussian Splatting (GS) have achieved considerable improvement over Neural Radiance Fields in terms of 3D fitting fidelity and rendering speed. However, this unstructured representation with scattered Gaussians poses a significant challenge for generative modeling. To address the problem, we introduce GaussianCube, a structured GS representation that is both powerful and efficient for generative modeling. We achieve this by first proposing a modified densification-constrained GS fitting algorithm which can yield high-quality fitting results using a fixed number of free Gaussians, and then re-arranging the Gaussians into a predefined voxel grid via Optimal Transport. The structured grid representation allows us to use standard 3D U-Net as our backbone in diffusion generative modeling without elaborate designs. Extensive experiments conducted on ShapeNet and OmniObject3D show that our model achieves state-of-the-art generation results both qualitatively and quantitatively, underscoring the potential of GaussianCube as a powerful and versatile 3D representation.
Cross-lists for Fri, 29 Mar 24
- [386] arXiv:2311.18438 (cross-list from math.OC) [pdf, other]
-
Title: Solution-Set Geometry and Regularization Path of a Nonconvexly Regularized Convex Sparse ModelComments: 53 pages, 10 figures. Submitted to journalSubjects: Optimization and Control (math.OC); Machine Learning (cs.LG); Signal Processing (eess.SP); Statistics Theory (math.ST)
The generalized minimax concave (GMC) penalty is a nonconvex sparse regularizer which can preserve the overall-convexity of the regularized least-squares problem. In this paper, we focus on a significant instance of the GMC model termed scaled GMC (sGMC), and present various notable findings on its solution-set geometry and regularization path. Our investigation indicates that while the sGMC penalty is a nonconvex extension of the LASSO penalty (i.e., the $\ell_1$-norm), the sGMC model preserves many celebrated properties of the LASSO model, hence can serve as a less biased surrogate of LASSO without losing its advantages. Specifically, for a fixed regularization parameter $\lambda$, we show that the solution-set geometry, solution uniqueness and sparseness of the sGMC model can be characterized in a similar elegant way to the LASSO model (see, e.g., Osborne et al. 2000, R. J. Tibshirani 2013). For a varying $\lambda$, we prove that the sGMC solution set is a continuous polytope-valued mapping of $\lambda$. Most noticeably, our study indicates that similar to LASSO, the minimum $\ell_2$-norm regularization path of the sGMC model is continuous and piecewise linear in $\lambda$. Based on these theoretical results, an efficient regularization path algorithm is proposed for the sGMC model, extending the well-known least angle regression (LARS) algorithm for LASSO. We prove the correctness and finite termination of the proposed algorithm under a mild assumption, and confirm its correctness-in-general-situation, efficiency, and practical utility through numerical experiments. Many results in this study also contribute to the theoretical research of LASSO.
- [387] arXiv:2403.18822 (cross-list from q-fin.TR) [pdf, ps, other]
-
Title: Enhancing Financial Data Visualization for Investment Decision-MakingComments: 5 pages, 10 figuresSubjects: Trading and Market Microstructure (q-fin.TR); Machine Learning (cs.LG)
Navigating the intricate landscape of financial markets requires adept forecasting of stock price movements. This paper delves into the potential of Long Short-Term Memory (LSTM) networks for predicting stock dynamics, with a focus on discerning nuanced rise and fall patterns. Leveraging a dataset from the New York Stock Exchange (NYSE), the study incorporates multiple features to enhance LSTM's capacity in capturing complex patterns. Visualization of key attributes, such as opening, closing, low, and high prices, aids in unraveling subtle distinctions crucial for comprehensive market understanding. The meticulously crafted LSTM input structure, inspired by established guidelines, incorporates both price and volume attributes over a 25-day time step, enabling the model to capture temporal intricacies. A comprehensive methodology, including hyperparameter tuning with Grid Search, Early Stopping, and Callback mechanisms, leads to a remarkable 53% improvement in predictive accuracy. The study concludes with insights into model robustness, contributions to financial forecasting literature, and a roadmap for real-time stock market prediction. The amalgamation of LSTM networks, strategic hyperparameter tuning, and informed feature selection presents a potent framework for advancing the accuracy of stock price predictions, contributing substantively to financial time series forecasting discourse.
- [388] arXiv:2403.18823 (cross-list from q-fin.GN) [pdf, other]
-
Title: Artificial Intelligence-based Analysis of Change in Public Finance between US and International MarketsAuthors: Kapil PandaComments: 5 pages, 2 figuresSubjects: General Finance (q-fin.GN); Computational Engineering, Finance, and Science (cs.CE); Portfolio Management (q-fin.PM)
Public finances are one of the fundamental mechanisms of economic governance that refer to the financial activities and decisions made by government entities to fund public services, projects, and operations through assets. In today's globalized landscape, even subtle shifts in one nation's public debt landscape can have significant impacts on that of international finances, necessitating a nuanced understanding of the correlations between international and national markets to help investors make informed investment decisions. Therefore, by leveraging the capabilities of artificial intelligence, this study utilizes neural networks to depict the correlations between US and International Public Finances and predict the changes in international public finances based on the changes in US public finances. With the neural network model achieving a commendable Mean Squared Error (MSE) value of 2.79, it is able to affirm a discernible correlation and also plot the effect of US market volatility on international markets. To further test the accuracy and significance of the model, an economic analysis was conducted that aimed to correlate the changes seen by the results of the model with historical stock market changes. This model demonstrates significant potential for investors to predict changes in international public finances based on signals from US markets, marking a significant stride in comprehending the intricacies of global public finances and the role of artificial intelligence in decoding its multifaceted patterns for practical forecasting.
- [389] arXiv:2403.18826 (cross-list from q-bio.QM) [pdf, ps, other]
-
Title: SAM-dPCR: Real-Time and High-throughput Absolute Quantification of Biological Samples Using Zero-Shot Segment Anything ModelAuthors: Yuanyuan Wei, Shanhang Luo, Changran Xu, Yingqi Fu, Qingyue Dong, Yi Zhang, Fuyang Qu, Guangyao Cheng, Yi-Ping Ho, Ho-Pui Ho, Wu YuanComments: 23 pages, 6 figuresSubjects: Quantitative Methods (q-bio.QM); Image and Video Processing (eess.IV); Systems and Control (eess.SY)
Digital PCR (dPCR) has revolutionized nucleic acid diagnostics by enabling absolute quantification of rare mutations and target sequences. However, current detection methodologies face challenges, as flow cytometers are costly and complex, while fluorescence imaging methods, relying on software or manual counting, are time-consuming and prone to errors. To address these limitations, we present SAM-dPCR, a novel self-supervised learning-based pipeline that enables real-time and high-throughput absolute quantification of biological samples. Leveraging the zero-shot SAM model, SAM-dPCR efficiently analyzes diverse microreactors with over 97.7% accuracy within a rapid processing time of 3.16 seconds. By utilizing commonly available lab fluorescence microscopes, SAM-dPCR facilitates the quantification of sample concentrations. The accuracy of SAM-dPCR is validated by the strong linear relationship observed between known and inferred sample concentrations. Additionally, SAM-dPCR demonstrates versatility through comprehensive verification using various samples and reactor morphologies. This accessible, cost-effective tool transcends the limitations of traditional detection methods or fully supervised AI models, marking the first application of SAM in nucleic acid detection or molecular diagnostics. By eliminating the need for annotated training data, SAM-dPCR holds great application potential for nucleic acid quantification in resource-limited settings.
- [390] arXiv:2403.18829 (cross-list from q-bio.NC) [pdf, ps, other]
-
Title: A Primer on Gibsonian InformationComments: 48 pages, 10 figures, 2 tablesSubjects: Neurons and Cognition (q-bio.NC); Human-Computer Interaction (cs.HC)
Across the scientific literature, information measurement in the nervous system is posed as a problem of information processing internal to the brain by constructs such as neuronal populations, sensory surprise, or cognitive models. Application of information theory in the nervous system has focused on measuring phenomena such as capacity and integration. Yet the ecological perspective suggests that information is a product of active perception and interactions with the environment. Here, we propose Gibsonian Information (GI), relevant to both the study of cognitive agents and single cell systems that exhibit cognitive behaviors. We propose a formal model of GI that characterizes how agents extract environmental information in a dynamic fashion. GI demonstrates how sensory information guides information processing within individual nervous system representations of motion and continuous multisensory integration, as well as representations that guide collective behaviors. GI is useful for understanding first-order sensory inputs in terms of agent interactions with naturalistic contexts and simple internal representations and can be extended to cybernetic or symbolic representations. Statistical affordances, or clustered information that is spatiotemporally dependent perceptual input, facilitate extraction of GI from the environment. As a quantitative accounting of perceptual information, GI provides a means to measure a generalized indicator of nervous system input and can be characterized by three scenarios: disjoint distributions, contingent action, and coherent movement. By applying this framework to a variety of specific contexts, including a four-channel model of multisensory embodiment, we demonstrate how GI is essential to understanding the full scope of cognitive information processing.
- [391] arXiv:2403.18831 (cross-list from q-fin.TR) [pdf, other]
-
Title: DeepTraderX: Challenging Conventional Trading Strategies with Deep Learning in Multi-Threaded Market SimulationsAuthors: Armand Mihai CismaruComments: 11 pages, 9 png figures, uses apalike.sty and SCITEPRESS.sty, to be published in the proceedings of ICAART 2024Journal-ref: In Proceedings of the 16th International Conference on Agents and Artificial Intelligence - Volume 3, ISBN 978-989-758-680-4, ISSN 2184-433X, pages 412-421 (2024)Subjects: Trading and Market Microstructure (q-fin.TR); Artificial Intelligence (cs.AI)
In this paper, we introduce DeepTraderX (DTX), a simple Deep Learning-based trader, and present results that demonstrate its performance in a multi-threaded market simulation. In a total of about 500 simulated market days, DTX has learned solely by watching the prices that other strategies produce. By doing this, it has successfully created a mapping from market data to quotes, either bid or ask orders, to place for an asset. Trained on historical Level-2 market data, i.e., the Limit Order Book (LOB) for specific tradable assets, DTX processes the market state $S$ at each timestep $T$ to determine a price $P$ for market orders. The market data used in both training and testing was generated from unique market schedules based on real historic stock market data. DTX was tested extensively against the best strategies in the literature, with its results validated by statistical analysis. Our findings underscore DTX's capability to rival, and in many instances, surpass, the performance of public-domain traders, including those that outclass human traders, emphasising the efficiency of simple models, as this is required to succeed in intricate multi-threaded simulations. This highlights the potential of leveraging "black-box" Deep Learning systems to create more efficient financial markets.
- [392] arXiv:2403.18833 (cross-list from eess.SP) [pdf, ps, other]
-
Title: A New Method for Sensorless Estimation of the Speed and Position in Brushed DC Motors Using Support Vector MachinesAuthors: Ernesto Vazquez-Sanchez, Jaime Gomez-Gil, Jose-Carlos Gamazo-Real, Jose Fernando Diez-HigueraJournal-ref: IEEE Transactions on Industrial Electronics, 2012, vol. 59, no. 3, pp. 1397-1408, ISSN 0278-0046Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG); Systems and Control (eess.SY)
Currently, for many applications, it is necessary to know the speed and position of motors. This can be achieved using mechanical sensors coupled to the motor shaft or using sensorless techniques. The sensorless techniques in brushed dc motors can be classified into two types: 1) techniques based on the dynamic brushed dc motor model and 2) techniques based on the ripple component of the current. This paper presents a new method, based on the ripple component, for speed and position estimation in brushed dc motors, using support vector machines. The proposed method only measures the current and detects the pulses in this signal. The motor speed is estimated by using the inverse distance between the detected pulses, and the position is estimated by counting all detected pulses. The ability to detect ghost pulses and to discard false pulses is the main advantage of this method over other sensorless methods. The performed tests on two fractional horsepower brushed dc motors indicate that the method works correctly in a wide range of speeds and situations, in which the speed is constant or varies dynamically.
- [393] arXiv:2403.18837 (cross-list from econ.GN) [pdf, ps, other]
-
Title: Repetitive Dilemma Games in Distribution Information Using Interplay of Droop Quota: Meek's Method in Impact of Maximum Compensation and Minimum Cost Routes in Information Role of Marginal Contribution in Two-Sided Matching MarketsAuthors: Yasuko KawahataComments: Wallace's Law, Droop Quota, Meek's Method, Marginal Contribution, Two-Sided Matching Market, Repetitive Dilemma Game, Maximum Compensation Problem, Minimum Cost Pathways, Fake News, Fact-Checking, Information Market EquilibriumSubjects: General Economics (econ.GN); Computer Science and Game Theory (cs.GT); Theoretical Economics (econ.TH); Physics and Society (physics.soc-ph)
This paper is a preliminary report of the research plan and a digest of the results and discussions. On research note explores the complex dynamics of fake news dissemination and fact-checking costs within the framework of information markets and analyzes the equilibrium between supply and demand using the concepts of droop quotas, Meek's method, and marginal contributions. By adopting a two-sided matching market perspective, we delve into scenarios in which markets are stable under the influence of fake news perceived as truth and those in which credibility prevails. Through the application of iterated dilemma game theory, we investigate the strategic choices of news providers affected by the costs associated with spreading fake news and fact-checking efforts. We further examine the maximum reward problem and strategies to minimize the cost path for spreading fake news, and consider a nuanced understanding of market segmentation into "cheap" and "premium" segments based on the nature of the information being spread. Our analysis uses mathematical models and computational processes to identify stable equilibrium points that ensure market stability in the face of deceptive information practices and provide insight into effective strategies to enhance the informational health of the market. Through this comprehensive approach, this paper aims for a more truthful and reliable perspective from which to observe information markets.
- [394] arXiv:2403.18839 (cross-list from q-fin.TR) [pdf, other]
-
Title: Long Short-Term Memory Pattern Recognition in Currency TradingAuthors: Jai PalComments: 10 Pages, 8 Figures, 4 ListingsSubjects: Trading and Market Microstructure (q-fin.TR); Machine Learning (cs.LG)
This study delves into the analysis of financial markets through the lens of Wyckoff Phases, a framework devised by Richard D. Wyckoff in the early 20th century. Focusing on the accumulation pattern within the Wyckoff framework, the research explores the phases of trading range and secondary test, elucidating their significance in understanding market dynamics and identifying potential trading opportunities. By dissecting the intricacies of these phases, the study sheds light on the creation of liquidity through market structure, offering insights into how traders can leverage this knowledge to anticipate price movements and make informed decisions. The effective detection and analysis of Wyckoff patterns necessitate robust computational models capable of processing complex market data, with spatial data best analyzed using Convolutional Neural Networks (CNNs) and temporal data through Long Short-Term Memory (LSTM) models. The creation of training data involves the generation of swing points, representing significant market movements, and filler points, introducing noise and enhancing model generalization. Activation functions, such as the sigmoid function, play a crucial role in determining the output behavior of neural network models. The results of the study demonstrate the remarkable efficacy of deep learning models in detecting Wyckoff patterns within financial data, underscoring their potential for enhancing pattern recognition and analysis in financial markets. In conclusion, the study highlights the transformative potential of AI-driven approaches in financial analysis and trading strategies, with the integration of AI technologies shaping the future of trading and investment practices.
- [395] arXiv:2403.18840 (cross-list from hep-th) [pdf, other]
-
Title: Feynman Diagrams as Computational GraphsAuthors: Pengcheng Hou, Tao Wang, Daniel Cerkoney, Xiansheng Cai, Zhiyi Li, Youjin Deng, Lei Wang, Kun ChenSubjects: High Energy Physics - Theory (hep-th); Strongly Correlated Electrons (cond-mat.str-el); Machine Learning (cs.LG); High Energy Physics - Phenomenology (hep-ph); Computational Physics (physics.comp-ph)
We propose a computational graph representation of high-order Feynman diagrams in Quantum Field Theory (QFT), applicable to any combination of spatial, temporal, momentum, and frequency domains. Utilizing the Dyson-Schwinger and parquet equations, our approach effectively organizes these diagrams into a fractal structure of tensor operations, significantly reducing computational redundancy. This approach not only streamlines the evaluation of complex diagrams but also facilitates an efficient implementation of the field-theoretic renormalization scheme, crucial for enhancing perturbative QFT calculations. Key to this advancement is the integration of Taylor-mode automatic differentiation, a key technique employed in machine learning packages to compute higher-order derivatives efficiently on computational graphs. To operationalize these concepts, we develop a Feynman diagram compiler that optimizes diagrams for various computational platforms, utilizing machine learning frameworks. Demonstrating this methodology's effectiveness, we apply it to the three-dimensional uniform electron gas problem, achieving unprecedented accuracy in calculating the quasiparticle effective mass at metal density. Our work demonstrates the synergy between QFT and machine learning, establishing a new avenue for applying AI techniques to complex quantum many-body problems.
- [396] arXiv:2403.18850 (cross-list from q-bio.NC) [pdf, other]
-
Title: Are Colors Quanta of Light for Human Vision? A Quantum Cognition Study of Visual PerceptionAuthors: Jonito Aerts ArguëllesComments: 28 pages, 4 figures. arXiv admin note: text overlap with arXiv:2208.03726Subjects: Neurons and Cognition (q-bio.NC); Artificial Intelligence (cs.AI); Quantum Physics (quant-ph)
We study the phenomenon of categorical perception within the quantum measurement process. The mechanism underlying this phenomenon consists in dilating stimuli being perceived to belong to different categories and contracting stimuli being perceived to belong to the same category. We show that, due to the naturally different way in determining the distance between pure states compared to the distance between density states, the phenomenon of categorical perception is rooted in the structure of the quantum measurement process itself. We apply our findings to the situation of visual perception of colors and argue that it is possible to consider colors as light quanta for human visual perception in a similar way as photons are light quanta for physical measurements of light frequencies. In our approach we see perception as a complex encounter between the existing physical reality, the stimuli, and the reality expected by the perciever, resulting in the experience of the percepts. We investigate what that means for the situation of two colors, which we call Light and Dark, given our findings on categorical perception within the quantum measurement process.
- [397] arXiv:2403.18853 (cross-list from physics.soc-ph) [pdf, other]
-
Title: Spatio-seasonal risk assessment of upward lightning at tall objects using meteorological reanalysis dataAuthors: Isabell Stucke, Deborah Morgenstern, Georg J. Mayr, Thorsten Simon, Achim Zeileis, Gerhard Diendorfer, Wolfgang Schulz, Hannes PichlerSubjects: Physics and Society (physics.soc-ph); Machine Learning (cs.LG); Applications (stat.AP)
This study investigates lightning at tall objects and evaluates the risk of upward lightning (UL) over the eastern Alps and its surrounding areas. While uncommon, UL poses a threat, especially to wind turbines, as the long-duration current of UL can cause significant damage. Current risk assessment methods overlook the impact of meteorological conditions, potentially underestimating UL risks. Therefore, this study employs random forests, a machine learning technique, to analyze the relationship between UL measured at Gaisberg Tower (Austria) and $35$ larger-scale meteorological variables. Of these, the larger-scale upward velocity, wind speed and direction at 10 meters and cloud physics variables contribute most information. The random forests predict the risk of UL across the study area at a 1 km$^2$ resolution. Strong near-surface winds combined with upward deflection by elevated terrain increase UL risk. The diurnal cycle of the UL risk as well as high-risk areas shift seasonally. They are concentrated north/northeast of the Alps in winter due to prevailing northerly winds, and expanding southward, impacting northern Italy in the transitional and summer months. The model performs best in winter, with the highest predicted UL risk coinciding with observed peaks in measured lightning at tall objects. The highest concentration is north of the Alps, where most wind turbines are located, leading to an increase in overall lightning activity. Comprehensive meteorological information is essential for UL risk assessment, as lightning densities are a poor indicator of lightning at tall objects.
- [398] arXiv:2403.18864 (cross-list from physics.ao-ph) [pdf, other]
-
Title: Interpretable Machine Learning for Weather and Climate Prediction: A SurveyAuthors: Ruyi Yang, Jingyu Hu, Zihao Li, Jianli Mu, Tingzhao Yu, Jiangjiang Xia, Xuhong Li, Aritra Dasgupta, Haoyi XiongComments: 26 pages, 5 figuresSubjects: Atmospheric and Oceanic Physics (physics.ao-ph); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Advanced machine learning models have recently achieved high predictive accuracy for weather and climate prediction. However, these complex models often lack inherent transparency and interpretability, acting as "black boxes" that impede user trust and hinder further model improvements. As such, interpretable machine learning techniques have become crucial in enhancing the credibility and utility of weather and climate modeling. In this survey, we review current interpretable machine learning approaches applied to meteorological predictions. We categorize methods into two major paradigms: 1) Post-hoc interpretability techniques that explain pre-trained models, such as perturbation-based, game theory based, and gradient-based attribution methods. 2) Designing inherently interpretable models from scratch using architectures like tree ensembles and explainable neural networks. We summarize how each technique provides insights into the predictions, uncovering novel meteorological relationships captured by machine learning. Lastly, we discuss research challenges around achieving deeper mechanistic interpretations aligned with physical principles, developing standardized evaluation benchmarks, integrating interpretability into iterative model development workflows, and providing explainability for large foundation models.
- [399] arXiv:2403.18873 (cross-list from eess.IV) [pdf, ps, other]
-
Title: Predicting risk of cardiovascular disease using retinal OCT imagingAuthors: Cynthia Maldonado-Garcia, Rodrigo Bonazzola, Enzo Ferrante, Thomas H Julian, Panagiotis I Sergouniotis, Nishant Ravikumara, Alejandro F FrangiComments: 18 pages for main manuscript, 7 figures, 2 pages for appendix and preprint for a journalSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
We investigated the potential of optical coherence tomography (OCT) as an additional imaging technique to predict future cardiovascular disease (CVD). We utilised a self-supervised deep learning approach based on Variational Autoencoders (VAE) to learn low-dimensional representations of high-dimensional 3D OCT images and to capture distinct characteristics of different retinal layers within the OCT image. A Random Forest (RF) classifier was subsequently trained using the learned latent features and participant demographic and clinical data, to differentiate between patients at risk of CVD events (MI or stroke) and non-CVD cases. Our predictive model, trained on multimodal data, was assessed based on its ability to correctly identify individuals likely to suffer from a CVD event(MI or stroke), within a 5-year interval after image acquisition. Our self-supervised VAE feature selection and multimodal Random Forest classifier differentiate between patients at risk of future CVD events and the control group with an AUC of 0.75, outperforming the clinically established QRISK3 score (AUC= 0.597). The choroidal layer visible in OCT images was identified as an important predictor of future CVD events using a novel approach to model explanability. Retinal OCT imaging provides a cost-effective and non-invasive alternative to predict the risk of cardiovascular disease and is readily accessible in optometry practices and hospitals.
- [400] arXiv:2403.18901 (cross-list from quant-ph) [pdf, ps, other]
-
Title: Toward Low-latency Iterative Decoding of QLDPC Codes Under Circuit-Level NoiseComments: 8+4 pages, 7 figures. The source code for the simulations in this work is available online at this http URLSubjects: Quantum Physics (quant-ph); Information Theory (cs.IT)
We introduce a sliding window decoder based on belief propagation (BP) with guided decimation for the purposes of decoding quantum low-density parity-check codes in the presence of circuit-level noise. Windowed decoding keeps the decoding complexity reasonable when, as is typically the case, repeated rounds of syndrome extraction are required to decode. Within each window, we employ several rounds of BP with decimation of the variable node that we expect to be the most likely to flip in each round, Furthermore, we employ ensemble decoding to keep both decimation options (guesses) open in a small number of chosen rounds. We term the resulting decoder BP with guided decimation guessing (GDG). Applied to bivariate bicycle codes, GDG achieves a similar logical error rate as BP with an additional OSD post-processing stage (BP+OSD) and combination-sweep of order 10. For a window size of three syndrome cycles, a multi-threaded CPU implementation of GDG achieves a worst-case decoding latency of 3ms per window for the [[144,12,12]] code.
- [401] arXiv:2403.18963 (cross-list from quant-ph) [pdf, other]
-
Title: Using Quantum Computing to Infer Dynamic Behaviors of Biological and Artificial Neural NetworksAuthors: Gabriel A. SilvaSubjects: Quantum Physics (quant-ph); Artificial Intelligence (cs.AI); Neurons and Cognition (q-bio.NC)
The exploration of new problem classes for quantum computation is an active area of research. An essentially completely unexplored topic is the use of quantum algorithms and computing to explore and ask questions \textit{about} the functional dynamics of neural networks. This is a component of the still-nascent topic of applying quantum computing to the modeling and simulations of biological and artificial neural networks. In this work, we show how a carefully constructed set of conditions can use two foundational quantum algorithms, Grover and Deutsch-Josza, in such a way that the output measurements admit an interpretation that guarantees we can infer if a simple representation of a neural network (which applies to both biological and artificial networks) after some period of time has the potential to continue sustaining dynamic activity. Or whether the dynamics are guaranteed to stop either through 'epileptic' dynamics or quiescence.
- [402] arXiv:2403.18967 (cross-list from math.OC) [pdf, ps, other]
-
Title: Stabilization of linear Port-Hamiltonian Descriptor Systems via Output FeedbackSubjects: Optimization and Control (math.OC); Numerical Analysis (math.NA)
The structure preserving stabilization of (possibly non-regular) linear port-Hamiltonian descriptor (pHDAE) systems by output feedback is discussed. While for general descriptor systems the characterization when there exist output feedbacks that lead to an asymptotically stable closed loop system is a very hard and partially open problem, for systems in pHDAE representation this problem can be completely solved. Necessary and sufficient conditions are presented that guarantee that there exist a proportional output feedback such that the resulting closed-loop port-Hamiltonian descriptor system is (robustly) asymptotically stable. For this it is also necessary that the output feedback also makes the problem regular and of index at most one. A complete characterization when this is possible is presented as well.
- [403] arXiv:2403.18980 (cross-list from math.CO) [pdf, other]
-
Title: A census of graph-drawing algorithms based on generalized transversal structuresSubjects: Combinatorics (math.CO); Computational Geometry (cs.CG)
We define graph drawing algorithms which simultaneously generalize several classical ones. More precisely, we consider the following algorithms:
(a) Fusy's algorithm for the straight-line grid drawing of planar triangulations, based on transversal structures,
(b) Barri\`ere and Huemmer's algorithm for the straight-line grid drawing of planar quadrangulations, based on separating decompositions,
(c) He's algorithm for the orthogonal drawing of 3-valent planar maps, based on transversal structures,
(d) Bernardi \& Fusy 's algorithm for the orthogonal drawing of 4-valent planar maps, based on 2-orientations.
We present an algorithm generalizing (a) and (b) which produces a straight line grid drawing for planar maps with faces of degree at most 4, and we present an algorithm generalizing (c) and (d) which produces an orthogonal drawing for planar maps with vertices of degree at most 4. Our two algorithms are based on a class of combinatorial structures called grand-Schnyder woods. - [404] arXiv:2403.18994 (cross-list from stat.ML) [pdf, other]
-
Title: Causal-StoNet: Causal Inference for High-Dimensional Complex DataSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
With the advancement of data science, the collection of increasingly complex datasets has become commonplace. In such datasets, the data dimension can be extremely high, and the underlying data generation process can be unknown and highly nonlinear. As a result, the task of making causal inference with high-dimensional complex data has become a fundamental problem in many disciplines, such as medicine, econometrics, and social science. However, the existing methods for causal inference are frequently developed under the assumption that the data dimension is low or that the underlying data generation process is linear or approximately linear. To address these challenges, this paper proposes a novel causal inference approach for dealing with high-dimensional complex data. The proposed approach is based on deep learning techniques, including sparse deep learning theory and stochastic neural networks, that have been developed in recent literature. By using these techniques, the proposed approach can address both the high dimensionality and unknown data generation process in a coherent way. Furthermore, the proposed approach can also be used when missing values are present in the datasets. Extensive numerical studies indicate that the proposed approach outperforms existing ones.
- [405] arXiv:2403.19003 (cross-list from math.DS) [pdf, other]
-
Title: Finding Birkhoff Averages via Adaptive FilteringSubjects: Dynamical Systems (math.DS); Numerical Analysis (math.NA)
In many applications, one is interested in classifying trajectories of Hamiltonian systems as invariant tori, islands, or chaos. The convergence rate of ergodic Birkhoff averages can be used to categorize these regions, but many iterations of the return map are needed to implement this directly. Recently, it has been shown that a weighted Birkhoff average can be used to accelerate the convergence, resulting in a useful method for categorizing trajectories.
In this paper, we show how a modified version the reduced rank extrapolation method (named Birkhoff RRE) can also be used to find optimal weights for the weighted average with a single linear least-squares solve.Using these, we classify trajectories with fewer iterations of the map than the standard weighted Birkhoff average. Furthermore, for the islands and invariant circles, a subsequent eigenvalue problem gives the number of islands and the rotation number. Using these numbers, we find Fourier parameterizations of invariant circles and islands. We show examples of Birkhoff RRE on the standard map and on magnetic field line dynamics. - [406] arXiv:2403.19011 (cross-list from q-bio.QM) [pdf, other]
-
Title: Sequential Inference of Hospitalization ElectronicHealth Records Using Probabilistic ModelsSubjects: Quantitative Methods (q-bio.QM); Machine Learning (cs.LG)
In the dynamic hospital setting, decision support can be a valuable tool for improving patient outcomes. Data-driven inference of future outcomes is challenging in this dynamic setting, where long sequences such as laboratory tests and medications are updated frequently. This is due in part to heterogeneity of data types and mixed-sequence types contained in variable length sequences. In this work we design a probabilistic unsupervised model for multiple arbitrary-length sequences contained in hospitalization Electronic Health Record (EHR) data. The model uses a latent variable structure and captures complex relationships between medications, diagnoses, laboratory tests, neurological assessments, and medications. It can be trained on original data, without requiring any lossy transformations or time binning. Inference algorithms are derived that use partial data to infer properties of the complete sequences, including their length and presence of specific values. We train this model on data from subjects receiving medical care in the Kaiser Permanente Northern California integrated healthcare delivery system. The results are evaluated against held-out data for predicting the length of sequences and presence of Intensive Care Unit (ICU) in hospitalization bed sequences. Our method outperforms a baseline approach, showing that in these experiments the trained model captures information in the sequences that is informative of their future values.
- [407] arXiv:2403.19055 (cross-list from math.SP) [pdf, other]
-
Title: Computing the spectrum and pseudospectrum of infinite-volume operators from local patchesSubjects: Spectral Theory (math.SP); Mathematical Physics (math-ph); Numerical Analysis (math.NA)
We show how the spectrum of normal discrete short-range infinite-volume operators can be approximated with two-sided error control using only data from finite-sized local patches. As a corollary, we prove the computability of the spectrum of such infinite-volume operators with the additional property of finite local complexity and provide an explicit algorithm. Such operators appear in many applications, e.g. as discretizations of differential operators on unbounded domains or as so-called tight-binding Hamiltonians in solid state physics. For a large class of such operators, our result allows for the first time to establish computationally also the absence of spectrum, i.e. the existence and the size of spectral gaps. We extend our results to the $\varepsilon$-pseudospectrum of non-normal operators, proving that also the pseudospectrum of such operators is computable.
- [408] arXiv:2403.19070 (cross-list from math.AP) [pdf, ps, other]
-
Title: Stability of solutions of the porous medium equation with growth with respect to the diffusion exponentSubjects: Analysis of PDEs (math.AP); Numerical Analysis (math.NA); Quantitative Methods (q-bio.QM)
We consider a macroscopic model for the growth of living tissues incorporating pressure-driven dispersal and pressure-modulated proliferation. Assuming a power-law relation between the mechanical pressure and the cell density, the model can be expressed as the porous medium equation with a growth term. We prove H\"older continuous dependence of the solutions of the model on the diffusion exponent. The main difficulty lies in the degeneracy of the porous medium equations at vacuum. To deal with this issue, we first regularise the equation by shifting the initial data away from zero and then optimise the stability estimate derived in the regular setting.
- [409] arXiv:2403.19088 (cross-list from math.OC) [pdf, other]
-
Title: A Framework for Time-Varying Optimization via Derivative EstimationSubjects: Optimization and Control (math.OC); Systems and Control (eess.SY)
Optimization algorithms have a rich and fundamental relationship with ordinary differential equations given by its continuous-time limit. When the cost function varies with time -- typically in response to a dynamically changing environment -- online optimization becomes a continuous-time trajectory tracking problem. To accommodate these time variations, one typically requires some inherent knowledge about their nature such as a time derivative.
In this paper, we propose a novel construction and analysis of a continuous-time derivative estimation scheme based on "dirty-derivatives", and show how it naturally interfaces with continuous-time optimization algorithms using the language of ISS (Input-to-State Stability). More generally, we show how a simple Lyapunov redesign technique leads to provable suboptimality guarantees when composing this estimator with any well-behaved optimization algorithm for time-varying costs. - [410] arXiv:2403.19097 (cross-list from math.AT) [pdf, other]
-
Title: Topological Optimal Transport for Geometric Cycle MatchingComments: Comments are welcome!Subjects: Algebraic Topology (math.AT); Computational Geometry (cs.CG); Metric Geometry (math.MG)
Topological data analysis is a powerful tool for describing topological signatures in real world data. An important challenge in topological data analysis is matching significant topological signals across distinct systems. In geometry and probability theory, optimal transport formalises notions of distance and matchings between distributions and structured objects. We propose to combine these approaches, constructing a mathematical framework for optimal transport-based matchings of topological features. Building upon recent advances in the domains of persistent homology and optimal transport for hypergraphs, we develop a transport-based methodology for topological data processing. We define measure topological networks, which integrate both geometric and topological information about a system, introduce a distance on the space of these objects, and study its metric properties, showing that it induces a geodesic metric space of non-negative curvature. The resulting Topological Optimal Transport (TpOT) framework provides a transport model on point clouds that minimises topological distortion while simultaneously yielding a geometrically informed matching between persistent homology cycles.
- [411] arXiv:2403.19099 (cross-list from quant-ph) [pdf, other]
-
Title: Optimizing Quantum Convolutional Neural Network Architectures for Arbitrary Data DimensionAuthors: Changwon Lee, Israel F. Araujo, Dongha Kim, Junghan Lee, Siheon Park, Ju-Young Ryu, Daniel K. ParkComments: 17 pages, 7 figuresSubjects: Quantum Physics (quant-ph); Machine Learning (cs.LG)
Quantum convolutional neural networks (QCNNs) represent a promising approach in quantum machine learning, paving new directions for both quantum and classical data analysis. This approach is particularly attractive due to the absence of the barren plateau problem, a fundamental challenge in training quantum neural networks (QNNs), and its feasibility. However, a limitation arises when applying QCNNs to classical data. The network architecture is most natural when the number of input qubits is a power of two, as this number is reduced by a factor of two in each pooling layer. The number of input qubits determines the dimensions (i.e. the number of features) of the input data that can be processed, restricting the applicability of QCNN algorithms to real-world data. To address this issue, we propose a QCNN architecture capable of handling arbitrary input data dimensions while optimizing the allocation of quantum resources such as ancillary qubits and quantum gates. This optimization is not only important for minimizing computational resources, but also essential in noisy intermediate-scale quantum (NISQ) computing, as the size of the quantum circuits that can be executed reliably is limited. Through numerical simulations, we benchmarked the classification performance of various QCNN architectures when handling arbitrary input data dimensions on the MNIST and Breast Cancer datasets. The results validate that the proposed QCNN architecture achieves excellent classification performance while utilizing a minimal resource overhead, providing an optimal solution when reliable quantum computation is constrained by noise and imperfections.
- [412] arXiv:2403.19127 (cross-list from eess.SP) [pdf, ps, other]
-
Title: Decentralizing Coherent Joint Transmission Precoding via Fast ADMM with Deterministic EquivalentsSubjects: Signal Processing (eess.SP); Information Theory (cs.IT)
Inter-cell interference (ICI) suppression is critical for multi-cell multi-user networks. In this paper, we investigate advanced precoding techniques for coordinated multi-point (CoMP) with downlink coherent joint transmission, an effective approach for ICI suppression. Different from the centralized precoding schemes that require frequent information exchange among the cooperating base stations, we propose a decentralized scheme to minimize the total power consumption. In particular, based on the covariance matrices of global channel state information, we estimate the ICI bounds via the deterministic equivalents and decouple the original design problem into sub-problems, each of which can be solved in a decentralized manner. To solve the sub-problems at each base station, we develop a low-complexity solver based on the alternating direction method of multipliers (ADMM) in conjunction with the convex-concave procedure (CCCP). Simulation results demonstrate the effectiveness of our proposed decentralized precoding scheme, which achieves performance similar to the optimal centralized precoding scheme. Besides, our proposed ADMM solver can substantially reduce the computational complexity, while maintaining outstanding performance.
- [413] arXiv:2403.19175 (cross-list from cond-mat.stat-mech) [pdf, other]
-
Title: Toward Practical Benchmarks of Ising Machines: A Case Study on the Quadratic Knapsack ProblemComments: 25 pagesSubjects: Statistical Mechanics (cond-mat.stat-mech); Emerging Technologies (cs.ET)
Combinatorial optimization has wide applications from industry to natural science. Ising machines bring an emerging computing paradigm for efficiently solving a combinatorial optimization problem by searching a ground state of a given Ising model. Current cutting-edge Ising machines achieve fast sampling of near-optimal solutions of the max-cut problem. However, for problems with additional constraint conditions, their advantages have been hardly shown due to difficulties in handling the constraints. The performance of Ising machines on such problems heavily depends on encoding methods of constraints into penalties, but the optimal choice is non-trivial. In this work, we focus on benchmarks of Ising machines on the quadratic knapsack problem (QKP). To bring out their practical performance, we propose to exploit the problem structure upon using Ising machines. Specifically, we apply fast two-stage post-processing to the outputs of Ising machines, which makes handling the constraint easier. Simulation on medium-sized test instances shows that the proposed method substantially improves the solving performance of Ising machines and the improvement is robust to a choice of the encoding methods. We evaluate an Ising machine called Amplify Annealing Engine with the proposed method and found that it achieves comparable results with existing heuristics.
- [414] arXiv:2403.19180 (cross-list from eess.SP) [pdf, ps, other]
-
Title: A Multi-hop Secure UWOC assisted Local Area Network for UIoT and Underwater MonitoringSubjects: Signal Processing (eess.SP); Emerging Technologies (cs.ET)
Underwater environment is substantially less explored territory as compared to earth surface due to lack of robust underwater communication infrastructure. For Internet of Underwater things connectivity, underwater wireless optical communication can play a vital role, compared to conventional radio frequency communication, due to longer range, high data rate, low latency, and unregulated bandwidth. This study proposes underwater wireless optical communication driven local area network UWOC LAN, comprised of multiple network nodes with optical transceivers. Moreover, the temperature sensor data is encapsulated with individual authentication identity to enhance the security of the framework at the user end. The proposed system is evaluated in a specially designed water tank of 4 meters. The proposed system evaluation analysis shows that the system can transmit underwater temperature data reliably in real time. The proposed secure UWOC LAN is tested within a communication range of 16 meters by incorporating multi hop connectivity to monitor the underwater environment.
- [415] arXiv:2403.19203 (cross-list from eess.IV) [pdf, other]
-
Title: Single-Shared Network with Prior-Inspired Loss for Parameter-Efficient Multi-Modal Imaging Skin Lesion ClassificationComments: This paper have submitted to Journal for reviewSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
In this study, we introduce a multi-modal approach that efficiently integrates multi-scale clinical and dermoscopy features within a single network, thereby substantially reducing model parameters. The proposed method includes three novel fusion schemes.
Firstly, unlike current methods that usually employ two individual models for for clinical and dermoscopy modalities, we verified that multimodal feature can be learned by sharing the parameters of encoder while leaving the individual modal-specific classifiers.
Secondly, the shared cross-attention module can replace the individual one to efficiently interact between two modalities at multiple layers.
Thirdly, different from current methods that equally optimize dermoscopy and clinical branches, inspired by prior knowledge that dermoscopy images play a more significant role than clinical images, we propose a novel biased loss. This loss guides the single-shared network to prioritize dermoscopy information over clinical information, implicitly learning a better joint feature representation for the modal-specific task.
Extensive experiments on a well-recognized Seven-Point Checklist (SPC) dataset and a collected dataset demonstrate the effectiveness of our method on both CNN and Transformer structures. Furthermore, our method exhibits superiority in both accuracy and model parameters compared to currently advanced methods. - [416] arXiv:2403.19244 (cross-list from physics.chem-ph) [pdf, ps, other]
-
Title: The role of chemo-mechanical modelling in the development of battery technology -- a perspectiveSubjects: Chemical Physics (physics.chem-ph); Materials Science (cond-mat.mtrl-sci); Computational Engineering, Finance, and Science (cs.CE); Applied Physics (physics.app-ph)
In the race to reduce global CO2 emissions and achieve net-zero, chemomechanics must play a critical role in the technological development of current and next-generation batteries to improve their energy storage capabilities and their lifetime. Many degradation processes arise through mechanics via the development of diffusion-induced stress and volumetric strains within the various constituent materials in a battery. From particle cracking in lithium-ion batteries to lithium dendrite-based fracture of solid electrolytes in solid-state batteries, it is clear that significant barriers exist in the development of these energy storage systems, where chemomechanics plays a central part. To accelerate technological and scientific advances in this area, multi-scale and highly coupled multiphysics modelling must be carried out that includes mechanics-based phenomena. In this perspective article, we provide an introduction to chemomechanical modelling, the various physical problems that it addresses, and the issues that need to be resolved in order to expand its use within the field of battery technology.
- [417] arXiv:2403.19251 (cross-list from quant-ph) [pdf, other]
-
Title: Arbitrary State Transition of Open Qubit System Based on Switching ControlComments: 12 pages, 7 figuresSubjects: Quantum Physics (quant-ph); Systems and Control (eess.SY)
We present a switching control strategy based on Lyapunov control for arbitrary state transitions in open qubit systems. With coherent vector representation, we propose a switching control strategy, which can prevent the state of the qubit from entering invariant sets and singular value sets, effectively driving the system ultimately to a sufficiently small neighborhood of target states. In comparison to existing works, this control strategy relaxes the strict constraints on system models imposed by special target states. Furthermore, we identify conditions under which the open qubit system achieves finite-time stability (FTS) and finite-time contractive stability (FTCS), respectively. This represents a critical improvement in quantum state transitions, especially considering the asymptotic stability of arbitrary target states is unattainable in open quantum systems. The effectiveness of our proposed method is convincingly demonstrated through its application in a qubit system affected by various types of decoherence, including amplitude, dephasing and polarization decoherence.
- [418] arXiv:2403.19262 (cross-list from eess.SP) [pdf, other]
-
Title: Removing the need for ground truth UWB data collection: self-supervised ranging error correction using deep reinforcement learningComments: 11 pages, 8 figures and 4 tablesSubjects: Signal Processing (eess.SP); Machine Learning (cs.LG)
Indoor positioning using UWB technology has gained interest due to its centimeter-level accuracy potential. However, multipath effects and non-line-of-sight conditions cause ranging errors between anchors and tags. Existing approaches for mitigating these ranging errors rely on collecting large labeled datasets, making them impractical for real-world deployments. This paper proposes a novel self-supervised deep reinforcement learning approach that does not require labeled ground truth data. A reinforcement learning agent uses the channel impulse response as a state and predicts corrections to minimize the error between corrected and estimated ranges. The agent learns, self-supervised, by iteratively improving corrections that are generated by combining the predictability of trajectories with filtering and smoothening. Experiments on real-world UWB measurements demonstrate comparable performance to state-of-the-art supervised methods, overcoming data dependency and lack of generalizability limitations. This makes self-supervised deep reinforcement learning a promising solution for practical and scalable UWB-ranging error correction.
- [419] arXiv:2403.19274 (cross-list from math.DS) [pdf, other]
-
Title: Extracting coherent sets in aperiodically driven flows from generators of Mather semigroupsSubjects: Dynamical Systems (math.DS); Numerical Analysis (math.NA)
Coherent sets are time-dependent regions in the physical space of nonautonomous flows that exhibit little mixing with their neighborhoods, robustly under small random perturbations of the flow. They thus characterize the global long-term transport behavior of the system. We propose a framework to extract such time-dependent families of coherent sets for nonautonomous systems with an ergodic driving dynamics and (small) Brownian noise in physical space. Our construction involves the assembly and analysis of an operator on functions over the augmented space of the associated skew product that, for each fixed state of the driving, propagates distributions on the corresponding physical-space fibre according to the dynamics. This time-dependent operator has the structure of a semigroup (it is called the Mather semigroup), and we show that a spectral analysis of its generator allows for a trajectory-free computation of coherent families, simultaneously for all states of the driving. Additionally, for quasi-periodically driven torus flows, we propose a tailored Fourier discretization scheme for this generator and demonstrate our method by means of three examples of two-dimensional flows.
- [420] arXiv:2403.19300 (cross-list from math.PR) [pdf, other]
-
Title: Random Multi-Type Spanning Forests for Synchronization on Sparse GraphsSubjects: Probability (math.PR); Data Structures and Algorithms (cs.DS); Statistics Theory (math.ST)
Random diffusions are a popular tool in Monte-Carlo estimations, with well established algorithms such as Walk-on-Spheres (WoS) going back several decades. In this work, we introduce diffusion estimators for the problems of angular synchronization and smoothing on graphs, in the presence of a rotation associated to each edge. Unlike classical WoS algorithms, these estimators allow for global estimations by propagating along the branches of multi-type spanning forests, and we show that they can outperform standard numerical-linear-algebra solvers in challenging instances, depending on the topology and density of the graph.
- [421] arXiv:2403.19324 (cross-list from math.OC) [pdf, other]
-
Title: Rapid nonlinear convex guidance via overparameterized monomial coordinates and fundamental solution expansionsComments: 35 pages, 14 figuresSubjects: Optimization and Control (math.OC); Systems and Control (eess.SY)
This paper introduces a framework by which the nonlinear trajectory optimization problem is posed as a path-planning problem in a space liberated of dynamics. In this space, general state constraints for continuous and impulsive control problems are encoded as linear constraints on the native overparameterized variables. This framework is enabled by nonlinear expansion in the vicinity of a reference in terms of fundamental solutions and a minimal nonlinear basis of mixed monomials in problem initial conditions. The former can be computed using state transition tensors, differential algebra, or analytic approaches, and the latter is computed analytically. Nonlinear guidance schemes are proposed taking advantage of this framework, including a successive convex programming scheme for delta-V minimizing trajectory optimization. This work enables a stable and highly rapid nonlinear guidance implementation without the need for collocation or real-time integration.
- [422] arXiv:2403.19379 (cross-list from eess.SP) [pdf, other]
-
Title: Optimal Pilot Design for OTFS in Linear Time-Varying ChannelsComments: 13 pages, 8 figures, submitted to IEEE Transactions on Wireless CommunicationsSubjects: Signal Processing (eess.SP); Information Theory (cs.IT)
This paper investigates the positioning of the pilot symbols, as well as the power distribution between the pilot and the communication symbols in the OTFS modulation scheme. We analyze the pilot placements that minimize the mean squared error (MSE) in estimating the channel taps. In addition, we optimize the average channel capacity by adjusting the power balance. We show that this leads to a significant increase in average capacity. The results provide valuable guidance for designing the OTFS parameters to achieve maximum capacity. Numerical simulations are performed to validate the findings.
- [423] arXiv:2403.19381 (cross-list from stat.ML) [pdf, other]
-
Title: On Uncertainty Quantification for Near-Bayes Optimal AlgorithmsSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
Bayesian modelling allows for the quantification of predictive uncertainty which is crucial in safety-critical applications. Yet for many machine learning (ML) algorithms, it is difficult to construct or implement their Bayesian counterpart. In this work we present a promising approach to address this challenge, based on the hypothesis that commonly used ML algorithms are efficient across a wide variety of tasks and may thus be near Bayes-optimal w.r.t. an unknown task distribution. We prove that it is possible to recover the Bayesian posterior defined by the task distribution, which is unknown but optimal in this setting, by building a martingale posterior using the algorithm. We further propose a practical uncertainty quantification method that apply to general ML algorithms. Experiments based on a variety of non-NN and NN algorithms demonstrate the efficacy of our method.
- [424] arXiv:2403.19388 (cross-list from math.CO) [pdf, ps, other]
-
Title: Cosystolic Expansion of Sheaves on Posets with Applications to Good 2-Query LTCs and Lifted CodesComments: This subsumes sections 1-8 of arXiv:2208.01778. Preliminary version. Final version will appear soonSubjects: Combinatorics (math.CO); Computational Complexity (cs.CC); Information Theory (cs.IT)
We study sheaves on posets, showing that cosystolic expansion of such sheaves can be derived from local expansion conditions of the sheaf and the poset (typically a high dimensional expander). When the poset at hand is a cell complex, a sheaf on it may be thought of as generalizing coefficient groups used for defining homology and cohomology, by letting the coefficient group vary along the cell complex. Previous works established local criteria for cosystolic expansion only for simplicial complexes and with respect to constant coefficients. Cosystolic expansion of sheaves is related to property testing. We use this relation and our local criterion for cosystolic expansion to give two applications to locally testable codes (LTCs).
First, we show the existence of good $2$-query LTCs. These codes are related to the recent good $q$-query LTCs of Dinur et. al and Panteleev-Kalachev, being the formers' so-called line codes, but we get them from a new, more illuminating perspective, namely, by realizing them as cocycle codes of sheaves over posets. We then derive their good properties directly from our criterion for cosystolic expansion.
Second, we give a local criterion for a a lifted code (with some auxiliary structure) to be locally testable. This improves on a previous work of Dikstein et. al, where it was shown that one can obtain local testability of lifted codes from a mixture of local and global conditions. - [425] arXiv:2403.19415 (cross-list from eess.IV) [pdf, ps, other]
-
Title: Brain-Shift: Unsupervised Pseudo-Healthy Brain Synthesis for Novel Biomarker Extraction in Chronic Subdural HematomaSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
Chronic subdural hematoma (cSDH) is a common neurological condition characterized by the accumulation of blood between the brain and the dura mater. This accumulation of blood can exert pressure on the brain, potentially leading to fatal outcomes. Treatment options for cSDH are limited to invasive surgery or non-invasive management. Traditionally, the midline shift, hand-measured by experts from an ideal sagittal plane, and the hematoma volume have been the primary metrics for quantifying and analyzing cSDH. However, these approaches do not quantify the local 3D brain deformation caused by cSDH. We propose a novel method using anatomy-aware unsupervised diffeomorphic pseudo-healthy synthesis to generate brain deformation fields. The deformation fields derived from this process are utilized to extract biomarkers that quantify the shift in the brain due to cSDH. We use CT scans of 121 patients for training and validation of our method and find that our metrics allow the identification of patients who require surgery. Our results indicate that automatically obtained brain deformation fields might contain prognostic value for personalized cSDH treatment. Our implementation is available on: github.com/Barisimre/brain-morphing
- [426] arXiv:2403.19425 (cross-list from eess.IV) [pdf, ps, other]
-
Title: A Robust Ensemble Algorithm for Ischemic Stroke Lesion Segmentation: Generalizability and Clinical Utility Beyond the ISLES ChallengeAuthors: Ezequiel de la Rosa, Mauricio Reyes, Sook-Lei Liew, Alexandre Hutton, Roland Wiest, Johannes Kaesmacher, Uta Hanning, Arsany Hakim, Richard Zubal, Waldo Valenzuela, David Robben, Diana M. Sima, Vincenzo Anania, Arne Brys, James A. Meakin, Anne Mickan, Gabriel Broocks, Christian Heitkamp, Shengbo Gao, Kongming Liang, Ziji Zhang, Md Mahfuzur Rahman Siddiquee, Andriy Myronenko, Pooya Ashtari, Sabine Van Huffel, Hyun-su Jeong, Chi-ho Yoon, Chulhong Kim, Jiayu Huo, Sebastien Ourselin, Rachel Sparks, Albert Clèrigues, Arnau Oliver, Xavier Lladó, Liam Chalcroft, Ioannis Pappas, Jeroen Bertels, Ewout Heylen, Juliette Moreau, Nima Hatami, Carole Frindel, Abdul Qayyum, Moona Mazher, Domenec Puig, Shao-Chieh Lin, Chun-Jung Juan, Tianxi Hu, Lyndon Boone, Maged Goubran, Yi-Jui Liu, Susanne Wegener, et al. (7 additional authors not shown)Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
Diffusion-weighted MRI (DWI) is essential for stroke diagnosis, treatment decisions, and prognosis. However, image and disease variability hinder the development of generalizable AI algorithms with clinical value. We address this gap by presenting a novel ensemble algorithm derived from the 2022 Ischemic Stroke Lesion Segmentation (ISLES) challenge. ISLES'22 provided 400 patient scans with ischemic stroke from various medical centers, facilitating the development of a wide range of cutting-edge segmentation algorithms by the research community. Through collaboration with leading teams, we combined top-performing algorithms into an ensemble model that overcomes the limitations of individual solutions. Our ensemble model achieved superior ischemic lesion detection and segmentation accuracy on our internal test set compared to individual algorithms. This accuracy generalized well across diverse image and disease variables. Furthermore, the model excelled in extracting clinical biomarkers. Notably, in a Turing-like test, neuroradiologists consistently preferred the algorithm's segmentations over manual expert efforts, highlighting increased comprehensiveness and precision. Validation using a real-world external dataset (N=1686) confirmed the model's generalizability. The algorithm's outputs also demonstrated strong correlations with clinical scores (admission NIHSS and 90-day mRS) on par with or exceeding expert-derived results, underlining its clinical relevance. This study offers two key findings. First, we present an ensemble algorithm (https://github.com/Tabrisrei/ISLES22_Ensemble) that detects and segments ischemic stroke lesions on DWI across diverse scenarios on par with expert (neuro)radiologists. Second, we show the potential for biomedical challenge outputs to extend beyond the challenge's initial objectives, demonstrating their real-world clinical applicability.
- [427] arXiv:2403.19426 (cross-list from physics.geo-ph) [pdf, other]
-
Title: A multi-step calibration strategy for reliable parameter determination of salt rock mechanics constitutive modelsAuthors: Hermínio T. Honório, Maartje Houben, Kevin Bisdom, Arjan van der Linden, Karin de Borst, Lambertus J. Sluys, Hadi HajibeygiComments: 27 pages, 23 figuresSubjects: Geophysics (physics.geo-ph); Numerical Analysis (math.NA)
Renewable hydrogen storage in salt caverns requires fast injection and production rates to cope with the imbalance between energy production and consumption. Such operational conditions raise concerns about the mechanical stability of salt caverns. Choosing an appropriate constitutive model for salt mechanics is an important step in investigating this issue, and many constitutive models with several parameters have been presented in the literature. However, a robust calibration strategy to reliably determine which model and which parameter set represent the given rock, based on stress-strain data, remains an unsolved challenge. For the first time in the community, we present a multi-step strategy to determine a single parameter set based on many deformation datasets for salt rocks. Towards this end, we first develop a comprehensive constitutive model able to capture all relevant nonlinear deformation physics of transient, reverse, and steady-state creep. The determination of the single set of representative material parameters is achieved by framing the calibration process as an optimization problem, for which the global PSO algorithm is employed. Dynamic data integration is achieved by a multi-step calibration strategy for a situation where experiments are included one at a time, as they become available. Additionally, our calibration strategy is made flexible to account for mild heterogeneity between rock samples, resulting in a single set of parameters that is representative of the deformation datasets. As a rigorous mathematical analysis and the lack of relevant experimental datasets, we consider a wide range of synthetic experimental data, inspired by the existing sparse relevant data in the literature. The results of our performance analyses show that the proposed calibration strategy is robust and accuracy is improved as more experiments are included for calibration.
- [428] arXiv:2403.19448 (cross-list from math.OC) [pdf, other]
-
Title: Fisher-Rao Gradient Flows of Linear Programs and State-Action Natural Policy GradientsComments: 27 pages, 4 figures, under reviewSubjects: Optimization and Control (math.OC); Machine Learning (cs.LG); Systems and Control (eess.SY); Numerical Analysis (math.NA); Machine Learning (stat.ML)
Kakade's natural policy gradient method has been studied extensively in the last years showing linear convergence with and without regularization. We study another natural gradient method which is based on the Fisher information matrix of the state-action distributions and has received little attention from the theoretical side. Here, the state-action distributions follow the Fisher-Rao gradient flow inside the state-action polytope with respect to a linear potential. Therefore, we study Fisher-Rao gradient flows of linear programs more generally and show linear convergence with a rate that depends on the geometry of the linear program. Equivalently, this yields an estimate on the error induced by entropic regularization of the linear program which improves existing results. We extend these results and show sublinear convergence for perturbed Fisher-Rao gradient flows and natural gradient flows up to an approximation error. In particular, these general results cover the case of state-action natural policy gradients.
- [429] arXiv:2403.19455 (cross-list from math.OC) [pdf, ps, other]
-
Title: Stabilization of a Class of Large-Scale Systems of Linear Hyperbolic PDEs via Continuum Approximation of Exact Backstepping KernelsComments: 16 pages, 6 figures, submitted to IEEE Transactions on Automatic ControlSubjects: Optimization and Control (math.OC); Systems and Control (eess.SY)
We establish that stabilization of a class of linear, hyperbolic partial differential equations (PDEs) with a large (nevertheless finite) number of components, can be achieved via employment of a backstepping-based control law, which is constructed for stabilization of a continuum version (i.e., as the number of components tends to infinity) of the PDE system. This is achieved by proving that the exact backstepping kernels, constructed for stabilization of the large-scale system, can be approximated (in certain sense such that exponential stability is preserved) by the backstepping kernels constructed for stabilization of a continuum version (essentially an infinite ensemble) of the original PDE system. The proof relies on construction of a convergent sequence of backstepping kernels that is defined such that each kernel matches the exact backstepping kernels (derived based on the original, large-scale system), in a piecewise constant manner with respect to an ensemble variable; while showing that they satisfy the continuum backstepping kernel equations. We present a numerical example that reveals that complexity of computation of stabilizing backstepping kernels may not scale with the number of components of the PDE state, when the kernels are constructed on the basis of the continuum version, in contrast to the case in which they are constructed on the basis of the original, large-scale system. In addition, we formally establish the connection between the solutions to the large-scale system and its continuum counterpart. Thus, this approach can be useful for design of computationally tractable, stabilizing backstepping-based control laws for large-scale PDE systems.
- [430] arXiv:2403.19508 (cross-list from eess.IV) [pdf, other]
-
Title: Debiasing Cardiac Imaging with Controlled Latent Diffusion ModelsAuthors: Grzegorz Skorupko, Richard Osuala, Zuzanna Szafranowska, Kaisar Kushibar, Nay Aung, Steffen E Petersen, Karim Lekadir, Polyxeni GkontraSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
The progress in deep learning solutions for disease diagnosis and prognosis based on cardiac magnetic resonance imaging is hindered by highly imbalanced and biased training data. To address this issue, we propose a method to alleviate imbalances inherent in datasets through the generation of synthetic data based on sensitive attributes such as sex, age, body mass index, and health condition. We adopt ControlNet based on a denoising diffusion probabilistic model to condition on text assembled from patient metadata and cardiac geometry derived from segmentation masks using a large-cohort study, specifically, the UK Biobank. We assess our method by evaluating the realism of the generated images using established quantitative metrics. Furthermore, we conduct a downstream classification task aimed at debiasing a classifier by rectifying imbalances within underrepresented groups through synthetically generated samples. Our experiments demonstrate the effectiveness of the proposed approach in mitigating dataset imbalances, such as the scarcity of younger patients or individuals with normal BMI level suffering from heart failure. This work represents a major step towards the adoption of synthetic data for the development of fair and generalizable models for medical classification tasks. Notably, we conduct all our experiments using a single, consumer-level GPU to highlight the feasibility of our approach within resource-constrained environments. Our code is available at https://github.com/faildeny/debiasing-cardiac-mri.
- [431] arXiv:2403.19512 (cross-list from quant-ph) [pdf, ps, other]
-
Title: Quantum Realization of the Finite Element MethodSubjects: Quantum Physics (quant-ph); Numerical Analysis (math.NA)
This paper presents a quantum algorithm for the solution of prototypical second-order linear elliptic partial differential equations discretized by $d$-linear finite elements on Cartesian grids of a bounded $d$-dimensional domain. An essential step in the construction is a BPX preconditioner, which transforms the linear system into a sufficiently well-conditioned one, making it amenable to quantum computation. We provide a constructive proof demonstrating that our quantum algorithm can compute suitable functionals of the solution to a given tolerance $\texttt{tol}$ with a complexity linear in $\texttt{tol}^{-1}$ for a fixed dimension $d$, neglecting logarithmic terms. This complexity is proportional to that of its one-dimensional counterpart and improves previous quantum algorithms by a factor of order $\texttt{tol}^{-2}$. We also detail the design and implementation of a quantum circuit capable of executing our algorithm, and present simulator results that support the quantum feasibility of the finite element method in the near future, paving the way for quantum computing approaches to a wide range of PDE-related challenges.
- [432] arXiv:2403.19516 (cross-list from stat.ML) [pdf, ps, other]
-
Title: Maximum Likelihood Estimation on Stochastic Blockmodels for Directed Graph ClusteringSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Social and Information Networks (cs.SI); Statistics Theory (math.ST)
This paper studies the directed graph clustering problem through the lens of statistics, where we formulate clustering as estimating underlying communities in the directed stochastic block model (DSBM). We conduct the maximum likelihood estimation (MLE) on the DSBM and thereby ascertain the most probable community assignment given the observed graph structure. In addition to the statistical point of view, we further establish the equivalence between this MLE formulation and a novel flow optimization heuristic, which jointly considers two important directed graph statistics: edge density and edge orientation. Building on this new formulation of directed clustering, we introduce two efficient and interpretable directed clustering algorithms, a spectral clustering algorithm and a semidefinite programming based clustering algorithm. We provide a theoretical upper bound on the number of misclustered vertices of the spectral clustering algorithm using tools from matrix perturbation theory. We compare, both quantitatively and qualitatively, our proposed algorithms with existing directed clustering methods on both synthetic and real-world data, thus providing further ground to our theoretical contributions.
- [433] arXiv:2403.19605 (cross-list from stat.ME) [pdf, other]
-
Title: Data-Adaptive Tradeoffs among Multiple Risks in Distribution-Free PredictionComments: 27 pages, 10 figuresSubjects: Methodology (stat.ME); Machine Learning (cs.LG)
Decision-making pipelines are generally characterized by tradeoffs among various risk functions. It is often desirable to manage such tradeoffs in a data-adaptive manner. As we demonstrate, if this is done naively, state-of-the art uncertainty quantification methods can lead to significant violations of putative risk guarantees.
To address this issue, we develop methods that permit valid control of risk when threshold and tradeoff parameters are chosen adaptively. Our methodology supports monotone and nearly-monotone risks, but otherwise makes no distributional assumptions.
To illustrate the benefits of our approach, we carry out numerical experiments on synthetic data and the large-scale vision dataset MS-COCO.
Replacements for Fri, 29 Mar 24
- [434] arXiv:1903.06811 (replaced) [pdf, other]
-
Title: Multi-camera calibration with pattern rigs, including for non-overlapping cameras: CALICOComments: 11 pagesSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [435] arXiv:2006.14774 (replaced) [pdf, other]
-
Title: Co-Designing Statistical MIMO Radar and In-band Full-Duplex Multi-User MIMO Communications -- Part I: Signal ProcessingComments: 23 pages, 5 figures, 2 tablesSubjects: Signal Processing (eess.SP); Information Retrieval (cs.IR)
- [436] arXiv:2007.15776 (replaced) [pdf, other]
-
Title: Random Vector Functional Link Networks for Function Approximation on ManifoldsComments: 37 pages, 1 figureSubjects: Machine Learning (stat.ML); Information Theory (cs.IT); Machine Learning (cs.LG); Probability (math.PR)
- [437] arXiv:2010.05330 (replaced) [pdf, other]
-
Title: Incremental Processing in the Age of Non-Incremental Encoders: An Empirical Assessment of Bidirectional Models for Incremental NLUComments: Accepted to the EMNLP 2020 conference (long paper). V2 has minor updates, see note in last pageSubjects: Computation and Language (cs.CL)
- [438] arXiv:2101.10300 (replaced) [pdf, other]
-
Title: Channel Estimation via Successive Denoising in MIMO OFDM Systems: A Reinforcement Learning ApproachAuthors: Myeung Suk Oh, Seyyedali Hosseinalipour, Taejoon Kim, Christopher G. Brinton, David J. LoveComments: This paper has been published in the proceedings of 2021 IEEE International Conference on Communications (ICC)Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG)
- [439] arXiv:2107.07264 (replaced) [pdf, other]
-
Title: Automatic Resource Allocation in Business Processes: A Systematic Literature SurveySubjects: Software Engineering (cs.SE)
- [440] arXiv:2108.03018 (replaced) [pdf, other]
-
Title: Conditional Separation as a Binary Relation. A Coq Assisted ProofSubjects: Discrete Mathematics (cs.DM)
- [441] arXiv:2109.11729 (replaced) [pdf, other]
-
Title: Optimal error bounds in the absence of constraint qualifications with applications to the $p$-cones and beyondComments: 37 pages, comments welcome. To appear at Mathematics of Operations ResearchSubjects: Optimization and Control (math.OC); Numerical Analysis (math.NA)
- [442] arXiv:2110.00504 (replaced) [pdf, other]
-
Title: Adwords with Unknown Budgets and BeyondAuthors: Rajan UdwaniComments: To appear in Management ScienceSubjects: Data Structures and Algorithms (cs.DS)
- [443] arXiv:2202.04291 (replaced) [pdf, other]
-
Title: L2B: Learning to Bootstrap Robust Models for Combating Label NoiseAuthors: Yuyin Zhou, Xianhang Li, Fengze Liu, Qingyue Wei, Xuxi Chen, Lequan Yu, Cihang Xie, Matthew P. Lungren, Lei XingComments: CVPR 2024; code is available at this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [444] arXiv:2204.07839 (replaced) [pdf, ps, other]
-
Title: Dependence Logics in Temporal SettingsSubjects: Logic in Computer Science (cs.LO); Logic (math.LO)
- [445] arXiv:2204.08989 (replaced) [pdf, other]
-
Title: Efficient Deep Learning-based Estimation of the Vital Signs on SmartphonesComments: 10 pages, 8 figures, 11 tablesSubjects: Signal Processing (eess.SP); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
- [446] arXiv:2204.11970 (replaced) [pdf, other]
-
Title: Visual Acuity Prediction on Real-Life Patient Data Using a Machine Learning Based Multistage SystemAuthors: Tobias Schlosser, Frederik Beuth, Trixy Meyer, Arunodhayan Sampath Kumar, Gabriel Stolze, Olga Furashova, Katrin Engelmann, Danny KowerkoComments: Preprint for journal Scientific Reports (Springer)Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR); Machine Learning (cs.LG)
- [447] arXiv:2206.13690 (replaced) [pdf, other]
-
Title: Supervised Semantic Similarity-based Conflict Detection Algorithm: S3CDASubjects: Software Engineering (cs.SE)
- [448] arXiv:2208.11274 (replaced) [pdf, other]
-
Title: Revisiting Code Search in a Two-Stage ParadigmComments: Accepted by WSDM 2023Subjects: Software Engineering (cs.SE)
- [449] arXiv:2209.03919 (replaced) [pdf, other]
-
Title: Bi-objective Ranking and Selection Using Stochastic KrigingComments: 33 pages, 14 figuresSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
- [450] arXiv:2211.00499 (replaced) [pdf, ps, other]
-
Title: A combination technique for optimal control problems constrained by random PDEsComments: 29 pages, 4 figuresSubjects: Numerical Analysis (math.NA)
- [451] arXiv:2211.01579 (replaced) [pdf, other]
-
Title: Data-free Defense of Black Box Models Against Adversarial AttacksComments: CVPR Workshop (Under Review)Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV)
- [452] arXiv:2211.08360 (replaced) [pdf, other]
-
Title: An environmental disturbance observer framework for autonomous surface vesselsJournal-ref: Ocean Engineering, Volume 285, Part 2, 1 October 2023, 115412Subjects: Systems and Control (eess.SY)
- [453] arXiv:2211.13912 (replaced) [pdf, ps, other]
-
Title: Enhancing Recommender Systems: A Strategy to Mitigate False Negative ImpactComments: 9 pages, 16 figuresSubjects: Information Retrieval (cs.IR)
- [454] arXiv:2211.14361 (replaced) [pdf, other]
-
Title: gatekeeper: Online Safety Verification and Control for Nonlinear Systems in Dynamic EnvironmentsComments: Accepted at IROS 2023, 8 pages, 4 figures, Conditional Accept at IEEE T-ROSubjects: Robotics (cs.RO); Systems and Control (eess.SY)
- [455] arXiv:2212.00851 (replaced) [pdf, ps, other]
-
Title: SOLD: Sinhala Offensive Language DatasetAuthors: Tharindu Ranasinghe, Isuri Anuradha, Damith Premasiri, Kanishka Silva, Hansi Hettiarachchi, Lasitha Uyangodage, Marcos ZampieriComments: Accepted to Language Resources and Evaluation, SpringerSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Social and Information Networks (cs.SI)
- [456] arXiv:2212.08635 (replaced) [pdf, other]
-
Title: Self-Prompting Large Language Models for Zero-Shot Open-Domain QAComments: NAACL 2024Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
- [457] arXiv:2212.08686 (replaced) [pdf, other]
-
Title: Evaluating Step-by-Step Reasoning through Symbolic VerificationComments: NAACL-Findings, 2024Subjects: Computation and Language (cs.CL)
- [458] arXiv:2301.13375 (replaced) [pdf, other]
-
Title: Optimal Transport Perturbations for Safe Reinforcement Learning with Robustness GuaranteesComments: Transactions on Machine Learning Research (TMLR), 2024Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
- [459] arXiv:2302.03357 (replaced) [pdf, other]
-
Title: Towards Enhancing Time Series Contrastive Learning: A Dynamic Bad Pair Mining ApproachComments: ICLR 2024 Camera Ready (this https URL)Subjects: Machine Learning (cs.LG)
- [460] arXiv:2302.03788 (replaced) [pdf, other]
-
Title: Toward a Theory of Causation for Interpreting Neural Code ModelsAuthors: David N. Palacio, Alejandro Velasco, Nathan Cooper, Alvaro Rodriguez, Kevin Moran, Denys PoshyvanykComments: Accepted to appear in IEEE Transactions on Software EngineeringSubjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Methodology (stat.ME)
- [461] arXiv:2303.05699 (replaced) [pdf, other]
-
Title: Feature Unlearning for Pre-trained GANs and VAEsSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [462] arXiv:2303.08737 (replaced) [pdf, other]
-
Title: Evaluating gesture generation in a large-scale open challenge: The GENEA Challenge 2022Authors: Taras Kucherenko, Pieter Wolfert, Youngwoo Yoon, Carla Viegas, Teodor Nikolov, Mihail Tsakov, Gustav Eje HenterComments: The first three authors made equal contributions and share joint first authorship. Accepted for publication in the ACM Transactions on Graphics (TOG).Please see this https URL for all challenge materials. arXiv admin note: text overlap with arXiv:2208.10441Subjects: Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Multimedia (cs.MM)
- [463] arXiv:2303.14694 (replaced) [pdf, other]
-
Title: A stability theorem for bigraded persistence barcodesComments: 20 pagesSubjects: Algebraic Topology (math.AT); Computational Geometry (cs.CG); Machine Learning (cs.LG); Combinatorics (math.CO); Metric Geometry (math.MG)
- [464] arXiv:2304.05684 (replaced) [pdf, other]
-
Title: InterGen: Diffusion-based Multi-human Motion Generation under Complex InteractionsComments: accepted by IJCV 2024Journal-ref: Int J Comput Vis (2024) 1-21Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [465] arXiv:2304.09224 (replaced) [pdf, other]
-
Title: Quantum machine learning for image classificationComments: 13 pages, 10 figures, 1 tableJournal-ref: Mach. Learn.: Sci. Technol. 5(1), 015040 (2024)Subjects: Quantum Physics (quant-ph); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [466] arXiv:2304.09704 (replaced) [pdf, other]
-
Title: Learnable Earth Parser: Discovering 3D Prototypes in Aerial ScansSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [467] arXiv:2304.11342 (replaced) [pdf, ps, other]
-
Title: NaviNeRF: NeRF-based 3D Representation Disentanglement by Latent Semantic NavigationSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [468] arXiv:2305.09535 (replaced) [src]
-
Title: What's the Problem, Linda? The Conjunction Fallacy as a Fairness ProblemAuthors: Jose Alvarez ColmenaresComments: Large portions of this paper were used for another paper with a different direction. Therefore, this paper became an early draft of the other paper, which is why I am removing it until it can stand by itselfSubjects: Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
- [469] arXiv:2305.12612 (replaced) [pdf, other]
-
Title: PrOnto: Language Model Evaluations for 859 LanguagesAuthors: Luke GesslerComments: Accepted at LREC-COLING 2024Subjects: Computation and Language (cs.CL)
- [470] arXiv:2305.18034 (replaced) [pdf, other]
-
Title: A Corpus for Sentence-level Subjectivity Detection on English News ArticlesAuthors: Francesco Antici, Andrea Galassi, Federico Ruggeri, Katerina Korre, Arianna Muti, Alessandra Bardi, Alice Fedotova, Alberto Barrón-CedeñoComments: Accepted at LREC-COLING 2024Subjects: Computation and Language (cs.CL)
- [471] arXiv:2305.19507 (replaced) [pdf, other]
-
Title: Manifold Constraint Regularization for Remote Sensing Image GenerationSubjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
- [472] arXiv:2306.02240 (replaced) [pdf, other]
-
Title: ProTeCt: Prompt Tuning for Taxonomic Open Set ClassificationComments: Accepted to CVPR 2024Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [473] arXiv:2306.03258 (replaced) [pdf, other]
-
Title: LipVoicer: Generating Speech from Silent Videos Guided by Lip ReadingComments: ICLR 2024Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
- [474] arXiv:2306.03799 (replaced) [pdf, other]
-
Title: Prompt Space Optimizing Few-shot Reasoning Success with Large Language ModelsAuthors: Fobo Shi, Peijun Qing, Dong Yang, Nan Wang, Youbo Lei, Haonan Lu, Xiaodong Lin, Duantengchuan LiComments: Natural language processing (NLP)Subjects: Computation and Language (cs.CL)
- [475] arXiv:2306.10367 (replaced) [pdf, other]
-
Title: Query2GMM: Learning Representation with Gaussian Mixture Model for Reasoning over Knowledge GraphsComments: 10 pages, 4 figures, accepted by WWW 2024Subjects: Information Retrieval (cs.IR)
- [476] arXiv:2306.12627 (replaced) [pdf, other]
-
Title: Targeted collapse regularized autoencoder for anomaly detection: black hole at the centerAuthors: Amin Ghafourian, Huanyi Shui, Devesh Upadhyay, Rajesh Gupta, Dimitar Filev, Iman Soltani BozchalooiComments: 18 pages, 4 figures, 8 tablesSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Neurons and Cognition (q-bio.NC); Machine Learning (stat.ML)
- [477] arXiv:2306.15116 (replaced) [pdf, other]
-
Title: Streaming quantum gate set tomography using the extended Kalman filterComments: Revised to version that appeared in the conferenceJournal-ref: in 2023 IEEE International Conference on Quantum Computing and Engineering (QCE), Bellevue, WA, USA, 2023 pp. 1401-1411Subjects: Quantum Physics (quant-ph); Systems and Control (eess.SY)
- [478] arXiv:2306.15865 (replaced) [pdf, other]
-
Title: Differentially Private Distributed Estimation and LearningComments: Accepted for publication at IISE Transactions (Special issue on Federated, Distributed Learning and Analytics)Subjects: Machine Learning (cs.LG); Social and Information Networks (cs.SI); Systems and Control (eess.SY); Statistics Theory (math.ST); Applications (stat.AP); Machine Learning (stat.ML)
- [479] arXiv:2306.15867 (replaced) [pdf, other]
-
Title: Convergence analysis of a weak Galerkin finite element method on a Shishkin mesh for a singularly perturbed fourth-order problem in 2DSubjects: Numerical Analysis (math.NA)
- [480] arXiv:2306.16324 (replaced) [pdf, other]
-
Title: DoseDiff: Distance-aware Diffusion Model for Dose Prediction in RadiotherapySubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
- [481] arXiv:2306.17061 (replaced) [pdf, other]
-
Title: RowPress: Amplifying Read Disturbance in Modern DRAM ChipsAuthors: Haocong Luo, Ataberk Olgun, A. Giray Yağlıkçı, Yahya Can Tuğrul, Steve Rhyner, Meryem Banu Cavlak, Joël Lindegger, Mohammad Sadrosadati, Onur MutluComments: Extended version of the paper "RowPress: Amplifying Read Disturbance in Modern DRAM Chips" at the 50th Annual International Symposium on Computer Architecture (ISCA), 2023Subjects: Cryptography and Security (cs.CR); Hardware Architecture (cs.AR)
- [482] arXiv:2306.17077 (replaced) [pdf, other]
-
Title: RAPGen: An Approach for Fixing Code Inefficiencies in Zero-ShotSubjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI)
- [483] arXiv:2306.17563 (replaced) [pdf, other]
-
Title: Large Language Models are Effective Text Rankers with Pairwise Ranking PromptingAuthors: Zhen Qin, Rolf Jagerman, Kai Hui, Honglei Zhuang, Junru Wu, Le Yan, Jiaming Shen, Tianqi Liu, Jialu Liu, Donald Metzler, Xuanhui Wang, Michael BenderskyComments: Accepted to NAACL 2024. Corrected results of RankT5 on TREC-DL19Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL); Machine Learning (cs.LG)
- [484] arXiv:2307.01362 (replaced) [pdf, other]
-
Title: Direct Superpoints Matching for Robust Point Cloud RegistrationSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [485] arXiv:2307.01579 (replaced) [src]
-
Title: With Trail to Follow: Measurements of Real-world Non-fungible Token Phishing Attacks on EthereumComments: The content of the article has undergone major changesSubjects: Cryptography and Security (cs.CR)
- [486] arXiv:2307.02192 (replaced) [pdf, other]
-
Title: The FormAI Dataset: Generative AI in Software Security Through the Lens of Formal VerificationAuthors: Norbert Tihanyi, Tamas Bisztray, Ridhi Jain, Mohamed Amine Ferrag, Lucas C. Cordeiro, Vasileios MavroeidisJournal-ref: PROMISE 2023: Proceedings of the 19th International Conference on Predictive Models and Data Analytics in Software Engineering December 2023 Pages 33 to 43Subjects: Databases (cs.DB); Artificial Intelligence (cs.AI)
- [487] arXiv:2307.02496 (replaced) [pdf, ps, other]
-
Title: Learning to reconstruct the bubble distribution with conductivity maps using Invertible Neural Networks and Error DiffusionComments: Accepted for Oral presentation at WCIPT11 (11th World Congress on Industrial Process Tomography)Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [488] arXiv:2307.03683 (replaced) [pdf, other]
-
Title: Differentiable Turbulence: Closure as a partial differential equation constrained optimizationSubjects: Fluid Dynamics (physics.flu-dyn); Machine Learning (cs.LG)
- [489] arXiv:2307.04132 (replaced) [pdf, other]
-
Title: Reasoning over the Behaviour of Objects in Video-Clips for Adverb-Type RecognitionSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Symbolic Computation (cs.SC)
- [490] arXiv:2307.04977 (replaced) [pdf, other]
-
Title: Model-Driven Sensing-Node Selection and Power Allocation for Tracking Maneuvering Targets in Perceptive Mobile NetworksSubjects: Information Theory (cs.IT); Signal Processing (eess.SP)
- [491] arXiv:2307.05352 (replaced) [pdf, other]
-
Title: Leveraging Variational Autoencoders for Parameterized MMSE EstimationComments: 12 pages, 12 figuresSubjects: Signal Processing (eess.SP); Information Theory (cs.IT); Machine Learning (stat.ML)
- [492] arXiv:2307.10924 (replaced) [pdf, other]
-
Title: Intrinsic Image Decomposition Using Point Cloud RepresentationComments: Code: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [493] arXiv:2308.02018 (replaced) [pdf, ps, other]
-
Title: Gradual Sensitivity TypingComments: Paper SubmissionSubjects: Programming Languages (cs.PL)
- [494] arXiv:2308.08453 (replaced) [pdf, other]
-
Title: Tightest Admissible Shortest PathComments: arXiv admin note: text overlap with arXiv:2208.11489Subjects: Data Structures and Algorithms (cs.DS); Artificial Intelligence (cs.AI); Discrete Mathematics (cs.DM)
- [495] arXiv:2308.09911 (replaced) [pdf, other]
-
Title: Noisy-Correspondence Learning for Text-to-Image Person Re-identificationSubjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
- [496] arXiv:2308.10226 (replaced) [pdf, other]
-
Title: Machine Learning-Powered Combinatorial Clock AuctionComments: AAAI 2024 (8 pages + appendix)Journal-ref: Proceedings of the AAAI Conference on Artificial Intelligence, 38(9) (2024) 9891-9900Subjects: Computer Science and Game Theory (cs.GT); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
- [497] arXiv:2308.10757 (replaced) [pdf, other]
-
Title: To Whom are You Talking? A Deep Learning Model to Endow Social Robots with Addressee Estimation SkillsComments: Accepted v. of IJCNN 2023 publication. Funded by the Horizon Europe project TERAIS (G.A. 101079338), the UKRI Node on Trust (EP/V026682/1), the EU projects TRAINCREASE and MUSAE, and the US project THRIVE++. Cite: this https URL Code: this https URL Data: this https URL 10 pages, 8 Figures, 3 TablesJournal-ref: 2023 International Joint Conference on Neural Networks (IJCNN), pp. 1-10Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO)
- [498] arXiv:2308.16682 (replaced) [pdf, other]
-
Title: DiffusionPoser: Real-time Human Motion Reconstruction From Arbitrary Sparse Sensors Using Autoregressive DiffusionComments: accepted at CVPR2024Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [499] arXiv:2309.00475 (replaced) [pdf, ps, other]
-
Title: Effective equation solving, constraints and growth in virtually abelian groupsComments: 28 pagesSubjects: Group Theory (math.GR); Discrete Mathematics (cs.DM); Formal Languages and Automata Theory (cs.FL)
- [500] arXiv:2309.01898 (replaced) [pdf, other]
-
Title: Safe Legged Locomotion using Collision Cone Control Barrier Functions (C3BFs)Comments: 5 Pages, 5 Figures. Updated citation. arXiv admin note: substantial text overlap with arXiv:2303.15871Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
- [501] arXiv:2309.02156 (replaced) [pdf, ps, other]
-
Title: Subspace Acceleration for a Sequence of Linear Systems and Application to Plasma SimulationSubjects: Numerical Analysis (math.NA); Plasma Physics (physics.plasm-ph)
- [502] arXiv:2309.02605 (replaced) [pdf, ps, other]
-
Title: A pragma based C++ framework for hybrid quantum/classical computationSubjects: Quantum Physics (quant-ph); Emerging Technologies (cs.ET); Programming Languages (cs.PL)
- [503] arXiv:2309.03008 (replaced) [pdf, other]
-
Title: Sparse 3D Reconstruction via Object-Centric Ray SamplingSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [504] arXiv:2309.11076 (replaced) [pdf, other]
-
Title: Symbolic Regression on Sparse and Noisy Data with Gaussian ProcessesComments: Submitted to CDC 2024Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY)
- [505] arXiv:2309.13302 (replaced) [pdf, other]
-
Title: Gaining the Sparse Rewards by Exploring Lottery Tickets in Spiking Neural NetworkComments: This paper is under submissionSubjects: Neural and Evolutionary Computing (cs.NE); Computer Vision and Pattern Recognition (cs.CV)
- [506] arXiv:2309.13610 (replaced) [pdf, other]
-
Title: VisionKG: Unleashing the Power of Visual Datasets via Knowledge GraphAuthors: Jicheng Yuan, Anh Le-Tuan, Manh Nguyen-Duc, Trung-Kien Tran, Manfred Hauswirth, Danh Le-PhuocComments: Accepted at ESWC 2024Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [507] arXiv:2309.13863 (replaced) [pdf, other]
-
Title: SuPerPM: A Large Deformation-Robust Surgical Perception Framework Based on Deep Point Matching Learned from Physical Constrained Simulation DataAuthors: Shan Lin, Albert J. Miao, Ali Alabiad, Fei Liu, Kaiyuan Wang, Jingpei Lu, Florian Richter, Michael C. YipSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [508] arXiv:2309.14274 (replaced) [pdf, ps, other]
-
Title: Analysis and Experimental Validation of the WPT Efficiency of the Both-Sides Retrodirective SystemComments: This current version has been submitted to the Space Solar Power and Wireless Transmission on February 19, 2024 for possible publication. Compared to the previous version, this version is a major revision discussing existing works more thoroughly to the proposed idea and also adding more detail to the experiment setup so it can be reproducibleSubjects: Systems and Control (eess.SY)
- [509] arXiv:2309.17053 (replaced) [pdf, ps, other]
-
Title: On the Power of the Weisfeiler-Leman Test for Graph Motif ParametersSubjects: Machine Learning (cs.LG)
- [510] arXiv:2310.00926 (replaced) [pdf, other]
-
Title: Integration of Graph Neural Network and Neural-ODEs for Tumor Dynamic PredictionSubjects: Machine Learning (cs.LG)
- [511] arXiv:2310.01201 (replaced) [pdf, other]
-
Title: SWoTTeD: An Extension of Tensor Decomposition to Temporal PhenotypingSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
- [512] arXiv:2310.02861 (replaced) [pdf, ps, other]
-
Title: Rayleigh Quotient Graph Neural Networks for Graph-level Anomaly DetectionSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
- [513] arXiv:2310.03394 (replaced) [pdf, other]
-
Title: Kinodynamic Motion Planning for a Team of Multirotors Transporting a Cable-Suspended Payload in Cluttered EnvironmentsComments: Submitted to IROS, 2024Subjects: Robotics (cs.RO); Multiagent Systems (cs.MA)
- [514] arXiv:2310.04190 (replaced) [pdf, other]
-
Title: On the Two Sides of Redundancy in Graph Neural NetworksSubjects: Machine Learning (cs.LG)
- [515] arXiv:2310.08080 (replaced) [pdf, ps, other]
-
Title: RT-SRTS: Angle-Agnostic Real-Time Simultaneous 3D Reconstruction and Tumor Segmentation from Single X-Ray ProjectionSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
- [516] arXiv:2310.08471 (replaced) [pdf, other]
-
Title: WinSyn: A High Resolution Testbed for Synthetic DataComments: cvpr versionSubjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
- [517] arXiv:2310.10136 (replaced) [pdf, other]
-
Title: Mata, a Fast and Simple Finite Automata Library (Technical Report)Authors: David Chocholatý, Tomáš Fiedor, Vojtěch Havlena, Lukáš Holík, Martin Hruška, Ondřej Lengál, Juraj SíčSubjects: Formal Languages and Automata Theory (cs.FL)
- [518] arXiv:2310.10500 (replaced) [pdf, other]
-
Title: Few-Shot Learning Patterns in Financial Time-Series for Trend-Following StrategiesComments: minor editsSubjects: Trading and Market Microstructure (q-fin.TR); Machine Learning (cs.LG); Portfolio Management (q-fin.PM)
- [519] arXiv:2310.12387 (replaced) [pdf, other]
-
Title: Learning to Optimise Climate Sensor Placement using a TransformerSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
- [520] arXiv:2310.14344 (replaced) [pdf, other]
-
Title: What's in a Prior? Learned Proximal Networks for Inverse ProblemsSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [521] arXiv:2310.15301 (replaced) [pdf, other]
-
Title: ADMarker: A Multi-Modal Federated Learning System for Monitoring Digital Biomarkers of Alzheimer's DiseaseAuthors: Xiaomin Ouyang, Xian Shuai, Yang Li, Li Pan, Xifan Zhang, Heming Fu, Xinyan Wang, Shihua Cao, Jiang Xin, Hazel Mok, Zhenyu Yan, Doris Sau Fung Yu, Timothy Kwok, Guoliang XingSubjects: Machine Learning (cs.LG)
- [522] arXiv:2310.19056 (replaced) [pdf, other]
-
Title: MILL: Mutual Verification with Large Language Models for Zero-Shot Query ExpansionAuthors: Pengyue Jia, Yiding Liu, Xiangyu Zhao, Xiaopeng Li, Changying Hao, Shuaiqiang Wang, Dawei YinComments: Accepted to NAACL 2024Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
- [523] arXiv:2310.19654 (replaced) [pdf, other]
-
Title: MCAD: Multi-teacher Cross-modal Alignment Distillation for efficient image-text retrievalSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [524] arXiv:2311.01959 (replaced) [pdf, ps, other]
-
Title: A First Order Method for Linear Programming Parameterized by Circuit ImbalanceSubjects: Optimization and Control (math.OC); Data Structures and Algorithms (cs.DS)
- [525] arXiv:2311.01990 (replaced) [pdf, other]
-
Title: Conditions on Preference Relations that Guarantee the Existence of Optimal PoliciesComments: v2: replaced with accepted AISTATS 2024 version, containing a new summary figure and one extra example. Results and conclusions are unchangedSubjects: Machine Learning (cs.LG)
- [526] arXiv:2311.03164 (replaced) [pdf, other]
-
Title: A contract negotiation scheme for safety verification of interconnected systemsSubjects: Systems and Control (eess.SY)
- [527] arXiv:2311.03284 (replaced) [pdf, other]
-
Title: Safe Collective Control under Noisy Inputs and Competing Constraints via Non-Smooth Barrier FunctionsComments: Accepted to the 2024 European Control Conference. See Section VI.B (in particular, Theorem 1, Proposition 2, and Remark 2) for updates incorporating new results (from Reference 3) on almost-sure safety of ZCBFsSubjects: Systems and Control (eess.SY); Robotics (cs.RO)
- [528] arXiv:2311.03573 (replaced) [pdf, ps, other]
-
Title: Smart Blockchain Networks: Revolutionizing Donation Tracking in the Web 3.0Subjects: Cryptography and Security (cs.CR)
- [529] arXiv:2311.04830 (replaced) [pdf, other]
-
Title: Real-Time Recurrent Reinforcement LearningComments: 14 pages, 9 figures, includes AppendixSubjects: Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Systems and Control (eess.SY)
- [530] arXiv:2311.06958 (replaced) [src]
-
Title: Towards probabilistic Weather Forecasting with Conditioned Spatio-Temporal Normalizing FlowsAuthors: Christina WinklerComments: Wrong version, will upload a new oneSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
- [531] arXiv:2311.07039 (replaced) [pdf, other]
-
Title: Time-Optimal Control for High-Order Chain-of-Integrators Systems with Full State Constraints and Arbitrary Terminal States (Extended Version)Subjects: Systems and Control (eess.SY)
- [532] arXiv:2311.09336 (replaced) [pdf, other]
-
Title: LLMRefine: Pinpointing and Refining Large Language Models via Fine-Grained Actionable FeedbackAuthors: Wenda Xu, Daniel Deutsch, Mara Finkelstein, Juraj Juraska, Biao Zhang, Zhongtao Liu, William Yang Wang, Lei Li, Markus FreitagComments: Accepted to NAACL 2024Subjects: Computation and Language (cs.CL)
- [533] arXiv:2311.09363 (replaced) [pdf, other]
-
Title: Investigating the Emergent Audio Classification Ability of ASR Foundation ModelsComments: NAACL 2024 (main conference)Subjects: Computation and Language (cs.CL)
- [534] arXiv:2311.09519 (replaced) [pdf, other]
-
Title: Leveraging Code to Improve In-context Learning for Semantic ParsingComments: Accepted to NAACL 2024Subjects: Computation and Language (cs.CL)
- [535] arXiv:2311.09682 (replaced) [pdf, other]
-
Title: MacGyver: Are Large Language Models Creative Problem Solvers?Authors: Yufei Tian, Abhilasha Ravichander, Lianhui Qin, Ronan Le Bras, Raja Marjieh, Nanyun Peng, Yejin Choi, Thomas L. Griffiths, Faeze BrahmanComments: NAACL 2024Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
- [536] arXiv:2311.10522 (replaced) [pdf, other]
-
Title: Enhancing Object Coherence in Layout-to-Image SynthesisComments: GitHub: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [537] arXiv:2311.11016 (replaced) [pdf, other]
-
Title: SNI-SLAM: Semantic Neural Implicit SLAMAuthors: Siting Zhu, Guangming Wang, Hermann Blum, Jiuming Liu, Liang Song, Marc Pollefeys, Hesheng WangComments: Accepted to CVPR 2024Subjects: Robotics (cs.RO)
- [538] arXiv:2311.11278 (replaced) [pdf, other]
-
Title: Transcending Forgery Specificity with Latent Space Augmentation for Generalizable Deepfake DetectionSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [539] arXiv:2311.11908 (replaced) [pdf, other]
-
Title: Continual Learning: Applications and the Road ForwardAuthors: Eli Verwimp, Rahaf Aljundi, Shai Ben-David, Matthias Bethge, Andrea Cossu, Alexander Gepperth, Tyler L. Hayes, Eyke Hüllermeier, Christopher Kanan, Dhireesha Kudithipudi, Christoph H. Lampert, Martin Mundt, Razvan Pascanu, Adrian Popescu, Andreas S. Tolias, Joost van de Weijer, Bing Liu, Vincenzo Lomonaco, Tinne Tuytelaars, Gido M. van de VenSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
- [540] arXiv:2311.13099 (replaced) [pdf, other]
-
Title: PIE-NeRF: Physics-based Interactive Elastodynamics with NeRFSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR); Machine Learning (cs.LG)
- [541] arXiv:2311.13120 (replaced) [pdf, other]
-
Title: Multi-modal In-Context Learning Makes an Ego-evolving Scene Text RecognizerAuthors: Zhen Zhao, Jingqun Tang, Chunhui Lin, Binghong Wu, Can Huang, Hao Liu, Xin Tan, Zhizhong Zhang, Yuan XieComments: Accepted to CVPR2024Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [542] arXiv:2311.13628 (replaced) [pdf, other]
-
Title: Prompt Risk Control: A Rigorous Framework for Responsible Deployment of Large Language ModelsComments: 34 pages, 10 figures, published as conference paper at ICLR 2024, and accepted to the Socially Responsible Language Modelling Research (SoLaR) workshop at NeurIPS 2023Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
- [543] arXiv:2311.14097 (replaced) [pdf, other]
-
Title: ACT-Diffusion: Efficient Adversarial Consistency Training for One-step Diffusion ModelsAuthors: Fei Kong, Jinhao Duan, Lichao Sun, Hao Cheng, Renjing Xu, Hengtao Shen, Xiaofeng Zhu, Xiaoshuang Shi, Kaidi XuComments: To appear in CVPR 2024Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [544] arXiv:2311.15596 (replaced) [pdf, other]
-
Title: EgoThink: Evaluating First-Person Perspective Thinking Capability of Vision-Language ModelsSubjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
- [545] arXiv:2311.15977 (replaced) [pdf, other]
-
Title: Text2Loc: 3D Point Cloud Localization from Natural LanguageComments: Accepted by CVPR 2024Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [546] arXiv:2311.16473 (replaced) [pdf, other]
-
Title: GS-IR: 3D Gaussian Splatting for Inverse RenderingSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [547] arXiv:2311.16516 (replaced) [pdf, other]
-
Title: Segment Every Out-of-Distribution ObjectComments: 20 pages, 14 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [548] arXiv:2311.17048 (replaced) [pdf, other]
-
Title: Zero-shot Referring Expression Comprehension via Structural Similarity Between Images and CaptionsComments: CVPR 2024Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [549] arXiv:2311.17112 (replaced) [pdf, other]
-
Title: Parameter Efficient Fine-tuning via Cross Block Orchestration for Segment Anything ModelComments: Accepted by CVPR2024Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [550] arXiv:2311.17113 (replaced) [pdf, other]
-
Title: Human Gaussian Splatting: Real-time Rendering of Animatable AvatarsComments: Accepted to CVPR 2024Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
- [551] arXiv:2311.17216 (replaced) [pdf, other]
-
Title: Self-Discovering Interpretable Diffusion Latent Directions for Responsible Text-to-Image GenerationComments: Accepted to CVPR 2024Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [552] arXiv:2311.18331 (replaced) [pdf, other]
-
Title: MRFP: Learning Generalizable Semantic Segmentation from Sim-2-Real with Multi-Resolution Feature PerturbationComments: Accepted to CVPR 2024Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [553] arXiv:2312.00096 (replaced) [pdf, other]
-
Title: OST: Refining Text Knowledge with Optimal Spatio-Temporal Descriptor for General Video RecognitionComments: Technical report. Project Page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [554] arXiv:2312.02051 (replaced) [pdf, other]
-
Title: TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video UnderstandingComments: CVPR 2024 camera-ready version, code is available at this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
- [555] arXiv:2312.02069 (replaced) [pdf, other]
-
Title: GaussianAvatars: Photorealistic Head Avatars with Rigged 3D GaussiansAuthors: Shenhan Qian, Tobias Kirschstein, Liam Schoneveld, Davide Davoli, Simon Giebenhain, Matthias NießnerComments: Project page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [556] arXiv:2312.02137 (replaced) [pdf, other]
-
Title: MANUS: Markerless Grasp Capture using Articulated 3D GaussiansAuthors: Chandradeep Pokhariya, Ishaan N Shah, Angela Xing, Zekun Li, Kefan Chen, Avinash Sharma, Srinath SridharComments: IEEE / CVF Computer Vision and Pattern Recognition Conference (CVPR) 2024Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [557] arXiv:2312.03160 (replaced) [pdf, other]
-
Title: HybridNeRF: Efficient Neural Rendering via Adaptive Volumetric SurfacesAuthors: Haithem Turki, Vasu Agrawal, Samuel Rota Bulò, Lorenzo Porzi, Peter Kontschieder, Deva Ramanan, Michael Zollhöfer, Christian RichardtComments: CVPR 2024 Project page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (cs.LG)
- [558] arXiv:2312.03441 (replaced) [pdf, other]
-
Title: UFineBench: Towards Text-based Person Retrieval with Ultra-fine GranularityAuthors: Jialong Zuo, Hanyu Zhou, Ying Nie, Feng Zhang, Tianyu Guo, Nong Sang, Yunhe Wang, Changxin GaoSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [559] arXiv:2312.03596 (replaced) [pdf, other]
-
Title: MMM: Generative Masked Motion ModelComments: accepted to CVPRSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
- [560] arXiv:2312.03795 (replaced) [pdf, other]
-
Title: AnimatableDreamer: Text-Guided Non-rigid 3D Model Generation and Reconstruction with Canonical Score DistillationAuthors: Xinzhou Wang, Yikai Wang, Junliang Ye, Zhengyi Wang, Fuchun Sun, Pengkun Liu, Ling Wang, Kai Sun, Xintong Wang, Bin HeComments: Project page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [561] arXiv:2312.04021 (replaced) [pdf, other]
-
Title: A Study on the Calibration of In-context LearningAuthors: Hanlin Zhang, Yi-Fan Zhang, Yaodong Yu, Dhruv Madeka, Dean Foster, Eric Xing, Himabindu Lakkaraju, Sham KakadeComments: NAACL 2024Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
- [562] arXiv:2312.05995 (replaced) [pdf, other]
-
Title: From Correspondences to Pose: Non-minimal Certifiably Optimal Relative Pose without DisambiguationComments: Accepted to CVPR 2024Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [563] arXiv:2312.06153 (replaced) [pdf, other]
-
Title: Open Datasheets: Machine-readable Documentation for Open Datasets and Responsible AI AssessmentsAuthors: Anthony Cintron Roman, Jennifer Wortman Vaughan, Valerie See, Steph Ballard, Jehu Torres, Caleb Robinson, Juan M. Lavista FerresSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
- [564] arXiv:2312.07330 (replaced) [pdf, other]
-
Title: Learned representation-guided diffusion models for large-image generationAuthors: Alexandros Graikos, Srikar Yellapragada, Minh-Quan Le, Saarthak Kapse, Prateek Prasanna, Joel Saltz, Dimitris SamarasSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [565] arXiv:2312.07360 (replaced) [pdf, other]
-
Title: Boosting Latent Diffusion with Flow MatchingSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [566] arXiv:2312.08280 (replaced) [pdf, other]
-
Title: New High-Order Numerical Methods for Hyperbolic Systems of Nonlinear PDEs with UncertaintiesAuthors: Alina Chertock, Michael Herty, Arsen S. Iskhakov, Safa Janajra, Alexander Kurganov, Maria Lukacova-MedvidovaSubjects: Numerical Analysis (math.NA)
- [567] arXiv:2312.10370 (replaced) [pdf, other]
-
Title: Do Similar Entities have Similar Embeddings?Comments: Accepted at ESWC 2024Subjects: Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Machine Learning (cs.LG)
- [568] arXiv:2312.11034 (replaced) [pdf, other]
-
Title: Appeal: Allow Mislabeled Samples the Chance to be Rectified in Partial Label LearningComments: Under review. An extended version of 2024 AAAI oral paper "Partial Label Learning with a Partner"Subjects: Machine Learning (cs.LG)
- [569] arXiv:2312.11598 (replaced) [pdf, other]
-
Title: SkillDiffuser: Interpretable Hierarchical Planning via Skill Abstractions in Diffusion-Based Task ExecutionComments: Accepted by CVPR 2024. Camera ready version. Project page: this https URLSubjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [570] arXiv:2312.12566 (replaced) [pdf, other]
-
Title: Johnsen-Rahbek Capstan Clutch: A High Torque Electrostatic ClutchSubjects: Robotics (cs.RO); Systems and Control (eess.SY)
- [571] arXiv:2312.13102 (replaced) [pdf, other]
-
Title: SpecNeRF: Gaussian Directional Encoding for Specular ReflectionsAuthors: Li Ma, Vasu Agrawal, Haithem Turki, Changil Kim, Chen Gao, Pedro Sander, Michael Zollhöfer, Christian RichardtComments: Accepted to CVPR2024, Project page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [572] arXiv:2312.13936 (replaced) [pdf, other]
-
Title: GVE-Leiden: Fast Leiden Algorithm for Community Detection in Shared Memory SettingAuthors: Subhajit SahuComments: 12 pages, 10 figures, 1 table. arXiv admin note: substantial text overlap with arXiv:2312.04876Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Performance (cs.PF)
- [573] arXiv:2312.14810 (replaced) [pdf, other]
- [574] arXiv:2401.00365 (replaced) [pdf, other]
-
Title: HQ-VAE: Hierarchical Discrete Representation Learning with Variational BayesAuthors: Yuhta Takida, Yukara Ikemiya, Takashi Shibuya, Kazuki Shimada, Woosung Choi, Chieh-Hsin Lai, Naoki Murata, Toshimitsu Uesaka, Kengo Uchida, Wei-Hsiang Liao, Yuki MitsufujiComments: 34 pages with 17 figures, accepted for TMLRSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
- [575] arXiv:2401.01286 (replaced) [pdf, other]
-
Title: A Comprehensive Study of Knowledge Editing for Large Language ModelsAuthors: Ningyu Zhang, Yunzhi Yao, Bozhong Tian, Peng Wang, Shumin Deng, Mengru Wang, Zekun Xi, Shengyu Mao, Jintian Zhang, Yuansheng Ni, Siyuan Cheng, Ziwen Xu, Xin Xu, Jia-Chen Gu, Yong Jiang, Pengjun Xie, Fei Huang, Lei Liang, Zhiqiang Zhang, Xiaowei Zhu, Jun Zhou, Huajun ChenComments: Ongoing work; 52 pages, 282 citations; benchmark is available at this https URL code is available at this https URL paper list is available at this https URLSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
- [576] arXiv:2401.03707 (replaced) [pdf, other]
-
Title: FMA-Net: Flow-Guided Dynamic Filtering and Iterative Feature Refinement with Multi-Attention for Joint Video Super-Resolution and DeblurringComments: CVPR2024 (camera-ready version). The last two authors are co-corresponding authors. Please visit our project page at this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [577] arXiv:2401.06877 (replaced) [pdf, other]
-
Title: Promptly Predicting Structures: The Return of InferenceComments: 18 pages, 13 figures Accepted to NAACL'2024 (Main)Subjects: Computation and Language (cs.CL)
- [578] arXiv:2401.09210 (replaced) [pdf, other]
-
Title: Narratives of Collective Action in YouTube's Discourse on VeganismComments: 15 pages, 7 figures, 7 tables. Accepted at ICWSM 2024Subjects: Computers and Society (cs.CY); Physics and Society (physics.soc-ph)
- [579] arXiv:2401.09268 (replaced) [pdf, other]
-
Title: Chemically Motivated Simulation Problems are Efficiently Solvable by a Quantum ComputerAuthors: Philipp Schleich, Lasse Bjørn Kristensen, Jorge A. Campos Gonzalez Angulo, Davide Avagliano, Mohsen Bagherimehrab, Abdulrahman Aldossary, Christoph Gorgulla, Joe Fitzsimons, Alán Aspuru-GuzikComments: 12 pages, 4 figuresSubjects: Quantum Physics (quant-ph); Computational Complexity (cs.CC); Chemical Physics (physics.chem-ph)
- [580] arXiv:2401.10746 (replaced) [pdf, other]
-
Title: A Systematic Evaluation of Euclidean Alignment with Deep Learning for EEG DecodingComments: 14 pages and 10 figuresSubjects: Signal Processing (eess.SP); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
- [581] arXiv:2401.11874 (replaced) [pdf, other]
-
Title: Detect-Order-Construct: A Tree Construction based Approach for Hierarchical Document Structure AnalysisComments: Submitted to Pattern RecognitionSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [582] arXiv:2401.12214 (replaced) [pdf, other]
-
Title: Quality-Aware Hydraulic Control in Drinking Water Networks via Controllability ProxiesSubjects: Systems and Control (eess.SY)
- [583] arXiv:2401.16456 (replaced) [pdf, other]
-
Title: SHViT: Single-Head Vision Transformer with Memory Efficient Macro DesignComments: CVPR 2024Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [584] arXiv:2401.17959 (replaced) [pdf, ps, other]
-
Title: University Students Motives and Challenges in Utilising Institutional Repository ResourcesSubjects: Digital Libraries (cs.DL)
- [585] arXiv:2402.01088 (replaced) [pdf, other]
-
Title: The Danger Of Arrogance: Welfare Equilibra As A Solution To Stackelberg Self-Play In Non-Coincidental GamesComments: 31 pages, 23 figuresSubjects: Computer Science and Game Theory (cs.GT); Multiagent Systems (cs.MA)
- [586] arXiv:2402.01786 (replaced) [pdf, other]
-
Title: COA-GPT: Generative Pre-trained Transformers for Accelerated Course of Action Development in Military OperationsComments: Accepted at the NATO Science and Technology Organization Symposium (ICMCIS) organized by the Information Systems Technology (IST) Panel, IST-205-RSY - the ICMCIS, held in Koblenz, Germany, 23-24 April 2024Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
- [587] arXiv:2402.02160 (replaced) [pdf, other]
-
Title: Data Poisoning for In-context LearningSubjects: Cryptography and Security (cs.CR)
- [588] arXiv:2402.04061 (replaced) [pdf, other]
-
Title: TopoNav: Topological Navigation for Efficient Exploration in Sparse Reward EnvironmentsAuthors: Jumman Hossain, Abu-Zaher Faridee, Nirmalya Roy, Jade Freeman, Timothy Gregory, Theron T. TroutComments: Paper under review for IROS 2024Subjects: Robotics (cs.RO); Machine Learning (cs.LG)
- [589] arXiv:2402.04944 (replaced) [pdf, other]
-
Title: Elastic Analysis of Augmented Curves and Constrained SurfacesAuthors: Esfandiar Nava-YazdaniSubjects: Differential Geometry (math.DG); Numerical Analysis (math.NA)
- [590] arXiv:2402.05608 (replaced) [pdf, other]
-
Title: Scalable Diffusion Models with State Space BackboneSubjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
- [591] arXiv:2402.06501 (replaced) [pdf, other]
-
Title: Scalable Interactive Machine Learning for Future Command and ControlAuthors: Anna Madison, Ellen Novoseller, Vinicius G. Goecks, Benjamin T. Files, Nicholas Waytowich, Alfred Yu, Vernon J. Lawhern, Steven Thurman, Christopher Kelshaw, Kaleb McDowellComments: Accepted at the NATO Science and Technology Organization Symposium (ICMCIS) organized by the Information Systems Technology (IST) Panel, IST-205-RSY - the ICMCIS, held in Koblenz, Germany, 23-24 April 2024Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Human-Computer Interaction (cs.HC)
- [592] arXiv:2402.06988 (replaced) [pdf, ps, other]
-
Title: Three Subtyping Algorithms for Binary Session Types and their Complexity Analyses (full version)Comments: 14 pages, 5 figures. Full version of a paper submitted to PLACES 2024Subjects: Programming Languages (cs.PL)
- [593] arXiv:2402.07946 (replaced) [pdf, ps, other]
-
Title: Re-Envisioning Command and ControlComments: Accepted at the NATO Science and Technology Organization Symposium (ICMCIS) organized by the Information Systems Technology (IST) Panel, IST-205-RSY - the ICMCIS, held in Koblenz, Germany, 23-24 April 2024Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
- [594] arXiv:2402.08403 (replaced) [pdf, other]
-
Title: LLMs and the Human ConditionAuthors: Peter WallisComments: 3rd draft includes Roger's comments. Added images of Sagrada Familia and termite mounds. target is IVA in 2024Subjects: Computation and Language (cs.CL)
- [595] arXiv:2402.08436 (replaced) [pdf, other]
-
Title: The current state of security -- Insights from the German software industryComments: 36 pages, 19 figuresSubjects: Cryptography and Security (cs.CR)
- [596] arXiv:2402.08714 (replaced) [pdf, other]
-
Title: PRDP: Proximal Reward Difference Prediction for Large-Scale Reward Finetuning of Diffusion ModelsComments: CVPR 2024. Project page: this https URLSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
- [597] arXiv:2402.10251 (replaced) [pdf, other]
-
Title: Brant-2: Foundation Model for Brain SignalsComments: 14 pages, 7 figuresSubjects: Neurons and Cognition (q-bio.NC); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Signal Processing (eess.SP)
- [598] arXiv:2402.11454 (replaced) [pdf, other]
-
Title: An Approach for Addressing Internally-Disconnected Communities in Louvain AlgorithmAuthors: Subhajit SahuComments: 15 pages, 7 figures, 1 table. arXiv admin note: text overlap with arXiv:2312.13936Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Social and Information Networks (cs.SI)
- [599] arXiv:2402.11549 (replaced) [pdf, other]
-
Title: Syntactic Language Change in English and German: Metrics, Parsers, and ConvergencesComments: Updated to the current versionSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
- [600] arXiv:2402.11567 (replaced) [pdf, ps, other]
-
Title: Saturability of the Quantum Cramér-Rao Bound in Multiparameter Quantum Estimation at the Single-Copy LevelAuthors: Hendra I. NurdinComments: 10 pages, no figures. Partly different approach from v2 but yielding the same conclusions and a strengthened result. Theorem 6 now states necessary and sufficient conditions. To appear in IEEE Control Systems Letters. Comments are welcome!Subjects: Quantum Physics (quant-ph); Systems and Control (eess.SY)
- [601] arXiv:2402.11815 (replaced) [pdf, other]
-
Title: HU at SemEval-2024 Task 8A: Can Contrastive Learning Learn Embeddings to Detect Machine-Generated Text?Comments: Camera Ready Version - Accepted in SemEval 2024 (Colocated with NAACL 2024)Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
- [602] arXiv:2402.12373 (replaced) [pdf, other]
-
Title: LTL learning on GPUsComments: 27 pagesSubjects: Programming Languages (cs.PL); Artificial Intelligence (cs.AI)
- [603] arXiv:2402.13492 (replaced) [pdf, other]
-
Title: Retrieval Helps or Hurts? A Deeper Dive into the Efficacy of Retrieval Augmentation to Language ModelsComments: NAACL2024 (main)Subjects: Computation and Language (cs.CL)
- [604] arXiv:2402.14490 (replaced) [pdf, other]
-
Title: Imbalanced Data Clustering using Equilibrium K-MeansAuthors: Yudong HeSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
- [605] arXiv:2402.14556 (replaced) [pdf, other]
-
Title: Quantum computing in civil engineering: Potentials and LimitationsComments: accepted at ICCCBE 2024Subjects: Emerging Technologies (cs.ET)
- [606] arXiv:2402.16105 (replaced) [pdf, other]
-
Title: Informed Meta-LearningSubjects: Machine Learning (cs.LG)
- [607] arXiv:2402.17058 (replaced) [pdf, other]
-
Title: An Analysis of Capacity-Distortion Trade-Offs in Memoryless ISAC SystemsSubjects: Information Theory (cs.IT); Signal Processing (eess.SP)
- [608] arXiv:2402.17729 (replaced) [pdf, other]
-
Title: Towards Fairness-Aware Adversarial LearningComments: This work will appear in the CVPR 2024 conference proceedingsSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [609] arXiv:2402.19161 (replaced) [pdf, other]
-
Title: MemoNav: Working Memory Model for Visual NavigationComments: Accepted to CVPR 2024. Code: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Robotics (cs.RO)
- [610] arXiv:2402.19212 (replaced) [pdf, ps, other]
-
Title: Deep Reinforcement Learning: A Convex Optimization ApproachAuthors: Ather GattamiSubjects: Optimization and Control (math.OC); Machine Learning (cs.LG)
- [611] arXiv:2402.19470 (replaced) [pdf, other]
-
Title: Towards Generalizable Tumor SynthesisComments: The IEEE / CVF Computer Vision and Pattern Recognition Conference (CVPR 2024)Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
- [612] arXiv:2403.01121 (replaced) [pdf, other]
-
Title: OpenGraph: Towards Open Graph Foundation ModelsSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Social and Information Networks (cs.SI)
- [613] arXiv:2403.02472 (replaced) [pdf, other]
-
Title: OffLanDat: A Community Based Implicit Offensive Language Dataset Generated by Large Language Model Through Prompt EngineeringAuthors: Amit Das, Mostafa Rahgouy, Dongji Feng, Zheng Zhang, Tathagata Bhattacharya, Nilanjana Raychawdhary, Mary Sandage, Lauramarie Pope, Gerry Dozier, Cheryl SealsSubjects: Computation and Language (cs.CL)
- [614] arXiv:2403.04260 (replaced) [pdf, other]
-
Title: Can Small Language Models be Good Reasoners for Sequential Recommendation?Authors: Yuling Wang, Changxin Tian, Binbin Hu, Yanhua Yu, Ziqi Liu, Zhiqiang Zhang, Jun Zhou, Liang Pang, Xiao WangComments: Accepted by TheWebConf (WWW) 2024Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL); Machine Learning (cs.LG)
- [615] arXiv:2403.05369 (replaced) [pdf, ps, other]
-
Title: Frequency-Adaptive Dilated Convolution for Semantic SegmentationSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [616] arXiv:2403.05950 (replaced) [pdf, ps, other]
-
Title: Classifying Objects in 3D Point Clouds Using Recurrent Neural Network: A GRU LSTM Hybrid ApproachSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [617] arXiv:2403.06436 (replaced) [pdf, ps, other]
-
Title: Designing a K-state P-bit EngineSubjects: Emerging Technologies (cs.ET); Optimization and Control (math.OC); Applied Physics (physics.app-ph)
- [618] arXiv:2403.07728 (replaced) [pdf, other]
-
Title: CAP: A General Algorithm for Online Selective Conformal Prediction with FCR ControlSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Methodology (stat.ME)
- [619] arXiv:2403.07820 (replaced) [pdf, ps, other]
-
Title: The Variant of Designated Verifier Signature Scheme with Message RecoveryComments: 11 pagesSubjects: Cryptography and Security (cs.CR)
- [620] arXiv:2403.08059 (replaced) [pdf, other]
-
Title: FluoroSAM: A Language-aligned Foundation Model for X-ray Image SegmentationAuthors: Benjamin D. Killeen, Liam J. Wang, Han Zhang, Mehran Armand, Russell H. Taylor, Dave Dreizin, Greg Osgood, Mathias UnberathSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
- [621] arXiv:2403.09412 (replaced) [pdf, other]
-
Title: OpenGraph: Open-Vocabulary Hierarchical 3D Graph Representation in Large-Scale Outdoor EnvironmentsSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Robotics (cs.RO)
- [622] arXiv:2403.09856 (replaced) [pdf, other]
-
Title: A Tale of Two Communities: Exploring Academic References on Stack OverflowComments: Accepted for publication in The Web Conference (WWW) 2024, Short Paper TrackSubjects: Computers and Society (cs.CY); Software Engineering (cs.SE)
- [623] arXiv:2403.10667 (replaced) [pdf, other]
-
Title: Towards Unified Multi-Modal Personalization: Large Vision-Language Models for Generative Recommendation and BeyondAuthors: Tianxin Wei, Bowen Jin, Ruirui Li, Hansi Zeng, Zhengyang Wang, Jianhui Sun, Qingyu Yin, Hanqing Lu, Suhang Wang, Jingrui He, Xianfeng TangComments: ICLR 2024Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Multimedia (cs.MM)
- [624] arXiv:2403.11505 (replaced) [pdf, other]
-
Title: COVID-19 detection from pulmonary CT scans using a novel EfficientNet with attention mechanismAuthors: Ramy Farag, Parth Upadhyay, Yixiang Gao, Jacket Demby, Katherin Garces Montoya, Seyed Mohamad Ali Tousi, Gbenga Omotara, Guilherme DeSouzaSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [625] arXiv:2403.11624 (replaced) [pdf, other]
-
Title: Dual-Channel Multiplex Graph Neural Networks for RecommendationSubjects: Information Retrieval (cs.IR); Machine Learning (cs.LG)
- [626] arXiv:2403.11687 (replaced) [pdf, other]
-
Title: Nonsmooth Implicit Differentiation: Deterministic and Stochastic Convergence RatesComments: Removed the assumption on the conservative derivative of the fixed point map having a product structure: the product of partial conservative derivatives is not conservative in generalSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Optimization and Control (math.OC)
- [627] arXiv:2403.11742 (replaced) [pdf, other]
-
Title: Accelerating Model Predictive Control for Legged Robots through Distributed OptimizationSubjects: Robotics (cs.RO)
- [628] arXiv:2403.11956 (replaced) [pdf, other]
-
Title: Subjective-Aligned Dataset and Metric for Text-to-Video Quality AssessmentAuthors: Tengchuan Kou, Xiaohong Liu, Zicheng Zhang, Chunyi Li, Haoning Wu, Xiongkuo Min, Guangtao Zhai, Ning LiuSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [629] arXiv:2403.12031 (replaced) [pdf, other]
-
Title: RouterBench: A Benchmark for Multi-LLM Routing SystemAuthors: Qitian Jason Hu, Jacob Bieker, Xiuyu Li, Nan Jiang, Benjamin Keigwin, Gaurav Ranganath, Kurt Keutzer, Shriyash Kaustubh UpadhyaySubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
- [630] arXiv:2403.12659 (replaced) [pdf, other]
-
Title: Graph Neural Networks for Carbon Dioxide Adsorption Prediction in Aluminium-Exchanged ZeolitesSubjects: Materials Science (cond-mat.mtrl-sci); Machine Learning (cs.LG)
- [631] arXiv:2403.12847 (replaced) [pdf, other]
-
Title: Policy Bifurcation in Safe Reinforcement LearningAuthors: Wenjun Zou, Yao Lyu, Jie Li, Yujie Yang, Shengbo Eben Li, Jingliang Duan, Xianyuan Zhan, Jingjing Liu, Yaqin Zhang, Keqiang LiSubjects: Machine Learning (cs.LG)
- [632] arXiv:2403.12984 (replaced) [pdf, other]
-
Title: When SMILES have Language: Drug Classification using Text Classification Methods on Drug SMILES StringsComments: 7 pages, 2 figures, 5 tables, Accepted (invited to present) to the The Second Tiny Papers Track at ICLR 2024 (this https URL)Journal-ref: The Second Tiny Papers Track at {ICLR} 2024, Tiny Papers @ {ICLR} 2024, Vienna Austria, May 11, 2024Subjects: Biomolecules (q-bio.BM); Computation and Language (cs.CL); Information Retrieval (cs.IR); Machine Learning (cs.LG); Machine Learning (stat.ML)
- [633] arXiv:2403.13171 (replaced) [pdf, other]
-
Title: LUWA Dataset: Learning Lithic Use-Wear Analysis on Microscopic ImagesAuthors: Jing Zhang, Irving Fang, Juexiao Zhang, Hao Wu, Akshat Kaushik, Alice Rodriguez, Hanwen Zhao, Zhuo Zheng, Radu Iovita, Chen FengComments: CVPRSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [634] arXiv:2403.13230 (replaced) [pdf, other]
-
Title: BFT-PoLoc: A Byzantine Fortified Trigonometric Proof of Location Protocol using Internet DelaysSubjects: Networking and Internet Architecture (cs.NI)
- [635] arXiv:2403.14302 (replaced) [pdf, other]
-
Title: SpikingResformer: Bridging ResNet and Vision Transformer in Spiking Neural NetworksComments: To be published in the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)Subjects: Neural and Evolutionary Computing (cs.NE); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [636] arXiv:2403.14403 (replaced) [pdf, other]
-
Title: Adaptive-RAG: Learning to Adapt Retrieval-Augmented Large Language Models through Question ComplexityComments: NAACL 2024Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
- [637] arXiv:2403.14472 (replaced) [pdf, other]
-
Title: Detoxifying Large Language Models via Knowledge EditingAuthors: Mengru Wang, Ningyu Zhang, Ziwen Xu, Zekun Xi, Shumin Deng, Yunzhi Yao, Qishen Zhang, Linyi Yang, Jindong Wang, Huajun ChenComments: Ongoing work. Project website: this https URL Due to the specificity of the knowledge editing setting, we revise Tables 1 and 3 to present a fair comparison of experimental results. More experimental results will be updated soonSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
- [638] arXiv:2403.14680 (replaced) [pdf, ps, other]
-
Title: Trust in AI: Progress, Challenges, and Future DirectionsSubjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI)
- [639] arXiv:2403.14760 (replaced) [pdf, other]
-
Title: Can 3D Vision-Language Models Truly Understand Natural Language?Comments: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [640] arXiv:2403.15022 (replaced) [pdf, other]
-
Title: Insights into the Lottery Ticket Hypothesis and Iterative Magnitude PruningSubjects: Machine Learning (cs.LG)
- [641] arXiv:2403.15136 (replaced) [pdf, other]
-
Title: Mixed finite element methods for linear Cosserat equationsComments: A typo was corrected, a boken citation was fixed, and a footnote was added to BGG sequencesSubjects: Numerical Analysis (math.NA)
- [642] arXiv:2403.15268 (replaced) [pdf, other]
-
Title: Imagination Augmented Generation: Learning to Imagine Richer Context for Question Answering over Large Language ModelsSubjects: Computation and Language (cs.CL)
- [643] arXiv:2403.15321 (replaced) [pdf, other]
-
Title: Visual Highlighting for Situated Brushing and LinkingComments: To appear in EuroVis 2024Subjects: Human-Computer Interaction (cs.HC)
- [644] arXiv:2403.15456 (replaced) [pdf, other]
-
Title: WoLF: Large Language Model Framework for CXR UnderstandingSubjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
- [645] arXiv:2403.15563 (replaced) [pdf, other]
-
Title: Sparse additive function decompositions facing basis transformsComments: 46 pages, 10 figures, 8 tablesSubjects: Numerical Analysis (math.NA); Optimization and Control (math.OC)
- [646] arXiv:2403.15634 (replaced) [pdf, ps, other]
-
Title: An Interactive Decision-Support Dashboard for Optimal Hospital Capacity ManagementSubjects: Computers and Society (cs.CY)
- [647] arXiv:2403.15905 (replaced) [pdf, other]
-
Title: Towards Low-Energy Adaptive Personalization for Resource-Constrained DevicesComments: Accepetd to The 4th Workshop on Machine Learning and Systems (EuroMLSys '24)Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
- [648] arXiv:2403.15931 (replaced) [pdf, other]
-
Title: X-Portrait: Expressive Portrait Animation with Hierarchical Motion AttentionSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [649] arXiv:2403.15955 (replaced) [pdf, other]
-
Title: Finding needles in a haystack: A Black-Box Approach to Invisible Watermark DetectionSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [650] arXiv:2403.15981 (replaced) [pdf, other]
-
Title: Exploring Accurate 3D Phenotyping in Greenhouse through Neural Radiance FieldsSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [651] arXiv:2403.16002 (replaced) [pdf, other]
-
Title: SDSTrack: Self-Distillation Symmetric Adapter Learning for Multi-Modal Visual Object TrackingAuthors: Xiaojun Hou, Jiazheng Xing, Yijie Qian, Yaowei Guo, Shuo Xin, Junhao Chen, Kai Tang, Mengmeng Wang, Zhengkai Jiang, Liang Liu, Yong LiuComments: Accepted by CVPR2024Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [652] arXiv:2403.16097 (replaced) [pdf, other]
-
Title: Can Language Models Pretend Solvers? Logic Code Simulation with LLMsComments: 12 pages, 8 figuresSubjects: Artificial Intelligence (cs.AI); Logic in Computer Science (cs.LO); Software Engineering (cs.SE)
- [653] arXiv:2403.16169 (replaced) [pdf, other]
-
Title: Gaze-guided Hand-Object Interaction Synthesis: Benchmark and MethodSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [654] arXiv:2403.16341 (replaced) [pdf, other]
-
Title: NonlinearSolve.jl: High-Performance and Robust Solvers for Systems of Nonlinear Equations in JuliaAuthors: Avik Pal, Flemming Holtorf, Axel Larsson, Torkel Loman, Utkarsh, Frank Schäefer, Qingyu Qu, Alan Edelman, Chris RackauckasSubjects: Numerical Analysis (math.NA)
- [655] arXiv:2403.16385 (replaced) [pdf, other]
-
Title: Synthesize Step-by-Step: Tools, Templates and LLMs as Data Generators for Reasoning-Based Chart VQAComments: Accepted to CVPR 2024Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
- [656] arXiv:2403.16451 (replaced) [pdf, other]
-
Title: DeepMachining: Online Prediction of Machining Errors of Lathe MachinesSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
- [657] arXiv:2403.16535 (replaced) [pdf, other]
-
Title: Arm-Constrained Curriculum Learning for Loco-Manipulation of the Wheel-Legged RobotAuthors: Zifan Wang, Yufei Jia, Lu Shi, Haoyu Wang, Haizhou Zhao, Xueyang Li, Jinni Zhou, Jun Ma, Guyue ZhouSubjects: Robotics (cs.RO)
- [658] arXiv:2403.16591 (replaced) [pdf, other]
-
Title: Deciphering the Interplay between Local Differential Privacy, Average Bayesian Privacy, and Maximum Bayesian PrivacySubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)
- [659] arXiv:2403.16712 (replaced) [pdf, ps, other]
-
Title: Chase Termination Beyond Polynomial TimeComments: Technical report of our PODS'24 paperSubjects: Databases (cs.DB); Logic in Computer Science (cs.LO)
- [660] arXiv:2403.16797 (replaced) [pdf, other]
-
Title: Privacy Preservation by Intermittent Transmission in Cooperative LQG Control SystemsSubjects: Systems and Control (eess.SY)
- [661] arXiv:2403.16898 (replaced) [pdf, other]
-
Title: Concerned with Data Contamination? Assessing Countermeasures in Code Language ModelComments: Adjust the format so that the layout looks betterSubjects: Cryptography and Security (cs.CR); Software Engineering (cs.SE)
- [662] arXiv:2403.17210 (replaced) [pdf, other]
-
Title: CADGL: Context-Aware Deep Graph Learning for Predicting Drug-Drug InteractionsComments: 8 Pages, 4 Figures; In reviewSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Biomolecules (q-bio.BM); Molecular Networks (q-bio.MN)
- [663] arXiv:2403.17458 (replaced) [pdf, ps, other]
-
Title: Expectations Versus Reality: Evaluating Intrusion Detection Systems in PracticeAuthors: Jake Hesford, Daniel Cheng, Alan Wan, Larry Huynh, Seungho Kim, Hyoungshick Kim, Jin B. HongComments: 10 pagesSubjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)
- [664] arXiv:2403.17608 (replaced) [pdf, other]
-
Title: Fake or JPEG? Revealing Common Biases in Generated Image Detection DatasetsSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
- [665] arXiv:2403.17633 (replaced) [pdf, other]
-
Title: UADA3D: Unsupervised Adversarial Domain Adaptation for 3D Object Detection with Sparse LiDAR and Large Domain GapsSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Robotics (cs.RO)
- [666] arXiv:2403.17645 (replaced) [pdf, ps, other]
-
Title: DANCER: Entity Description Augmented Named Entity Corrector for Automatic Speech RecognitionSubjects: Computation and Language (cs.CL)
- [667] arXiv:2403.17675 (replaced) [pdf, other]
-
Title: Chattering Phenomena in Time-Optimal Control for High-Order Chain-of-Integrators Systems with Full State ConstraintsSubjects: Optimization and Control (math.OC); Systems and Control (eess.SY)
- [668] arXiv:2403.17740 (replaced) [pdf, other]
-
Title: All-in-One: Heterogeneous Interaction Modeling for Cold-Start Rating PredictionComments: 14 pages, 9 figuresSubjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)
- [669] arXiv:2403.17752 (replaced) [pdf, other]
-
Title: Can multiple-choice questions really be useful in detecting the abilities of LLMs?Subjects: Computation and Language (cs.CL)
- [670] arXiv:2403.17919 (replaced) [pdf, other]
-
Title: LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-TuningSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Optimization and Control (math.OC)
- [671] arXiv:2403.18018 (replaced) [pdf, other]
-
Title: DORE: A Dataset For Portuguese Definition GenerationComments: Accepted to LREC-COLING 2024 (The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation)Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
- [672] arXiv:2403.18025 (replaced) [pdf, other]
-
Title: Improving Pre-trained Language Model Sensitivity via Mask Specific losses: A case study on Biomedical NERComments: Paper alrerady accepted for publishing by the NAACL 2024 conference (main conference paper)Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Machine Learning (cs.LG)
- [673] arXiv:2403.18028 (replaced) [pdf, other]
-
Title: Predicting Species Occurrence Patterns from Partial ObservationsComments: Tackling Climate Change with Machine Learning workshop at ICLR 2024Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Populations and Evolution (q-bio.PE)
- [674] arXiv:2403.18159 (replaced) [pdf, other]
-
Title: Oh! We Freeze: Improving Quantized Knowledge Distillation via Signal Propagation Analysis for Large Language ModelsAuthors: Kartikeya Bhardwaj, Nilesh Prasad Pandey, Sweta Priyadarshi, Kyunggeun Lee, Jun Ma, Harris TeagueComments: Accepted at Practical ML for Low Resource Settings Workshop at ICLR 2024Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
- [675] arXiv:2403.18314 (replaced) [pdf, other]
-
Title: Chinese Offensive Language Detection:Current Status and Future DirectionsSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
- [676] arXiv:2403.18339 (replaced) [pdf, other]
-
Title: H2ASeg: Hierarchical Adaptive Interaction and Weighting Network for Tumor Segmentation in PET/CT ImagesComments: 10 pages,4 figuresSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
- [677] arXiv:2403.18346 (replaced) [pdf, other]
-
Title: Quantifying and Mitigating Unimodal Biases in Multimodal Large Language Models: A Causal PerspectiveSubjects: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
- [678] arXiv:2403.18361 (replaced) [pdf, other]
-
Title: ViTAR: Vision Transformer with Any ResolutionAuthors: Qihang Fan, Quanzeng You, Xiaotian Han, Yongfei Liu, Yunzhe Tao, Huaibo Huang, Ran He, Hongxia YangSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [679] arXiv:2403.18479 (replaced) [pdf, other]
-
Title: Lightweight Embeddings for Graph Collaborative FilteringComments: Accepted by SIGIR '24Subjects: Information Retrieval (cs.IR)
- [680] arXiv:2403.18605 (replaced) [pdf, other]
-
Title: FlexEdit: Flexible and Controllable Diffusion-based Object-centric Image EditingComments: Our project page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [681] arXiv:2403.18621 (replaced) [pdf, other]
-
Title: Performance Analysis of Integrated Sensing and Communication Networks with Blockage EffectsComments: Submitted to IEEE Transactions on Vehicular TechnologySubjects: Information Theory (cs.IT); Signal Processing (eess.SP)
- [682] arXiv:2403.18703 (replaced) [pdf, other]
-
Title: FPGA-Based Neural Thrust Controller for UAVsAuthors: Sharif Azem, David Scheunert, Mengguang Li, Jonas Gehrunger, Kai Cui, Christian Hochberger, Heinz KoepplSubjects: Systems and Control (eess.SY); Machine Learning (cs.LG)
- [683] arXiv:2403.18755 (replaced) [pdf, other]
-
Title: Many-Objective Evolutionary Influence Maximization: Balancing Spread, Budget, Fairness, and TimeComments: To appear in Genetic and Evolutionary Computation Conference (GECCO 24 Companion), July 14 18, 2024, Melbourne, VIC, Australia. ACM, New York, NY, USASubjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Social and Information Networks (cs.SI)
- [684] arXiv:2403.18807 (replaced) [pdf, other]
-
Title: ECoDepth: Effective Conditioning of Diffusion Models for Monocular Depth EstimationComments: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[ showing up to 2000 entries per page: fewer | more ]
Disable MathJax (What is MathJax?)
Links to: arXiv, form interface, find, cs, recent, 2403, contact, help (Access key information)