Computer Science

New submissions

Submissions received from Fri 10 May 24 to Mon 13 May 24, announced Tue, 14 May 24

[ total of 1149 entries: 1-500 | 501-1000 | 1001-1149 ]
[ showing 500 entries per page: fewer | more | all ]

New submissions for Tue, 14 May 24 (showing first 500 of 634 entries)

[1] arXiv:2405.06643 [pdf, ps, other]: Title: Levels of AI Agents: from Rules to Large Language Models

Authors: Yu Huang

Subjects: Computation and Language (cs.CL)

AI agents are defined as artificial entities to perceive the environment, make decisions and take actions. Inspired by the 6 levels of autonomous driving by Society of Automotive Engineers, the AI agents are also categorized based on utilities and strongness, as the following levels: L0, no AI, with tools taking into account perception plus actions; L1, using rule-based AI; L2, making rule-based AI replaced by IL/RL-based AI, with additional reasoning & decision making; L3, applying LLM-based AI instead of IL/RL-based AI, additionally setting up memory & reflection; L4, based on L3, facilitating autonomous learning & generalization; L5, based on L4, appending personality of emotion and character and collaborative behavior with multi-agents.
[2] arXiv:2405.06646 [pdf, other]: Title: On-the-fly Learning to Transfer Motion Style with Diffusion Models: A Semantic Guidance Approach

Authors: Lei Hu, Zihao Zhang, Yongjing Ye, Yiwen Xu, Shihong Xia

Comments: 23 pages

Subjects: Graphics (cs.GR); Computer Vision and Pattern Recognition (cs.CV)

In recent years, the emergence of generative models has spurred development of human motion generation, among which the generation of stylized human motion has consistently been a focal point of research. The conventional approach for stylized human motion generation involves transferring the style from given style examples to new motions. Despite decades of research in human motion style transfer, it still faces three main challenges: 1) difficulties in decoupling the motion content and style; 2) generalization to unseen motion style. 3) requirements of dedicated motion style dataset; To address these issues, we propose an on-the-fly human motion style transfer learning method based on the diffusion model, which can learn a style transfer model in a few minutes of fine-tuning to transfer an unseen style to diverse content motions. The key idea of our method is to consider the denoising process of the diffusion model as a motion translation process that learns the difference between the style-neutral motion pair, thereby avoiding the challenge of style and content decoupling. Specifically, given an unseen style example, we first generate the corresponding neutral motion through the proposed Style-Neutral Motion Pair Generation module. We then add noise to the generated neutral motion and denoise it to be close to the style example to fine-tune the style transfer diffusion model. We only need one style example and a text-to-motion dataset with predominantly neutral motion (e.g. HumanML3D). The qualitative and quantitative evaluations demonstrate that our method can achieve state-of-the-art performance and has practical applications.
[3] arXiv:2405.06650 [pdf, other]: Title: Large Language Models as Planning Domain Generators

Authors: James Oswald, Kavitha Srinivas, Harsha Kokel, Junkyu Lee, Michael Katz, Shirin Sohrabi

Comments: Published at ICAPS 2024

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Developing domain models is one of the few remaining places that require manual human labor in AI planning. Thus, in order to make planning more accessible, it is desirable to automate the process of domain model generation. To this end, we investigate if large language models (LLMs) can be used to generate planning domain models from simple textual descriptions. Specifically, we introduce a framework for automated evaluation of LLM-generated domains by comparing the sets of plans for domain instances. Finally, we perform an empirical analysis of 7 large language models, including coding and chat models across 9 different planning domains, and under three classes of natural language domain descriptions. Our results indicate that LLMs, particularly those with high parameter counts, exhibit a moderate level of proficiency in generating correct planning domains from natural language descriptions. Our code is available at https://github.com/IBM/NL2PDDL.
[4] arXiv:2405.06652 [pdf, ps, other]: Title: Large Language Model (LLM) AI text generation detection based on transformer deep learning algorithm

Authors: Yuhong Mo, Hao Qin, Yushan Dong, Ziyi Zhu, Zhenglin Li

Comments: 6 pages

Subjects: Computation and Language (cs.CL)

In this paper, a tool for detecting LLM AI text generation is developed based on the Transformer model, aiming to improve the accuracy of AI text generation detection and provide reference for subsequent research. Firstly the text is Unicode normalised, converted to lowercase form, characters other than non-alphabetic characters and punctuation marks are removed by regular expressions, spaces are added around punctuation marks, first and last spaces are removed, consecutive ellipses are replaced with single spaces and the text is connected using the specified delimiter. Next remove non-alphabetic characters and extra whitespace characters, replace multiple consecutive whitespace characters with a single space and again convert to lowercase form. The deep learning model combines layers such as LSTM, Transformer and CNN for text classification or sequence labelling tasks. The training and validation sets show that the model loss decreases from 0.127 to 0.005 and accuracy increases from 94.96 to 99.8, indicating that the model has good detection and classification ability for AI generated text. The test set confusion matrix and accuracy show that the model has 99% prediction accuracy for AI-generated text, with a precision of 0.99, a recall of 1, and an f1 score of 0.99, achieving a very high classification accuracy. Looking forward, it has the prospect of wide application in the field of AI text detection.
[5] arXiv:2405.06656 [pdf, other]: Title: Exploring Social Media Posts for Depression Identification: A Study on Reddit Dataset

Authors: Nandigramam Sai Harshit, Nilesh Kumar Sahu, Haroon R. Lone

Comments: Accepted as a poster in IndiaHCI 2023

Subjects: Computation and Language (cs.CL); Social and Information Networks (cs.SI)

Depression is one of the most common mental disorders affecting an individual's personal and professional life. In this work, we investigated the possibility of utilizing social media posts to identify depression in individuals. To achieve this goal, we conducted a preliminary study where we extracted and analyzed the top Reddit posts made in 2022 from depression-related forums. The collected data were labeled as depressive and non-depressive using UMLS Metathesaurus. Further, the pre-processed data were fed to classical machine learning models, where we achieved an accuracy of 92.28\% in predicting the depressive and non-depressive posts.
[6] arXiv:2405.06664 [pdf, other]: Title: A categorical account of composition methods in logic (extended version)

Authors: Tomáš Jakl, Dan Marsden, Nihil Shah

Comments: This is an extended version of arXiv:2304.10196 which, apart from providing full proofs of all statements, takes a more categorical point of view to tell the whole story. In particular, we highlight and explain the underlying categorical constructions in detail

Subjects: Logic in Computer Science (cs.LO); Category Theory (math.CT)

We present a categorical theory of the composition methods in finite model theory -- a key technique enabling modular reasoning about complex structures by building them out of simpler components. The crucial results required by the composition methods are Feferman--Vaught--Mostowski (FVM) type theorems, which characterize how logical equivalence behaves under composition and transformation of models.
Our results are developed by extending the recently introduced game comonad semantics for model comparison games. This level of abstraction allow us to give conditions yielding FVM type results in a uniform way. Our theorems are parametric in the classes of models, logics and operations involved. Furthermore, they naturally account for the existential and positive existential fragments, and extensions with counting quantifiers of these logics. We also reveal surprising connections between FVM type theorems, and classical concepts in the theory of monads.
We illustrate our methods by recovering many classical theorems of practical interest, including a refinement of a previous result by Dawar, Severini, and Zapata concerning the 3-variable counting logic and cospectrality. To highlight the importance of our techniques being parametric in the logic of interest, we prove a family of FVM theorems for products of structures, uniformly in the logic in question, which cannot be done using specific game arguments.
This is an extended version of the LiCS 2023 conference paper of the same name.
[7] arXiv:2405.06665 [pdf, other]: Title: Enhancing Language Models for Financial Relation Extraction with Named Entities and Part-of-Speech

Authors: Menglin Li, Kwan Hui Lim

Comments: Accepted to ICLR 2024 Tiny Paper Track

Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR); Machine Learning (cs.LG)

The Financial Relation Extraction (FinRE) task involves identifying the entities and their relation, given a piece of financial statement/text. To solve this FinRE problem, we propose a simple but effective strategy that improves the performance of pre-trained language models by augmenting them with Named Entity Recognition (NER) and Part-Of-Speech (POS), as well as different approaches to combine these information. Experiments on a financial relations dataset show promising results and highlights the benefits of incorporating NER and POS in existing models. Our dataset and codes are available at https://github.com/kwanhui/FinRelExtract.
[8] arXiv:2405.06667 [pdf, other]: Title: Sentiment Polarity Analysis of Bangla Food Reviews Using Machine and Deep Learning Algorithms

Authors: Al Amin, Anik Sarkar, Md Mahamodul Islam, Asif Ahammad Miazee, Md Robiul Islam, Md Mahmudul Hoque

Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

The Internet has become an essential tool for people in the modern world. Humans, like all living organisms, have essential requirements for survival. These include access to atmospheric oxygen, potable water, protective shelter, and sustenance. The constant flux of the world is making our existence less complicated. A significant portion of the population utilizes online food ordering services to have meals delivered to their residences. Although there are numerous methods for ordering food, customers sometimes experience disappointment with the food they receive. Our endeavor was to establish a model that could determine if food is of good or poor quality. We compiled an extensive dataset of over 1484 online reviews from prominent food ordering platforms, including Food Panda and HungryNaki. Leveraging the collected data, a rigorous assessment of various deep learning and machine learning techniques was performed to determine the most accurate approach for predicting food quality. Out of all the algorithms evaluated, logistic regression emerged as the most accurate, achieving an impressive 90.91% accuracy. The review offers valuable insights that will guide the user in deciding whether or not to order the food.
[9] arXiv:2405.06668 [pdf, other]: Title: Exposing and Explaining Fake News On-the-Fly

Authors: Francisco de Arriba-Pérez, Silvia García-Méndez, Fátima Leal, Benedita Malheiro, Juan Carlos Burguillo

Journal-ref: Mach Learn (2024)

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Social and Information Networks (cs.SI)

Social media platforms enable the rapid dissemination and consumption of information. However, users instantly consume such content regardless of the reliability of the shared data. Consequently, the latter crowdsourcing model is exposed to manipulation. This work contributes with an explainable and online classification method to recognize fake news in real-time. The proposed method combines both unsupervised and supervised Machine Learning approaches with online created lexica. The profiling is built using creator-, content- and context-based features using Natural Language Processing techniques. The explainable classification mechanism displays in a dashboard the features selected for classification and the prediction confidence. The performance of the proposed solution has been validated with real data sets from Twitter and the results attain 80 % accuracy and macro F-measure. This proposal is the first to jointly provide data stream processing, profiling, classification and explainability. Ultimately, the proposed early detection, isolation and explanation of fake news contribute to increase the quality and trustworthiness of social media contents.
[10] arXiv:2405.06669 [pdf, other]: Title: Instruction-Guided Bullet Point Summarization of Long Financial Earnings Call Transcripts

Authors: Subhendu Khatuya, Koushiki Sinha, Niloy Ganguly, Saptarshi Ghosh, Pawan Goyal

Comments: Accepted in SIGIR 2024

Subjects: Computation and Language (cs.CL); Computational Engineering, Finance, and Science (cs.CE); Information Retrieval (cs.IR); Machine Learning (cs.LG)

While automatic summarization techniques have made significant advancements, their primary focus has been on summarizing short news articles or documents that have clear structural patterns like scientific articles or government reports. There has not been much exploration into developing efficient methods for summarizing financial documents, which often contain complex facts and figures. Here, we study the problem of bullet point summarization of long Earning Call Transcripts (ECTs) using the recently released ECTSum dataset. We leverage an unsupervised question-based extractive module followed by a parameter efficient instruction-tuned abstractive module to solve this task. Our proposed model FLAN-FinBPS achieves new state-of-the-art performances outperforming the strongest baseline with 14.88% average ROUGE score gain, and is capable of generating factually consistent bullet point summaries that capture the important facts discussed in the ECTs.
[11] arXiv:2405.06670 [pdf, other]: Title: TLINet: Differentiable Neural Network Temporal Logic Inference

Authors: Danyang Li, Mingyu Cai, Cristian-Ioan Vasile, Roberto Tron

Subjects: Logic in Computer Science (cs.LO); Machine Learning (cs.LG)

There has been a growing interest in extracting formal descriptions of the system behaviors from data. Signal Temporal Logic (STL) is an expressive formal language used to describe spatial-temporal properties with interpretability. This paper introduces TLINet, a neural-symbolic framework for learning STL formulas. The computation in TLINet is differentiable, enabling the usage of off-the-shelf gradient-based tools during the learning process. In contrast to existing approaches, we introduce approximation methods for max operator designed specifically for temporal logic-based gradient techniques, ensuring the correctness of STL satisfaction evaluation. Our framework not only learns the structure but also the parameters of STL formulas, allowing flexible combinations of operators and various logical structures. We validate TLINet against state-of-the-art baselines, demonstrating that our approach outperforms these baselines in terms of interpretability, compactness, rich expressibility, and computational efficiency.
[12] arXiv:2405.06671 [pdf, other]: Title: Parameter-Efficient Instruction Tuning of Large Language Models For Extreme Financial Numeral Labelling

Authors: Subhendu Khatuya, Rajdeep Mukherjee, Akash Ghosh, Manjunath Hegde, Koustuv Dasgupta, Niloy Ganguly, Saptarshi Ghosh, Pawan Goyal

Subjects: Computation and Language (cs.CL); Computational Engineering, Finance, and Science (cs.CE); Machine Learning (cs.LG)

We study the problem of automatically annotating relevant numerals (GAAP metrics) occurring in the financial documents with their corresponding XBRL tags. Different from prior works, we investigate the feasibility of solving this extreme classification problem using a generative paradigm through instruction tuning of Large Language Models (LLMs). To this end, we leverage metric metadata information to frame our target outputs while proposing a parameter efficient solution for the task using LoRA. We perform experiments on two recently released financial numeric labeling datasets. Our proposed model, FLAN-FinXC, achieves new state-of-the-art performances on both the datasets, outperforming several strong baselines. We explain the better scores of our proposed model by demonstrating its capability for zero-shot as well as the least frequently occurring tags. Also, even when we fail to predict the XBRL tags correctly, our generated output has substantial overlap with the ground-truth in majority of the cases.
[13] arXiv:2405.06673 [pdf, other]: Title: Overview of the EHRSQL 2024 Shared Task on Reliable Text-to-SQL Modeling on Electronic Health Records

Authors: Gyubok Lee, Sunjun Kweon, Seongsu Bae, Edward Choi

Comments: The 6th Clinical Natural Language Processing Workshop at NAACL 2024; Minor Change from Camera-Ready

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Electronic Health Records (EHRs) are relational databases that store the entire medical histories of patients within hospitals. They record numerous aspects of patients' medical care, from hospital admission and diagnosis to treatment and discharge. While EHRs are vital sources of clinical data, exploring them beyond a predefined set of queries requires skills in query languages like SQL. To make information retrieval more accessible, one strategy is to build a question-answering system, possibly leveraging text-to-SQL models that can automatically translate natural language questions into corresponding SQL queries and use these queries to retrieve the answers. The EHRSQL 2024 shared task aims to advance and promote research in developing a question-answering system for EHRs using text-to-SQL modeling, capable of reliably providing requested answers to various healthcare professionals to improve their clinical work processes and satisfy their needs. Among more than 100 participants who applied to the shared task, eight teams completed the entire shared task processes and demonstrated a wide range of methods to effectively solve this task. In this paper, we describe the task of reliable text-to-SQL modeling, the dataset, and the methods and results of the participants. We hope this shared task will spur further research and insights into developing reliable question-answering systems for EHRs.
[14] arXiv:2405.06674 [pdf, other]: Title: Open-SQL Framework: Enhancing Text-to-SQL on Open-source Large Language Models

Authors: Xiaojun Chen, Tianle Wang, Tianhao Qiu, Jianbin Qin, Min Yang

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Despite the success of large language models (LLMs) in Text-to-SQL tasks, open-source LLMs encounter challenges in contextual understanding and response coherence. To tackle these issues, we present \ours, a systematic methodology tailored for Text-to-SQL with open-source LLMs. Our contributions include a comprehensive evaluation of open-source LLMs in Text-to-SQL tasks, the \openprompt strategy for effective question representation, and novel strategies for supervised fine-tuning. We explore the benefits of Chain-of-Thought in step-by-step inference and propose the \openexample method for enhanced few-shot learning. Additionally, we introduce token-efficient techniques, such as \textbf{Variable-length Open DB Schema}, \textbf{Target Column Truncation}, and \textbf{Example Column Truncation}, addressing challenges in large-scale databases. Our findings emphasize the need for further investigation into the impact of supervised fine-tuning on contextual learning capabilities. Remarkably, our method significantly improved Llama2-7B from 2.54\% to 41.04\% and Code Llama-7B from 14.54\% to 48.24\% on the BIRD-Dev dataset. Notably, the performance of Code Llama-7B surpassed GPT-4 (46.35\%) on the BIRD-Dev dataset.
[15] arXiv:2405.06676 [pdf, other]: Title: EDA Corpus: A Large Language Model Dataset for Enhanced Interaction with OpenROAD

Authors: Bing-Yue Wu, Utsav Sharma, Sai Rahul Dhanvi Kankipati, Ajay Yadav, Bintu Kappil George, Sai Ritish Guntupalli, Austin Rovinski, Vidya A. Chhabria

Comments: Under review at Workshop on LLM-Aided Design (LAD'24)

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Hardware Architecture (cs.AR)

Large language models (LLMs) serve as powerful tools for design, providing capabilities for both task automation and design assistance. Recent advancements have shown tremendous potential for facilitating LLM integration into the chip design process; however, many of these works rely on data that are not publicly available and/or not permissively licensed for use in LLM training and distribution. In this paper, we present a solution aimed at bridging this gap by introducing an open-source dataset tailored for OpenROAD, a widely adopted open-source EDA toolchain. The dataset features over 1000 data points and is structured in two formats: (i) a pairwise set comprised of question prompts with prose answers, and (ii) a pairwise set comprised of code prompts and their corresponding OpenROAD scripts. By providing this dataset, we aim to facilitate LLM-focused research within the EDA domain. The dataset is available at https://github.com/OpenROAD-Assistant/EDA-Corpus.
[16] arXiv:2405.06677 [pdf, other]: Title: ATG: Benchmarking Automated Theorem Generation for Generative Language Models

Authors: Xiaohan Lin, Qingxing Cao, Yinya Huang, Zhicheng Yang, Zhengying Liu, Zhenguo Li, Xiaodan Liang

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Humans can develop new theorems to explore broader and more complex mathematical results. While current generative language models (LMs) have achieved significant improvement in automatically proving theorems, their ability to generate new or reusable theorems is still under-explored. Without the new theorems, current LMs struggle to prove harder theorems that are distant from the given hypotheses with the exponentially growing search space. Therefore, this paper proposes an Automated Theorem Generation (ATG) benchmark that evaluates whether an agent can automatically generate valuable (and possibly brand new) theorems that are applicable for downstream theorem proving as reusable knowledge. Specifically, we construct the ATG benchmark by splitting the Metamath library into three sets: axioms, library, and problem based on their proving depth. We conduct extensive experiments to investigate whether current LMs can generate theorems in the library and benefit the problem theorems proving. The results demonstrate that high-quality ATG data facilitates models' performances on downstream ATP. However, there is still room for current LMs to develop better ATG and generate more advanced and human-like theorems. We hope the new ATG challenge can shed some light on advanced complex theorem proving.
[17] arXiv:2405.06680 [pdf, other]: Title: Exploring the Compositional Deficiency of Large Language Models in Mathematical Reasoning

Authors: Jun Zhao, Jingqi Tong, Yurong Mou, Ming Zhang, Qi Zhang, Xuanjing Huang

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Human cognition exhibits systematic compositionality, the algebraic ability to generate infinite novel combinations from finite learned components, which is the key to understanding and reasoning about complex logic. In this work, we investigate the compositionality of large language models (LLMs) in mathematical reasoning. Specifically, we construct a new dataset \textsc{MathTrap}\footnotemark[3] by introducing carefully designed logical traps into the problem descriptions of MATH and GSM8k. Since problems with logical flaws are quite rare in the real world, these represent ``unseen'' cases to LLMs. Solving these requires the models to systematically compose (1) the mathematical knowledge involved in the original problems with (2) knowledge related to the introduced traps. Our experiments show that while LLMs possess both components of requisite knowledge, they do not \textbf{spontaneously} combine them to handle these novel cases. We explore several methods to mitigate this deficiency, such as natural language prompts, few-shot demonstrations, and fine-tuning. We find that LLMs' performance can be \textbf{passively} improved through the above external intervention. Overall, systematic compositionality remains an open challenge for large language models.
[18] arXiv:2405.06681 [pdf, other]: Title: Leveraging Lecture Content for Improved Feedback: Explorations with GPT-4 and Retrieval Augmented Generation

Authors: Sven Jacobs, Steffen Jaschke

Comments: accepted at CSEE&T 2024: 36th International Conference on Software Engineering Education and Training, W\"urzburg, Germany

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)

This paper presents the use of Retrieval Augmented Generation (RAG) to improve the feedback generated by Large Language Models for programming tasks. For this purpose, corresponding lecture recordings were transcribed and made available to the Large Language Model GPT-4 as external knowledge source together with timestamps as metainformation by using RAG. The purpose of this is to prevent hallucinations and to enforce the use of the technical terms and phrases from the lecture. In an exercise platform developed to solve programming problems for an introductory programming lecture, students can request feedback on their solutions generated by GPT-4. For this task GPT-4 receives the students' code solution, the compiler output, the result of unit tests and the relevant passages from the lecture notes available through the use of RAG as additional context. The feedback generated by GPT-4 should guide students to solve problems independently and link to the lecture content, using the time stamps of the transcript as meta-information. In this way, the corresponding lecture videos can be viewed immediately at the corresponding positions. For the evaluation, students worked with the tool in a workshop and decided for each feedback whether it should be extended by RAG or not. First results based on a questionnaire and the collected usage data show that the use of RAG can improve feedback generation and is preferred by students in some situations. Due to the slower speed of feedback generation, the benefits are situation dependent.
[19] arXiv:2405.06682 [pdf, other]: Title: Self-Reflection in LLM Agents: Effects on Problem-Solving Performance

Authors: Matthew Renze, Erhan Guven

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

In this study, we investigated the effects of self-reflection in large language models (LLMs) on problem-solving performance. We instructed nine popular LLMs to answer a series of multiple-choice questions to provide a performance baseline. For each incorrectly answered question, we instructed eight types of self-reflecting LLM agents to reflect on their mistakes and provide themselves with guidance to improve problem-solving. Then, using this guidance, each self-reflecting agent attempted to re-answer the same questions. Our results indicate that LLM agents are able to significantly improve their problem-solving performance through self-reflection ($p < 0.001$). In addition, we compared the various types of self-reflection to determine their individual contribution to performance. All code and data are available on GitHub at https://github.com/matthewrenze/self-reflection
[20] arXiv:2405.06683 [pdf, other]: Title: ERAGent: Enhancing Retrieval-Augmented Language Models with Improved Accuracy, Efficiency, and Personalization

Authors: Yunxiao Shi, Xing Zi, Zijing Shi, Haimin Zhang, Qiang Wu, Min Xu

Comments: Draft Paper

Subjects: Computation and Language (cs.CL)

Retrieval-augmented generation (RAG) for language models significantly improves language understanding systems. The basic retrieval-then-read pipeline of response generation has evolved into a more extended process due to the integration of various components, sometimes even forming loop structures. Despite its advancements in improving response accuracy, challenges like poor retrieval quality for complex questions that require the search of multifaceted semantic information, inefficiencies in knowledge re-retrieval during long-term serving, and lack of personalized responses persist. Motivated by transcending these limitations, we introduce ERAGent, a cutting-edge framework that embodies an advancement in the RAG area. Our contribution is the introduction of the synergistically operated module: Enhanced Question Rewriter and Knowledge Filter, for better retrieval quality. Retrieval Trigger is incorporated to curtail extraneous external knowledge retrieval without sacrificing response quality. ERAGent also personalizes responses by incorporating a learned user profile. The efficiency and personalization characteristics of ERAGent are supported by the Experiential Learner module which makes the AI assistant being capable of expanding its knowledge and modeling user profile incrementally. Rigorous evaluations across six datasets and three question-answering tasks prove ERAGent's superior accuracy, efficiency, and personalization, emphasizing its potential to advance the RAG field and its applicability in practical systems.
[21] arXiv:2405.06684 [pdf, ps, other]: Title: QuakeBERT: Accurate Classification of Social Media Texts for Rapid Earthquake Impact Assessment

Authors: Jin Han, Zhe Zheng, Xin-Zheng Lu, Ke-Yin Chen, Jia-Rui Lin

Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Social and Information Networks (cs.SI)

Social media aids disaster response but suffers from noise, hindering accurate impact assessment and decision making for resilient cities, which few studies considered. To address the problem, this study proposes the first domain-specific LLM model and an integrated method for rapid earthquake impact assessment. First, a few categories are introduced to classify and filter microblogs considering their relationship to the physical and social impacts of earthquakes, and a dataset comprising 7282 earthquake-related microblogs from twenty earthquakes in different locations is developed as well. Then, with a systematic analysis of various influential factors, QuakeBERT, a domain-specific large language model (LLM), is developed and fine-tuned for accurate classification and filtering of microblogs. Meanwhile, an integrated method integrating public opinion trend analysis, sentiment analysis, and keyword-based physical impact quantification is introduced to assess both the physical and social impacts of earthquakes based on social media texts. Experiments show that data diversity and data volume dominate the performance of QuakeBERT and increase the macro average F1 score by 27%, while the best classification model QuakeBERT outperforms the CNN- or RNN-based models by improving the macro average F1 score from 60.87% to 84.33%. Finally, the proposed approach is applied to assess two earthquakes with the same magnitude and focal depth. Results show that the proposed approach can effectively enhance the impact assessment process by accurate detection of noisy microblogs, which enables effective post-disaster emergency responses to create more resilient cities.
[22] arXiv:2405.06685 [pdf, other]: Title: Multigenre AI-powered Story Composition

Authors: Edirlei Soares de Lima, Margot M. E. Neggers, Antonio L. Furtado

Subjects: Computation and Language (cs.CL)

This paper shows how to construct genre patterns, whose purpose is to guide interactive story composition in a way that enforces thematic consistency. To start the discussion we argue, based on previous seminal works, for the existence of five fundamental genres, namely comedy, romance - in the sense of epic plots, flourishing since the twelfth century -, tragedy, satire, and mystery. To construct the patterns, a simple two-phase process is employed: first retrieving examples that match our genre characterizations, and then applying a form of most specific generalization to the groups of examples in order to find their commonalities. In both phases, AI agents are instrumental, with our PatternTeller prototype being called to operate the story composition process, offering the opportunity to generate stories from a given premise of the user, to be developed under the guidance of the chosen pattern and trying to accommodate the user's suggestions along the composition stages.
[23] arXiv:2405.06686 [pdf, other]: Title: Word2World: Generating Stories and Worlds through Large Language Models

Authors: Muhammad U. Nasir, Steven James, Julian Togelius

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Large Language Models (LLMs) have proven their worth across a diverse spectrum of disciplines. LLMs have shown great potential in Procedural Content Generation (PCG) as well, but directly generating a level through a pre-trained LLM is still challenging. This work introduces Word2World, a system that enables LLMs to procedurally design playable games through stories, without any task-specific fine-tuning. Word2World leverages the abilities of LLMs to create diverse content and extract information. Combining these abilities, LLMs can create a story for the game, design narrative, and place tiles in appropriate places to create coherent worlds and playable games. We test Word2World with different LLMs and perform a thorough ablation study to validate each step. We open-source the code at https://github.com/umair-nasir14/Word2World.
[24] arXiv:2405.06687 [pdf, other]: Title: Hire Me or Not? Examining Language Model's Behavior with Occupation Attributes

Authors: Damin Zhang, Yi Zhang, Geetanjali Bihani, Julia Rayz

Comments: Under review

Subjects: Computation and Language (cs.CL)

With the impressive performance in various downstream tasks, large language models (LLMs) have been widely integrated into production pipelines, like recruitment and recommendation systems. A known issue of models trained on natural language data is the presence of human biases, which can impact the fairness of the system. This paper investigates LLMs' behavior with respect to gender stereotypes, in the context of occupation decision making. Our framework is designed to investigate and quantify the presence of gender stereotypes in LLMs' behavior via multi-round question answering. Inspired by prior works, we construct a dataset by leveraging a standard occupation classification knowledge base released by authoritative agencies. We tested three LLMs (RoBERTa-large, GPT-3.5-turbo, and Llama2-70b-chat) and found that all models exhibit gender stereotypes analogous to human biases, but with different preferences. The distinct preferences of GPT-3.5-turbo and Llama2-70b-chat may imply the current alignment methods are insufficient for debiasing and could introduce new biases contradicting the traditional gender stereotypes.
[25] arXiv:2405.06689 [pdf, ps, other]: Title: Policy Iteration for Pareto-Optimal Policies in Stochastic Stackelberg Games

Authors: Mikoto Kudo, Yohei Akimoto

Comments: 21 pages

Subjects: Computer Science and Game Theory (cs.GT); Machine Learning (cs.LG); Multiagent Systems (cs.MA); Optimization and Control (math.OC)

In general-sum stochastic games, a stationary Stackelberg equilibrium (SSE) does not always exist, in which the leader maximizes leader's return for all the initial states when the follower takes the best response against the leader's policy. Existing methods of determining the SSEs require strong assumptions to guarantee the convergence and the coincidence of the limit with the SSE. Moreover, our analysis suggests that the performance at the fixed points of these methods is not reasonable when they are not SSEs. Herein, we introduced the concept of Pareto-optimality as a reasonable alternative to SSEs. We derive the policy improvement theorem for stochastic games with the best-response follower and propose an iterative algorithm to determine the Pareto-optimal policies based on it. Monotone improvement and convergence of the proposed approach are proved, and its convergence to SSEs is proved in a special case.
[26] arXiv:2405.06691 [pdf, other]: Title: Fleet of Agents: Coordinated Problem Solving with Large Language Models using Genetic Particle Filtering

Authors: Akhil Arora, Lars Klein, Nearchos Potamitis, Roland Aydin, Caglar Gulcehre, Robert West

Comments: 11 pages, 1 figure, 4 tables

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

Large language models (LLMs) have significantly evolved, moving from simple output generation to complex reasoning and from stand-alone usage to being embedded into broader frameworks. In this paper, we introduce \emph{Fleet of Agents (FoA)}, a novel framework utilizing LLMs as agents to navigate through dynamic tree searches, employing a genetic-type particle filtering approach. FoA spawns a multitude of agents, each exploring autonomously, followed by a selection phase where resampling based on a heuristic value function optimizes the balance between exploration and exploitation. This mechanism enables dynamic branching, adapting the exploration strategy based on discovered solutions. We experimentally validate FoA using two benchmark tasks, "Game of 24" and "Mini-Crosswords". FoA outperforms the previously proposed Tree-of-Thoughts method in terms of efficacy and efficiency: it significantly decreases computational costs (by calling the value function less frequently) while preserving comparable or even superior accuracy.
[27] arXiv:2405.06692 [pdf, ps, other]: Title: Analyzing Language Bias Between French and English in Conventional Multilingual Sentiment Analysis Models

Authors: Ethan Parker Wong, Faten M'hiri

Comments: Undergraduate Research Project

Subjects: Computation and Language (cs.CL)

Inspired by the 'Bias Considerations in Bilingual Natural Language Processing' report by Statistics Canada, this study delves into potential biases in multilingual sentiment analysis between English and French. Given a 50-50 dataset of French and English, we aim to determine if there exists a language bias and explore how the incorporation of more diverse datasets in the future might affect the equity of multilingual Natural Language Processing (NLP) systems. By employing Support Vector Machine (SVM) and Naive Bayes models on three balanced datasets, we reveal potential biases in multilingual sentiment classification. Utilizing Fairlearn, a tool for assessing bias in machine learning models, our findings indicate nuanced outcomes. With French data outperforming English across accuracy, recall, and F1 score metrics in both models, hinting at a language bias favoring French. However, Fairlearn's metrics suggest that the SVM approaches equitable levels with a demographic parity ratio of 0.963, 0.989, and 0.985 for the three separate datasets, indicating near-equitable treatment across languages. In contrast, Naive Bayes demonstrates greater disparities, evidenced by a demographic parity ratio of 0.813, 0.908, and 0.961. These findings reveal the importance of developing equitable multilingual NLP systems, particularly as we anticipate the inclusion of more datasets in various languages in the future.
[28] arXiv:2405.06694 [pdf, other]: Title: SUTRA: Scalable Multilingual Language Model Architecture

Authors: Abhijit Bendale, Michael Sapienza, Steven Ripplinger, Simon Gibbs, Jaewon Lee, Pranav Mistry

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

In this paper, we introduce SUTRA, multilingual Large Language Model architecture capable of understanding, reasoning, and generating text in over 50 languages. SUTRA's design uniquely decouples core conceptual understanding from language-specific processing, which facilitates scalable and efficient multilingual alignment and learning. Employing a Mixture of Experts framework both in language and concept processing, SUTRA demonstrates both computational efficiency and responsiveness. Through extensive evaluations, SUTRA is demonstrated to surpass existing models like GPT-3.5, Llama2 by 20-30% on leading Massive Multitask Language Understanding (MMLU) benchmarks for multilingual tasks. SUTRA models are also online LLMs that can use knowledge from the internet to provide hallucination-free, factual and up-to-date responses while retaining their multilingual capabilities. Furthermore, we explore the broader implications of its architecture for the future of multilingual AI, highlighting its potential to democratize access to AI technology globally and to improve the equity and utility of AI in regions with predominantly non-English languages. Our findings suggest that SUTRA not only fills pivotal gaps in multilingual model capabilities but also establishes a new benchmark for operational efficiency and scalability in AI applications.
[29] arXiv:2405.06695 [pdf, ps, other]: Title: Utilizing Large Language Models to Generate Synthetic Data to Increase the Performance of BERT-Based Neural Networks

Authors: Chancellor R. Woolsey, Prakash Bisht, Joshua Rothman, Gondy Leroy

Comments: Published in 2024 American Medical Informatics Association (AMIA) Summit March 18-21

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

An important issue impacting healthcare is a lack of available experts. Machine learning (ML) models could resolve this by aiding in diagnosing patients. However, creating datasets large enough to train these models is expensive. We evaluated large language models (LLMs) for data creation. Using Autism Spectrum Disorders (ASD), we prompted ChatGPT and GPT-Premium to generate 4,200 synthetic observations to augment existing medical data. Our goal is to label behaviors corresponding to autism criteria and improve model accuracy with synthetic training data. We used a BERT classifier pre-trained on biomedical literature to assess differences in performance between models. A random sample (N=140) from the LLM-generated data was evaluated by a clinician and found to contain 83% correct example-label pairs. Augmenting data increased recall by 13% but decreased precision by 16%, correlating with higher quality and lower accuracy across pairs. Future work will analyze how different synthetic data traits affect ML outcomes.
[30] arXiv:2405.06696 [pdf, other]: Title: Multi-level Shared Knowledge Guided Learning for Knowledge Graph Completion

Authors: Yongxue Shan, Jie Zhou, Jie Peng, Xin Zhou, Jiaqian Yin, Xiaodong Wang

Comments: The paper has been accepted for publication at TACL. And the arXiv version is a pre-MIT Press publication version

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

In the task of Knowledge Graph Completion (KGC), the existing datasets and their inherent subtasks carry a wealth of shared knowledge that can be utilized to enhance the representation of knowledge triplets and overall performance. However, no current studies specifically address the shared knowledge within KGC. To bridge this gap, we introduce a multi-level Shared Knowledge Guided learning method (SKG) that operates at both the dataset and task levels. On the dataset level, SKG-KGC broadens the original dataset by identifying shared features within entity sets via text summarization. On the task level, for the three typical KGC subtasks - head entity prediction, relation prediction, and tail entity prediction - we present an innovative multi-task learning architecture with dynamically adjusted loss weights. This approach allows the model to focus on more challenging and underperforming tasks, effectively mitigating the imbalance of knowledge sharing among subtasks. Experimental results demonstrate that SKG-KGC outperforms existing text-based methods significantly on three well-known datasets, with the most notable improvement on WN18RR.
[31] arXiv:2405.06697 [pdf, other]: Title: Automated Conversion of Static to Dynamic Scheduler via Natural Language

Authors: Paul Mingzheng Tang, Kenji Kah Hoe Leong, Nowshad Shaik, Hoong Chuin Lau

Comments: 7 pages (excluding appendix), 10 figures, 3 tables

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

In this paper, we explore the potential application of Large Language Models (LLMs) that will automatically model constraints and generate code for dynamic scheduling problems given an existing static model. Static scheduling problems are modelled and coded by optimization experts. These models may be easily obsoleted as the underlying constraints may need to be fine-tuned in order to reflect changes in the scheduling rules. Furthermore, it may be necessary to turn a static model into a dynamic one in order to cope with disturbances in the environment. In this paper, we propose a Retrieval-Augmented Generation (RAG) based LLM model to automate the process of implementing constraints for Dynamic Scheduling (RAGDyS), without seeking help from an optimization modeling expert. Our framework aims to minimize technical complexities related to mathematical modelling and computational workload for end-users, thereby allowing end-users to quickly obtain a new schedule close to the original schedule with changes reflected by natural language constraint descriptions.
[32] arXiv:2405.06699 [pdf, ps, other]: Title: ChatSOS: Vector Database Augmented Generative Question Answering Assistant in Safety Engineering

Authors: Haiyang Tang, Dongping Chen, Qingzhao Chu

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

With the rapid advancement of natural language processing technologies, generative artificial intelligence techniques, represented by large language models (LLMs), are gaining increasing prominence and demonstrating significant potential for applications in safety engineering. However, fundamental LLMs face constraints such as limited training data coverage and unreliable responses. This study develops a vector database from 117 explosion accident reports in China spanning 2013 to 2023, employing techniques such as corpus segmenting and vector embedding. By utilizing the vector database, which outperforms the relational database in information retrieval quality, we provide LLMs with richer, more relevant knowledge. Comparative analysis of LLMs demonstrates that ChatSOS significantly enhances reliability, accuracy, and comprehensiveness, improves adaptability and clarification of responses. These results illustrate the effectiveness of supplementing LLMs with an external database, highlighting their potential to handle professional queries in safety engineering and laying a foundation for broader applications.
[33] arXiv:2405.06701 [pdf, other]: Title: Lightweight Spatial Modeling for Combinatorial Information Extraction From Documents

Authors: Yanfei Dong, Lambert Deng, Jiazheng Zhang, Xiaodong Yu, Ting Lin, Francesco Gelli, Soujanya Poria, Wee Sun Lee

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Documents that consist of diverse templates and exhibit complex spatial structures pose a challenge for document entity classification. We propose KNN-former, which incorporates a new kind of spatial bias in attention calculation based on the K-nearest-neighbor (KNN) graph of document entities. We limit entities' attention only to their local radius defined by the KNN graph. We also use combinatorial matching to address the one-to-one mapping property that exists in many documents, where one field has only one corresponding entity. Moreover, our method is highly parameter-efficient compared to existing approaches in terms of the number of trainable parameters. Despite this, experiments across various datasets show our method outperforms baselines in most entity types. Many real-world documents exhibit combinatorial properties which can be leveraged as inductive biases to improve extraction accuracy, but existing datasets do not cover these documents. To facilitate future research into these types of documents, we release a new ID document dataset that covers diverse templates and languages. We also release enhanced annotations for an existing dataset.
[34] arXiv:2405.06702 [pdf, other]: Title: Malayalam Sign Language Identification using Finetuned YOLOv8 and Computer Vision Techniques

Authors: Abhinand K., Abhiram B. Nair, Dhananjay C., Hanan Hamza, Mohammed Fawaz J., Rahma Fahim K., Anoop V. S

Subjects: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)

Technological advancements and innovations are advancing our daily life in all the ways possible but there is a larger section of society who are deprived of accessing the benefits due to their physical inabilities. To reap the real benefits and make it accessible to society, these talented and gifted people should also use such innovations without any hurdles. Many applications developed these days address these challenges, but localized communities and other constrained linguistic groups may find it difficult to use them. Malayalam, a Dravidian language spoken in the Indian state of Kerala is one of the twenty-two scheduled languages in India. Recent years have witnessed a surge in the development of systems and tools in Malayalam, addressing the needs of Kerala, but many of them are not empathetically designed to cater to the needs of hearing-impaired people. One of the major challenges is the limited or no availability of sign language data for the Malayalam language and sufficient efforts are not made in this direction. In this connection, this paper proposes an approach for sign language identification for the Malayalam language using advanced deep learning and computer vision techniques. We start by developing a labeled dataset for Malayalam letters and for the identification we use advanced deep learning techniques such as YOLOv8 and computer vision. Experimental results show that the identification accuracy is comparable to other sign language identification systems and other researchers in sign language identification can use the model as a baseline to develop advanced models.
[35] arXiv:2405.06703 [pdf, other]: Title: Interpretable Cross-Examination Technique (ICE-T): Using highly informative features to boost LLM performance

Authors: Goran Muric, Ben Delay, Steven Minton

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

In this paper, we introduce the Interpretable Cross-Examination Technique (ICE-T), a novel approach that leverages structured multi-prompt techniques with Large Language Models (LLMs) to improve classification performance over zero-shot and few-shot methods. In domains where interpretability is crucial, such as medicine and law, standard models often fall short due to their "black-box" nature. ICE-T addresses these limitations by using a series of generated prompts that allow an LLM to approach the problem from multiple directions. The responses from the LLM are then converted into numerical feature vectors and processed by a traditional classifier. This method not only maintains high interpretability but also allows for smaller, less capable models to achieve or exceed the performance of larger, more advanced models under zero-shot conditions. We demonstrate the effectiveness of ICE-T across a diverse set of data sources, including medical records and legal documents, consistently surpassing the zero-shot baseline in terms of classification metrics such as F1 scores. Our results indicate that ICE-T can be used for improving both the performance and transparency of AI applications in complex decision-making environments.
[36] arXiv:2405.06704 [pdf, other]: Title: Enhanced Review Detection and Recognition: A Platform-Agnostic Approach with Application to Online Commerce

Authors: Priyabrata Karmakar, John Hawkins

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Online commerce relies heavily on user generated reviews to provide unbiased information about products that they have not physically seen. The importance of reviews has attracted multiple exploitative online behaviours and requires methods for monitoring and detecting reviews. We present a machine learning methodology for review detection and extraction, and demonstrate that it generalises for use across websites that were not contained in the training data. This method promises to drive applications for automatic detection and evaluation of reviews, regardless of their source. Furthermore, we showcase the versatility of our method by implementing and discussing three key applications for analysing reviews: Sentiment Inconsistency Analysis, which detects and filters out unreliable reviews based on inconsistencies between ratings and comments; Multi-language support, enabling the extraction and translation of reviews from various languages without relying on HTML scraping; and Fake review detection, achieved by integrating a trained NLP model to identify and distinguish between genuine and fake reviews.
[37] arXiv:2405.06705 [pdf, other]: Title: LLMs can Find Mathematical Reasoning Mistakes by Pedagogical Chain-of-Thought

Authors: Zhuoxuan Jiang, Haoyuan Peng, Shanshan Feng, Fan Li, Dongsheng Li

Comments: To appear at IJCAI 2024

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Self-correction is emerging as a promising approach to mitigate the issue of hallucination in Large Language Models (LLMs). To facilitate effective self-correction, recent research has proposed mistake detection as its initial step. However, current literature suggests that LLMs often struggle with reliably identifying reasoning mistakes when using simplistic prompting strategies. To address this challenge, we introduce a unique prompting strategy, termed the Pedagogical Chain-of-Thought (PedCoT), which is specifically designed to guide the identification of reasoning mistakes, particularly mathematical reasoning mistakes. PedCoT consists of pedagogical principles for prompts (PPP) design, two-stage interaction process (TIP) and grounded PedCoT prompts, all inspired by the educational theory of the Bloom Cognitive Model (BCM). We evaluate our approach on two public datasets featuring math problems of varying difficulty levels. The experiments demonstrate that our zero-shot prompting strategy significantly outperforms strong baselines. The proposed method can achieve the goal of reliable mathematical mistake identification and provide a foundation for automatic math answer grading. The results underscore the significance of educational theory, serving as domain knowledge, in guiding prompting strategy design for addressing challenging tasks with LLMs effectively.
[38] arXiv:2405.06706 [pdf, other]: Title: Exploring the Capabilities of Large Multimodal Models on Dense Text

Authors: Shuo Zhang, Biao Yang, Zhang Li, Zhiyin Ma, Yuliang Liu, Xiang Bai

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

While large multi-modal models (LMM) have shown notable progress in multi-modal tasks, their capabilities in tasks involving dense textual content remains to be fully explored. Dense text, which carries important information, is often found in documents, tables, and product descriptions. Understanding dense text enables us to obtain more accurate information, assisting in making better decisions. To further explore the capabilities of LMM in complex text tasks, we propose the DT-VQA dataset, with 170k question-answer pairs. In this paper, we conduct a comprehensive evaluation of GPT4V, Gemini, and various open-source LMMs on our dataset, revealing their strengths and weaknesses. Furthermore, we evaluate the effectiveness of two strategies for LMM: prompt engineering and downstream fine-tuning. We find that even with automatically labeled training datasets, significant improvements in model performance can be achieved. We hope that this research will promote the study of LMM in dense text tasks. Code will be released at https://github.com/Yuliang-Liu/MultimodalOCR.
[39] arXiv:2405.06707 [pdf, other]: Title: Hypothesis Testing Prompting Improves Deductive Reasoning in Large Language Models

Authors: Yitian Li, Jidong Tian, Hao He, Yaohui Jin

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Combining different forms of prompts with pre-trained large language models has yielded remarkable results on reasoning tasks (e.g. Chain-of-Thought prompting). However, along with testing on more complex reasoning, these methods also expose problems such as invalid reasoning and fictional reasoning paths. In this paper, we develop \textit{Hypothesis Testing Prompting}, which adds conclusion assumptions, backward reasoning, and fact verification during intermediate reasoning steps. \textit{Hypothesis Testing prompting} involves multiple assumptions and reverses validation of conclusions leading to its unique correct answer. Experiments on two challenging deductive reasoning datasets ProofWriter and RuleTaker show that hypothesis testing prompting not only significantly improves the effect, but also generates a more reasonable and standardized reasoning process.
[40] arXiv:2405.06709 [pdf, other]: Title: Evaluating the Efficacy of AI Techniques in Textual Anonymization: A Comparative Study

Authors: Dimitris Asimopoulos, Ilias Siniosoglou, Vasileios Argyriou, Sotirios K. Goudos, Konstantinos E. Psannis, Nikoleta Karditsioti, Theocharis Saoulidis, Panagiotis Sarigiannidis

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

In the digital era, with escalating privacy concerns, it's imperative to devise robust strategies that protect private data while maintaining the intrinsic value of textual information. This research embarks on a comprehensive examination of text anonymisation methods, focusing on Conditional Random Fields (CRF), Long Short-Term Memory (LSTM), Embeddings from Language Models (ELMo), and the transformative capabilities of the Transformers architecture. Each model presents unique strengths since LSTM is modeling long-term dependencies, CRF captures dependencies among word sequences, ELMo delivers contextual word representations using deep bidirectional language models and Transformers introduce self-attention mechanisms that provide enhanced scalability. Our study is positioned as a comparative analysis of these models, emphasising their synergistic potential in addressing text anonymisation challenges. Preliminary results indicate that CRF, LSTM, and ELMo individually outperform traditional methods. The inclusion of Transformers, when compared alongside with the other models, offers a broader perspective on achieving optimal text anonymisation in contemporary settings.
[41] arXiv:2405.06710 [pdf, other]: Title: Mobile Sequencers

Authors: Cem Bozsahin

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)

The article is an attempt to contribute to explorations of a common origin for language and planned-collaborative action. It gives `semantics of change' the central stage in the synthesis, from its history and recordkeeping to its development, its syntax, delivery and reception, including substratal aspects.
It is suggested that to arrive at a common core, linguistic semantics must be understood as studying through syntax mobile agent's representing, tracking and coping with change and no change. Semantics of actions can be conceived the same way, but through plans instead of syntax. The key point is the following: Sequencing itself, of words and action sequences, brings in more structural interpretation to the sequence than which is immediately evident from the sequents themselves. Mobile sequencers can be understood as subjects structuring reporting, understanding and keeping track of change and no change. The idea invites rethinking of the notion of category, both in language and in planning.
Understanding understanding change by mobile agents is suggested to be about human extended practice, not extended-human practice. That's why linguistics is as important as computer science in the synthesis. It must rely on representational history of acts, thoughts and expressions, personal and public, crosscutting overtness and covertness of these phenomena. It has implication for anthropology in the extended practice, which is covered briefly.
[42] arXiv:2405.06712 [pdf, other]: Title: Digital Diagnostics: The Potential Of Large Language Models In Recognizing Symptoms Of Common Illnesses

Authors: Gaurav Kumar Gupta, Aditi Singh, Sijo Valayakkad Manikandan, Abul Ehtesham

Comments: 14 pages, 4 figures

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

The recent swift development of LLMs like GPT-4, Gemini, and GPT-3.5 offers a transformative opportunity in medicine and healthcare, especially in digital diagnostics. This study evaluates each model diagnostic abilities by interpreting a user symptoms and determining diagnoses that fit well with common illnesses, and it demonstrates how each of these models could significantly increase diagnostic accuracy and efficiency. Through a series of diagnostic prompts based on symptoms from medical databases, GPT-4 demonstrates higher diagnostic accuracy from its deep and complete history of training on medical data. Meanwhile, Gemini performs with high precision as a critical tool in disease triage, demonstrating its potential to be a reliable model when physicians are trying to make high-risk diagnoses. GPT-3.5, though slightly less advanced, is a good tool for medical diagnostics. This study highlights the need to study LLMs for healthcare and clinical practices with more care and attention, ensuring that any system utilizing LLMs promotes patient privacy and complies with health information privacy laws such as HIPAA compliance, as well as the social consequences that affect the varied individuals in complex healthcare contexts. This study marks the start of a larger future effort to study the various ways in which assigning ethical concerns to LLMs task of learning from human biases could unearth new ways to apply AI in complex medical settings.
[43] arXiv:2405.06713 [pdf, ps, other]: Title: Unveiling the Competitive Dynamics: A Comparative Evaluation of American and Chinese LLMs

Authors: Zhenhui Jiang, Jiaxin Li, Yang Liu

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

The strategic significance of Large Language Models (LLMs) in economic expansion, innovation, societal development, and national security has been increasingly recognized since the advent of ChatGPT. This study provides a comprehensive comparative evaluation of American and Chinese LLMs in both English and Chinese contexts. We proposed a comprehensive evaluation framework that encompasses natural language proficiency, disciplinary expertise, and safety and responsibility, and systematically assessed 16 prominent models from the US and China under various operational tasks and scenarios. Our key findings show that GPT 4-Turbo is at the forefront in English contexts, whereas Ernie-Bot 4 stands out in Chinese contexts. The study also highlights disparities in LLM performance across languages and tasks, stressing the necessity for linguistically and culturally nuanced model development. The complementary strengths of American and Chinese LLMs point to the value of Sino-US collaboration in advancing LLM technology. The research presents the current LLM competition landscape and offers valuable insights for policymakers and businesses regarding strategic LLM investments and development. Future work will expand on this framework to include emerging LLM multimodal capabilities and business application assessments.
[44] arXiv:2405.06714 [pdf, other]: Title: Towards a path dependent account of category fluency

Authors: David Heineman, Reba Koenen, Sashank Varma

Comments: To appear at CogSci 2024

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Category fluency is a widely studied cognitive phenomenon, yet two conflicting accounts have been proposed as the underlying retrieval mechanism -- an optimal foraging process deliberately searching through memory (Hills et al., 2012) and a random walk sampling from a semantic network (Abbott et al., 2015). Evidence for both accounts has centered around predicting human patch switches, where both existing models of category fluency produce paradoxically identical results. We begin by peeling back the assumptions made by existing models, namely that each named example only depends on the previous example, by (i) adding an additional bias to model the category transition probability directly and (ii) relying on a large language model to predict based on the entire existing sequence. Then, we present evidence towards resolving the disagreement between each account of foraging by reformulating models as sequence generators. To evaluate, we compare generated category fluency runs to a bank of human-written sequences by proposing a metric based on n-gram overlap. We find category switch predictors do not necessarily produce human-like sequences, in fact the additional biases used by the Hills et al. (2012) model are required to improve generation quality, which are later improved by our category modification. Even generating exclusively with an LLM requires an additional global cue to trigger the patch switching behavior during production. Further tests on only the search process on top of the semantic network highlight the importance of deterministic search to replicate human behavior.
[45] arXiv:2405.06715 [pdf, other]: Title: Enhancing Creativity in Large Language Models through Associative Thinking Strategies

Authors: Pronita Mehrotra, Aishni Parab, Sumit Gulwani

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

This paper explores the enhancement of creativity in Large Language Models (LLMs) like vGPT-4 through associative thinking, a cognitive process where creative ideas emerge from linking seemingly unrelated concepts. Associative thinking strategies have been found to effectively help humans boost creativity. However, whether the same strategies can help LLMs become more creative remains under-explored. In this work, we investigate whether prompting LLMs to connect disparate concepts can augment their creative outputs. Focusing on three domains -- Product Design, Storytelling, and Marketing -- we introduce creativity tasks designed to assess vGPT-4's ability to generate original and useful content. By challenging the models to form novel associations, we evaluate the potential of associative thinking to enhance the creative capabilities of LLMs. Our findings show that leveraging associative thinking techniques can significantly improve the originality of vGPT-4's responses.
[46] arXiv:2405.06719 [pdf, ps, other]: Title: Enhancing Traffic Prediction with Textual Data Using Large Language Models

Authors: Xiannan Huang

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Traffic prediction is pivotal for rational transportation supply scheduling and allocation. Existing researches into short-term traffic prediction, however, face challenges in adequately addressing exceptional circumstances and integrating non-numerical contextual information like weather into models. While, Large language models offer a promising solution due to their inherent world knowledge. However, directly using them for traffic prediction presents drawbacks such as high cost, lack of determinism, and limited mathematical capability. To mitigate these issues, this study proposes a novel approach. Instead of directly employing large models for prediction, it utilizes them to process textual information and obtain embeddings. These embeddings are then combined with historical traffic data and inputted into traditional spatiotemporal forecasting models. The study investigates two types of special scenarios: regional-level and node-level. For regional-level scenarios, textual information is represented as a node connected to the entire network. For node-level scenarios, embeddings from the large model represent additional nodes connected only to corresponding nodes. This approach shows a significant improvement in prediction accuracy according to our experiment of New York Bike dataset.
[47] arXiv:2405.06721 [pdf, other]: Title: Kolmogorov-Arnold Networks are Radial Basis Function Networks

Authors: Ziyao Li

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

This short paper is a fast proof-of-concept that the 3-order B-splines used in Kolmogorov-Arnold Networks (KANs) can be well approximated by Gaussian radial basis functions. Doing so leads to FastKAN, a much faster implementation of KAN which is also a radial basis function (RBF) network.
[48] arXiv:2405.06726 [pdf, other]: Title: Region of Attraction Estimation for Free-Floating Systems under Time-Varying LQR Control

Authors: Lasse Shala, Shubham Vyas, Mohamed Khalil Ben-Larbi, Shivesh Kumar, Enrico Stoll

Subjects: Systems and Control (eess.SY)

Future Active Debris Removal (ADR) and On Orbit Servicing (OOS) missions demand for elaborate closed loop controllers. Feasible control architectures should take into consideration the inherent coupling of the free floating dynamics and the kinematics of the system. Recently, Time-Varying Linear Quadratic Regulators (TVLQR) have been used to stabilize underactuated systems that exhibit a similar kinodynamic coupling. Furthermore, this control approach integrates synergistically with Lyapunov based region of attraction (ROA) estimation, which, in the context of ADR and OOS, allows for reasoning about composability of different sub-maneuvers. In this paper, TVLQR was used to stabilize an ADR detumbling maneuver in simulation. Moreover, the ROA of the closed loop dynamics was estimated using a probabilistic method. In order to demonstrate the real-world applicability for free floating robots, further experiments were conducted onboard a free floating testbed.
[49] arXiv:2405.06728 [pdf, other]: Title: THERADIA WoZ: An Ecological Corpus for Appraisal-based Affect Research in Healthcare

Authors: Hippolyte Fournier, Sina Alisamir, Safaa Azzakhnini, Hanna Chainay, Olivier Koenig, Isabella Zsoldos, Eléeonore Trân, Gérard Bailly, Frédéeric Elisei, Béatrice Bouchot, Brice Varini, Patrick Constant, Joan Fruitet, Franck Tarpin-Bernard, Solange Rossato, François Portet, Fabien Ringeval

Subjects: Human-Computer Interaction (cs.HC)

We present THERADIA WoZ, an ecological corpus designed for audiovisual research on affect in healthcare. Two groups of senior individuals, consisting of 52 healthy participants and 9 individuals with Mild Cognitive Impairment (MCI), performed Computerised Cognitive Training (CCT) exercises while receiving support from a virtual assistant, tele-operated by a human in the role of a Wizard-of-Oz (WoZ). The audiovisual expressions produced by the participants were fully transcribed, and partially annotated based on dimensions derived from recent models of the appraisal theories, including novelty, intrinsic pleasantness, goal conduciveness, and coping. Additionally, the annotations included 23 affective labels drew from the literature of achievement affects. We present the protocols used for the data collection, transcription, and annotation, along with a detailed analysis of the annotated dimensions and labels. Baseline methods and results for their automatic prediction are also presented. The corpus aims to serve as a valuable resource for researchers in affective computing, and is made available to both industry and academia.
[50] arXiv:2405.06730 [pdf, other]: Title: Ocean-DC: An analysis ready data cube framework for environmental and climate change monitoring over the port areas

Authors: Ioannis Kavouras, Ioannis Rallis, Nikolaos Doulamis, Anastasios Doulamis

Subjects: Databases (cs.DB); Atmospheric and Oceanic Physics (physics.ao-ph)

The environmental hazards and climate change effects causes serious problems in land and coastal areas. A solution to this problem can be the periodic monitoring over critical areas, like coastal region with heavy industrial activity (i.e., ship-buildings) or areas where a disaster (i.e., oil-spill) has occurred. Today there are several Earth and non-Earth Observation data available from several data providers. These data are huge in size and usually it is needed to combine several data from multiple sources (i.e., data with format differences) for a more effective evaluation. For addressing these issues, this work proposes the Ocean-DC framework as a solution in data harmonization and homogenization. A strong advantage of this Data Cube implementation is the generation of a single NetCDF product that contains Earth Observation data of several data types (i.e., Landsat-8 and Sentinel-2). To evaluate the effectiveness and efficiency of the Ocean-DC implementation, it is examined a case study of an oil-spill in Saronic gulf in September of 2017. The generated 4D Data Cube considers both Landsat-8,9 and Sentinel-2 products for a time-series analysis, before, during, and after the oil-spill event. The Ocean-DC framework successfully generated a NetCDF product, containing all the necessary remote sensing products for monitoring the oil-spill disaster in the Saronic gulf.
[51] arXiv:2405.06747 [pdf, other]: Title: Music Emotion Prediction Using Recurrent Neural Networks

Authors: Xinyu Chang, Xiangyu Zhang, Haoruo Zhang, Yulu Ran

Comments: 15 pages, 13 figures

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)

This study explores the application of recurrent neural networks to recognize emotions conveyed in music, aiming to enhance music recommendation systems and support therapeutic interventions by tailoring music to fit listeners' emotional states. We utilize Russell's Emotion Quadrant to categorize music into four distinct emotional regions and develop models capable of accurately predicting these categories. Our approach involves extracting a comprehensive set of audio features using Librosa and applying various recurrent neural network architectures, including standard RNNs, Bidirectional RNNs, and Long Short-Term Memory (LSTM) networks. Initial experiments are conducted using a dataset of 900 audio clips, labeled according to the emotional quadrants. We compare the performance of our neural network models against a set of baseline classifiers and analyze their effectiveness in capturing the temporal dynamics inherent in musical expression. The results indicate that simpler RNN architectures may perform comparably or even superiorly to more complex models, particularly in smaller datasets. We've also applied the following experiments on larger datasets: one is augmented based on our original dataset, and the other is from other sources. This research not only enhances our understanding of the emotional impact of music but also demonstrates the potential of neural networks in creating more personalized and emotionally resonant music recommendation and therapy systems.
[52] arXiv:2405.06749 [pdf, other]: Title: Ensuring UAV Safety: A Vision-only and Real-time Framework for Collision Avoidance Through Object Detection, Tracking, and Distance Estimation

Authors: Vasileios Karampinis, Anastasios Arsenos, Orfeas Filippopoulos, Evangelos Petrongonas, Christos Skliros, Dimitrios Kollias, Stefanos Kollias, Athanasios Voulodimos

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

In the last twenty years, unmanned aerial vehicles (UAVs) have garnered growing interest due to their expanding applications in both military and civilian domains. Detecting non-cooperative aerial vehicles with efficiency and estimating collisions accurately are pivotal for achieving fully autonomous aircraft and facilitating Advanced Air Mobility (AAM). This paper presents a deep-learning framework that utilizes optical sensors for the detection, tracking, and distance estimation of non-cooperative aerial vehicles. In implementing this comprehensive sensing framework, the availability of depth information is essential for enabling autonomous aerial vehicles to perceive and navigate around obstacles. In this work, we propose a method for estimating the distance information of a detected aerial object in real time using only the input of a monocular camera. In order to train our deep learning components for the object detection, tracking and depth estimation tasks we utilize the Amazon Airborne Object Tracking (AOT) Dataset. In contrast to previous approaches that integrate the depth estimation module into the object detector, our method formulates the problem as image-to-image translation. We employ a separate lightweight encoder-decoder network for efficient and robust depth estimation. In a nutshell, the object detection module identifies and localizes obstacles, conveying this information to both the tracking module for monitoring obstacle movement and the depth estimation module for calculating distances. Our approach is evaluated on the Airborne Object Tracking (AOT) dataset which is the largest (to the best of our knowledge) air-to-air airborne object dataset.
[53] arXiv:2405.06754 [pdf, other]: Title: Wall-Street: Smart Surface-Enabled 5G mmWave for Roadside Networking

Authors: Kun Woo Cho, Prasanthi Maddala, Ivan Seskar, Kyle Jamieson

Comments: 15 pages, 22 figures, under submission

Subjects: Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)

5G mmWave roadside networks promise high-speed wireless connectivity, but face significant challenges in maintaining reliable connections for users moving at high speed. Frequent handovers, complex beam alignment, and signal attenuation due to obstacles like car bodies lead to service interruptions and degraded performance. We present Wall-Street, a smart surface installed on vehicles to enhance 5G mmWave connectivity for users inside. Wall-Street improves mobility management by (1) steering outdoor mmWave signals into the vehicle, ensuring coverage for all users; (2) enabling simultaneous serving cell data transfer and candidate handover cell measurement, allowing seamless handovers without service interruption; and (3) combining beams from source and target cells during a handover to increase reliability. Through its flexible and diverse signal manipulation capabilities, Wall-Street provides uninterrupted high-speed connectivity for latency-sensitive applications in challenging mobile environments. We have implemented and integrated Wall-Street in the COSMOS testbed and evaluated its real-time performance with four gNBs and a mobile client inside a surface-enabled vehicle, driving on a nearby road. Wall-Street achieves a 2.5-3.4x TCP throughput improvement and a 0.4-0.8x reduction in delay over a baseline 5G Standalone handover protocol.
[54] arXiv:2405.06758 [pdf, other]: Title: Scalable and Effective Arithmetic Tree Generation for Adder and Multiplier Designs

Authors: Yao Lai, Jinxin Liu, David Z. Pan, Ping Luo

Subjects: Machine Learning (cs.LG)

Across a wide range of hardware scenarios, the computational efficiency and physical size of the arithmetic units significantly influence the speed and footprint of the overall hardware system. Nevertheless, the effectiveness of prior arithmetic design techniques proves inadequate, as it does not sufficiently optimize speed and area, resulting in a reduced processing rate and larger module size. To boost the arithmetic performance, in this work, we focus on the two most common and fundamental arithmetic modules: adders and multipliers. We cast the design tasks as single-player tree generation games, leveraging reinforcement learning techniques to optimize their arithmetic tree structures. Such a tree generation formulation allows us to efficiently navigate the vast search space and discover superior arithmetic designs that improve computational efficiency and hardware size within just a few hours. For adders, our approach discovers designs of 128-bit adders that achieve Pareto optimality in theoretical metrics. Compared with the state-of-the-art PrefixRL, our method decreases computational delay and hardware size by up to 26% and 30%, respectively. For multipliers, when compared to RL-MUL, our approach increases speed and reduces size by as much as 49% and 45%. Moreover, the inherent flexibility and scalability of our method enable us to deploy our designs into cutting-edge technologies, as we show that they can be seamlessly integrated into 7nm technology. We believe our work will offer valuable insights into hardware design, further accelerating speed and reducing size through the refined search space and our tree generation methodologies. See our introduction video at https://bit.ly/ArithmeticTree. Codes are released at https://github.com/laiyao1/ArithmeticTree.
[55] arXiv:2405.06760 [pdf, ps, other]: Title: Opportunities for Persian Digital Humanities Research with Artificial Intelligence Language Models; Case Study: Forough Farrokhzad

Authors: Arash Rasti Meymandi, Zahra Hosseini, Sina Davari, Abolfazl Moshiri, Shabnam Rahimi-Golkhandan, Khashayar Namdar, Nikta Feizi, Mohamad Tavakoli-Targhi, Farzad Khalvati

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

This study explores the integration of advanced Natural Language Processing (NLP) and Artificial Intelligence (AI) techniques to analyze and interpret Persian literature, focusing on the poetry of Forough Farrokhzad. Utilizing computational methods, we aim to unveil thematic, stylistic, and linguistic patterns in Persian poetry. Specifically, the study employs AI models including transformer-based language models for clustering of the poems in an unsupervised framework. This research underscores the potential of AI in enhancing our understanding of Persian literary heritage, with Forough Farrokhzad's work providing a comprehensive case study. This approach not only contributes to the field of Persian Digital Humanities but also sets a precedent for future research in Persian literary studies using computational techniques.
[56] arXiv:2405.06761 [pdf, other]: Title: Tree Proof-of-Position Algorithms

Authors: Aida Manzano Kharman, Pietro Ferraro, Homayoun Hamedmoghadam, Robert Shorten

Subjects: Data Structures and Algorithms (cs.DS)

We present a novel class of proof-of-position algorithms: Tree-Proof-of-Position (T-PoP). This algorithm is decentralised, collaborative and can be computed in a privacy preserving manner, such that agents do not need to reveal their position publicly. We make no assumptions of honest behaviour in the system, and consider varying ways in which agents may misbehave. Our algorithm is therefore resilient to highly adversarial scenarios. This makes it suitable for a wide class of applications, namely those in which trust in a centralised infrastructure may not be assumed, or high security risk scenarios. Our algorithm has a worst case quadratic runtime, making it suitable for hardware constrained IoT applications. We also provide a mathematical model that summarises T-PoP's performance for varying operating conditions. We then simulate T-PoP's behaviour with a large number of agent-based simulations, which are in complete agreement with our mathematical model, thus demonstrating its validity. T-PoP can achieve high levels of reliability and security by tuning its operating conditions, both in high and low density environments. Finally, we also present a mathematical model to probabilistically detect platooning attacks.
[57] arXiv:2405.06762 [pdf, other]: Title: LIVE: LaTex Interactive Visual Editing

Authors: Jinwei Lin

Comments: 8 pages, double column, ieee

Subjects: Human-Computer Interaction (cs.HC); Computation and Language (cs.CL)

LaTex coding is one of the main methods of writing an academic paper. When writing a paper, abundant proper visual or graphic components will represent more information volume than the textual data. However, most of the implementation of LaTex graphic items are designed as static items that have some weaknesses in representing more informative figures or tables with an interactive reading experience. To address this problem, we propose LIVE, a novel design methods idea to design interactive LaTex graphic items. To make a lucid representation of the main idea of LIVE, we designed several novels representing implementations that are interactive and enough explanation for the basic level principles. Using LIVE can design more graphic items, which we call the Gitems, and easily and automatically get the relationship of the mutual application of a specific range of papers, which will add more vitality and performance factors into writing of traditional papers especially the review papers. For vividly representing the functions of LIVE, we use the papers from NeRF as the example reference papers. The code of the implementation project is open source.
[58] arXiv:2405.06765 [pdf, other]: Title: Common Corruptions for Enhancing and Evaluating Robustness in Air-to-Air Visual Object Detection

Authors: Anastasios Arsenos, Vasileios Karampinis, Evangelos Petrongonas, Christos Skliros, Dimitrios Kollias, Stefanos Kollias, Athanasios Voulodimos

Subjects: Computer Vision and Pattern Recognition (cs.CV)

The main barrier to achieving fully autonomous flights lies in autonomous aircraft navigation. Managing non-cooperative traffic presents the most important challenge in this problem. The most efficient strategy for handling non-cooperative traffic is based on monocular video processing through deep learning models. This study contributes to the vision-based deep learning aircraft detection and tracking literature by investigating the impact of data corruption arising from environmental and hardware conditions on the effectiveness of these methods. More specifically, we designed $7$ types of common corruptions for camera inputs taking into account real-world flight conditions. By applying these corruptions to the Airborne Object Tracking (AOT) dataset we constructed the first robustness benchmark dataset named AOT-C for air-to-air aerial object detection. The corruptions included in this dataset cover a wide range of challenging conditions such as adverse weather and sensor noise. The second main contribution of this letter is to present an extensive experimental evaluation involving $8$ diverse object detectors to explore the degradation in the performance under escalating levels of corruptions (domain shifts). Based on the evaluation results, the key observations that emerge are the following: 1) One-stage detectors of the YOLO family demonstrate better robustness, 2) Transformer-based and multi-stage detectors like Faster R-CNN are extremely vulnerable to corruptions, 3) Robustness against corruptions is related to the generalization ability of models. The third main contribution is to present that finetuning on our augmented synthetic data results in improvements in the generalisation ability of the object detector in real-world flight experiments.
[59] arXiv:2405.06766 [pdf, ps, other]: Title: Dynamic Optimization of Proton Exchange Membrane Water Electrolyzers Considering Usage-Based Degradation

Authors: Landon Schofield, Benjamin Paren, Ruaridh Macdonald, Yang Shao-Horn, Dharik Mallapragada

Comments: 61 pages, 19 figures, includes SI

Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC)

We present a techno-economic optimization model for evaluating the design and operation of proton exchange membrane (PEM) electrolyzers, crucial for hydrogen production powered by variable renewable electricity. This model integrates a 0-D physics representation of the electrolyzer stack, complete mass and energy balances, operational constraints, and empirical data on use-dependent degradation. Utilizing a decomposition approach, the model predicts optimal electrolyzer size, operation, and necessary hydrogen storage to satisfy baseload demands across various technology and electricity price scenarios. Analysis for 2022 shows that including degradation effects raises the levelized cost of hydrogen from \$4.56/kg to \$6.60/kg and decreases stack life to two years. However, projections for 2030 anticipate a significant reduction in costs to approximately \$2.50/kg due to lower capital expenses, leading to larger stacks, extended lifetimes, and less hydrogen storage. This approach is adaptable to other electrochemical systems relevant for decarbonization.
[60] arXiv:2405.06767 [pdf, other]: Title: Color: A Framework for Applying Graph Coloring to Subgraph Cardinality Estimation

Authors: Kyle Deeds, Diandre Sabale, Moe Kayali, Dan Suciu

Subjects: Databases (cs.DB)

Graph workloads pose a particularly challenging problem for query optimizers. They typically feature large queries made up of entirely many-to-many joins with complex correlations. This puts significant stress on traditional cardinality estimation methods which generally see catastrophic errors when estimating the size of queries with only a handful of joins. To overcome this, we propose COLOR, a framework for subgraph cardinality estimation which applies insights from graph compression theory to produce a compact summary that captures the global topology of the data graph. Further, we identify several key optimizations that enable tractable estimation over this summary even for large query graphs. We then evaluate several designs within this framework and find that they improve accuracy by up to 10$^3$x over all competing methods while maintaining fast inference, a small memory footprint, efficient construction, and graceful degradation under updates.
[61] arXiv:2405.06770 [pdf, other]: Title: Demonstrating Reinforcement Learning and Run Time Assurance for Spacecraft Inspection Using Unmanned Aerial Vehicles

Authors: Kyle Dunlap, Nathaniel Hamilton, Zachary Lippay, Matthew Shubert, Sean Phillips, Kerianne L. Hobbs

Subjects: Systems and Control (eess.SY)

On-orbit spacecraft inspection is an important capability for enabling servicing and manufacturing missions and extending the life of spacecraft. However, as space operations become increasingly more common and complex, autonomous control methods are needed to reduce the burden on operators to individually monitor each mission. In order for autonomous control methods to be used in space, they must exhibit safe behavior that demonstrates robustness to real world disturbances and uncertainty. In this paper, neural network controllers (NNCs) trained with reinforcement learning are used to solve an inspection task, which is a foundational capability for servicing missions. Run time assurance (RTA) is used to assure safety of the NNC in real time, enforcing several different constraints on position and velocity. The NNC and RTA are tested in the real world using unmanned aerial vehicles designed to emulate spacecraft dynamics. The results show this emulation is a useful demonstration of the capability of the NNC and RTA, and the algorithms demonstrate robustness to real world disturbances.
[62] arXiv:2405.06771 [pdf, other]: Title: Space Processor Computation Time Analysis for Reinforcement Learning and Run Time Assurance Control Policies

Authors: Kyle Dunlap, Nathaniel Hamilton, Francisco Viramontes, Derrek Landauer, Evan Kain, Kerianne L. Hobbs

Subjects: Systems and Control (eess.SY)

As the number of spacecraft on orbit continues to grow, it is challenging for human operators to constantly monitor and plan for all missions. Autonomous control methods such as reinforcement learning (RL) have the power to solve complex tasks while reducing the need for constant operator intervention. By combining RL solutions with run time assurance (RTA), safety of these systems can be assured in real time. However, in order to use these algorithms on board a spacecraft, they must be able to run in real time on space grade processors, which are typically outdated and less capable than state-of-the-art equipment. In this paper, multiple RL-trained neural network controllers (NNCs) and RTA algorithms were tested on commercial-off-the-shelf (COTS) and radiation tolerant processors. The results show that all NNCs and most RTA algorithms can compute optimal and safe actions in well under 1 second with room for further optimization before deploying in the real world.
[63] arXiv:2405.06772 [pdf, other]: Title: CANAL -- Cyber Activity News Alerting Language Model: Empirical Approach vs. Expensive LLM

Authors: Urjitkumar Patel, Fang-Chun Yeh, Chinmay Gondhalekar

Comments: Published in 2024 IEEE 3rd International Conference on AI in Cybersecurity (ICAIC), Conference Date: 07-09 February 2024

Journal-ref: 2024 IEEE 3rd International Conference on AI in Cybersecurity (ICAIC), Houston, TX, USA, 2024, pp. 1-12

Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

In today's digital landscape, where cyber attacks have become the norm, the detection of cyber attacks and threats is critically imperative across diverse domains. Our research presents a new empirical framework for cyber threat modeling, adept at parsing and categorizing cyber-related information from news articles, enhancing real-time vigilance for market stakeholders. At the core of this framework is a fine-tuned BERT model, which we call CANAL - Cyber Activity News Alerting Language Model, tailored for cyber categorization using a novel silver labeling approach powered by Random Forest. We benchmark CANAL against larger, costlier LLMs, including GPT-4, LLaMA, and Zephyr, highlighting their zero to few-shot learning in cyber news classification. CANAL demonstrates superior performance by outperforming all other LLM counterparts in both accuracy and cost-effectiveness. Furthermore, we introduce the Cyber Signal Discovery module, a strategic component designed to efficiently detect emerging cyber signals from news articles. Collectively, CANAL and Cyber Signal Discovery module equip our framework to provide a robust and cost-effective solution for businesses that require agile responses to cyber intelligence.
[64] arXiv:2405.06773 [pdf, ps, other]: Title: A Monotone Circuit Construction for Individually-Secure Multi-Secret Sharing

Authors: Cailyn Bass, Alejandro Cohen, Rafael G. L. D'Oliveira, Muriel Médard

Subjects: Information Theory (cs.IT)

In this work, we introduce a new technique for taking a single-secret sharing scheme with a general access structure and transforming it into an individually secure multi-secret sharing scheme where every secret has the same general access structure. To increase the information rate, we consider Individual Security which guarantees zero mutual information with each secret individually, for any unauthorized subsets. Our approach involves identifying which shares of the single-secret sharing scheme can be replaced by linear combinations of messages. When $m-1$ shares are replaced, our scheme obtains an information rate of $m/|S|$, where $S$ is the set of shares. This provides an improvement over the information rate of $1/|S|$ in the original single-secret sharing scheme.
[65] arXiv:2405.06778 [pdf, other]: Title: Shape Conditioned Human Motion Generation with Diffusion Model

Authors: Kebing Xue, Hyewon Seo

Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)

Human motion synthesis is an important task in computer graphics and computer vision. While focusing on various conditioning signals such as text, action class, or audio to guide the generation process, most existing methods utilize skeleton-based pose representation, requiring additional skinning to produce renderable meshes. Given that human motion is a complex interplay of bones, joints, and muscles, considering solely the skeleton for generation may neglect their inherent interdependency, which can limit the variability and precision of the generated results. To address this issue, we propose a Shape-conditioned Motion Diffusion model (SMD), which enables the generation of motion sequences directly in mesh format, conditioned on a specified target mesh. In SMD, the input meshes are transformed into spectral coefficients using graph Laplacian, to efficiently represent meshes. Subsequently, we propose a Spectral-Temporal Autoencoder (STAE) to leverage cross-temporal dependencies within the spectral domain. Extensive experimental evaluations show that SMD not only produces vivid and realistic motions but also achieves competitive performance in text-to-motion and action-to-motion tasks when compared to state-of-the-art methods.
[66] arXiv:2405.06780 [pdf, other]: Title: Deep MMD Gradient Flow without adversarial training

Authors: Alexandre Galashov, Valentin de Bortoli, Arthur Gretton

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

We propose a gradient flow procedure for generative modeling by transporting particles from an initial source distribution to a target distribution, where the gradient field on the particles is given by a noise-adaptive Wasserstein Gradient of the Maximum Mean Discrepancy (MMD). The noise-adaptive MMD is trained on data distributions corrupted by increasing levels of noise, obtained via a forward diffusion process, as commonly used in denoising diffusion probabilistic models. The result is a generalization of MMD Gradient Flow, which we call Diffusion-MMD-Gradient Flow or DMMD. The divergence training procedure is related to discriminator training in Generative Adversarial Networks (GAN), but does not require adversarial training. We obtain competitive empirical performance in unconditional image generation on CIFAR10, MNIST, CELEB-A (64 x64) and LSUN Church (64 x 64). Furthermore, we demonstrate the validity of the approach when MMD is replaced by a lower bound on the KL divergence.
[67] arXiv:2405.06782 [pdf, other]: Title: GraphRelate3D: Context-Dependent 3D Object Detection with Inter-Object Relationship Graphs

Authors: Mingyu Liu, Ekim Yurtsever, Marc Brede, Jun Meng, Walter Zimmer, Xingcheng Zhou, Bare Luka Zagar, Yuning Cui, Alois Knoll

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Accurate and effective 3D object detection is critical for ensuring the driving safety of autonomous vehicles. Recently, state-of-the-art two-stage 3D object detectors have exhibited promising performance. However, these methods refine proposals individually, ignoring the rich contextual information in the object relationships between the neighbor proposals. In this study, we introduce an object relation module, consisting of a graph generator and a graph neural network (GNN), to learn the spatial information from certain patterns to improve 3D object detection. Specifically, we create an inter-object relationship graph based on proposals in a frame via the graph generator to connect each proposal with its neighbor proposals. Afterward, the GNN module extracts edge features from the generated graph and iteratively refines proposal features with the captured edge features. Ultimately, we leverage the refined features as input to the detection head to obtain detection results. Our approach improves upon the baseline PV-RCNN on the KITTI validation set for the car class across easy, moderate, and hard difficulty levels by 0.82%, 0.74%, and 0.58%, respectively. Additionally, our method outperforms the baseline by more than 1% under the moderate and hard levels BEV AP on the test server.
[68] arXiv:2405.06783 [pdf, other]: Title: BLIP: Facilitating the Exploration of Undesirable Consequences of Digital Technologies

Authors: Rock Yuren Pang, Sebastin Santy, René Just, Katharina Reinecke

Comments: To appear in the Proceedings of the CHI Conference on Human Factors in Computing Systems (CHI '24), May 11--16, 2024, Honolulu, HI, USA

Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)

Digital technologies have positively transformed society, but they have also led to undesirable consequences not anticipated at the time of design or development. We posit that insights into past undesirable consequences can help researchers and practitioners gain awareness and anticipate potential adverse effects. To test this assumption, we introduce BLIP, a system that extracts real-world undesirable consequences of technology from online articles, summarizes and categorizes them, and presents them in an interactive, web-based interface. In two user studies with 15 researchers in various computer science disciplines, we found that BLIP substantially increased the number and diversity of undesirable consequences they could list in comparison to relying on prior knowledge or searching online. Moreover, BLIP helped them identify undesirable consequences relevant to their ongoing projects, made them aware of undesirable consequences they "had never considered," and inspired them to reflect on their own experiences with technology.
[69] arXiv:2405.06784 [pdf, other]: Title: Open Challenges and Opportunities in Federated Foundation Models Towards Biomedical Healthcare

Authors: Xingyu Li, Lu Peng, Yuping Wang, Weihua Zhang

Comments: 42 pages

Subjects: Machine Learning (cs.LG)

This survey explores the transformative impact of foundation models (FMs) in artificial intelligence, focusing on their integration with federated learning (FL) for advancing biomedical research. Foundation models such as ChatGPT, LLaMa, and CLIP, which are trained on vast datasets through methods including unsupervised pretraining, self-supervised learning, instructed fine-tuning, and reinforcement learning from human feedback, represent significant advancements in machine learning. These models, with their ability to generate coherent text and realistic images, are crucial for biomedical applications that require processing diverse data forms such as clinical reports, diagnostic images, and multimodal patient interactions.
The incorporation of FL with these sophisticated models presents a promising strategy to harness their analytical power while safeguarding the privacy of sensitive medical data. This approach not only enhances the capabilities of FMs in medical diagnostics and personalized treatment but also addresses critical concerns about data privacy and security in healthcare. This survey reviews the current applications of FMs in federated settings, underscores the challenges, and identifies future research directions including scaling FMs, managing data diversity, and enhancing communication efficiency within FL frameworks. The objective is to encourage further research into the combined potential of FMs and FL, laying the groundwork for groundbreaking healthcare innovations.
[70] arXiv:2405.06794 [pdf, other]: Title: Site-dependent Solutions of Wave Energy Converter Farms with Surrogate Models, Control Co-design, and Layout Optimization

Authors: Saeed Azad, Daniel R. Herber, Suraj Khanal, Gaofeng Jia

Comments: 9 pages, 9 figures

Subjects: Systems and Control (eess.SY)

Design of wave energy converter farms entails multiple domains that are coupled, and thus, their concurrent representation and consideration in early-stage design optimization has the potential to offer new insights and promising solutions with improved performance. Concurrent optimization of physical attributes (e.g., plant) and the control system design is often known as control co-design or CCD. To further improve performance, the layout of the farm must be carefully optimized in order to ensure that constructive effects from hydrodynamic interactions are leveraged, while destructive effects are avoided. The variations in the joint probability distribution of waves, stemming from distinct site locations, affect the farm's performance and can potentially influence decisions regarding optimal plant selection, control strategies, and layout configurations. Therefore, this paper undertakes a concurrent exploration of control co-design and layout optimization for a farm comprising five devices, modeled as heaving cylinders in the frequency domain, situated across four distinct site locations: Alaskan Coasts, East Coast, Pacific Islands, and West Coast. The challenge of efficiently and accurately estimating hydrodynamic coefficients within the optimization loop was mitigated through the application of surrogate modeling and many-body expansion principles. Results indicate the optimized solutions exhibit variations in plant, control, and layout for each candidate site, signifying the importance of system-level design with environmental considerations from the early stages of the design process.
[71] arXiv:2405.06797 [pdf, ps, other]: Title: Exponential Lower Bounds on the Double Oracle Algorithm in Zero-Sum Games

Authors: Brian Hu Zhang, Tuomas Sandholm

Subjects: Computer Science and Game Theory (cs.GT)

The double oracle algorithm is a popular method of solving games, because it is able to reduce computing equilibria to computing a series of best responses. However, its theoretical properties are not well understood. In this paper, we provide exponential lower bounds on the performance of the double oracle algorithm in both partially-observable stochastic games (POSGs) and extensive-form games (EFGs). Our results depend on what is assumed about the tiebreaking scheme -- that is, which meta-Nash equilibrium or best response is chosen, in the event that there are multiple to pick from. In particular, for EFGs, our lower bounds require adversarial tiebreaking, whereas for POSGs, our lower bounds apply regardless of how ties are broken.
[72] arXiv:2405.06800 [pdf, other]: Title: LLM-Generated Black-box Explanations Can Be Adversarially Helpful

Authors: Rohan Ajwani, Shashidhar Reddy Javaji, Frank Rudzicz, Zining Zhu

Subjects: Computation and Language (cs.CL)

Large Language Models (LLMs) are becoming vital tools that help us solve and understand complex problems by acting as digital assistants. LLMs can generate convincing explanations, even when only given the inputs and outputs of these problems, i.e., in a ``black-box'' approach. However, our research uncovers a hidden risk tied to this approach, which we call *adversarial helpfulness*. This happens when an LLM's explanations make a wrong answer look right, potentially leading people to trust incorrect solutions. In this paper, we show that this issue affects not just humans, but also LLM evaluators. Digging deeper, we identify and examine key persuasive strategies employed by LLMs. Our findings reveal that these models employ strategies such as reframing the questions, expressing an elevated level of confidence, and cherry-picking evidence to paint misleading answers in a credible light. To examine if LLMs are able to navigate complex-structured knowledge when generating adversarially helpful explanations, we create a special task based on navigating through graphs. Some LLMs are not able to find alternative paths along simple graphs, indicating that their misleading explanations aren't produced by only logical deductions using complex knowledge. These findings shed light on the limitations of black-box explanation setting. We provide some advice on how to use LLMs as explainers safely.
[73] arXiv:2405.06801 [pdf, other]: Title: LEO Satellite Network Access in the Wild: Potentials, Experiences, and Challenges

Authors: Sami Ma (1), Yi Ching Chou (1), Miao Zhang (1), Hao Fang (1), Haoyuan Zhao (1), Jiangchuan Liu (1), William I. Atlas (2) ((1) Simon Fraser University, (2) Pacific Salmon Foundation)

Comments: 8 pages, 6 figures

Subjects: Networking and Internet Architecture (cs.NI)

In the past three years, working with the Pacific Salmon Foundation and various First Nations groups, we have established Starlink-empowered wild salmon monitoring sites in remote Northern British Columbia, Canada. We report our experiences with the network services in these challenging environments, including deep woods and deep valleys, that lack infrastructural support with some close to Starlink's service boundary at the far north. We assess the portability and mobility of the satellite dishes and the quality of existing network access in underdeveloped countries that Starlink expects to cover. Our experiences suggest that network access based on LEO satellite constellations holds promise but faces hurdles such as energy supply constraints and environmental factors like temperature, precipitation, and solar storms. The presence of wildlife and respecting local residents' culture and heritage pose further complications. We envision several technical solutions addressing the challenges and believe that further regulations will be necessary.
[74] arXiv:2405.06802 [pdf, ps, other]: Title: Summarizing Radiology Reports Findings into Impressions

Authors: Raul Salles de Padua, Imran Qureshi

Comments: 10 pages, 6 figures

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Patient hand-off and triage are two fundamental problems in health care. Often doctors must painstakingly summarize complex findings to efficiently communicate with specialists and quickly make decisions on which patients have the most urgent cases. In pursuit of these challenges, we present (1) a model with state-of-art radiology report summarization performance using (2) a novel method for augmenting medical data, and (3) an analysis of the model limitations and radiology knowledge gain. We also provide a data processing pipeline for future models developed on the the MIMIC CXR dataset. Our best performing model was a fine-tuned BERT-to-BERT encoder-decoder with 58.75/100 ROUGE-L F1, which outperformed specialized checkpoints with more sophisticated attention mechanisms. We investigate these aspects in this work.
[75] arXiv:2405.06804 [pdf, other]: Title: Time-of-arrival Estimation and Phase Unwrapping of Head-related Transfer Functions With Integer Linear Programming

Authors: Chin-Yun Yu, Johan Pauwels, György Fazekas

Comments: Accepted to be presented at Audio Engineering Society 156th Convention, 2024 June, Madrid, Spain

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)

In binaural audio synthesis, aligning head-related impulse responses (HRIRs) in time has been an important pre-processing step, enabling accurate spatial interpolation and efficient data compression. The maximum correlation time delay between spatially nearby HRIRs has previously been used to get accurate and smooth alignment by solving a matrix equation in which the solution has the minimum Euclidean distance to the time delay. However, the Euclidean criterion could lead to an over-smoothing solution in practice. In this paper, we solve the smoothing issue by formulating the task as solving an integer linear programming problem equivalent to minimising an $L^1$-norm. Moreover, we incorporate 1) the cross-correlation of inter-aural HRIRs, and 2) HRIRs with their minimum-phase responses to have more reference measurements for optimisation. We show the proposed method can get more accurate alignments than the Euclidean-based method by comparing the spectral reconstruction loss of time-aligned HRIRs using spherical harmonics representation on seven HRIRs consisting of human and dummy heads. The extra correlation features and the $L^1$-norm are also beneficial in extremely noisy conditions. In addition, this method can be applied to phase unwrapping of head-related transfer functions, where the unwrapped phase could be a compact feature for downstream tasks.
[76] arXiv:2405.06805 [pdf, ps, other]: Title: A (Weakly) Polynomial Algorithm for AIVF Coding

Authors: Reza Hosseini Dolatabadi, Mordecai J. Golin, Arian Zamani

Comments: Expanded version of paper appearing on ISIT 2024

Subjects: Data Structures and Algorithms (cs.DS)

It is possible to improve upon Tunstall coding using a collection of multiple parse trees. The best such results so far are Iwata and Yamamoto's maximum cost AIVF codes. The most efficient algorithm for designing such codes is an iterative one that could run in exponential time. In this paper, we show that this problem fits into the framework of a newly developed technique that uses linear programming with the Ellipsoid method to solve the minimum cost Markov chain problem. This permits constructing maximum cost AIVF codes in (weakly) polynomial time.
[77] arXiv:2405.06806 [pdf, other]: Title: An Empirical Study on the Effectiveness of Large Language Models for SATD Identification and Classification

Authors: Mohammad Sadegh Sheikhaei, Yuan Tian, Shaowei Wang, Bowen Xu

Comments: This is the preprint version of a paper that has been submitted to Empirical Software Engineering

Subjects: Software Engineering (cs.SE)

Self-Admitted Technical Debt (SATD), a concept highlighting sub-optimal choices in software development documented in code comments or other project resources, poses challenges in the maintainability and evolution of software systems. Large language models (LLMs) have demonstrated significant effectiveness across a broad range of software tasks, especially in software text generation tasks. Nonetheless, their effectiveness in tasks related to SATD is still under-researched. In this paper, we investigate the efficacy of LLMs in both identification and classification of SATD. For both tasks, we investigate the performance gain from using more recent LLMs, specifically the Flan-T5 family, across different common usage settings. Our results demonstrate that for SATD identification, all fine-tuned LLMs outperform the best existing non-LLM baseline, i.e., the CNN model, with a 4.4% to 7.2% improvement in F1 score. In the SATD classification task, while our largest fine-tuned model, Flan-T5-XL, still led in performance, the CNN model exhibited competitive results, even surpassing four of six LLMs. We also found that the largest Flan-T5 model, i.e., Flan-T5-XXL, when used with a zero-shot in-context learning (ICL) approach for SATD identification, provides competitive results with traditional approaches but performs 6.4% to 9.2% worse than fine-tuned LLMs. For SATD classification, few-shot ICL approach, incorporating examples and category descriptions in prompts, outperforms the zero-shot approach and even surpasses the fine-tuned smaller Flan-T5 models. Moreover, our experiments demonstrate that incorporating contextual information, such as surrounding code, into the SATD classification task enables larger fine-tuned LLMs to improve their performance.
[78] arXiv:2405.06807 [pdf, other]: Title: Tackling Execution-Based Evaluation for NL2Bash

Authors: Ngoc Phuoc An Vo, Brent Paulovicks, Vadim Sheinin

Subjects: Computation and Language (cs.CL); Software Engineering (cs.SE)

Given recent advancement of Large Language Models (LLMs), the task of translating from natural language prompts to different programming languages (code generation) attracts immense attention for wide application in different domains. Specially code generation for Bash (NL2Bash) is widely used to generate Bash scripts for automating different tasks, such as performance monitoring, compilation, system administration, system diagnostics, etc. Besides code generation, validating synthetic code is critical before using them for any application. Different methods for code validation are proposed, both direct (execution evaluation) and indirect validations (i.e. exact/partial match, BLEU score). Among these, Execution-based Evaluation (EE) can validate the predicted code by comparing the execution output of model prediction and expected output in system. However, designing and implementing such an execution-based evaluation system for NL2Bash is not a trivial task. In this paper, we present a machinery for execution-based evaluation for NL2Bash. We create a set of 50 prompts to evaluate some popular LLMs for NL2Bash. We also analyze several advantages and challenges of EE such as syntactically different yet semantically equivalent Bash scripts generated by different LLMs, or syntactically correct but semantically incorrect Bash scripts, and how we capture and process them correctly.
[79] arXiv:2405.06811 [pdf, other]: Title: Shared Virtual Memory: Its Design and Performance Implications for Diverse Applications

Authors: Bennett Cooper, Thomas R. W. Scogland, Rong Ge

Comments: To be published in ICS '24

Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

Discrete GPU accelerators, while providing massive computing power for supercomputers and data centers, have their separate memory domain. Explicit memory management across device and host domains in programming is tedious and error-prone. To improve programming portability and productivity, Unified Memory (UM) integrates GPU memory into the host virtual memory systems, and provides transparent data migration between them and GPU memory oversubscription. Nevertheless, current UM technologies cause significant performance loss for applications. With AMD GPUs increasingly being integrated into the world's leading supercomputers, it is necessary to understand their Shared Virtual Memory (SVM) and mitigate the performance impacts. In this work, we delve into the SVM design, examine its interactions with applications' data accesses at fine granularity, and quantitatively analyze its performance effects on various applications and identify the performance bottlenecks. Our research reveals that SVM employs an aggressive prefetching strategy for demand paging. This prefetching is efficient when GPU memory is not oversubscribed. However, in tandem with the eviction policy, it causes excessive thrashing and performance degradation for certain applications under oversubscription. We discuss SVM-aware algorithms and SVM design changes to mitigate the performance impacts. To the best of our knowledge, this work is the first in-depth and comprehensive study for SVM technologies.
[80] arXiv:2405.06814 [pdf, other]: Title: Dual-Task Vision Transformer for Rapid and Accurate Intracerebral Hemorrhage Classification on CT Images

Authors: Jialiang Fan, Guoyu Lu, Xinhui Fan

Comments: 9 pages, 4 figure3

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Intracerebral hemorrhage (ICH) is a severe and sudden medical condition caused by the rupture of blood vessels in the brain, leading to permanent damage to brain tissue and often resulting in functional disabilities or death in patients. Diagnosis and analysis of ICH typically rely on brain CT imaging. Given the urgency of ICH conditions, early treatment is crucial, necessitating rapid analysis of CT images to formulate tailored treatment plans. However, the complexity of ICH CT images and the frequent scarcity of specialist radiologists pose significant challenges. Therefore, we built a dataset for ICH and normal classification and three types of ICH image classification based on the hemorrhage location, i.e., Deep, Subcortical, and Lobar. In addition, we propose a dual-task vision transformer (DTViT) for the automated classification and diagnosis of ICH images. This neural network utilizes the encoder from ViT, employing attention mechanisms for feature extraction from CT images. We incorporated two multilayer perception (MLP)-based decoders within the network to simultaneously identify the presence of ICH and classify three types of hemorrhage locations. Experimental results demonstrate that our proposed multi-classification network performs well on the built real-world test dataset. The code and dataset for this study will be made publicly available upon paper acceptance at: https://github.com/Jialiangfan/ICH-classification.
[81] arXiv:2405.06816 [pdf, other]: Title: Non-stationary Domain Generalization: Theory and Algorithm

Authors: Thai-Hoang Pham, Xueru Zhang, Ping Zhang

Comments: Accepted by UAI 2024

Subjects: Machine Learning (cs.LG)

Although recent advances in machine learning have shown its success to learn from independent and identically distributed (IID) data, it is vulnerable to out-of-distribution (OOD) data in an open world. Domain generalization (DG) deals with such an issue and it aims to learn a model from multiple source domains that can be generalized to unseen target domains. Existing studies on DG have largely focused on stationary settings with homogeneous source domains. However, in many applications, domains may evolve along a specific direction (e.g., time, space). Without accounting for such non-stationary patterns, models trained with existing methods may fail to generalize on OOD data. In this paper, we study domain generalization in non-stationary environment. We first examine the impact of environmental non-stationarity on model performance and establish the theoretical upper bounds for the model error at target domains. Then, we propose a novel algorithm based on adaptive invariant representation learning, which leverages the non-stationary pattern to train a model that attains good performance on target domains. Experiments on both synthetic and real data validate the proposed algorithm.
[82] arXiv:2405.06817 [pdf, other]: Title: Coherent Design of Wind Turbine Controllers Considering Transitions between Operating Regions using Fuzzy Membership Functions

Authors: Horst Schulte

Subjects: Systems and Control (eess.SY)

This paper presents a coherent design of wind turbine controllers with explicit consideration of transitions between operating regions by fuzzy membership functions. In improving the design process of wind turbines, the transitions between partial-load operation by torque control and full-load operation by pitch control need to be systematically considered. From the first view, fuzzy methods for blending separately designed control laws are an obvious choice. However, valid design rules must be developed to ensure stability and performance during the transition. A model-based control design procedure in the Takagi-Sugeno fuzzy framework using the sector nonlinearity method is proposed to achieve the above control design objectives. In addition to a detailed mathematical analysis of the design, the method's applicability is verified by simulation studies using a high-fidelity reference wind turbine model.
[83] arXiv:2405.06818 [pdf, ps, other]: Title: The Ghanaian NLP Landscape: A First Look

Authors: Sheriff Issaka, Zhaoyi Zhang, Mihir Heda, Keyi Wang, Yinka Ajibola, Ryan DeMar, Xuefeng Du

Subjects: Computation and Language (cs.CL)

Despite comprising one-third of global languages, African languages are critically underrepresented in Artificial Intelligence (AI), threatening linguistic diversity and cultural heritage. Ghanaian languages, in particular, face an alarming decline, with documented extinction and several at risk. This study pioneers a comprehensive survey of Natural Language Processing (NLP) research focused on Ghanaian languages, identifying methodologies, datasets, and techniques employed. Additionally, we create a detailed roadmap outlining challenges, best practices, and future directions, aiming to improve accessibility for researchers. This work serves as a foundational resource for Ghanaian NLP research and underscores the critical need for integrating global linguistic diversity into AI development.
[84] arXiv:2405.06821 [pdf, other]: Title: Synchronized Object Detection for Autonomous Sorting, Mapping, and Quantification of Medical Materials

Authors: Federico Zocco, Daniel Lake, Shahin Rahimifard

Comments: To be submitted

Subjects: Computer Vision and Pattern Recognition (cs.CV)

The circular economy paradigm is gaining interest as a solution to reduce both material supply uncertainties and waste generation. One of the main challenges is monitoring materials, since in general, something that is not measured cannot be effectively managed. In this paper, we propose real-time synchronized object detection to enable, at the same time, autonomous sorting, mapping, and quantification of end-of-life medical materials. Dataset, code, and demo videos are publicly available.
[85] arXiv:2405.06822 [pdf, other]: Title: MH-pFLID: Model Heterogeneous personalized Federated Learning via Injection and Distillation for Medical Data Analysis

Authors: Luyuan Xie, Manqing Lin, Tianyu Luan, Cong Li, Yuejian Fang, Qingni Shen, Zhonghai Wu

Comments: This paper is accepted by ICML 2024

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Federated learning is widely used in medical applications for training global models without needing local data access. However, varying computational capabilities and network architectures (system heterogeneity), across clients pose significant challenges in effectively aggregating information from non-independently and identically distributed (non-IID) data. Current federated learning methods using knowledge distillation require public datasets, raising privacy and data collection issues. Additionally, these datasets require additional local computing and storage resources, which is a burden for medical institutions with limited hardware conditions. In this paper, we introduce a novel federated learning paradigm, named Model Heterogeneous personalized Federated Learning via Injection and Distillation (MH-pFLID). Our framework leverages a lightweight messenger model that carries concentrated information to collect the information from each client. We also develop a set of receiver and transmitter modules to receive and send information from the messenger model, so that the information could be injected and distilled with efficiency.
[86] arXiv:2405.06823 [pdf, other]: Title: PLeak: Prompt Leaking Attacks against Large Language Model Applications

Authors: Bo Hui, Haolin Yuan, Neil Gong, Philippe Burlina, Yinzhi Cao

Comments: To appear in the Proceedings of The ACM Conference on Computer and Communications Security (CCS), 2024

Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Large Language Models (LLMs) enable a new ecosystem with many downstream applications, called LLM applications, with different natural language processing tasks. The functionality and performance of an LLM application highly depend on its system prompt, which instructs the backend LLM on what task to perform. Therefore, an LLM application developer often keeps a system prompt confidential to protect its intellectual property. As a result, a natural attack, called prompt leaking, is to steal the system prompt from an LLM application, which compromises the developer's intellectual property. Existing prompt leaking attacks primarily rely on manually crafted queries, and thus achieve limited effectiveness.
In this paper, we design a novel, closed-box prompt leaking attack framework, called PLeak, to optimize an adversarial query such that when the attacker sends it to a target LLM application, its response reveals its own system prompt. We formulate finding such an adversarial query as an optimization problem and solve it with a gradient-based method approximately. Our key idea is to break down the optimization goal by optimizing adversary queries for system prompts incrementally, i.e., starting from the first few tokens of each system prompt step by step until the entire length of the system prompt.
We evaluate PLeak in both offline settings and for real-world LLM applications, e.g., those on Poe, a popular platform hosting such applications. Our results show that PLeak can effectively leak system prompts and significantly outperforms not only baselines that manually curate queries but also baselines with optimized queries that are modified and adapted from existing jailbreaking attacks. We responsibly reported the issues to Poe and are still waiting for their response. Our implementation is available at this repository: https://github.com/BHui97/PLeak.
[87] arXiv:2405.06826 [pdf, other]: Title: A Nominal Approach to Probabilistic Separation Logic

Authors: John M. Li, Jon Aytac, Philip Johnson-Freyd, Amal Ahmed, Steven Holtzen

Subjects: Programming Languages (cs.PL); Logic in Computer Science (cs.LO)

Currently, there is a gap between the tools used by probability theorists and those used in formal reasoning about probabilistic programs. On the one hand, a probability theorist decomposes probabilistic state along the simple and natural product of probability spaces. On the other hand, recently developed probabilistic separation logics decompose state via relatively unfamiliar measure-theoretic constructions for computing unions of sigma-algebras and probability measures. We bridge the gap between these two perspectives by showing that these two methods of decomposition are equivalent up to a suitable equivalence of categories. Our main result is a probabilistic analog of the classic equivalence between the category of nominal sets and the Schanuel topos. Through this equivalence, we validate design decisions in prior work on probabilistic separation logic and create new connections to nominal-set-like models of probability.
[88] arXiv:2405.06827 [pdf, other]: Title: Acceleration of Power System Dynamic Simulations using a Deep Equilibrium Layer and Neural ODE Surrogate

Authors: Matthew Bossart, Jose Daniel Lara, Ciaran Roberts, Rodrigo Henriquez-Auba, Duncan Callaway, Bri-Mathias Hodge

Comments: This work has been submitted to the IEEE Transactions on Energy Conversion for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

Subjects: Systems and Control (eess.SY)

The dominant paradigm for power system dynamic simulation is to build system-level simulations by combining physics-based models of individual components. The sheer size of the system along with the rapid integration of inverter-based resources exacerbates the computational burden of running time domain simulations. In this paper, we propose a data-driven surrogate model based on implicit machine learning -- specifically deep equilibrium layers and neural ordinary differential equations -- to learn a reduced order model of a portion of the full underlying system. The data-driven surrogate achieves similar accuracy and reduction in simulation time compared to a physics-based surrogate, without the constraint of requiring detailed knowledge of the underlying dynamic models. This work also establishes key requirements needed to integrate the surrogate into existing simulation workflows; the proposed surrogate is initialized to a steady state operating point that matches the power flow solution by design.
[89] arXiv:2405.06828 [pdf, other]: Title: G-FARS: Gradient-Field-based Auto-Regressive Sampling for 3D Part Grouping

Authors: Junfeng Cheng, Tania Stathaki

Comments: CVPR 2024

Subjects: Computer Vision and Pattern Recognition (cs.CV)

This paper proposes a novel task named "3D part grouping". Suppose there is a mixed set containing scattered parts from various shapes. This task requires algorithms to find out every possible combination among all the parts. To address this challenge, we propose the so called Gradient Field-based Auto-Regressive Sampling framework (G-FARS) tailored specifically for the 3D part grouping task. In our framework, we design a gradient-field-based selection graph neural network (GNN) to learn the gradients of a log conditional probability density in terms of part selection, where the condition is the given mixed part set. This innovative approach, implemented through the gradient-field-based selection GNN, effectively captures complex relationships among all the parts in the input. Upon completion of the training process, our framework becomes capable of autonomously grouping 3D parts by iteratively selecting them from the mixed part set, leveraging the knowledge acquired by the trained gradient-field-based selection GNN. Our code is available at: https://github.com/J-F-Cheng/G-FARS-3DPartGrouping.
[90] arXiv:2405.06829 [pdf, ps, other]: Title: Model Reference Control for Wind Turbine Systems in Full Load Region based on Takagi-Sugeno Fuzzy Systems

Authors: Johannes Brunner, Jens Fortmann, Horst Schulte

Comments: 9 pages, 6 figures

Subjects: Systems and Control (eess.SY)

This paper presents a novel Model Reference Control (MRC) approach for wind turbine (WT) systems in the full load region employing a fuzzy Parallel Distribution Compensation Controller (PDC-C) derived using a Takagi-Sugeno (TS) fuzzy System approach. Through first-order Taylor series expansion, local linear submodels are generated and combined via triangular membership functions to develop a TS descriptor model. From here, the MRC PDC-C is synthesized by a constrained LMI optimization procedure, including damping characteristics of the elastic drive train, to track the desired rotor speed and generator torque based on the reference model dynamics. The controller is tested on the nonlinear WT model in simulation studies under various wind conditions, such as turbulent wind, wind gusts, and a Fault Ride Through (FRT) scenario where the generator torque is set to 0 p.u. for 150 ms.
[91] arXiv:2405.06830 [pdf, other]: Title: Towards Browser Controls to Protect Cookies from Malicious Extensions

Authors: Liam Tyler, Ivan De Oliveira Nunes

Subjects: Cryptography and Security (cs.CR)

Cookies provide a state management mechanism for the web and are often used for authentication, storing a user's session ID, and replacing their credentials in subsequent requests. These ``session cookies'' are valuable targets of attacks such as Session Hijacking and Fixation that attempt to steal them and gain unauthorized access to user accounts. Multiple controls such as the Secure and HttpOnly cookie attributes restrict cookie accessibility, effectively mitigating attacks from the network or malicious websites, but often ignoring untrusted extensions within the user's browser. Extensions are third-party HTML/JavaScript add-ons with access to several privileged APIs and can run on multiple websites at once. Unfortunately, this can provide malicious/compromised extensions with unrestricted access to session cookies.
In this work, we first conduct a study assessing the prevalence of extensions with these ``risky'' APIs (i.e., those enabling cookie modification and theft) and find that they are currently used by hundreds of millions of users. Motivated by this, we propose browser controls based on two new cookie attributes that protect cookies from malicious extensions: BrowserOnly and Tracked. The BrowserOnly attribute prevents accessing cookies from extensions altogether. While effective, not all cookies can be inaccessible. Cookies with the Tracked attribute remain accessible, are tied to a single browser, and record any modifications made by extensions. Thus, stolen Tracked cookies become unusable outside their original browser and servers can verify any modifications. To demonstrate these features' practicality, we implement CREAM (Cookie Restrictions for Extension Abuse Mitigation): a modified version of Chromium realizing these controls. Our evaluation indicates that CREAM controls effectively protect cookies from malicious extensions while incurring small run-time overheads.
[92] arXiv:2405.06831 [pdf, other]: Title: Better Algorithms for Constructing Minimum Cost Markov Chains and AIFV Codes

Authors: Reza Hosseini Dolatabadi, Mordedcai J. Golin, Arian Zamani

Comments: Expanded version of paper appearing in ISIT 2024

Subjects: Data Structures and Algorithms (cs.DS)

The problem of constructing optimal AIFV codes is a special case of that of constructing minimum cost Markov Chains. This paper provides the first complete proof of correctness for the previously known iterative algorithm for constructing such Markov chains.
A recent work describes how to efficiently solve the Markov Chain problem by first constructing a Markov Chain Polytope and then running the Ellipsoid algorithm for linear programming on it. This paper's second result is that, in the AIFV case, a special property of the polytope instead permits solving the corresponding linear program using simple binary search
[93] arXiv:2405.06832 [pdf, other]: Title: Concolic Testing of JavaScript using Sparkplug

Authors: Zhe Li, Fei Xie

Subjects: Software Engineering (cs.SE)

JavaScript is prevalent in web and server apps, handling sensitive data. JS testing methods lag behind other languages. Insitu concolic testing for JS is effective but slow and complex. Our method enhances tracing with V8 Sparkplug baseline compiler and remill libraries for assembly to LLVM IR conversion. Evaluation on 160 Node.js libraries reveals comparable coverage and bug detection in significantly less time than the in-situ method.
[94] arXiv:2405.06835 [pdf, other]: Title: Automating Code Adaptation for MLOps -- A Benchmarking Study on LLMs

Authors: Harsh Patel, Buvaneswari A. Ramanan, Manzoor A. Khan, Thomas Williams, Brian Friedman, Lawrence Drabeck

Comments: The work was completed during 2Q, 3Q of Year 2023, when WizardCoder was the top performing Open source LLM for coding. Newer and better models have emerged since then. The processes and methodologies utilized for this benchmarking can still be utilized for evaluating the current SoTA models

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Software Engineering (cs.SE)

This paper explores the possibilities of the current generation of Large Language Models for incorporating Machine Learning Operations (MLOps) functionalities into ML training code bases. We evaluate the performance of OpenAI (gpt-3.5-turbo) and WizardCoder (open-source, 15B parameters) models on the automated accomplishment of various MLOps functionalities in different settings. We perform a benchmarking study that assesses the ability of these models to: (1) adapt existing code samples (Inlining) with component-specific MLOps functionality such as MLflow and Weights & Biases for experiment tracking, Optuna for hyperparameter optimization etc., and (2) perform the task of Translation from one component of an MLOps functionality to another, e.g., translating existing GitPython library based version control code to Data Version Control library based. We also propose three different approaches that involve teaching LLMs to comprehend the API documentation of the components as a reference while accomplishing the Translation tasks. In our evaluations, the gpt-3.5-turbo model significantly outperforms WizardCoder by achieving impressive Pass@3 accuracy in model optimization (55% compared to 0% by WizardCoder), experiment tracking (100%, compared to 62.5% by WizardCoder), model registration (92% compared to 42% by WizardCoder) and hyperparameter optimization (83% compared to 58% by WizardCoder) on average, in their best possible settings, showcasing its superior code adaptability performance in complex MLOps tasks.
[95] arXiv:2405.06840 [pdf, other]: Title: MEIC: Re-thinking RTL Debug Automation using LLMs

Authors: Ke Xu, Jialin Sun, Yuchen Hu, Xinwei Fang, Weiwei Shan, Xi Wang, Zhe Jiang

Subjects: Hardware Architecture (cs.AR); Software Engineering (cs.SE)

The deployment of Large Language Models (LLMs) for code debugging (e.g., C and Python) is widespread, benefiting from their ability to understand and interpret intricate concepts. However, in the semiconductor industry, utilising LLMs to debug Register Transfer Level (RTL) code is still insufficient, largely due to the underrepresentation of RTL-specific data in training sets. This work introduces a novel framework, Make Each Iteration Count (MEIC), which contrasts with traditional one-shot LLM-based debugging methods that heavily rely on prompt engineering, model tuning, and model training. MEIC utilises LLMs in an iterative process to overcome the limitation of LLMs in RTL code debugging, which is suitable for identifying and correcting both syntax and function errors, while effectively managing the uncertainties inherent in LLM operations. To evaluate our framework, we provide an open-source dataset comprising 178 common RTL programming errors. The experimental results demonstrate that the proposed debugging framework achieves fix rate of 93% for syntax errors and 78% for function errors, with up to 48x speedup in debugging processes when compared with experienced engineers. The Repo. of dataset and code: https://anonymous.4open.science/r/Verilog-Auto-Debug-6E7F/.
[96] arXiv:2405.06841 [pdf, other]: Title: Bridging the Gap: Protocol Towards Fair and Consistent Affect Analysis

Authors: Guanyu Hu, Eleni Papadopoulou, Dimitrios Kollias, Paraskevi Tzouveli, Jie Wei, Xinyu Yang

Comments: accepted at IEEE FG 2024

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

The increasing integration of machine learning algorithms in daily life underscores the critical need for fairness and equity in their deployment. As these technologies play a pivotal role in decision-making, addressing biases across diverse subpopulation groups, including age, gender, and race, becomes paramount. Automatic affect analysis, at the intersection of physiology, psychology, and machine learning, has seen significant development. However, existing databases and methodologies lack uniformity, leading to biased evaluations. This work addresses these issues by analyzing six affective databases, annotating demographic attributes, and proposing a common protocol for database partitioning. Emphasis is placed on fairness in evaluations. Extensive experiments with baseline and state-of-the-art methods demonstrate the impact of these changes, revealing the inadequacy of prior assessments. The findings underscore the importance of considering demographic attributes in affect analysis research and provide a foundation for more equitable methodologies. Our annotations, code and pre-trained models are available at: https://github.com/dkollias/Fair-Consistent-Affect-Analysis
[97] arXiv:2405.06842 [pdf, other]: Title: BitVMX: A CPU for Universal Computation on Bitcoin

Authors: Sergio Demian Lerner, Ramon Amela, Shreemoy Mishra, Martin Jonas, Javier Álvarez Cid-Fuentes

Subjects: Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC)

BitVMX is a new design for a virtual CPU to optimistically execute arbitrary programs on Bitcoin based on a challenge response game introduced in BitVM. Similar to BitVM1 we create a general-purpose CPU to be verified in Bitcoin script. Our design supports common architectures, such as RISC-V or MIPS. Our main contribution to the state of the art is a design that uses hash chains of program traces, memory mapped registers, and a new challenge-response protocol. We present a new message linking protocol as a means to allow authenticated communication between the participants. This protocol emulates stateful smart contracts by sharing state between transactions. This provides a basis for our verification game which uses a graph of pre-signed transactions to support challenge-response interactions. In case of a dispute, the hash chain of program trace is used with selective pre-signed transactions to locate (via $n$-ary search) and then recover the precise nature of errors in the computation. Unlike BitVM1, our approach does not require the creation of Merkle trees for CPU instructions or memory words. Additionally, it does not rely on signature equivocations. These differences help avoid complexities associated with BitVM1 and make BitVMX a compelling alternative to BitVM2. Our approach is quite flexible, BitVMX can be instantiated to balance transaction cost vs round complexity, prover cost vs verifier cost, and precomputations vs round complexity.
[98] arXiv:2405.06845 [pdf, other]: Title: CasCalib: Cascaded Calibration for Motion Capture from Sparse Unsynchronized Cameras

Authors: James Tang, Shashwat Suri, Daniel Ajisafe, Bastian Wandt, Helge Rhodin

Comments: Accepted to the 18th IEEE International Conference on Automatic Face and Gesture Recognition

Subjects: Computer Vision and Pattern Recognition (cs.CV)

It is now possible to estimate 3D human pose from monocular images with off-the-shelf 3D pose estimators. However, many practical applications require fine-grained absolute pose information for which multi-view cues and camera calibration are necessary. Such multi-view recordings are laborious because they require manual calibration, and are expensive when using dedicated hardware. Our goal is full automation, which includes temporal synchronization, as well as intrinsic and extrinsic camera calibration. This is done by using persons in the scene as the calibration objects. Existing methods either address only synchronization or calibration, assume one of the former as input, or have significant limitations. A common limitation is that they only consider single persons, which eases correspondence finding. We attain this generality by partitioning the high-dimensional time and calibration space into a cascade of subspaces and introduce tailored algorithms to optimize each efficiently and robustly. The outcome is an easy-to-use, flexible, and robust motion capture toolbox that we release to enable scientific applications, which we demonstrate on diverse multi-view benchmarks. Project website: https://github.com/jamestang1998/CasCalib.
[99] arXiv:2405.06846 [pdf, other]: Title: Dominion: A New Frontier for AI Research

Authors: Danny Halawi, Aron Sarmasi, Siena Saltzen, Joshua McCoy

Subjects: Artificial Intelligence (cs.AI)

In recent years, machine learning approaches have made dramatic advances, reaching superhuman performance in Go, Atari, and poker variants. These games, and others before them, have served not only as a testbed but have also helped to push the boundaries of AI research. Continuing this tradition, we examine the tabletop game Dominion and discuss the properties that make it well-suited to serve as a benchmark for the next generation of reinforcement learning (RL) algorithms. We also present the Dominion Online Dataset, a collection of over 2,000,000 games of Dominion played by experienced players on the Dominion Online webserver. Finally, we introduce an RL baseline bot that uses existing techniques to beat common heuristic-based bots, and shows competitive performance against the previously strongest bot, Provincial.
[100] arXiv:2405.06848 [pdf, other]: Title: ISR: Invertible Symbolic Regression

Authors: Tony Tohme, Mohammad Javad Khojasteh, Mohsen Sadr, Florian Meyer, Kamal Youcef-Toumi

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Information Theory (cs.IT); Machine Learning (stat.ML)

We introduce an Invertible Symbolic Regression (ISR) method. It is a machine learning technique that generates analytical relationships between inputs and outputs of a given dataset via invertible maps (or architectures). The proposed ISR method naturally combines the principles of Invertible Neural Networks (INNs) and Equation Learner (EQL), a neural network-based symbolic architecture for function learning. In particular, we transform the affine coupling blocks of INNs into a symbolic framework, resulting in an end-to-end differentiable symbolic invertible architecture that allows for efficient gradient-based learning. The proposed ISR framework also relies on sparsity promoting regularization, allowing the discovery of concise and interpretable invertible expressions. We show that ISR can serve as a (symbolic) normalizing flow for density estimation tasks. Furthermore, we highlight its practical applicability in solving inverse problems, including a benchmark inverse kinematics problem, and notably, a geoacoustic inversion problem in oceanography aimed at inferring posterior distributions of underlying seabed parameters from acoustic signals.
[101] arXiv:2405.06849 [pdf, other]: Title: GreedyViG: Dynamic Axial Graph Construction for Efficient Vision GNNs

Authors: Mustafa Munir, William Avery, Md Mostafijur Rahman, Radu Marculescu

Comments: Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Vision graph neural networks (ViG) offer a new avenue for exploration in computer vision. A major bottleneck in ViGs is the inefficient k-nearest neighbor (KNN) operation used for graph construction. To solve this issue, we propose a new method for designing ViGs, Dynamic Axial Graph Construction (DAGC), which is more efficient than KNN as it limits the number of considered graph connections made within an image. Additionally, we propose a novel CNN-GNN architecture, GreedyViG, which uses DAGC. Extensive experiments show that GreedyViG beats existing ViG, CNN, and ViT architectures in terms of accuracy, GMACs, and parameters on image classification, object detection, instance segmentation, and semantic segmentation tasks. Our smallest model, GreedyViG-S, achieves 81.1% top-1 accuracy on ImageNet-1K, 2.9% higher than Vision GNN and 2.2% higher than Vision HyperGraph Neural Network (ViHGNN), with less GMACs and a similar number of parameters. Our largest model, GreedyViG-B obtains 83.9% top-1 accuracy, 0.2% higher than Vision GNN, with a 66.6% decrease in parameters and a 69% decrease in GMACs. GreedyViG-B also obtains the same accuracy as ViHGNN with a 67.3% decrease in parameters and a 71.3% decrease in GMACs. Our work shows that hybrid CNN-GNN architectures not only provide a new avenue for designing efficient models, but that they can also exceed the performance of current state-of-the-art models.
[102] arXiv:2405.06855 [pdf, other]: Title: Linear Explanations for Individual Neurons

Authors: Tuomas Oikarinen, Tsui-Wei Weng

Comments: Published in ICML 2024

Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

In recent years many methods have been developed to understand the internal workings of neural networks, often by describing the function of individual neurons in the model. However, these methods typically only focus on explaining the very highest activations of a neuron. In this paper we show this is not sufficient, and that the highest activation range is only responsible for a very small percentage of the neuron's causal effect. In addition, inputs causing lower activations are often very different and can't be reliably predicted by only looking at high activations. We propose that neurons should instead be understood as a linear combination of concepts, and develop an efficient method for producing these linear explanations. In addition, we show how to automatically evaluate description quality using simulation, i.e. predicting neuron activations on unseen inputs in vision setting.
[103] arXiv:2405.06856 [pdf, other]: Title: Aladdin: Joint Placement and Scaling for SLO-Aware LLM Serving

Authors: Chengyi Nie, Rodrigo Fonseca, Zhenhua Liu

Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

The demand for large language model (LLM) inference is gradually dominating the artificial intelligence workloads. Therefore, there is an urgent need for cost-efficient inference serving. Existing work focuses on single-worker optimization and lacks consideration of cluster-level management for both inference queries and computing resources. However, placing requests and managing resources without considering the query features easily causes SLO violations or resource underutilization. Providers are forced to allocate extra computing resources to guarantee user experience, leading to additional serving costs. In this paper we introduce Aladdin, a scheduler that co-adaptively places queries and scales computing resources with SLO awareness. For a stream of inference queries, Aladdin first predicts minimal computing resources and the corresponding serving workers' configuration required to fulfill the SLOs for all queries. Then, it places the queries to each serving worker according to the prefill and decode latency models of batched LLM inference to maximize each worker's utilization. Results show that Aladdin reduces the serving cost of a single model by up to 71% for the same SLO level compared with the baselines, which can be millions of dollars per year.
[104] arXiv:2405.06859 [pdf, other]: Title: Reimplementation of Learning to Reweight Examples for Robust Deep Learning

Authors: Parth Patil, Ben Boardley, Jack Gardner, Emily Loiselle, Deerajkumar Parthipan

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

Deep neural networks (DNNs) have been used to create models for many complex analysis problems like image recognition and medical diagnosis. DNNs are a popular tool within machine learning due to their ability to model complex patterns and distributions. However, the performance of these networks is highly dependent on the quality of the data used to train the models. Two characteristics of these sets, noisy labels and training set biases, are known to frequently cause poor generalization performance as a result of overfitting to the training set. This paper aims to solve this problem using the approach proposed by Ren et al. (2018) using meta-training and online weight approximation. We will first implement a toy-problem to crudely verify the claims made by the authors of Ren et al. (2018) and then venture into using the approach to solve a real world problem of Skin-cancer detection using an imbalanced image dataset.
[105] arXiv:2405.06865 [pdf, other]: Title: Disrupting Style Mimicry Attacks on Video Imagery

Authors: Josephine Passananti, Stanley Wu, Shawn Shan, Haitao Zheng, Ben Y. Zhao

Subjects: Computer Vision and Pattern Recognition (cs.CV); Cryptography and Security (cs.CR)

Generative AI models are often used to perform mimicry attacks, where a pretrained model is fine-tuned on a small sample of images to learn to mimic a specific artist of interest. While researchers have introduced multiple anti-mimicry protection tools (Mist, Glaze, Anti-Dreambooth), recent evidence points to a growing trend of mimicry models using videos as sources of training data. This paper presents our experiences exploring techniques to disrupt style mimicry on video imagery. We first validate that mimicry attacks can succeed by training on individual frames extracted from videos. We show that while anti-mimicry tools can offer protection when applied to individual frames, this approach is vulnerable to an adaptive countermeasure that removes protection by exploiting randomness in optimization results of consecutive (nearly-identical) frames. We develop a new, tool-agnostic framework that segments videos into short scenes based on frame-level similarity, and use a per-scene optimization baseline to remove inter-frame randomization while reducing computational cost. We show via both image level metrics and an end-to-end user study that the resulting protection restores protection against mimicry (including the countermeasure). Finally, we develop another adaptive countermeasure and find that it falls short against our framework.
[106] arXiv:2405.06869 [pdf, other]: Title: Sharpness-Aware Minimization for Evolutionary Feature Construction in Regression

Authors: Hengzhe Zhang, Qi Chen, Bing Xue, Wolfgang Banzhaf, Mengjie Zhang

Comments: Submitted to IEEE Transactions on Pattern Analysis and Machine Intelligence

Subjects: Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

In recent years, genetic programming (GP)-based evolutionary feature construction has achieved significant success. However, a primary challenge with evolutionary feature construction is its tendency to overfit the training data, resulting in poor generalization on unseen data. In this research, we draw inspiration from PAC-Bayesian theory and propose using sharpness-aware minimization in function space to discover symbolic features that exhibit robust performance within a smooth loss landscape in the semantic space. By optimizing sharpness in conjunction with cross-validation loss, as well as designing a sharpness reduction layer, the proposed method effectively mitigates the overfitting problem of GP, especially when dealing with a limited number of instances or in the presence of label noise. Experimental results on 58 real-world regression datasets show that our approach outperforms standard GP as well as six state-of-the-art complexity measurement methods for GP in controlling overfitting. Furthermore, the ensemble version of GP with sharpness-aware minimization demonstrates superior performance compared to nine fine-tuned machine learning and symbolic regression algorithms, including XGBoost and LightGBM.
[107] arXiv:2405.06870 [pdf, other]: Title: Noise-Tolerant Codebooks for Semi-Quantitative Group Testing: Application to Spatial Genomics

Authors: Kok Hao Chen, Duc Tu Dao, Han Mao Kiah, Van Long Phuoc Pham, Eitan Yaakobi

Comments: To appear in ISIT 2024 Proceedings

Subjects: Information Theory (cs.IT)

Motivated by applications in spatial genomics, we revisit group testing (Dorfman~1943) and propose the class of $\lambda$-{\sf ADD}-codes, studying such codes with certain distance $d$ and codelength $n$. When $d$ is constant, we provide explicit code constructions with rates close to $1/2$. When $d$ is proportional to $n$, we provide a GV-type lower bound whose rates are efficiently computable. Upper bounds for such codes are also studied.
[108] arXiv:2405.06871 [pdf, other]: Title: Statistical Error of Numerical Integrators for Underdamped Langevin Dynamics with Deterministic And Stochastic Gradients

Authors: Xuda Ye, Zhennan Zhou

Subjects: Numerical Analysis (math.NA); Probability (math.PR)

We propose a novel discrete Poisson equation approach to estimate the statistical error of a broad class of numerical integrators for the underdamped Langevin dynamics. The statistical error refers to the mean square error of the estimator to the exact ensemble average with a finite number of iterations. With the proposed error analysis framework, we show that when the potential function $U(x)$ is strongly convex in $\mathbb R^d$ and the numerical integrator has strong order $p$, the statistical error is $O(h^{2p}+\frac1{Nh})$, where $h$ is the time step and $N$ is the number of iterations. Besides, this approach can be adopted to analyze integrators with stochastic gradients, and quantitative estimates can be derived as well. Our approach only requires the geometric ergodicity of the continuous-time underdamped Langevin dynamics, and relaxes the constraint on the time step.
[109] arXiv:2405.06872 [pdf, other]: Title: eCAR: edge-assisted Collaborative Augmented Reality Framework

Authors: Jinwoo Jeon, Woontack Woo

Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)

We propose a novel edge-assisted multi-user collaborative augmented reality framework in a large indoor environment. In Collaborative Augmented Reality, data communication that synchronizes virtual objects has large network traffic and high network latency. Due to drift, CAR applications without continuous data communication for coordinate system alignment have virtual object inconsistency. In addition, synchronization messages for online virtual object updates have high latency as the number of collaborative devices increases. To solve this problem, we implement the CAR framework, called eCAR, which utilizes edge computing to continuously match the device's coordinate system with less network traffic. Furthermore, we extend the co-visibility graph of the edge server to maintain virtual object spatial-temporal consistency in neighboring devices by synchronizing a local graph. We evaluate the system quantitatively and qualitatively in the public dataset and a physical indoor environment. eCAR communicates data for coordinate system alignment between the edge server and devices with less network traffic and latency. In addition, collaborative augmented reality synchronization algorithms quickly and accurately host and resolve virtual objects. The proposed system continuously aligns coordinate systems to multiple devices in a large indoor environment and shares augmented reality content. Through our system, users interact with virtual objects and share augmented reality experiences with neighboring users.
[110] arXiv:2405.06874 [pdf, ps, other]: Title: An Interior Penalty Discontinuous Galerkin Method for an Interface Model of Flow in Fractured Porous Media

Authors: Yong Liu, Ziyao Xu

Subjects: Numerical Analysis (math.NA)

Discrete fracture models with reduced-dimensional treatment of conductive and blocking fractures are widely used to simulate fluid flow in fractured porous media. Among these, numerical methods based on interface models are intensively studied, where the fractures are treated as co-dimension one manifolds in a bulk matrix with low-dimensional governing equations. In this paper, we propose a simple yet effective treatment for modeling the fractures on fitted grids in the interior penalty discontinuous Galerkin (IPDG) methods without introducing any additional degrees of freedom or equations on the interfaces. We conduct stability and {\em hp}-analysis for the proposed IPDG method, deriving optimal a priori error bounds concerning mesh size ($h$) and sub-optimal bounds for polynomial degree ($k$) in both the energy norm and the $L^2$ norm. Numerical experiments involving published benchmarks validate our theoretical analysis and demonstrate the method's robust performance. Furthermore, we extend our method to two-phase flows and use numerical tests to confirm the algorithm's validity.
[111] arXiv:2405.06875 [pdf, other]: Title: LogicAL: Towards logical anomaly synthesis for unsupervised anomaly localization

Authors: Ying Zhao

Comments: Accepted to Visual Anomaly and Novelty Detection (VAND) 2.0 Workshop at CVPR 2024

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Anomaly localization is a practical technology for improving industrial production line efficiency. Due to anomalies are manifold and hard to be collected, existing unsupervised researches are usually equipped with anomaly synthesis methods. However, most of them are biased towards structural defects synthesis while ignoring the underlying logical constraints. To fill the gap and boost anomaly localization performance, we propose an edge manipulation based anomaly synthesis framework, named LogicAL, that produces photo-realistic both logical and structural anomalies. We introduce a logical anomaly generation strategy that is adept at breaking logical constraints and a structural anomaly generation strategy that complements to the structural defects synthesis. We further improve the anomaly localization performance by introducing edge reconstruction into the network structure. Extensive experiments on the challenge MVTecLOCO, MVTecAD, VisA and MADsim datasets verify the advantage of proposed LogicAL on both logical and structural anomaly localization.
[112] arXiv:2405.06884 [pdf, other]: Title: Efficient PAC Learnability of Dynamical Systems Over Multilayer Networks

Authors: Zirou Qiu, Abhijin Adiga, Madhav V. Marathe, S. S. Ravi, Daniel J. Rosenkrantz, Richard E. Stearns, Anil Vullikanti

Comments: Accepted at ICML 2024

Subjects: Machine Learning (cs.LG)

Networked dynamical systems are widely used as formal models of real-world cascading phenomena, such as the spread of diseases and information. Prior research has addressed the problem of learning the behavior of an unknown dynamical system when the underlying network has a single layer. In this work, we study the learnability of dynamical systems over multilayer networks, which are more realistic and challenging. First, we present an efficient PAC learning algorithm with provable guarantees to show that the learner only requires a small number of training examples to infer an unknown system. We further provide a tight analysis of the Natarajan dimension which measures the model complexity. Asymptotically, our bound on the Nararajan dimension is tight for almost all multilayer graphs. The techniques and insights from our work provide the theoretical foundations for future investigations of learning problems for multilayer dynamical systems.
[113] arXiv:2405.06886 [pdf, other]: Title: Event GDR: Event-Centric Generative Document Retrieval

Authors: Yong Guan, Dingxiao Liu, Jinchen Ma, Hao Peng, Xiaozhi Wang, Lei Hou, Ru Li

Comments: Accepted to WWW 2024

Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

Generative document retrieval, an emerging paradigm in information retrieval, learns to build connections between documents and identifiers within a single model, garnering significant attention. However, there are still two challenges: (1) neglecting inner-content correlation during document representation; (2) lacking explicit semantic structure during identifier construction. Nonetheless, events have enriched relations and well-defined taxonomy, which could facilitate addressing the above two challenges. Inspired by this, we propose Event GDR, an event-centric generative document retrieval model, integrating event knowledge into this task. Specifically, we utilize an exchange-then-reflection method based on multi-agents for event knowledge extraction. For document representation, we employ events and relations to model the document to guarantee the comprehensiveness and inner-content correlation. For identifier construction, we map the events to well-defined event taxonomy to construct the identifiers with explicit semantic structure. Our method achieves significant improvement over the baselines on two datasets, and also hopes to provide insights for future research.
[114] arXiv:2405.06887 [pdf, other]: Title: FineParser: A Fine-grained Spatio-temporal Action Parser for Human-centric Action Quality Assessment

Authors: Jinglin Xu, Sibo Yin, Guohao Zhao, Zishuo Wang, Yuxin Peng

Comments: Accepted by CVPR 2024

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Existing action quality assessment (AQA) methods mainly learn deep representations at the video level for scoring diverse actions. Due to the lack of a fine-grained understanding of actions in videos, they harshly suffer from low credibility and interpretability, thus insufficient for stringent applications, such as Olympic diving events. We argue that a fine-grained understanding of actions requires the model to perceive and parse actions in both time and space, which is also the key to the credibility and interpretability of the AQA technique. Based on this insight, we propose a new fine-grained spatial-temporal action parser named \textbf{FineParser}. It learns human-centric foreground action representations by focusing on target action regions within each frame and exploiting their fine-grained alignments in time and space to minimize the impact of invalid backgrounds during the assessment. In addition, we construct fine-grained annotations of human-centric foreground action masks for the FineDiving dataset, called \textbf{FineDiving-HM}. With refined annotations on diverse target action procedures, FineDiving-HM can promote the development of real-world AQA systems. Through extensive experiments, we demonstrate the effectiveness of FineParser, which outperforms state-of-the-art methods while supporting more tasks of fine-grained action understanding. Data and code are available at \url{https://github.com/PKU-ICST-MIPL/FineParser_CVPR2024}.
[115] arXiv:2405.06890 [pdf, other]: Title: TacoERE: Cluster-aware Compression for Event Relation Extraction

Authors: Yong Guan, Xiaozhi Wang, Lei Hou, Juanzi Li, Jeff Pan, Jiaoyan Chen, Freddy Lecue

Comments: Accepted to LREC-COLING 2024

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Event relation extraction (ERE) is a critical and fundamental challenge for natural language processing. Existing work mainly focuses on directly modeling the entire document, which cannot effectively handle long-range dependencies and information redundancy. To address these issues, we propose a cluster-aware compression method for improving event relation extraction (TacoERE), which explores a compression-then-extraction paradigm. Specifically, we first introduce document clustering for modeling event dependencies. It splits the document into intra- and inter-clusters, where intra-clusters aim to enhance the relations within the same cluster, while inter-clusters attempt to model the related events at arbitrary distances. Secondly, we utilize cluster summarization to simplify and highlight important text content of clusters for mitigating information redundancy and event distance. We have conducted extensive experiments on both pre-trained language models, such as RoBERTa, and large language models, such as ChatGPT and GPT-4, on three ERE datasets, i.e., MAVEN-ERE, EventStoryLine and HiEve. Experimental results demonstrate that TacoERE is an effective method for ERE.
[116] arXiv:2405.06893 [pdf, other]: Title: ADLDA: A Method to Reduce the Harm of Data Distribution Shift in Data Augmentation

Authors: Haonan Wang

Comments: 8 page 4 fig

Subjects: Computer Vision and Pattern Recognition (cs.CV)

This study introduces a novel data augmentation technique, ADLDA, aimed at mitigating the negative impact of data distribution shifts caused by the data augmentation process in computer vision task. ADLDA partitions augmented data into distinct subdomains and incorporates domain labels, combined with domain adaptation techniques, to optimize data representation in the model's feature space. Experimental results demonstrate that ADLDA significantly enhances model performance across multiple datasets, particularly in neural network architectures with complex feature extraction layers. Furthermore, ADLDA improves the model's ability to locate and recognize key features, showcasing potential in object recognition and image segmentation tasks. This paper's contribution provides an effective data augmentation regularization method for the field of computer vision aiding in the enhancement of robustness and accuracy in deep learning models.
[117] arXiv:2405.06895 [pdf, other]: Title: Unveiling the Era of Spatial Computing

Authors: Hanzhong Cao

Subjects: Human-Computer Interaction (cs.HC)

The evolution of User Interfaces marks a significant transition from traditional command-line interfaces to more intuitive graphical and touch-based interfaces, largely driven by the emergence of personal computing devices. The advent of spatial computing and Extended Reality technologies further pushes the boundaries, promising a fusion of physical and digital realms through interactive environments. This paper delves into the progression from All Realities technologies encompassing Augmented Reality, Virtual Reality, and Mediated Reality to spatial computing, highlighting their conceptual differences and applications. We explore enabling technologies such as Artificial Intelligence, the Internet of Things, 5G, cloud and edge computing, and blockchain that underpin the development of spatial computing. We further scrutinize the initial forays into commercial spatial computing devices, with a focus on Apple's Vision Pro, evaluating its technological advancements alongside the challenges it faces. Through this examination, we aim to provide insights into the potential of spatial computing to revolutionize our interaction with digital information and the physical world.
[118] arXiv:2405.06899 [pdf, other]: Title: Towards Metric DBSCAN: Exact, Approximate, and Streaming Algorithms

Authors: Guanlin Mo, Shihong Song, Hu Ding

Subjects: Data Structures and Algorithms (cs.DS)

DBSCAN is a popular density-based clustering algorithm that has many different applications in practice. However, the running time of DBSCAN in high-dimensional space or general metric space ({\em e.g.,} clustering a set of texts by using edit distance) can be as large as quadratic in the input size. Moreover, most of existing accelerating techniques for DBSCAN are only available for low-dimensional Euclidean space. In this paper, we study the DBSCAN problem under the assumption that the inliers (the core points and border points) have a low intrinsic dimension (which is a realistic assumption for many high-dimensional applications), where the outliers can locate anywhere in the space without any assumption. First, we propose a $k$-center clustering based algorithm that can reduce the time-consuming labeling and merging tasks of DBSCAN to be linear. Further, we propose a linear time approximate DBSCAN algorithm, where the key idea is building a novel small-size summary for the core points. Also, our algorithm can be efficiently implemented for streaming data and the required memory is independent of the input size. Finally, we conduct our experiments and compare our algorithms with several popular DBSCAN algorithms. The experimental results suggest that our proposed approach can significantly reduce the computational complexity in practice.
[119] arXiv:2405.06902 [pdf, ps, other]: Title: Causal Inference from Slowly Varying Nonstationary Processes

Authors: Kang Du, Yu Xiang

Comments: Accepted to the IEEE Transactions on Signal and Information Processing over Networks. arXiv admin note: substantial text overlap with arXiv:2012.13025

Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Causal inference from observational data following the restricted structural causal models (SCM) framework hinges largely on the asymmetry between cause and effect from the data generating mechanisms, such as non-Gaussianity or non-linearity. This methodology can be adapted to stationary time series, yet inferring causal relationships from nonstationary time series remains a challenging task. In this work, we propose a new class of restricted SCM, via a time-varying filter and stationary noise, and exploit the asymmetry from nonstationarity for causal identification in both bivariate and network settings. We propose efficient procedures by leveraging powerful estimates of the bivariate evolutionary spectra for slowly varying processes. Various synthetic and real datasets that involve high-order and non-smooth filters are evaluated to demonstrate the effectiveness of our proposed methodology.
[120] arXiv:2405.06903 [pdf, other]: Title: UniGarmentManip: A Unified Framework for Category-Level Garment Manipulation via Dense Visual Correspondence

Authors: Ruihai Wu, Haoran Lu, Yiyan Wang, Yubo Wang, Hao Dong

Comments: CVPR 2024

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Garment manipulation (e.g., unfolding, folding and hanging clothes) is essential for future robots to accomplish home-assistant tasks, while highly challenging due to the diversity of garment configurations, geometries and deformations. Although able to manipulate similar shaped garments in a certain task, previous works mostly have to design different policies for different tasks, could not generalize to garments with diverse geometries, and often rely heavily on human-annotated data. In this paper, we leverage the property that, garments in a certain category have similar structures, and then learn the topological dense (point-level) visual correspondence among garments in the category level with different deformations in the self-supervised manner. The topological correspondence can be easily adapted to the functional correspondence to guide the manipulation policies for various downstream tasks, within only one or few-shot demonstrations. Experiments over garments in 3 different categories on 3 representative tasks in diverse scenarios, using one or two arms, taking one or more steps, inputting flat or messy garments, demonstrate the effectiveness of our proposed method. Project page: https://warshallrho.github.io/unigarmentmanip.
[121] arXiv:2405.06904 [pdf, other]: Title: Generation of Granular-Balls for Clustering Based on the Principle of Justifiable Granularity

Authors: Zhen Zhang, Zihang Jia, Witold Pedrycz

Subjects: Machine Learning (cs.LG)

Efficient and robust data clustering remains a challenging task in the field of data analysis. Recent efforts have explored the integration of granular-ball (GB) computing with clustering algorithms to address this challenge, yielding promising results. However, existing methods for generating GBs often rely on single indicators to measure GB quality and employ threshold-based or greedy strategies, potentially leading to GBs that do not accurately capture the underlying data distribution. To address these limitations, this article introduces a novel GB generation method. The originality of this method lies in leveraging the principle of justifiable granularity to measure the quality of a GB for clustering tasks. To be precise, we define the coverage and specificity of a GB and introduce a comprehensive measure for assessing GB quality. Utilizing this quality measure, the method incorporates a binary tree pruning-based strategy and an anomaly detection method to determine the best combination of sub-GBs for each GB and identify abnormal GBs, respectively. Compared to previous GB generation methods, the new method maximizes the overall quality of generated GBs while ensuring alignment with the data distribution, thereby enhancing the rationality of the generated GBs. Experimental results obtained from both synthetic and publicly available datasets underscore the effectiveness of the proposed GB generation method, showcasing improvements in clustering accuracy and normalized mutual information.
[122] arXiv:2405.06906 [pdf, other]: Title: Finding structure in logographic writing with library learning

Authors: Guangyuan Jiang, Matthias Hofer, Jiayuan Mao, Lionel Wong, Joshua B. Tenenbaum, Roger P. Levy

Comments: Accepted at CogSci 2024 (Talk)

Subjects: Computation and Language (cs.CL)

One hallmark of human language is its combinatoriality -- reusing a relatively small inventory of building blocks to create a far larger inventory of increasingly complex structures. In this paper, we explore the idea that combinatoriality in language reflects a human inductive bias toward representational efficiency in symbol systems. We develop a computational framework for discovering structure in a writing system. Built on top of state-of-the-art library learning and program synthesis techniques, our computational framework discovers known linguistic structures in the Chinese writing system and reveals how the system evolves towards simplification under pressures for representational efficiency. We demonstrate how a library learning approach, utilizing learned abstractions and compression, may help reveal the fundamental computational principles that underlie the creation of combinatorial structures in human cognition, and offer broader insights into the evolution of efficient communication systems.
[123] arXiv:2405.06907 [pdf, other]: Title: CoRE: LLM as Interpreter for Natural Language Programming, Pseudo-Code Programming, and Flow Programming of AI Agents

Authors: Shuyuan Xu, Zelong Li, Kai Mei, Yongfeng Zhang

Comments: 12 pages, 6 figures, comments and suggestions are welcome

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Programming Languages (cs.PL)

Since their inception, programming languages have trended towards greater readability and lower barriers for programmers. Following this trend, natural language can be a promising type of programming language that provides great flexibility and usability and helps towards the democracy of programming. However, the inherent vagueness, ambiguity, and verbosity of natural language pose significant challenges in developing an interpreter that can accurately understand the programming logic and execute instructions written in natural language. Fortunately, recent advancements in Large Language Models (LLMs) have demonstrated remarkable proficiency in interpreting complex natural language. Inspired by this, we develop a novel system for Code Representation and Execution (CoRE), which employs LLM as interpreter to interpret and execute natural language instructions. The proposed system unifies natural language programming, pseudo-code programming, and flow programming under the same representation for constructing language agents, while LLM serves as the interpreter to interpret and execute the agent programs. In this paper, we begin with defining the programming syntax that structures natural language instructions logically. During the execution, we incorporate external memory to minimize redundancy. Furthermore, we equip the designed interpreter with the capability to invoke external tools, compensating for the limitations of LLM in specialized domains or when accessing real-time information. This work is open-source at https://github.com/agiresearch/CoRE.
[124] arXiv:2405.06908 [pdf, other]: Title: To Ask or Not To Ask: Human-in-the-loop Contextual Bandits with Applications in Robot-Assisted Feeding

Authors: Rohan Banerjee, Rajat Kumar Jenamani, Sidharth Vasudev, Amal Nanavati, Sarah Dean, Tapomayukh Bhattacharjee

Comments: Under submission to IROS 2024. The second and third authors contributed equally. The last two authors advised equally

Subjects: Robotics (cs.RO)

Robot-assisted bite acquisition involves picking up food items that vary in their shape, compliance, size, and texture. A fully autonomous strategy for bite acquisition is unlikely to efficiently generalize to this wide variety of food items. We propose to leverage the presence of the care recipient to provide feedback when the system encounters novel food items. However, repeatedly asking for help imposes cognitive workload on the user. In this work, we formulate human-in-the-loop bite acquisition within a contextual bandit framework and propose a novel method, LinUCB-QG, that selectively asks for help. This method leverages a predictive model of cognitive workload in response to different types and timings of queries, learned using data from 89 participants collected in an online user study. We demonstrate that this method enhances the balance between task performance and cognitive workload compared to autonomous and querying baselines, through experiments in a food dataset-based simulator and a user study with 18 participants without mobility limitations.
[125] arXiv:2405.06909 [pdf, ps, other]: Title: Fairness in Reinforcement Learning: A Survey

Authors: Anka Reuel, Devin Ma

Comments: 10 pages

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)

While our understanding of fairness in machine learning has significantly progressed, our understanding of fairness in reinforcement learning (RL) remains nascent. Most of the attention has been on fairness in one-shot classification tasks; however, real-world, RL-enabled systems (e.g., autonomous vehicles) are much more complicated in that agents operate in dynamic environments over a long period of time. To ensure the responsible development and deployment of these systems, we must better understand fairness in RL. In this paper, we survey the literature to provide the most up-to-date snapshot of the frontiers of fairness in RL. We start by reviewing where fairness considerations can arise in RL, then discuss the various definitions of fairness in RL that have been put forth thus far. We continue to highlight the methodologies researchers used to implement fairness in single- and multi-agent RL systems before showcasing the distinct application domains that fair RL has been investigated in. Finally, we critically examine gaps in the literature, such as understanding fairness in the context of RLHF, that still need to be addressed in future work to truly operationalize fair RL in real-world systems.
[126] arXiv:2405.06910 [pdf, other]: Title: Generative flow induced neural architecture search: Towards discovering optimal architecture in wavelet neural operator

Authors: Hartej Soin, Tapas Tripura, Souvik Chakraborty

Subjects: Machine Learning (cs.LG)

We propose a generative flow-induced neural architecture search algorithm. The proposed approach devices simple feed-forward neural networks to learn stochastic policies to generate sequences of architecture hyperparameters such that the generated states are in proportion with the reward from the terminal state. We demonstrate the efficacy of the proposed search algorithm on the wavelet neural operator (WNO), where we learn a policy to generate a sequence of hyperparameters like wavelet basis and activation operators for wavelet integral blocks. While the trajectory of the generated wavelet basis and activation sequence is cast as flow, the policy is learned by minimizing the flow violation between each state in the trajectory and maximizing the reward from the terminal state. In the terminal state, we train WNO simultaneously to guide the search. We propose to use the exponent of the negative of the WNO loss on the validation dataset as the reward function. While the grid search-based neural architecture generation algorithms foresee every combination, the proposed framework generates the most probable sequence based on the positive reward from the terminal state, thereby reducing exploration time. Compared to reinforcement learning schemes, where complete episodic training is required to get the reward, the proposed algorithm generates the hyperparameter trajectory sequentially. Through four fluid mechanics-oriented problems, we illustrate that the learned policies can sample the best-performing architecture of the neural operator, thereby improving the performance of the vanilla wavelet neural operator.
[127] arXiv:2405.06911 [pdf, other]: Title: Replication Study and Benchmarking of Real-Time Object Detection Models

Authors: Pierre-Luc Asselin, Vincent Coulombe, William Guimont-Martin, William Larrivée-Hardy

Comments: Authors are presented in alphabetical order, each having equal contribution to the work. Copyright may be transferred without notice, after which this version may no longer be accessible

Subjects: Computer Vision and Pattern Recognition (cs.CV)

This work examines the reproducibility and benchmarking of state-of-the-art real-time object detection models. As object detection models are often used in real-world contexts, such as robotics, where inference time is paramount, simply measuring models' accuracy is not enough to compare them. We thus compare a large variety of object detection models' accuracy and inference speed on multiple graphics cards. In addition to this large benchmarking attempt, we also reproduce the following models from scratch using PyTorch on the MS COCO 2017 dataset: DETR, RTMDet, ViTDet and YOLOv7. More importantly, we propose a unified training and evaluation pipeline, based on MMDetection's features, to better compare models. Our implementation of DETR and ViTDet could not achieve accuracy or speed performances comparable to what is declared in the original papers. On the other hand, reproduced RTMDet and YOLOv7 could match such performances. Studied papers are also found to be generally lacking for reproducibility purposes. As for MMDetection pretrained models, speed performances are severely reduced with limited computing resources (larger, more accurate models even more so). Moreover, results exhibit a strong trade-off between accuracy and speed, prevailed by anchor-free models - notably RTMDet or YOLOx models. The code used is this paper and all the experiments is available in the repository at https://github.com/Don767/segdet_mlcr2024.
[128] arXiv:2405.06914 [pdf, other]: Title: Non-confusing Generation of Customized Concepts in Diffusion Models

Authors: Wang Lin, Jingyuan Chen, Jiaxin Shi, Yichen Zhu, Chen Liang, Junzhong Miao, Tao Jin, Zhou Zhao, Fei Wu, Shuicheng Yan, Hanwang Zhang

Subjects: Computer Vision and Pattern Recognition (cs.CV)

We tackle the common challenge of inter-concept visual confusion in compositional concept generation using text-guided diffusion models (TGDMs). It becomes even more pronounced in the generation of customized concepts, due to the scarcity of user-provided concept visual examples. By revisiting the two major stages leading to the success of TGDMs -- 1) contrastive image-language pre-training (CLIP) for text encoder that encodes visual semantics, and 2) training TGDM that decodes the textual embeddings into pixels -- we point that existing customized generation methods only focus on fine-tuning the second stage while overlooking the first one. To this end, we propose a simple yet effective solution called CLIF: contrastive image-language fine-tuning. Specifically, given a few samples of customized concepts, we obtain non-confusing textual embeddings of a concept by fine-tuning CLIP via contrasting a concept and the over-segmented visual regions of other concepts. Experimental results demonstrate the effectiveness of CLIF in preventing the confusion of multi-customized concept generation.
[129] arXiv:2405.06915 [pdf, ps, other]: Title: Automating Creativity

Authors: Ming-Hui Huang, Roland T. Rust

Comments: 46 pages, 2 tables, 4 figures

Subjects: Artificial Intelligence (cs.AI)

Generative AI (GenAI) has spurred the expectation of being creative, due to its ability to generate content, yet so far, its creativity has somewhat disappointed, because it is trained using existing data following human intentions to generate outputs. The purpose of this paper is to explore what is required to evolve AI from generative to creative. Based on a reinforcement learning approach and building upon various research streams of computational creativity, we develop a triple prompt-response-reward engineering framework to develop the creative capability of GenAI. This framework consists of three components: 1) a prompt model for expected creativity by developing discriminative prompts that are objectively, individually, or socially novel, 2) a response model for observed creativity by generating surprising outputs that are incrementally, disruptively, or radically innovative, and 3) a reward model for improving creativity over time by incorporating feedback from the AI, the creator/manager, and/or the customers. This framework enables the application of GenAI for various levels of creativity strategically.
[130] arXiv:2405.06916 [pdf, other]: Title: High-order Neighborhoods Know More: HyperGraph Learning Meets Source-free Unsupervised Domain Adaptation

Authors: Jinkun Jiang, Qingxuan Lv, Yuezun Li, Yong Du, Sheng Chen, Hui Yu, Junyu Dong

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Source-free Unsupervised Domain Adaptation (SFDA) aims to classify target samples by only accessing a pre-trained source model and unlabelled target samples. Since no source data is available, transferring the knowledge from the source domain to the target domain is challenging. Existing methods normally exploit the pair-wise relation among target samples and attempt to discover their correlations by clustering these samples based on semantic features. The drawback of these methods includes: 1) the pair-wise relation is limited to exposing the underlying correlations of two more samples, hindering the exploration of the structural information embedded in the target domain; 2) the clustering process only relies on the semantic feature, while overlooking the critical effect of domain shift, i.e., the distribution differences between the source and target domains. To address these issues, we propose a new SFDA method that exploits the high-order neighborhood relation and explicitly takes the domain shift effect into account. Specifically, we formulate the SFDA as a Hypergraph learning problem and construct hyperedges to explore the local group and context information among multiple samples. Moreover, we integrate a self-loop strategy into the constructed hypergraph to elegantly introduce the domain uncertainty of each sample. By clustering these samples based on hyperedges, both the semantic feature and domain shift effects are considered. We then describe an adaptive relation-based objective to tune the model with soft attention levels for all samples. Extensive experiments are conducted on Office-31, Office-Home, VisDA, and PointDA-10 datasets. The results demonstrate the superiority of our method over state-of-the-art counterparts.
[131] arXiv:2405.06917 [pdf, ps, other]: Title: Design Requirements for Human-Centered Graph Neural Network Explanations

Authors: Pantea Habibi, Peyman Baghershahi, Sourav Medya, Debaleena Chattopadhyay

Subjects: Machine Learning (cs.LG); Human-Computer Interaction (cs.HC)

Graph neural networks (GNNs) are powerful graph-based machine-learning models that are popular in various domains, e.g., social media, transportation, and drug discovery. However, owing to complex data representations, GNNs do not easily allow for human-intelligible explanations of their predictions, which can decrease trust in them as well as deter any collaboration opportunities between the AI expert and non-technical, domain expert. Here, we first discuss the two papers that aim to provide GNN explanations to domain experts in an accessible manner and then establish a set of design requirements for human-centered GNN explanations. Finally, we offer two example prototypes to demonstrate some of those proposed requirements.
[132] arXiv:2405.06918 [pdf, other]: Title: Super-Resolving Blurry Images with Events

Authors: Chi Zhang, Mingyuan Lin, Xiang Zhang, Chenxu Jiang, Lei Yu

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Super-resolution from motion-blurred images poses a significant challenge due to the combined effects of motion blur and low spatial resolution. To address this challenge, this paper introduces an Event-based Blurry Super Resolution Network (EBSR-Net), which leverages the high temporal resolution of events to mitigate motion blur and improve high-resolution image prediction. Specifically, we propose a multi-scale center-surround event representation to fully capture motion and texture information inherent in events. Additionally, we design a symmetric cross-modal attention module to fully exploit the complementarity between blurry images and events. Furthermore, we introduce an intermodal residual group composed of several residual dense Swin Transformer blocks, each incorporating multiple Swin Transformer layers and a residual connection, to extract global context and facilitate inter-block feature aggregation. Extensive experiments show that our method compares favorably against state-of-the-art approaches and achieves remarkable performance.
[133] arXiv:2405.06919 [pdf, other]: Title: Automating Thematic Analysis: How LLMs Analyse Controversial Topics

Authors: Awais Hameed Khan, Hiruni Kegalle, Rhea D'Silva, Ned Watt, Daniel Whelan-Shamy, Lida Ghahremanlou, Liam Magee

Comments: 18 pages, 6 figures

Subjects: Computers and Society (cs.CY); Computation and Language (cs.CL)

Large Language Models (LLMs) are promising analytical tools. They can augment human epistemic, cognitive and reasoning abilities, and support 'sensemaking', making sense of a complex environment or subject by analysing large volumes of data with a sensitivity to context and nuance absent in earlier text processing systems. This paper presents a pilot experiment that explores how LLMs can support thematic analysis of controversial topics. We compare how human researchers and two LLMs GPT-4 and Llama 2 categorise excerpts from media coverage of the controversial Australian Robodebt scandal. Our findings highlight intriguing overlaps and variances in thematic categorisation between human and machine agents, and suggest where LLMs can be effective in supporting forms of discourse and thematic analysis. We argue LLMs should be used to augment, and not replace human interpretation, and we add further methodological insights and reflections to existing research on the application of automation to qualitative research methods. We also introduce a novel card-based design toolkit, for both researchers and practitioners to further interrogate LLMs as analytical tools.
[134] arXiv:2405.06922 [pdf, other]: Title: EmoMix-3L: A Code-Mixed Dataset for Bangla-English-Hindi Emotion Detection

Authors: Nishat Raihan, Dhiman Goswami, Antara Mahmud, Antonios Anastasopoulos, Marcos Zampieri

Comments: arXiv admin note: substantial text overlap with arXiv:2310.18387, arXiv:2310.18023

Subjects: Computation and Language (cs.CL)

Code-mixing is a well-studied linguistic phenomenon that occurs when two or more languages are mixed in text or speech. Several studies have been conducted on building datasets and performing downstream NLP tasks on code-mixed data. Although it is not uncommon to observe code-mixing of three or more languages, most available datasets in this domain contain code-mixed data from only two languages. In this paper, we introduce EmoMix-3L, a novel multi-label emotion detection dataset containing code-mixed data from three different languages. We experiment with several models on EmoMix-3L and we report that MuRIL outperforms other models on this dataset.
[135] arXiv:2405.06925 [pdf, other]: Title: Semi-supervised Anomaly Detection via Adaptive Reinforcement Learning-Enabled Method with Causal Inference

Authors: Xiangwei Chen, Ruliang Xiaoa, Zhixia Zeng, Zhipeng Qiu, Shi Zhang, Xin Du

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Semi-supervised anomaly detection for guaranteeing the reliability of intelligent systems has received increasing attention. However, existing methods rely too much on data correlation and neglect causality, which can be misleading due to confounding factors and affect system reliability. Additionally, the current reinforcement learning anomaly detection methods can effectively identify known and unknown anomalies in environments with limited labeled samples. Despite its effectiveness, these methods still face several challenges, such as under-utilization of priori knowledge, lack of model flexibility, and insufficient reward feedback when interacting with the environment. To address the above problems, this paper innovatively constructs a counterfactual causal reinforcement learning model, termed Triple-Assisted Causal Reinforcement Learning Anomaly Detector (Tri-CRLAD). The model utilizes the causal inference mechanism to radically improve the performance of semi-supervised models and enhance the model's ability to uncover anomaly data in the face of unknown or rare data. In addition, Tri-CRLAD features a triple decision support mechanism, namely, a sampling strategy based on historical similarity, an adaptive threshold smoothing adjustment strategy, and an adaptive decision reward mechanism. These mechanisms further enhance the flexibility and generalization ability of the model, enabling it to effectively respond to various complex and dynamically changing environments. Finally, Tri-CRLAD matches or exceeds the performance of 9 baseline methods across 7 diverse intelligent system datasets, including satellite systems, medical systems, and health systems. Moreover, anomaly detection stability was significantly improved by up to 23\% with an extremely small number of known anomaly samples. Our code is available at https://github.com/Aoudsung/Tri-CRLAD/
[136] arXiv:2405.06926 [pdf, other]: Title: TAI++: Text as Image for Multi-Label Image Classification by Co-Learning Transferable Prompt

Authors: Xiangyu Wu, Qing-Yuan Jiang, Yang Yang, Yi-Feng Wu, Qing-Guo Chen, Jianfeng Lu

Comments: Accepted for publication at IJCAI 2024; 13 pages; 11 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV)

The recent introduction of prompt tuning based on pre-trained vision-language models has dramatically improved the performance of multi-label image classification. However, some existing strategies that have been explored still have drawbacks, i.e., either exploiting massive labeled visual data at a high cost or using text data only for text prompt tuning and thus failing to learn the diversity of visual knowledge. Hence, the application scenarios of these methods are limited. In this paper, we propose a pseudo-visual prompt~(PVP) module for implicit visual prompt tuning to address this problem. Specifically, we first learn the pseudo-visual prompt for each category, mining diverse visual knowledge by the well-aligned space of pre-trained vision-language models. Then, a co-learning strategy with a dual-adapter module is designed to transfer visual knowledge from pseudo-visual prompt to text prompt, enhancing their visual representation abilities. Experimental results on VOC2007, MS-COCO, and NUSWIDE datasets demonstrate that our method can surpass state-of-the-art~(SOTA) methods across various settings for multi-label image classification tasks. The code is available at https://github.com/njustkmg/PVP.
[137] arXiv:2405.06927 [pdf, ps, other]: Title: Multimodal Pretraining and Generation for Recommendation: A Tutorial

Authors: Jieming Zhu, Chuhan Wu, Rui Zhang, Zhenhua Dong

Comments: Published in WWW 2024 Tutorial. Find the tutorial materials at this https URL

Subjects: Information Retrieval (cs.IR)

Personalized recommendation stands as a ubiquitous channel for users to explore information or items aligned with their interests. Nevertheless, prevailing recommendation models predominantly rely on unique IDs and categorical features for user-item matching. While this ID-centric approach has witnessed considerable success, it falls short in comprehensively grasping the essence of raw item contents across diverse modalities, such as text, image, audio, and video. This underutilization of multimodal data poses a limitation to recommender systems, particularly in the realm of multimedia services like news, music, and short-video platforms. The recent surge in pretraining and generation techniques presents both opportunities and challenges in the development of multimodal recommender systems. This tutorial seeks to provide a thorough exploration of the latest advancements and future trajectories in multimodal pretraining and generation techniques within the realm of recommender systems. The tutorial comprises three parts: multimodal pretraining, multimodal generation, and industrial applications and open challenges in the field of recommendation. Our target audience encompasses scholars, practitioners, and other parties interested in this domain. By providing a succinct overview of the field, we aspire to facilitate a swift understanding of multimodal recommendation and foster meaningful discussions on the future development of this evolving landscape.
[138] arXiv:2405.06928 [pdf, other]: Title: Systematic Review of Extended Reality for Smart Built Environments Lighting Design Simulations

Authors: Elham Mohammadrezaei, Shiva Ghasemi, Poorvesh Dongre, Denis Gracanin, Hongbo Zhang

Journal-ref: IEEE Access 2024

Subjects: Human-Computer Interaction (cs.HC)

This systematic literature review paper explores the use of extended reality {(XR)} technology for smart built environments and particularly for smart lighting systems design. Smart lighting is a novel concept that has emerged over a decade now and is being used and tested in commercial and industrial built environments. We used PRISMA methodology to review 270 research papers published from 1968 to 2023. Following a discussion of historical advances and key modeling techniques, a description of lighting simulation in the context of extended reality and smart built environment is given, followed by a discussion of the current trends and challenges.
[139] arXiv:2405.06929 [pdf, other]: Title: PRENet: A Plane-Fit Redundancy Encoding Point Cloud Sequence Network for Real-Time 3D Action Recognition

Authors: Shenglin He, Xiaoyang Qu, Jiguang Wan, Guokuan Li, Changsheng Xie, Jianzong Wang

Comments: Accepted by the 2024 International Joint Conference on Neural Networks (IJCNN 2024)

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Recognizing human actions from point cloud sequence has attracted tremendous attention from both academia and industry due to its wide applications. However, most previous studies on point cloud action recognition typically require complex networks to extract intra-frame spatial features and inter-frame temporal features, resulting in an excessive number of redundant computations. This leads to high latency, rendering them impractical for real-world applications. To address this problem, we propose a Plane-Fit Redundancy Encoding point cloud sequence network named PRENet. The primary concept of our approach involves the utilization of plane fitting to mitigate spatial redundancy within the sequence, concurrently encoding the temporal redundancy of the entire sequence to minimize redundant computations. Specifically, our network comprises two principal modules: a Plane-Fit Embedding module and a Spatio-Temporal Consistency Encoding module. The Plane-Fit Embedding module capitalizes on the observation that successive point cloud frames exhibit unique geometric features in physical space, allowing for the reuse of spatially encoded data for temporal stream encoding. The Spatio-Temporal Consistency Encoding module amalgamates the temporal structure of the temporally redundant part with its corresponding spatial arrangement, thereby enhancing recognition accuracy. We have done numerous experiments to verify the effectiveness of our network. The experimental results demonstrate that our method achieves almost identical recognition accuracy while being nearly four times faster than other state-of-the-art methods.
[140] arXiv:2405.06930 [pdf, other]: Title: Extended Reality for Smart Built Environments Design: Smart Lighting Design Testbed

Authors: Elham Mohammadrezaei, Denis Gracanin

Journal-ref: HCI International 2022

Subjects: Human-Computer Interaction (cs.HC)

Smart Built Environment is an eco-system of `connected' and `smart' Internet of Things (IoT) devices that are embedded in a built environment. Smart lighting is an important category of smart IoT devices that has recently attracted research interest, particularly for residential areas. In this paper, we present an extended reality based smart lighting design testbed that can generate design prototypes based on the functionality of the physical environment. The emphasis is on designing a smart lighting system in a controlled residential environment, with some evaluation of well-being and comfort.
[141] arXiv:2405.06931 [pdf, other]: Title: Identifying Key Terms in Prompts for Relevance Evaluation with GPT Models

Authors: Jaekeol Choi

Comments: 19pages, 2 figures

Journal-ref: International Journal of Natural Language Computing, April 2024, Volume 13, Number 2

Subjects: Information Retrieval (cs.IR)

Relevance evaluation of a query and a passage is essential in Information Retrieval (IR). Recently, numerous studies have been conducted on tasks related to relevance judgment using Large Language Models (LLMs) such as GPT-4, demonstrating significant improvements. However, the efficacy of LLMs is considerably influenced by the design of the prompt. The purpose of this paper is to identify which specific terms in prompts positively or negatively impact relevance evaluation with LLMs. We employed two types of prompts: those used in previous research and generated automatically by LLMs. By comparing the performance of these prompts in both few-shot and zero-shot settings, we analyze the influence of specific terms in the prompts. We have observed two main findings from our study. First, we discovered that prompts using the term answerlead to more effective relevance evaluations than those using relevant. This indicates that a more direct approach, focusing on answering the query, tends to enhance performance. Second, we noted the importance of appropriately balancing the scope of relevance. While the term relevant can extend the scope too broadly, resulting in less precise evaluations, an optimal balance in defining relevance is crucial for accurate assessments. The inclusion of few-shot examples helps in more precisely defining this balance. By providing clearer contexts for the term relevance, few-shot examples contribute to refine relevance criteria. In conclusion, our study highlights the significance of carefully selecting terms in prompts for relevance evaluation with LLMs.
[142] arXiv:2405.06932 [pdf, ps, other]: Title: Piccolo2: General Text Embedding with Multi-task Hybrid Loss Training

Authors: Junqin Huang, Zhongjie Hu, Zihao Jing, Mengya Gao, Yichao Wu

Comments: tech report

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

In this report, we introduce Piccolo2, an embedding model that surpasses other models in the comprehensive evaluation over 6 tasks on CMTEB benchmark, setting a new state-of-the-art. Piccolo2 primarily leverages an efficient multi-task hybrid loss training approach, effectively harnessing textual data and labels from diverse downstream tasks. In addition, Piccolo2 scales up the embedding dimension and uses MRL training to support more flexible vector dimensions. The latest information of piccolo models can be accessed via: https://huggingface.co/sensenova/
[143] arXiv:2405.06933 [pdf, other]: Title: Syndrome-based Fusion Rules in Heterogeneous Distributed Quickest Change Detection

Authors: Wen-Hsuan Li, Yu-Chih Huang

Subjects: Information Theory (cs.IT)

In this paper, the heterogeneous distributed quickest change detection (HetDQCD) with 1-bit non-anonymous feedback is studied. The concept of syndromes is introduced and the family of syndrome-based fusion rules is proposed, which encompasses all deterministic fusion rules as special cases. Through the Hasse diagram of syndromes, upper and lower bounds on the second-order performance of expected detection delay as a function of average run length to false alarm are provided. An interesting instance, the weighted voting rule previously proposed in our prior work, is then revisited, for which an efficient pruning method for breadth-first search in the Hasse diagram is proposed to analyze the performance. This in turn assists in the design of the weight threshold in the weighted voting rule. Simulation results corroborate that our analysis is instrumental in identifying a proper design for the weighted voting rule, demonstrating consistent superiority over both the anonymous voting rule and the group selection rule in HetDQCD.
[144] arXiv:2405.06937 [pdf, other]: Title: High-Order Synchrosqueezed Chirplet Transforms for Multicomponent Signal Analysis

Authors: Yi-Ju Yen, De-Yan Lu, Sing-Yuan Yeh, Jian-Jiun Ding, Chun-Yen Shen

Subjects: Numerical Analysis (math.NA); Signal Processing (eess.SP)

This study focuses on the analysis of signals containing multiple components with crossover instantaneous frequencies (IF). This problem was initially solved with the chirplet transform (CT). Also, it can be sharpened by adding the synchrosqueezing step, which is called the synchrosqueezed chirplet transform (SCT). However, we found that the SCT goes wrong with the high chirp modulation signal due to the wrong estimation of the IF. In this paper, we present the improvement of the post-transformation of the CT. The main goal of this paper is to amend the estimation introduced in the SCT and carry out the high-order synchrosqueezed chirplet transform. The proposed method reduces the wrong estimation when facing a stronger variety of chirp-modulated multi-component signals. The theoretical analysis of the new reassignment ingredient is provided. Numerical experiments on some synthetic signals are presented to verify the effectiveness of the proposed high-order SCT.
[145] arXiv:2405.06944 [pdf, other]: Title: Learning Monocular Depth from Focus with Event Focal Stack

Authors: Chenxu Jiang, Mingyuan Lin, Chi Zhang, Zhenghai Wang, Lei Yu

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Depth from Focus estimates depth by determining the moment of maximum focus from multiple shots at different focal distances, i.e. the Focal Stack. However, the limited sampling rate of conventional optical cameras makes it difficult to obtain sufficient focus cues during the focal sweep. Inspired by biological vision, the event camera records intensity changes over time in extremely low latency, which provides more temporal information for focus time acquisition. In this study, we propose the EDFF Network to estimate sparse depth from the Event Focal Stack. Specifically, we utilize the event voxel grid to encode intensity change information and project event time surface into the depth domain to preserve per-pixel focal distance information. A Focal-Distance-guided Cross-Modal Attention Module is presented to fuse the information mentioned above. Additionally, we propose a Multi-level Depth Fusion Block designed to integrate results from each level of a UNet-like architecture and produce the final output. Extensive experiments validate that our method outperforms existing state-of-the-art approaches.
[146] arXiv:2405.06945 [pdf, other]: Title: Direct Learning of Mesh and Appearance via 3D Gaussian Splatting

Authors: Ancheng Lin, Jun Li

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Accurately reconstructing a 3D scene including explicit geometry information is both attractive and challenging. Geometry reconstruction can benefit from incorporating differentiable appearance models, such as Neural Radiance Fields and 3D Gaussian Splatting (3DGS). In this work, we propose a learnable scene model that incorporates 3DGS with an explicit geometry representation, namely a mesh. Our model learns the mesh and appearance in an end-to-end manner, where we bind 3D Gaussians to the mesh faces and perform differentiable rendering of 3DGS to obtain photometric supervision. The model creates an effective information pathway to supervise the learning of the scene, including the mesh. Experimental results demonstrate that the learned scene model not only achieves state-of-the-art rendering quality but also supports manipulation using the explicit mesh. In addition, our model has a unique advantage in adapting to scene updates, thanks to the end-to-end learning of both mesh and appearance.
[147] arXiv:2405.06948 [pdf, other]: Title: Training-free Subject-Enhanced Attention Guidance for Compositional Text-to-image Generation

Authors: Shengyuan Liu, Bo Wang, Ye Ma, Te Yang, Xipeng Cao, Quan Chen, Han Li, Di Dong, Peng Jiang

Comments: 26 pages, 13 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Existing subject-driven text-to-image generation models suffer from tedious fine-tuning steps and struggle to maintain both text-image alignment and subject fidelity. For generating compositional subjects, it often encounters problems such as object missing and attribute mixing, where some subjects in the input prompt are not generated or their attributes are incorrectly combined. To address these limitations, we propose a subject-driven generation framework and introduce training-free guidance to intervene in the generative process during inference time. This approach strengthens the attention map, allowing for precise attribute binding and feature injection for each subject. Notably, our method exhibits exceptional zero-shot generation ability, especially in the challenging task of compositional generation. Furthermore, we propose a novel metric GroundingScore to evaluate subject alignment thoroughly. The obtained quantitative results serve as compelling evidence showcasing the effectiveness of our proposed method. The code will be released soon.
[148] arXiv:2405.06951 [pdf, ps, other]: Title: Intelligent Reflecting Surface-Aided Radar Spoofing

Authors: Haozhe Wang, Beixiong Zheng, Xiaodan Shao, Rui Zhang

Comments: 5 pages, 4 figures

Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

Electronic countermeasure (ECM) technology plays a critical role in modern electronic warfare, which can interfere with enemy radar detection systems by noise or deceptive signals. However, the conventional active jamming strategy incurs additional hardware and power costs and has the potential threat of exposing the target itself. To tackle the above challenges, we propose a new intelligent reflecting surface (IRS)-aided radar spoofing strategy in this letter, where IRS is deployed on the surface of a target to help eliminate the signals reflected towards the hostile radar to shield the target, while simultaneously redirecting its reflected signal towards a surrounding clutter to generate deceptive angle-of-arrival (AoA) sensing information for the radar. We optimize the IRS's reflection to maximize the received signal power at the radar from the direction of the selected clutter subject to the constraint that its received power from the direction of the target is lower than a given detection threshold. We first solve this non-convex optimization problem using the semidefinite relaxation (SDR) method and further propose a lower-complexity solution for real-time implementation. Simulation results validate the efficacy of our proposed IRS-aided spoofing system as compared to various benchmark schemes.
[149] arXiv:2405.06954 [pdf, ps, other]: Title: A Convergence Theorem for the Parareal Algorithm Revisited

Authors: Ernest Scheiber

Comments: 14 pages

Subjects: Numerical Analysis (math.NA)

The subject of the paper is to verify the convergence conditions for the parareal algorithm using Gander and Hairer's theorem . The analysis is conducted in the case where the coarse integrator is the Euler method and the high-accuracy integrator is an explicit Runge-Kutta type method.
[150] arXiv:2405.06957 [pdf, other]: Title: On the Role of Intelligence and Business Wargaming in Developing Foresight

Authors: Aline Werro, Christian Nitzl, Uwe M. Borghoff

Comments: 18 pages, 3 figures. The authors declare no potential conflicts of interest

Subjects: Computers and Society (cs.CY)

Business wargaming is a central tool for developing sustaining strategies. It transfers the benefits of traditional wargaming to the business environment. However, building wargames that support the process of decision-making for strategy require respective intelligence. This paper investigates the role of intelligence in the process of developing strategic foresight. The focus is on how intelligence is developed and how it relates to business wargaming. The so-called intelligence cycle is the basis and reference of our investigation.
The conceptual part of the paper combines the theoretical background from military, business as well as serious gaming. To elaborate on some of the lessons learned, we examine specific business wargames both drawn from the literature and conducted by us at the Center for Intelligence and Security Studies (CISS). It is shown that business wargaming can make a significant contribution to the transformation of data to intelligence by supporting the intelligence cycle in two crucial phases. Furthermore, it brings together business intelligence (BI) and competitive intelligence (CI) and it bridges the gap to a company's strategy by either testing or developing a new strategy. We were also able to confirm this finding based on the business wargame we conducted at a major semiconductor manufacturer.
[151] arXiv:2405.06959 [pdf, other]: Title: AHPPEBot: Autonomous Robot for Tomato Harvesting based on Phenotyping and Pose Estimation

Authors: Xingxu Li, Nan Ma, Yiheng Han, Shun Yang, Siyi Zheng

Comments: Accepted by 2024 IEEE International Conference on Robotics and Automation (ICRA),7 pages, 3 figures

Subjects: Robotics (cs.RO)

To address the limitations inherent to conventional automated harvesting robots specifically their suboptimal success rates and risk of crop damage, we design a novel bot named AHPPEBot which is capable of autonomous harvesting based on crop phenotyping and pose estimation. Specifically, In phenotyping, the detection, association, and maturity estimation of tomato trusses and individual fruits are accomplished through a multi-task YOLOv5 model coupled with a detection-based adaptive DBScan clustering algorithm. In pose estimation, we employ a deep learning model to predict seven semantic keypoints on the pedicel. These keypoints assist in the robot's path planning, minimize target contact, and facilitate the use of our specialized end effector for harvesting. In autonomous tomato harvesting experiments conducted in commercial greenhouses, our proposed robot achieved a harvesting success rate of 86.67%, with an average successful harvest time of 32.46 s, showcasing its continuous and robust harvesting capabilities. The result underscores the potential of harvesting robots to bridge the labor gap in agriculture.
[152] arXiv:2405.06964 [pdf, other]: Title: ManiFoundation Model for General-Purpose Robotic Manipulation of Contact Synthesis with Arbitrary Objects and Robots

Authors: Zhixuan Xu, Chongkai Gao, Zixuan Liu, Gang Yang, Chenrui Tie, Haozhuo Zheng, Haoyu Zhou, Weikun Peng, Debang Wang, Tianyi Chen, Zhouliang Yu, Lin Shao

Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)

To substantially enhance robot intelligence, there is a pressing need to develop a large model that enables general-purpose robots to proficiently undertake a broad spectrum of manipulation tasks, akin to the versatile task-planning ability exhibited by LLMs. The vast diversity in objects, robots, and manipulation tasks presents huge challenges. Our work introduces a comprehensive framework to develop a foundation model for general robotic manipulation that formalizes a manipulation task as contact synthesis. Specifically, our model takes as input object and robot manipulator point clouds, object physical attributes, target motions, and manipulation region masks. It outputs contact points on the object and associated contact forces or post-contact motions for robots to achieve the desired manipulation task. We perform extensive experiments both in the simulation and real-world settings, manipulating articulated rigid objects, rigid objects, and deformable objects that vary in dimensionality, ranging from one-dimensional objects like ropes to two-dimensional objects like cloth and extending to three-dimensional objects such as plasticine. Our model achieves average success rates of around 90\%. Supplementary materials and videos are available on our project website at https://manifoundationmodel.github.io/.
[153] arXiv:2405.06965 [pdf, other]: Title: A De-singularity Subgradient Approach for the Extended Weber Location Problem

Authors: Zhao-Rong Lai, Xiaotian Wu, Liangda Fang, Ziliang Chen

Comments: IJCAI 2024

Subjects: Machine Learning (cs.LG)

The extended Weber location problem is a classical optimization problem that has inspired some new works in several machine learning scenarios recently. However, most existing algorithms may get stuck due to the singularity at the data points when the power of the cost function $1\leqslant q<2$, such as the widely-used iterative Weiszfeld approach. In this paper, we establish a de-singularity subgradient approach for this problem. We also provide a complete proof of convergence which has fixed some incomplete statements of the proofs for some previous Weiszfeld algorithms. Moreover, we deduce a new theoretical result of superlinear convergence for the iteration sequence in a special case where the minimum point is a singular point. We conduct extensive experiments in a real-world machine learning scenario to show that the proposed approach solves the singularity problem, produces the same results as in the non-singularity cases, and shows a reasonable rate of linear convergence. The results also indicate that the $q$-th power case ($1<q<2$) is more advantageous than the $1$-st power case and the $2$-nd power case in some situations. Hence the de-singularity subgradient approach is beneficial to advancing both theory and practice for the extended Weber location problem.
[154] arXiv:2405.06967 [pdf, other]: Title: Optimal Configuration of Reconfigurable Intelligent Surfaces With Non-uniform Phase Quantization

Authors: Jialong Lu, Rujing Xiong, Tiebin Mi, Ke Yin, Robert Caiming Qiu

Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

The existing methods for Reconfigurable Intelligent Surface (RIS) beamforming in wireless communication are typically limited to uniform phase quantization. However, in real world applications, the phase and bit resolution of RIS units are often non-uniform due to practical requirements and engineering challenges. To fill this research gap, we formulate an optimization problem for discrete non-uniform phase configuration in RIS assisted multiple-input single-output (MISO) communications. Subsequently, a partition-and-traversal (PAT) algorithm is proposed to solve that, achieving the global optimal solution. The efficacy and superiority of the PAT algorithm are validated through numerical simulations, and the impact of non-uniform phase quantization on system performance is analyzed.
[155] arXiv:2405.06971 [pdf, other]: Title: Controlling network-coupled neural dynamics with nonlinear network control theory

Authors: Zhongye Xia, Weibin Li, Zhichao Liang, Kexin Lou, Quanying Liu

Subjects: Systems and Control (eess.SY)

This paper addresses the problem of controlling the temporal dynamics of complex nonlinear network-coupled dynamical systems, specifically in terms of neurodynamics. Based on the Lyapunov direct method, we derive a control strategy with theoretical guarantees of controllability. To verify the performance of the derived control strategy, we perform numerical experiments on two nonlinear network-coupled dynamical systems that emulate phase synchronization and neural population dynamics. The results demonstrate the feasibility and effectiveness of our control strategy.
[156] arXiv:2405.06972 [pdf, other]: Title: A Machine Learning-based Approach for Solving Recurrence Relations and its use in Cost Analysis of Logic Programs

Authors: Louis Rustenholz, Maximiliano Klemen, Miguel Ángel Carreira-Perpiñán, Pedro López-García

Comments: arXiv admin note: text overlap with arXiv:2309.07259

Subjects: Programming Languages (cs.PL); Artificial Intelligence (cs.AI)

Automatic static cost analysis infers information about the resources used by programs without actually running them with concrete data, and presents such information as functions of input data sizes. Most of the analysis tools for logic programs (and many for other languages), as CiaoPP, are based on setting up recurrence relations representing (bounds on) the computational cost of predicates, and solving them to find closed-form functions. Such recurrence solving is a bottleneck in current tools: many of the recurrences that arise during the analysis cannot be solved with state-of-the-art solvers, including Computer Algebra Systems (CASs), so that specific methods for different classes of recurrences need to be developed. We address such a challenge by developing a novel, general approach for solving arbitrary, constrained recurrence relations, that uses machine-learning (sparse-linear and symbolic) regression techniques to guess a candidate closed-form function, and a combination of an SMT-solver and a CAS to check if it is actually a solution of the recurrence. Our prototype implementation and its experimental evaluation within the context of the CiaoPP system show quite promising results. Overall, for the considered benchmarks, our approach outperforms state-of-the-art cost analyzers and recurrence solvers, and solves recurrences that cannot be solved by them.
[157] arXiv:2405.06973 [pdf, other]: Title: A Primer for Preferential Non-Monotonic Propositional Team Logics

Authors: Kai Sauerwald, Juha Kontinen

Subjects: Artificial Intelligence (cs.AI); Logic in Computer Science (cs.LO)

This paper considers KLM-style preferential non-monotonic reasoning in the setting of propositional team semantics. We show that team-based propositional logics naturally give rise to cumulative non-monotonic entailment relations. Motivated by the non-classical interpretation of disjunction in team semantics, we give a precise characterization for preferential models for propositional dependence logic satisfying all of System P postulates. Furthermore, we show how classical entailment and dependence logic entailment can be expressed in terms of non-trivial preferential models.
[158] arXiv:2405.06975 [pdf, other]: Title: Input Snapshots Fusion for Scalable Discrete Dynamic Graph Nerual Networks

Authors: QingGuo Qi, Hongyang Chen, Minhao Cheng, Han Liu

Subjects: Machine Learning (cs.LG)

Dynamic graphs are ubiquitous in the real world, yet there is a lack of suitable theoretical frameworks to effectively extend existing static graph models into the temporal domain. Additionally, for link prediction tasks on discrete dynamic graphs, the requirement of substantial GPU memory to store embeddings of all nodes hinders the scalability of existing models. In this paper, we introduce an Input {\bf S}napshots {\bf F}usion based {\bf Dy}namic {\bf G}raph Neural Network (SFDyG). By eliminating the partitioning of snapshots within the input window, we obtain a multi-graph (more than one edge between two nodes). Subsequently, by introducing a graph denoising problem with the assumption of temporal decayed smoothing, we integrate Hawkes process theory into Graph Neural Networks to model the generated multi-graph. Furthermore, based on the multi-graph, we propose a scalable three-step mini-batch training method and demonstrate its equivalence to full-batch training counterpart. Our experiments, conducted on eight distinct dynamic graph datasets for future link prediction tasks, revealed that SFDyG generally surpasses related methods.
[159] arXiv:2405.06977 [pdf, ps, other]: Title: The Sample Complexity of Stackelberg Games

Authors: Francesco Bacchiocchi, Matteo Bollini, Matteo Castiglioni, Alberto Marchesi, Nicola Gatti

Subjects: Computer Science and Game Theory (cs.GT)

Stackelberg games (SGs) constitute the most fundamental and acclaimed models of strategic interactions involving some form of commitment. Moreover, they form the basis of more elaborate models of this kind, such as, e.g., Bayesian persuasion and principal-agent problems. Addressing learning tasks in SGs and related models is crucial to operationalize them in practice, where model parameters are usually unknown. In this paper, we revise the sample complexity of learning an optimal strategy to commit to in SGs. We provide a novel algorithm that (i) does not require any of the limiting assumptions made by state-of-the-art approaches and (ii) deals with a trade-off between sample complexity and termination probability arising when leader's strategies representation has finite precision. Such a trade-off has been completely neglected by existing algorithms and, if not properly managed, it may result in them using exponentially-many samples. Our algorithm requires novel techniques, which also pave the way to addressing learning problems in other models with commitment ubiquitous in the real world.
[160] arXiv:2405.06978 [pdf, other]: Title: On User Association in Large-Scale Heterogeneous LEO Satellite Network

Authors: Yuan Guo, Christodoulos Skouroumounis, Symeon Chatzinotas, Ioannis Krikidis

Subjects: Emerging Technologies (cs.ET); Networking and Internet Architecture (cs.NI)

In this paper, we investigate the performance of large-scale heterogeneous low Earth orbit (LEO) satellite networks in the context of three association schemes. In contrast to existing studies, where single-tier LEO satellite-based network deployments are considered, the developed framework captures the heterogeneous nature of real-world satellite network deployments. More specifically, we propose an analytical framework to evaluate the performance of multi-tier LEO satellite-based networks, where the locations of LEO satellites are approximated as points of independent Poisson point processes, with different density, transmit power, and altitude. We propose three association schemes for the considered network topology based on: 1) the Euclidean distance, 2) the average received power, and 3) a random selection. By using stochastic geometry tools, analytical expressions for the association probability, the downlink coverage probability, as well as the spectral efficiency are derived for each association scheme, where the interference is considered. Moreover, we assess the achieved network performance under several different fading environments, including low, typical, and severe fading conditions, namely non-fading, shadowed-Rician and Rayleigh fading channels, respectively. Our results reveal the impact of fading channels on the coverage probability, and illustrate that the average power-based association scheme outperforms in terms of achieved coverage and spectral efficiency performance against the other two association policies. Furthermore, we highlight the impact of the proposed association schemes and the network topology on the optimal number of LEO satellites, providing guidance for the planning of multi-tier LEO satellite-based networks in order to enhance network performance.
[161] arXiv:2405.06979 [pdf, other]: Title: Robust Semi-supervised Learning by Wisely Leveraging Open-set Data

Authors: Yang Yang, Nan Jiang, Yi Xu, De-Chuan Zhan

Subjects: Machine Learning (cs.LG)

Open-set Semi-supervised Learning (OSSL) holds a realistic setting that unlabeled data may come from classes unseen in the labeled set, i.e., out-of-distribution (OOD) data, which could cause performance degradation in conventional SSL models. To handle this issue, except for the traditional in-distribution (ID) classifier, some existing OSSL approaches employ an extra OOD detection module to avoid the potential negative impact of the OOD data. Nevertheless, these approaches typically employ the entire set of open-set data during their training process, which may contain data unfriendly to the OSSL task that can negatively influence the model performance. This inspires us to develop a robust open-set data selection strategy for OSSL. Through a theoretical understanding from the perspective of learning theory, we propose Wise Open-set Semi-supervised Learning (WiseOpen), a generic OSSL framework that selectively leverages the open-set data for training the model. By applying a gradient-variance-based selection mechanism, WiseOpen exploits a friendly subset instead of the whole open-set dataset to enhance the model's capability of ID classification. Moreover, to reduce the computational expense, we also propose two practical variants of WiseOpen by adopting low-frequency update and loss-based selection respectively. Extensive experiments demonstrate the effectiveness of WiseOpen in comparison with the state-of-the-art.
[162] arXiv:2405.06980 [pdf, other]: Title: Fractals as Pre-training Datasets for Anomaly Detection and Localization

Authors: C. I. Ugwu, S. Casarin, O. Lanz

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Anomaly detection is crucial in large-scale industrial manufacturing as it helps detect and localise defective parts. Pre-training feature extractors on large-scale datasets is a popular approach for this task. Stringent data security and privacy regulations and high costs and acquisition time hinder the availability and creation of such large datasets. While recent work in anomaly detection primarily focuses on the development of new methods built on such extractors, the importance of the data used for pre-training has not been studied. Therefore, we evaluated the performance of eight state-of-the-art methods pre-trained using dynamically generated fractal images on the famous benchmark datasets MVTec and VisA. In contrast to existing literature, which predominantly examines the transfer-learning capabilities of fractals, in this study, we compare models pre-trained with fractal images against those pre-trained with ImageNet, without subsequent fine-tuning. Although pre-training with ImageNet remains a clear winner, the results of fractals are promising considering that the anomaly detection task required features capable of discerning even minor visual variations. This opens up the possibility for a new research direction where feature extractors could be trained on synthetically generated abstract datasets reconciling the ever-increasing demand for data in machine learning while circumventing privacy and security concerns.
[163] arXiv:2405.06981 [pdf, other]: Title: AraSpell: A Deep Learning Approach for Arabic Spelling Correction

Authors: Mahmoud Salhab, Faisal Abu-Khzam

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Spelling correction is the task of identifying spelling mistakes, typos, and grammatical mistakes in a given text and correcting them according to their context and grammatical structure. This work introduces "AraSpell," a framework for Arabic spelling correction using different seq2seq model architectures such as Recurrent Neural Network (RNN) and Transformer with artificial data generation for error injection, trained on more than 6.9 Million Arabic sentences. Thorough experimental studies provide empirical evidence of the effectiveness of the proposed approach, which achieved 4.8% and 1.11% word error rate (WER) and character error rate (CER), respectively, in comparison with labeled data of 29.72% WER and 5.03% CER. Our approach achieved 2.9% CER and 10.65% WER in comparison with labeled data of 10.02% CER and 50.94% WER. Both of these results are obtained on a test set of 100K sentences.
[164] arXiv:2405.06983 [pdf, other]: Title: ISAC-Assisted Wireless Rechargeable Sensor Networks with Multiple Mobile Charging Vehicles

Authors: Muhammad Umar Farooq Qaisar, Weijie Yuan, Paolo Bellavista, Guangjie Han, Adeel Ahmed

Comments: Accepted for publication in the Special Issue Q1'2024, "Integrating Sensing and Communication for Ubiquitous Internet of Things," IEEE Internet of Things Magazine

Subjects: Networking and Internet Architecture (cs.NI)

As IoT-based wireless sensor networks (WSNs) become more prevalent, the issue of energy shortages becomes more pressing. One potential solution is the use of wireless power transfer (WPT) technology, which is the key to building a new shape of wireless rechargeable sensor networks (WRSNs). However, efficient charging and scheduling are critical for WRSNs to function properly. Motivated by the fact that probabilistic techniques can help enhance the effectiveness of charging scheduling for WRSNs, this article addresses the aforementioned issue and proposes a novel ISAC-assisted WRSN protocol. In particular, our proposed protocol considers several factors to balance the charging load on each mobile charging vehicle (MCV), uses an efficient charging factor strategy to partially charge network devices, and employs the ISAC concept to reduce the traveling cost of each MCV and prevent charging conflicts. Simulation results demonstrate that this protocol outperforms other classic, cutting-edge protocols in multiple areas.
[165] arXiv:2405.06985 [pdf, other]: Title: RoTHP: Rotary Position Embedding-based Transformer Hawkes Process

Authors: Anningzhe Gao, Shan Dai

Subjects: Machine Learning (cs.LG)

Temporal Point Processes (TPPs), especially Hawkes Process are commonly used for modeling asynchronous event sequences data such as financial transactions and user behaviors in social networks. Due to the strong fitting ability of neural networks, various neural Temporal Point Processes are proposed, among which the Neural Hawkes Processes based on self-attention such as Transformer Hawkes Process (THP) achieve distinct performance improvement. Although the THP has gained increasing studies, it still suffers from the {sequence prediction issue}, i.e., training on history sequences and inferencing about the future, which is a prevalent paradigm in realistic sequence analysis tasks. What's more, conventional THP and its variants simply adopt initial sinusoid embedding in transformers, which shows performance sensitivity to temporal change or noise in sequence data analysis by our empirical study. To deal with the problems, we propose a new Rotary Position Embedding-based THP (RoTHP) architecture in this paper. Notably, we show the translation invariance property and {sequence prediction flexibility} of our RoTHP induced by the {relative time embeddings} when coupled with Hawkes process theoretically. Furthermore, we demonstrate empirically that our RoTHP can be better generalized in sequence data scenarios with timestamp translations and in sequence prediction tasks.
[166] arXiv:2405.06986 [pdf, ps, other]: Title: Revisiting the Efficacy of Signal Decomposition in AI-based Time Series Prediction

Authors: Kexin Jiang, Chuhan Wu, Yaoran Chen

Subjects: Machine Learning (cs.LG)

Time series prediction is a fundamental problem in scientific exploration and artificial intelligence (AI) technologies have substantially bolstered its efficiency and accuracy. A well-established paradigm in AI-driven time series prediction is injecting physical knowledge into neural networks through signal decomposition methods, and sustaining progress in numerous scenarios has been reported. However, we uncover non-negligible evidence that challenges the effectiveness of signal decomposition in AI-based time series prediction. We confirm that improper dataset processing with subtle future label leakage is unfortunately widely adopted, possibly yielding abnormally superior but misleading results. By processing data in a strictly causal way without any future information, the effectiveness of additional decomposed signals diminishes. Our work probably identifies an ingrained and universal error in time series modeling, and the de facto progress in relevant areas is expected to be revisited and calibrated to prevent future scientific detours and minimize practical losses.
[167] arXiv:2405.06989 [pdf, other]: Title: Stabilizing Circular Motion Within Nonconcentric Circular Boundary: A Mobius Transformation-Based Approach

Authors: Shubham Singh, Anoop Jain

Subjects: Robotics (cs.RO); Systems and Control (eess.SY)

Nonuniform motion constraints are ubiquitous in robotic applications. Geofencing control is one such paradigm where the motion of a robot must be constrained within a predefined boundary. This paper addresses the problem of stabilizing a unicycle robot around a desired circular orbit while confining its motion within a nonconcentric external circular boundary. Our solution approach relies on the concept of the so-called Mobius transformation that, under certain practical conditions, maps two nonconcentric circles to a pair of concentric circles, and hence, results in uniform spatial motion constraints. The choice of such a Mobius transformation is governed by the roots of a quadratic equation in the post-design analysis that decides how the regions enclosed by the two circles are mapped onto the two planes. We show that the problem can be formulated either as a trajectory-constraining problem or an obstacle-avoidance problem in the transformed plane, depending on these roots. Exploiting the idea of the barrier Lyapunov function, we propose a unique control law that solves both these contrasting problems in the transformed plane and renders a solution to the original problem in the actual plane. By relating parameters of two planes under Mobius transformation and its inverse map, we further establish a connection between the control laws in two planes and determine the control law to be applied in the actual plane. Simulation and experimental results are provided to illustrate the key theoretical developments.
[168] arXiv:2405.06991 [pdf, other]: Title: PIPE: Process Informed Parameter Estimation, a learning based approach to task generalized system identification

Authors: Constantin Schempp, Christian Friedrich

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

Subjects: Robotics (cs.RO)

We address the problem of robot guided assembly tasks, by using a learning-based approach to identify contact model parameters for known and novel parts. First, a Variational Autoencoder (VAE) is used to extract geometric features of assembly parts. Then, we combine the extracted features with physical knowledge to derive the parameters of a contact model using our newly proposed neural network structure. The measured force from real experiments is used to supervise the predicted forces, thus avoiding the need for ground truth model parameters. Although trained only on a small set of assembly parts, good contact model estimation for unknown objects were achieved. Our main contribution is the network structure that allows us to estimate contact models of assembly tasks depending on the geometry of the part to be joined. Where current system identification processes have to record new data for a new assembly process, our method only requires the 3D model of the assembly part. We evaluate our method by estimating contact models for robot-guided assembly tasks of pin connectors as well as electronic plugs and compare the results with real experiments.
[169] arXiv:2405.06992 [pdf, ps, other]: Title: ResSurv: Cancer Survival Analysis Prediction Model Based on Residual Networks

Authors: Wankang Zhai

Comments: 7pages, 7figures

Subjects: Machine Learning (cs.LG); Applications (stat.AP)

Survival prediction is an important branch of cancer prognosis analysis. The model that predicts survival risk through TCGA genomics data can discover genes related to cancer and provide diagnosis and treatment recommendations based on patient characteristics. We found that deep learning models based on Cox proportional hazards often suffer from overfitting when dealing with high-throughput data. Moreover, we found that as the number of network layers increases, the experimental results will not get better, and network degradation will occur. Based on this problem, we propose a new framework based on Deep Residual Learning. Combine the ideas of Cox proportional hazards and Residual. And name it ResSurv. First, ResSurv is a feed-forward deep learning network stacked by multiple basic ResNet Blocks. In each ResNet Block, we add a Normalization Layer to prevent gradient disappearance and gradient explosion. Secondly, for the loss function of the neural network, we inherited the Cox proportional hazards methods, applied the semi-parametric of the CPH model to the neural network, combined with the partial likelihood model, established the loss function, and performed backpropagation and gradient update. Finally, we compared ResSurv networks of different depths and found that we can effectively extract high-dimensional features. Ablation experiments and comparative experiments prove that our model has reached SOTA(state of the art) in the field of deep learning, and our network can effectively extract deep information.
[170] arXiv:2405.06993 [pdf, other]: Title: Robust Model Aggregation for Heterogeneous Federated Learning: Analysis and Optimizations

Authors: Yumeng Shao, Jun Li, Long Shi, Kang Wei, Ming Ding, Qianmu Li, Zengxiang Li, Wen Chen, Shi Jin

Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)

Conventional synchronous federated learning (SFL) frameworks suffer from performance degradation in heterogeneous systems due to imbalanced local data size and diverse computing power on the client side. To address this problem, asynchronous FL (AFL) and semi-asynchronous FL have been proposed to recover the performance loss by allowing asynchronous aggregation. However, asynchronous aggregation incurs a new problem of inconsistency between local updates and global updates. Motivated by the issues of conventional SFL and AFL, we first propose a time-driven SFL (T-SFL) framework for heterogeneous systems. The core idea of T-SFL is that the server aggregates the models from different clients, each with varying numbers of iterations, at regular time intervals. To evaluate the learning performance of T-SFL, we provide an upper bound on the global loss function. Further, we optimize the aggregation weights to minimize the developed upper bound. Then, we develop a discriminative model selection (DMS) algorithm that removes local models from clients whose number of iterations falls below a predetermined threshold. In particular, this algorithm ensures that each client's aggregation weight accurately reflects its true contribution to the global model update, thereby improving the efficiency and robustness of the system. To validate the effectiveness of T-SFL with the DMS algorithm, we conduct extensive experiments using several popular datasets including MNIST, Cifar-10, Fashion-MNIST, and SVHN. The experimental results demonstrate that T-SFL with the DMS algorithm can reduce the latency of conventional SFL by 50\%, while achieving an average 3\% improvement in learning accuracy over state-of-the-art AFL algorithms.
[171] arXiv:2405.06994 [pdf, other]: Title: GRASP-GCN: Graph-Shape Prioritization for Neural Architecture Search under Distribution Shifts

Authors: Sofia Casarin, Oswald Lanz, Sergio Escalera

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Neural Architecture Search (NAS) methods have shown to output networks that largely outperform human-designed networks. However, conventional NAS methods have mostly tackled the single dataset scenario, incuring in a large computational cost as the procedure has to be run from scratch for every new dataset. In this work, we focus on predictor-based algorithms and propose a simple and efficient way of improving their prediction performance when dealing with data distribution shifts. We exploit the Kronecker-product on the randomly wired search-space and create a small NAS benchmark composed of networks trained over four different datasets. To improve the generalization abilities, we propose GRASP-GCN, a ranking Graph Convolutional Network that takes as additional input the shape of the layers of the neural networks. GRASP-GCN is trained with the not-at-convergence accuracies, and improves the state-of-the-art of 3.3 % for Cifar-10 and increasing moreover the generalization abilities under data distribution shift.
[172] arXiv:2405.06995 [pdf, other]: Title: Benchmarking Cross-Domain Audio-Visual Deception Detection

Authors: Xiaobao Guo, Zitong Yu, Nithish Muthuchamy Selvaraj, Bingquan Shen, Adams Wai-Kin Kong, Alex C. Kot

Comments: 10 pages

Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)

Automated deception detection is crucial for assisting humans in accurately assessing truthfulness and identifying deceptive behavior. Conventional contact-based techniques, like polygraph devices, rely on physiological signals to determine the authenticity of an individual's statements. Nevertheless, recent developments in automated deception detection have demonstrated that multimodal features derived from both audio and video modalities may outperform human observers on publicly available datasets. Despite these positive findings, the generalizability of existing audio-visual deception detection approaches across different scenarios remains largely unexplored. To close this gap, we present the first cross-domain audio-visual deception detection benchmark, that enables us to assess how well these methods generalize for use in real-world scenarios. We used widely adopted audio and visual features and different architectures for benchmarking, comparing single-to-single and multi-to-single domain generalization performance. To further exploit the impacts using data from multiple source domains for training, we investigate three types of domain sampling strategies, including domain-simultaneous, domain-alternating, and domain-by-domain for multi-to-single domain generalization evaluation. Furthermore, we proposed the Attention-Mixer fusion method to improve performance, and we believe that this new cross-domain benchmark will facilitate future research in audio-visual deception detection. Protocols and source code are available at \href{https://github.com/Redaimao/cross_domain_DD}{https://github.com/Redaimao/cross\_domain\_DD}.
[173] arXiv:2405.06996 [pdf, other]: Title: Quite Good, but Not Enough: Nationality Bias in Large Language Models -- A Case Study of ChatGPT

Authors: Shucheng Zhu, Weikang Wang, Ying Liu

Comments: Accepted by LREC-COLING 2024

Subjects: Computation and Language (cs.CL)

While nationality is a pivotal demographic element that enhances the performance of language models, it has received far less scrutiny regarding inherent biases. This study investigates nationality bias in ChatGPT (GPT-3.5), a large language model (LLM) designed for text generation. The research covers 195 countries, 4 temperature settings, and 3 distinct prompt types, generating 4,680 discourses about nationality descriptions in Chinese and English. Automated metrics were used to analyze the nationality bias, and expert annotators alongside ChatGPT itself evaluated the perceived bias. The results show that ChatGPT's generated discourses are predominantly positive, especially compared to its predecessor, GPT-2. However, when prompted with negative inclinations, it occasionally produces negative content. Despite ChatGPT considering its generated text as neutral, it shows consistent self-awareness about nationality bias when subjected to the same pair-wise comparison annotation framework used by human annotators. In conclusion, while ChatGPT's generated texts seem friendly and positive, they reflect the inherent nationality biases in the real world. This bias may vary across different language versions of ChatGPT, indicating diverse cultural perspectives. The study highlights the subtle and pervasive nature of biases within LLMs, emphasizing the need for further scrutiny.
[174] arXiv:2405.06997 [pdf, other]: Title: Path Guiding for Wavefront Path Tracing: A Memory Efficient Approach for GPU Path Tracers

Authors: Bora Yalçıner (1), Ahmet Oğuz Akyüz (1) ((1) Middle East Technical University, Computer Engineering Department, Ankara, Turkey)

Comments: 14 pages, 11 figures

Subjects: Graphics (cs.GR)

We propose a path-guiding algorithm to be incorporated into the wavefront style of path tracers (WFPTs). As WFPTs are primarily implemented on graphics processing units (GPUs), the proposed method aims to leverage the capabilities of the GPUs and reduce the hierarchical data structure and memory usage typically required for such techniques. To achieve this, our algorithm only stores the radiant exitance on a single global sparse voxel octree (SVO) data structure. Probability density functions required to guide the rays are generated on-the-fly using this data structure. The proposed approach reduces the scene-related persistent memory requirements compared to other path-guiding techniques while producing similar or better results depending on scene characteristics. To our knowledge, our algorithm is the first one that incorporates path guiding into a WFPT.
[175] arXiv:2405.06999 [pdf, other]: Title: Large Language Model-aided Edge Learning in Distribution System State Estimation

Authors: Renyou Xie, Xin Yin, Chaojie Li, Nian Liu, Bo Zhao, Zhaoyang Dong

Subjects: Systems and Control (eess.SY)

Distribution system state estimation (DSSE) plays a crucial role in the real-time monitoring, control, and operation of distribution networks. Besides intensive computational requirements, conventional DSSE methods need high-quality measurements to obtain accurate states, whereas missing values often occur due to sensor failures or communication delays. To address these challenging issues, a forecast-then-estimate framework of edge learning is proposed for DSSE, leveraging large language models (LLMs) to forecast missing measurements and provide pseudo-measurements. Firstly, natural language-based prompts and measurement sequences are integrated by the proposed LLM to learn patterns from historical data and provide accurate forecasting results. Secondly, a convolutional layer-based neural network model is introduced to improve the robustness of state estimation under missing measurement. Thirdly, to alleviate the overfitting of the deep learning-based DSSE, it is reformulated as a multi-task learning framework containing shared and task-specific layers. The uncertainty weighting algorithm is applied to find the optimal weights to balance different tasks. The numerical simulation on the Simbench case is used to demonstrate the effectiveness of the proposed forecast-then-estimate framework.
[176] arXiv:2405.07001 [pdf, other]: Title: Evaluating Task-based Effectiveness of MLLMs on Charts

Authors: Yifan Wu, Lutao Yan, Yuyu Luo, Yunhai Wang, Nan Tang

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

In this paper, we explore a forward-thinking question: Is GPT-4V effective at low-level data analysis tasks on charts? To this end, we first curate a large-scale dataset, named ChartInsights, consisting of 89,388 quartets (chart, task, question, answer) and covering 10 widely-used low-level data analysis tasks on 7 chart types. Firstly, we conduct systematic evaluations to understand the capabilities and limitations of 18 advanced MLLMs, which include 12 open-source models and 6 closed-source models. Starting with a standard textual prompt approach, the average accuracy rate across the 18 MLLMs is 36.17%. Among all the models, GPT-4V achieves the highest accuracy, reaching 56.13%. To understand the limitations of multimodal large models in low-level data analysis tasks, we have designed various experiments to conduct an in-depth test of capabilities of GPT-4V. We further investigate how visual modifications to charts, such as altering visual elements (e.g. changing color schemes) and introducing perturbations (e.g. adding image noise), affect performance of GPT-4V. Secondly, we present 12 experimental findings. These findings suggest potential of GPT-4V to revolutionize interaction with charts and uncover the gap between human analytic needs and capabilities of GPT-4V. Thirdly, we propose a novel textual prompt strategy, named Chain-of-Charts, tailored for low-level analysis tasks, which boosts model performance by 24.36%, resulting in an accuracy of 80.49%. Furthermore, by incorporating a visual prompt strategy that directs attention of GPT-4V to question-relevant visual elements, we further improve accuracy to 83.83%. Our study not only sheds light on the capabilities and limitations of GPT-4V in low-level data analysis tasks but also offers valuable insights for future research.
[177] arXiv:2405.07004 [pdf, other]: Title: Stealthy Imitation: Reward-guided Environment-free Policy Stealing

Authors: Zhixiong Zhuang, Maria-Irina Nicolae, Mario Fritz

Comments: Accepted at ICML 2024. Project page: this https URL

Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)

Deep reinforcement learning policies, which are integral to modern control systems, represent valuable intellectual property. The development of these policies demands considerable resources, such as domain expertise, simulation fidelity, and real-world validation. These policies are potentially vulnerable to model stealing attacks, which aim to replicate their functionality using only black-box access. In this paper, we propose Stealthy Imitation, the first attack designed to steal policies without access to the environment or knowledge of the input range. This setup has not been considered by previous model stealing methods. Lacking access to the victim's input states distribution, Stealthy Imitation fits a reward model that allows to approximate it. We show that the victim policy is harder to imitate when the distribution of the attack queries matches that of the victim. We evaluate our approach across diverse, high-dimensional control tasks and consistently outperform prior data-free approaches adapted for policy stealing. Lastly, we propose a countermeasure that significantly diminishes the effectiveness of the attack.
[178] arXiv:2405.07006 [pdf, other]: Title: Word-specific tonal realizations in Mandarin

Authors: Yu-Ying Chuang, Melanie J. Bell, Yu-Hsiang Tseng, R. Harald Baayen

Subjects: Computation and Language (cs.CL)

The pitch contours of Mandarin two-character words are generally understood as being shaped by the underlying tones of the constituent single-character words, in interaction with articulatory constraints imposed by factors such as speech rate, co-articulation with adjacent tones, segmental make-up, and predictability. This study shows that tonal realization is also partially determined by words' meanings. We first show, on the basis of a Taiwan corpus of spontaneous conversations, using the generalized additive regression model, and focusing on the rise-fall tone pattern, that after controlling for effects of speaker and context, word type is a stronger predictor of pitch realization than all the previously established word-form related predictors combined. Importantly, the addition of information about meaning in context improves prediction accuracy even further. We then proceed to show, using computational modeling with context-specific word embeddings, that token-specific pitch contours predict word type with 50% accuracy on held-out data, and that context-sensitive, token-specific embeddings can predict the shape of pitch contours with 30% accuracy. These accuracies, which are an order of magnitude above chance level, suggest that the relation between words' pitch contours and their meanings are sufficiently strong to be functional for language users. The theoretical implications of these empirical findings are discussed.
[179] arXiv:2405.07007 [pdf, ps, other]: Title: A New Algorithm for Computing Branch Number of Non-Singular Matrices over Finite Fields

Authors: P.R. Mishra, Yogesh Kumar, Susanta Samanta, Atul Gaur

Subjects: Cryptography and Security (cs.CR)

The notion of branch numbers of a linear transformation is crucial for both linear and differential cryptanalysis. The number of non-zero elements in a state difference or linear mask directly correlates with the active S-Boxes. The differential or linear branch number indicates the minimum number of active S-Boxes in two consecutive rounds of an SPN cipher, specifically for differential or linear cryptanalysis, respectively. This paper presents a new algorithm for computing the branch number of non-singular matrices over finite fields. The algorithm is based on the existing classical method but demonstrates improved computational complexity compared to its predecessor. We conduct a comparative study of the proposed algorithm and the classical approach, providing an analytical estimation of the algorithm's complexity. Our analysis reveals that the computational complexity of our algorithm is the square root of that of the classical approach.
[180] arXiv:2405.07010 [pdf, other]: Title: Deciphering public attention to geoengineering and climate issues using machine learning and dynamic analysis

Authors: Ramit Debnath, Pengyu Zhang, Tianzhu Qin, R. Michael Alvarez, Shaun D. Fitzgerald

Comments: 46 page, 6 main figures and SI

Subjects: Computers and Society (cs.CY); Computation and Language (cs.CL)

As the conversation around using geoengineering to combat climate change intensifies, it is imperative to engage the public and deeply understand their perspectives on geoengineering research, development, and potential deployment. Through a comprehensive data-driven investigation, this paper explores the types of news that captivate public interest in geoengineering. We delved into 30,773 English-language news articles from the BBC and the New York Times, combined with Google Trends data spanning 2018 to 2022, to explore how public interest in geoengineering fluctuates in response to news coverage of broader climate issues. Using BERT-based topic modeling, sentiment analysis, and time-series regression models, we found that positive sentiment in energy-related news serves as a good predictor of heightened public interest in geoengineering, a trend that persists over time. Our findings suggest that public engagement with geoengineering and climate action is not uniform, with some topics being more potent in shaping interest over time, such as climate news related to energy, disasters, and politics. Understanding these patterns is crucial for scientists, policymakers, and educators aiming to craft effective strategies for engaging with the public and fostering dialogue around emerging climate technologies.
[181] arXiv:2405.07011 [pdf, other]: Title: Fair Graph Representation Learning via Sensitive Attribute Disentanglement

Authors: Yuchang Zhu, Jintang Li, Zibin Zheng, Liang Chen

Comments: Accepted by WWW 2024

Subjects: Machine Learning (cs.LG); Computers and Society (cs.CY)

Group fairness for Graph Neural Networks (GNNs), which emphasizes algorithmic decisions neither favoring nor harming certain groups defined by sensitive attributes (e.g., race and gender), has gained considerable attention. In particular, the objective of group fairness is to ensure that the decisions made by GNNs are independent of the sensitive attribute. To achieve this objective, most existing approaches involve eliminating sensitive attribute information in node representations or algorithmic decisions. However, such ways may also eliminate task-related information due to its inherent correlation with the sensitive attribute, leading to a sacrifice in utility. In this work, we focus on improving the fairness of GNNs while preserving task-related information and propose a fair GNN framework named FairSAD. Instead of eliminating sensitive attribute information, FairSAD enhances the fairness of GNNs via Sensitive Attribute Disentanglement (SAD), which separates the sensitive attribute-related information into an independent component to mitigate its impact. Additionally, FairSAD utilizes a channel masking mechanism to adaptively identify the sensitive attribute-related component and subsequently decorrelates it. Overall, FairSAD minimizes the impact of the sensitive attribute on GNN outcomes rather than eliminating sensitive attributes, thereby preserving task-related information associated with the sensitive attribute. Furthermore, experiments conducted on several real-world datasets demonstrate that FairSAD outperforms other state-of-the-art methods by a significant margin in terms of both fairness and utility performance. Our source code is available at https://github.com/ZzoomD/FairSAD.
[182] arXiv:2405.07012 [pdf, other]: Title: Incorporating Degradation Estimation in Light Field Spatial Super-Resolution

Authors: Zeyu Xiao, Zhiwei Xiong

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Recent advancements in light field super-resolution (SR) have yielded impressive results. In practice, however, many existing methods are limited by assuming fixed degradation models, such as bicubic downsampling, which hinders their robustness in real-world scenarios with complex degradations. To address this limitation, we present LF-DEST, an effective blind Light Field SR method that incorporates explicit Degradation Estimation to handle various degradation types. LF-DEST consists of two primary components: degradation estimation and light field restoration. The former concurrently estimates blur kernels and noise maps from low-resolution degraded light fields, while the latter generates super-resolved light fields based on the estimated degradations. Notably, we introduce a modulated and selective fusion module that intelligently combines degradation representations with image information, allowing for effective handling of diverse degradation types. We conduct extensive experiments on benchmark datasets, demonstrating that LF-DEST achieves superior performance across a variety of degradation scenarios in light field SR.
[183] arXiv:2405.07017 [pdf, other]: Title: Robot Agnostic Visual Servoing considering kinematic constraints enabled by a decoupled network trajectory planner structure

Authors: Constantin Schempp, Christian Friedrich

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

Subjects: Robotics (cs.RO)

We propose a visual servoing method consisting of a detection network and a velocity trajectory planner. First, the detection network estimates the objects position and orientation in the image space. Furthermore, these are normalized and filtered. The direction and orientation is then the input to the trajectory planner, which considers the kinematic constrains of the used robotic system. This allows safe and stable control, since the kinematic boundary values are taken into account in planning. Also, by having direction estimation and velocity planner separated, the learning part of the method does not directly influence the control value. This also enables the transfer of the method to different robotic systems without retraining, therefore being robot agnostic. We evaluate our method on different visual servoing tasks with and without clutter on two different robotic systems. Our method achieved mean absolute position errors of <0.5 mm and orientation errors of <1{\deg}. Additionally, we transferred the method to a new system which differs in robot and camera, emphasizing robot agnostic capability of our method.
[184] arXiv:2405.07018 [pdf, other]: Title: Shadow-Free Membership Inference Attacks: Recommender Systems Are More Vulnerable Than You Thought

Authors: Xiaoxiao Chi, Xuyun Zhang, Yan Wang, Lianyong Qi, Amin Beheshti, Xiaolong Xu, Kim-Kwang Raymond Choo, Shuo Wang, Hongsheng Hu

Comments: This paper has been accepted by IJCAI-24

Subjects: Cryptography and Security (cs.CR)

Recommender systems have been successfully applied in many applications. Nonetheless, recent studies demonstrate that recommender systems are vulnerable to membership inference attacks (MIAs), leading to the leakage of users' membership privacy. However, existing MIAs relying on shadow training suffer a large performance drop when the attacker lacks knowledge of the training data distribution and the model architecture of the target recommender system. To better understand the privacy risks of recommender systems, we propose shadow-free MIAs that directly leverage a user's recommendations for membership inference. Without shadow training, the proposed attack can conduct MIAs efficiently and effectively under a practice scenario where the attacker is given only black-box access to the target recommender system. The proposed attack leverages an intuition that the recommender system personalizes a user's recommendations if his historical interactions are used by it. Thus, an attacker can infer membership privacy by determining whether the recommendations are more similar to the interactions or the general popular items. We conduct extensive experiments on benchmark datasets across various recommender systems. Remarkably, our attack achieves far better attack accuracy with low false positive rates than baselines while with a much lower computational cost.
[185] arXiv:2405.07020 [pdf, other]: Title: Adaptive Online Bayesian Estimation of Frequency Distributions with Local Differential Privacy

Authors: Soner Aydin, Sinan Yildirim

Comments: Code for experiments available at this https URL

Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Machine Learning (stat.ML)

We propose a novel Bayesian approach for the adaptive and online estimation of the frequency distribution of a finite number of categories under the local differential privacy (LDP) framework. The proposed algorithm performs Bayesian parameter estimation via posterior sampling and adapts the randomization mechanism for LDP based on the obtained posterior samples. We propose a randomized mechanism for LDP which uses a subset of categories as an input and whose performance depends on the selected subset and the true frequency distribution. By using the posterior sample as an estimate of the frequency distribution, the algorithm performs a computationally tractable subset selection step to maximize the utility of the privatized response of the next user. We propose several utility functions related to well-known information metrics, such as (but not limited to) Fisher information matrix, total variation distance, and information entropy. We compare each of these utility metrics in terms of their computational complexity. We employ stochastic gradient Langevin dynamics for posterior sampling, a computationally efficient approximate Markov chain Monte Carlo method. We provide a theoretical analysis showing that (i) the posterior distribution targeted by the algorithm converges to the true parameter even for approximate posterior sampling, and (ii) the algorithm selects the optimal subset with high probability if posterior sampling is performed exactly. We also provide numerical results that empirically demonstrate the estimation accuracy of our algorithm where we compare it with nonadaptive and semi-adaptive approaches under experimental settings with various combinations of privacy parameters and population distribution parameters.
[186] arXiv:2405.07022 [pdf, other]: Title: DTMamba : Dual Twin Mamba for Time Series Forecasting

Authors: Zexue Wu, Yifeng Gong, Aoqian Zhang

Subjects: Machine Learning (cs.LG); Databases (cs.DB)

We utilized the Mamba model for time series data prediction tasks, and the experimental results indicate that our model performs well.
[187] arXiv:2405.07024 [pdf, other]: Title: Demystifying the Hypercomplex: Inductive Biases in Hypercomplex Deep Learning

Authors: Danilo Comminiello, Eleonora Grassucci, Danilo P. Mandic, Aurelio Uncini

Comments: Accepted for Publication in IEEE Signal Processing Magazine

Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP)

Hypercomplex algebras have recently been gaining prominence in the field of deep learning owing to the advantages of their division algebras over real vector spaces and their superior results when dealing with multidimensional signals in real-world 3D and 4D paradigms. This paper provides a foundational framework that serves as a roadmap for understanding why hypercomplex deep learning methods are so successful and how their potential can be exploited. Such a theoretical framework is described in terms of inductive bias, i.e., a collection of assumptions, properties, and constraints that are built into training algorithms to guide their learning process toward more efficient and accurate solutions. We show that it is possible to derive specific inductive biases in the hypercomplex domains, which extend complex numbers to encompass diverse numbers and data structures. These biases prove effective in managing the distinctive properties of these domains, as well as the complex structures of multidimensional and multimodal signals. This novel perspective for hypercomplex deep learning promises to both demystify this class of methods and clarify their potential, under a unifying framework, and in this way promotes hypercomplex models as viable alternatives to traditional real-valued deep learning for multidimensional signal processing.
[188] arXiv:2405.07027 [pdf, other]: Title: TD-NeRF: Novel Truncated Depth Prior for Joint Camera Pose and Neural Radiance Field Optimization

Authors: Zhen Tan, Zongtan Zhou, Yangbing Ge, Zi Wang, Xieyuanli Chen, Dewen Hu

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Robotics (cs.RO)

The reliance on accurate camera poses is a significant barrier to the widespread deployment of Neural Radiance Fields (NeRF) models for 3D reconstruction and SLAM tasks. The existing method introduces monocular depth priors to jointly optimize the camera poses and NeRF, which fails to fully exploit the depth priors and neglects the impact of their inherent noise. In this paper, we propose Truncated Depth NeRF (TD-NeRF), a novel approach that enables training NeRF from unknown camera poses - by jointly optimizing learnable parameters of the radiance field and camera poses. Our approach explicitly utilizes monocular depth priors through three key advancements: 1) we propose a novel depth-based ray sampling strategy based on the truncated normal distribution, which improves the convergence speed and accuracy of pose estimation; 2) to circumvent local minima and refine depth geometry, we introduce a coarse-to-fine training strategy that progressively improves the depth precision; 3) we propose a more robust inter-frame point constraint that enhances robustness against depth noise during training. The experimental results on three datasets demonstrate that TD-NeRF achieves superior performance in the joint optimization of camera pose and NeRF, surpassing prior works, and generates more accurate depth geometry. The implementation of our method has been released at https://github.com/nubot-nudt/TD-NeRF.
[189] arXiv:2405.07029 [pdf, ps, other]: Title: A framework of text-dependent speaker verification for chinese numerical string corpus

Authors: Litong Zheng, Feng Hong, Weijie Xu, Wan Zheng

Comments: arXiv admin note: text overlap with arXiv:2312.01645

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)

The Chinese numerical string corpus, serves as a valuable resource for speaker verification, particularly in financial transactions. Researches indicate that in short speech scenarios, text-dependent speaker verification (TD-SV) consistently outperforms text-independent speaker verification (TI-SV). However, TD-SV potentially includes the validation of text information, that can be negatively impacted by reading rhythms and pauses. To address this problem, we propose an end-to-end speaker verification system that enhances TD-SV by decoupling speaker and text information. Our system consists of a text embedding extractor, a speaker embedding extractor and a fusion module. In the text embedding extractor, we employ an enhanced Transformer and introduce a triple loss including text classification loss, connectionist temporal classification (CTC) loss and decoder loss; while in the speaker embedding extractor, we create a multi-scale pooling method by combining sliding window attentive statistics pooling (SWASP) with attentive statistics pooling (ASP). To mitigate the scarcity of data, we have recorded a publicly available Chinese numerical corpus named SHALCAS22A (hereinafter called SHAL), which can be accessed on Open-SLR. Moreover, we employ data augmentation techniques using Tacotron2 and HiFi-GAN. Our method achieves an equal error rate (EER) performance improvement of 49.2% on Hi-Mia and 75.0% on SHAL, respectively.
[190] arXiv:2405.07030 [pdf, ps, other]: Title: Lasso Ridge based XGBoost and Deep_LSTM Help Tennis Players Perform better

Authors: Wankang Zhai, Yuhan Wang

Comments: 22 pages, 11 figures

Subjects: Machine Learning (cs.LG)

Understanding the dynamics of momentum and game fluctuation in tennis matches is cru-cial for predicting match outcomes and enhancing player performance. In this study, we present a comprehensive analysis of these factors using a dataset from the 2023 Wimbledon final. Ini-tially, we develop a sliding-window-based scoring model to assess player performance, ac-counting for the influence of serving dominance through a serve decay factor. Additionally, we introduce a novel approach, Lasso-Ridge-based XGBoost, to quantify momentum effects, lev-eraging the predictive power of XGBoost while mitigating overfitting through regularization. Through experimentation, we achieve an accuracy of 94% in predicting match outcomes, iden-tifying key factors influencing winning rates. Subsequently, we propose a Derivative of the winning rate algorithm to quantify game fluctuation, employing an LSTM_Deep model to pre-dict fluctuation scores. Our model effectively captures temporal correlations in momentum fea-tures, yielding mean squared errors ranging from 0.036 to 0.064. Furthermore, we explore me-ta-learning using MAML to transfer our model to predict outcomes in ping-pong matches, though results indicate a comparative performance decline. Our findings provide valuable in-sights into momentum dynamics and game fluctuation, offering implications for sports analytics and player training strategies.
[191] arXiv:2405.07031 [pdf, other]: Title: Global Motion Understanding in Large-Scale Video Object Segmentation

Authors: Volodymyr Fedynyak, Yaroslav Romanus, Oles Dobosevych, Igor Babin, Roman Riazantsev

Subjects: Computer Vision and Pattern Recognition (cs.CV)

In this paper, we show that transferring knowledge from other domains of video understanding combined with large-scale learning can improve robustness of Video Object Segmentation (VOS) under complex circumstances. Namely, we focus on integrating scene global motion knowledge to improve large-scale semi-supervised Video Object Segmentation. Prior works on VOS mostly rely on direct comparison of semantic and contextual features to perform dense matching between current and past frames, passing over actual motion structure. On the other hand, Optical Flow Estimation task aims to approximate the scene motion field, exposing global motion patterns which are typically undiscoverable during all pairs similarity search. We present WarpFormer, an architecture for semi-supervised Video Object Segmentation that exploits existing knowledge in motion understanding to conduct smoother propagation and more accurate matching. Our framework employs a generic pretrained Optical Flow Estimation network whose prediction is used to warp both past frames and instance segmentation masks to the current frame domain. Consequently, warped segmentation masks are refined and fused together aiming to inpaint occluded regions and eliminate artifacts caused by flow field imperfects. Additionally, we employ novel large-scale MOSE 2023 dataset to train model on various complex scenarios. Our method demonstrates strong performance on DAVIS 2016/2017 validation (93.0% and 85.9%), DAVIS 2017 test-dev (80.6%) and YouTube-VOS 2019 validation (83.8%) that is competitive with alternative state-of-the-art methods while using much simpler memory mechanism and instance understanding logic.
[192] arXiv:2405.07033 [pdf, ps, other]: Title: A Performance Analysis Modeling Framework for Extended Reality Applications in Edge-Assisted Wireless Networks

Authors: Anik Mallik, Jiang Xie, Zhu Han

Comments: 12 pages, 4 figures; To appear in Proceedings of IEEE International Conference on Distributed Computing Systems (ICDCS), 2024

Subjects: Networking and Internet Architecture (cs.NI); Computer Vision and Pattern Recognition (cs.CV); Distributed, Parallel, and Cluster Computing (cs.DC); Image and Video Processing (eess.IV)

Extended reality (XR) is at the center of attraction in the research community due to the emergence of augmented, mixed, and virtual reality applications. The performance of such applications needs to be uptight to maintain the requirements of latency, energy consumption, and freshness of data. Therefore, a comprehensive performance analysis model is required to assess the effectiveness of an XR application but is challenging to design due to the dependence of the performance metrics on several difficult-to-model parameters, such as computing resources and hardware utilization of XR and edge devices, which are controlled by both their operating systems and the application itself. Moreover, the heterogeneity in devices and wireless access networks brings additional challenges in modeling. In this paper, we propose a novel modeling framework for performance analysis of XR applications considering edge-assisted wireless networks and validate the model with experimental data collected from testbeds designed specifically for XR applications. In addition, we present the challenges associated with performance analysis modeling and present methods to overcome them in detail. Finally, the performance evaluation shows that the proposed analytical model can analyze XR applications' performance with high accuracy compared to the state-of-the-art analytical models.
[193] arXiv:2405.07034 [pdf, ps, other]: Title: Towards an Accessible and Rapidly Trainable Rhythm Sequencer Using a Generative Stacked Autoencoder

Authors: Alex Wastnidge

Comments: 7 pages, 7 figures

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)

Neural networks and deep learning are often deployed for the sake of the most comprehensive music generation with as little involvement as possible from the human musician. Implementations in aid of, or being a tool for, music practitioners are sparse. This paper proposes the integration of generative stacked autoencoder structures for rhythm generation, within a conventional melodic step-sequencer. It further aims to work towards its implementation being accessible to the average electronic music practitioner. Several model architectures have been trained and tested for their creative potential. While the currently implementations do display limitations, they do represent viable creative solutions for music practitioners.
[194] arXiv:2405.07035 [pdf, other]: Title: A Turkish Educational Crossword Puzzle

Authors: Kamyar Zeinalipour, Yusuf Gökberk Keptiğ, Marco Maggini, Leonardo Rigutini, Marco Gori

Comments: This paper has been accepted for presentation at AIED2024 LBR

Subjects: Computation and Language (cs.CL)

This paper introduces the first Turkish crossword puzzle generator designed to leverage the capabilities of large language models (LLMs) for educational purposes. In this work, we introduced two specially created datasets: one with over 180,000 unique answer-clue pairs for generating relevant clues from the given answer, and another with over 35,000 samples containing text, answer, category, and clue data, aimed at producing clues for specific texts and keywords within certain categories. Beyond entertainment, this generator emerges as an interactive educational tool that enhances memory, vocabulary, and problem-solving skills. It's a notable step in AI-enhanced education, merging game-like engagement with learning for Turkish and setting new standards for interactive, intelligent learning tools in Turkish.
[195] arXiv:2405.07037 [pdf, other]: Title: Robust Online Convex Optimization for Disturbance Rejection

Authors: Joyce Lai, Peter Seiler

Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC)

Online convex optimization (OCO) is a powerful tool for learning sequential data, making it ideal for high precision control applications where the disturbances are arbitrary and unknown in advance. However, the ability of OCO-based controllers to accurately learn the disturbance while maintaining closed-loop stability relies on having an accurate model of the plant. This paper studies the performance of OCO-based controllers for linear time-invariant (LTI) systems subject to disturbance and model uncertainty. The model uncertainty can cause the closed-loop to become unstable. We provide a sufficient condition for robust stability based on the small gain theorem. This condition is easily incorporated as an on-line constraint in the OCO controller. Finally, we verify via numerical simulations that imposing the robust stability condition on the OCO controller ensures closed-loop stability.
[196] arXiv:2405.07038 [pdf, other]: Title: Conformal Online Auction Design

Authors: Jiale Han, Xiaowu Dai

Subjects: Computer Science and Game Theory (cs.GT); Machine Learning (cs.LG); Machine Learning (stat.ML)

This paper proposes the conformal online auction design (COAD), a novel mechanism for maximizing revenue in online auctions by quantifying the uncertainty in bidders' values without relying on assumptions about value distributions. COAD incorporates both the bidder and item features and leverages historical data to provide an incentive-compatible mechanism for online auctions. Unlike traditional methods for online auctions, COAD employs a distribution-free, prediction interval-based approach using conformal prediction techniques. This novel approach ensures that the expected revenue from our mechanism can achieve at least a constant fraction of the revenue generated by the optimal mechanism. Additionally, COAD admits the use of a broad array of modern machine-learning methods, including random forests, kernel methods, and deep neural nets, for predicting bidders' values. It ensures revenue performance under any finite sample of historical data. Moreover, COAD introduces bidder-specific reserve prices based on the lower confidence bounds of bidders' valuations, which is different from the uniform reserve prices commonly used in the literature. We validate our theoretical predictions through extensive simulations and a real-data application. All code for using COAD and reproducing results is made available on GitHub.
[197] arXiv:2405.07040 [pdf, other]: Title: Low-Complexity OTFS-Based Over-the-Air Computation Design for Time-Varying Channels

Authors: Xinyu Huang, Henrik Hellström, Carlo Fischione

Comments: 14 pages, 11 figures, submitted to IEEE for possible publication. arXiv admin note: text overlap with arXiv:2403.11272

Subjects: Information Theory (cs.IT)

This paper investigates over-the-air computation (AirComp) over multiple-access time-varying channels, where devices with high mobility transmit their sensing data to a fusion center (FC) for averaging. To combat the Doppler shift induced by time-varying channels, each device adopts orthogonal time frequency space (OTFS) modulation. Our objective is minimizing the mean squared error (MSE) for the target function estimation. Due to the multipath time-varying channels, the OTFS-based AirComp not only suffers from noise but also interference. Specifically, we propose three schemes, namely S1, S2, and S3, for the target function estimation. S1 directly estimates the target function under the impacts of noise and interference. S2 mitigates the interference by introducing a zero padding-assisted OTFS. In S3, we propose an iterative algorithm to estimate the function in a matrix form. In the numerical results, we evaluate the performance of S1, S2, and S3 from the perspectives of MSE and computational complexity, and compare them with benchmarks. Specifically, compared to benchmarks, S3 outperforms them with a significantly lower MSE but incurs a higher computational complexity. In contrast, S2 demonstrates a reduction in both MSE and computational complexity. Lastly, S1 shows superior error performance at small SNR and reduced computational complexity.
[198] arXiv:2405.07041 [pdf, other]: Title: Multi-agent Traffic Prediction via Denoised Endpoint Distribution

Authors: Yao Liu, Ruoyu Wang, Yuanjiang Cao, Quan Z. Sheng, Lina Yao

Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)

The exploration of high-speed movement by robots or road traffic agents is crucial for autonomous driving and navigation. Trajectory prediction at high speeds requires considering historical features and interactions with surrounding entities, a complexity not as pronounced in lower-speed environments. Prior methods have assessed the spatio-temporal dynamics of agents but often neglected intrinsic intent and uncertainty, thereby limiting their effectiveness. We present the Denoised Endpoint Distribution model for trajectory prediction, which distinctively models agents' spatio-temporal features alongside their intrinsic intentions and uncertainties. By employing Diffusion and Transformer models to focus on agent endpoints rather than entire trajectories, our approach significantly reduces model complexity and enhances performance through endpoint information. Our experiments on open datasets, coupled with comparison and ablation studies, demonstrate our model's efficacy and the importance of its components. This approach advances trajectory prediction in high-speed scenarios and lays groundwork for future developments.
[199] arXiv:2405.07043 [pdf, other]: Title: Optimal Multilayered Motion Planning for Multiple Differential Drive Mobile Robots with Hierarchical Prioritization (OM-MP)

Authors: Zong Chen, Songyuan Fa, Yiqun Li

Subjects: Robotics (cs.RO)

We present a novel framework for addressing the challenges of multi-Agent planning and formation control within intricate and dynamic environments. This framework transforms the Multi-Agent Path Finding (MAPF) problem into a Multi-Agent Trajectory Planning (MATP) problem. Unlike traditional MAPF solutions, our multilayer optimization scheme consists of a global planner optimization solver, which is dedicated to determining concise global paths for each individual robot, and a local planner with an embedded optimization solver aimed at ensuring the feasibility of local robot trajectories. By implementing a hierarchical prioritization strategy, we enhance robots' efficiency and approximate the global optimal solution. Specifically, within the global planner, we employ the Augmented Graph Search (AGS) algorithm, which significantly improves the speed of solutions. Meanwhile, within the local planner optimization solver, we utilize Control Barrier functions (CBFs) and introduced an oblique cylindrical obstacle bounding box based on the time axis for obstacle avoidance and construct a single-robot locally aware-communication circle to ensure the simplicity, speed, and accuracy of locally optimized solutions. Additionally, we integrate the weight and priority of path traces to prevent deadlocks in limiting scenarios. Compared to the other state-of-the-art methods, including CBS, ECBS and other derivative algorithms, our proposed method demonstrates superior performance in terms of capacity, flexible scalability and overall task optimality in theory, as validated through simulations and experiments.
[200] arXiv:2405.07044 [pdf, other]: Title: Semantic Guided Large Scale Factor Remote Sensing Image Super-resolution with Generative Diffusion Prior

Authors: Ce Wang, Wanjie Sun

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Remote sensing images captured by different platforms exhibit significant disparities in spatial resolution. Large scale factor super-resolution (SR) algorithms are vital for maximizing the utilization of low-resolution (LR) satellite data captured from orbit. However, existing methods confront challenges in recovering SR images with clear textures and correct ground objects. We introduce a novel framework, the Semantic Guided Diffusion Model (SGDM), designed for large scale factor remote sensing image super-resolution. The framework exploits a pre-trained generative model as a prior to generate perceptually plausible SR images. We further enhance the reconstruction by incorporating vector maps, which carry structural and semantic cues. Moreover, pixel-level inconsistencies in paired remote sensing images, stemming from sensor-specific imaging characteristics, may hinder the convergence of the model and diversity in generated results. To address this problem, we propose to extract the sensor-specific imaging characteristics and model the distribution of them, allowing diverse SR images generation based on imaging characteristics provided by reference images or sampled from the imaging characteristic probability distributions. To validate and evaluate our approach, we create the Cross-Modal Super-Resolution Dataset (CMSRD). Qualitative and quantitative experiments on CMSRD showcase the superiority and broad applicability of our method. Experimental results on downstream vision tasks also demonstrate the utilitarian of the generated SR images. The dataset and code will be publicly available at https://github.com/wwangcece/SGDM
[201] arXiv:2405.07045 [pdf, other]: Title: Predictive Modeling in the Reservoir Kernel Motif Space

Authors: Peter Tino, Robert Simon Fong, Roberto Fabio Leonarduzzi

Comments: 8 pages

Subjects: Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

This work proposes a time series prediction method based on the kernel view of linear reservoirs. In particular, the time series motifs of the reservoir kernel are used as representational basis on which general readouts are constructed. We provide a geometric interpretation of our approach shedding light on how our approach is related to the core reservoir models and in what way the two approaches differ. Empirical experiments then compare predictive performances of our suggested model with those of recent state-of-art transformer based models, as well as the established recurrent network model - LSTM. The experiments are performed on both univariate and multivariate time series and with a variety of prediction horizons. Rather surprisingly we show that even when linear readout is employed, our method has the capacity to outperform transformer models on univariate time series and attain competitive results on multivariate benchmark datasets. We conclude that simple models with easily controllable capacity but capturing enough memory and subsequence structure can outperform potentially over-complicated deep learning models. This does not mean that reservoir motif based models are preferable to other more complex alternatives - rather, when introducing a new complex time series model one should employ as a sanity check simple, but potentially powerful alternatives/baselines such as reservoir models or the models introduced here.
[202] arXiv:2405.07046 [pdf, other]: Title: Retrieval Enhanced Zero-Shot Video Captioning

Authors: Yunchuan Ma, Laiyun Qing, Guorong Li, Yuankai Qi, Quan Z. Sheng, Qingming Huang

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Despite the significant progress of fully-supervised video captioning, zero-shot methods remain much less explored. In this paper, we propose to take advantage of existing pre-trained large-scale vision and language models to directly generate captions with test time adaptation. Specifically, we bridge video and text using three key models: a general video understanding model XCLIP, a general image understanding model CLIP, and a text generation model GPT-2, due to their source-code availability. The main challenge is how to enable the text generation model to be sufficiently aware of the content in a given video so as to generate corresponding captions. To address this problem, we propose using learnable tokens as a communication medium between frozen GPT-2 and frozen XCLIP as well as frozen CLIP. Differing from the conventional way to train these tokens with training data, we update these tokens with pseudo-targets of the inference data under several carefully crafted loss functions which enable the tokens to absorb video information catered for GPT-2. This procedure can be done in just a few iterations (we use 16 iterations in the experiments) and does not require ground truth data. Extensive experimental results on three widely used datasets, MSR-VTT, MSVD, and VATEX, show 4% to 20% improvements in terms of the main metric CIDEr compared to the existing state-of-the-art methods.
[203] arXiv:2405.07047 [pdf, other]: Title: Unsupervised Density Neural Representation for CT Metal Artifact Reduction

Authors: Qing Wu, Xu Guo, Lixuan Chen, Dongming He, Hongjiang Wei, Xudong Wang, S. Kevin Zhou, Yifeng Zhang, Jingyi Yu, Yuyao Zhang

Comments: 13 pages

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Emerging unsupervised reconstruction techniques based on implicit neural representation (INR), such as NeRP, CoIL, and SCOPE, have shown unique capabilities in CT linear inverse imaging. In this work, we propose a novel unsupervised density neural representation (Diner) to tackle the challenging problem of CT metal artifacts when scanned objects contain metals. The drastic variation of linear attenuation coefficients (LACs) of metals over X-ray spectra leads to a nonlinear beam hardening effect (BHE) in CT measurements. Recovering CT images from metal-affected measurements therefore poses a complicated nonlinear inverse problem. Existing metal artifact reduction (MAR) techniques mostly formulate the MAR as an image inpainting task, which ignores the energy-induced BHE and produces suboptimal performance. Instead, our Diner introduces an energy-dependent polychromatic CT forward model to the INR framework, addressing the nonlinear nature of the MAR problem. Specifically, we decompose the energy-dependent LACs into energy-independent densities and energy-dependent mass attenuation coefficients (MACs) by fully considering the physical model of X-ray absorption. Using the densities as pivot variables and the MACs as known prior knowledge, the LACs can be accurately reconstructed from the raw measurements. Technically, we represent the unknown density map as an implicit function of coordinates. Combined with a novel differentiable forward model simulating the physical acquisition from the densities to the measurements, our Diner optimizes a multi-layer perception network to approximate the implicit function by minimizing predicted errors between the estimated and real measurements. Experimental results on simulated and real datasets confirm the superiority of our unsupervised Diner against popular supervised techniques in MAR performance and robustness.
[204] arXiv:2405.07052 [pdf, other]: Title: Length-Aware Multi-Kernel Transformer for Long Document Classification

Authors: Guangzeng Han, Jack Tsao, Xiaolei Huang

Comments: Accepted to SEM 2024

Subjects: Computation and Language (cs.CL)

Lengthy documents pose a unique challenge to neural language models due to substantial memory consumption. While existing state-of-the-art (SOTA) models segment long texts into equal-length snippets (e.g., 128 tokens per snippet) or deploy sparse attention networks, these methods have new challenges of context fragmentation and generalizability due to sentence boundaries and varying text lengths. For example, our empirical analysis has shown that SOTA models consistently overfit one set of lengthy documents (e.g., 2000 tokens) while performing worse on texts with other lengths (e.g., 1000 or 4000). In this study, we propose a Length-Aware Multi-Kernel Transformer (LAMKIT) to address the new challenges for the long document classification. LAMKIT encodes lengthy documents by diverse transformer-based kernels for bridging context boundaries and vectorizes text length by the kernels to promote model robustness over varying document lengths. Experiments on five standard benchmarks from health and law domains show LAMKIT outperforms SOTA models up to an absolute 10.9% improvement. We conduct extensive ablation analyses to examine model robustness and effectiveness over varying document lengths.
[205] arXiv:2405.07054 [pdf, ps, other]: Title: LUCID: A Framework for Reducing False Positives and Inconsistencies Among Container Scanning Tools

Authors: Md Sadun Haq, Ali Saman Tosun, Turgay Korkmaz

Comments: 13 pages, 15 figures, 8 tables

Subjects: Cryptography and Security (cs.CR)

Containerization has emerged as a revolutionary technology in the software development and deployment industry. Containers offer a portable and lightweight solution that allows for packaging applications and their dependencies systematically and efficiently. In addition, containers offer faster deployment and near-native performance with isolation and security drawbacks compared to Virtual Machines. To address the security issues, scanning tools that scan containers for preexisting vulnerabilities have been developed, but they suffer from false positives. Moreover, using different scanning tools to scan the same container provides different results, which leads to inconsistencies and confusion. Limited work has been done to address these issues. This paper provides a fully functional and extensible framework named LUCID that can reduce false positives and inconsistencies provided by multiple scanning tools. We use a database-centric approach and perform query-based analysis, to pinpoint the causes for inconsistencies. Our results show that our framework can reduce inconsistencies by 70%. The framework has been tested on both Intel64/AMD64 and ARM architecture. We also create a Dynamic Classification component that can successfully classify and predict the different severity levels with an accuracy of 84%. We believe this paper will raise awareness regarding security in container technologies and enable container scanning companies to improve their tool to provide better and more consistent results.
[206] arXiv:2405.07056 [pdf, other]: Title: Graph $p$-Laplacian eigenpairs as saddle points of a family of spectral energy functions

Authors: Piero Deidda, Nicola Segala, Mario Putti

Subjects: Numerical Analysis (math.NA)

We address the problem of computing the graph $p$-Laplacian eigenpairs for $p\in (2,\infty)$. We propose a reformulation of the graph $p$-Laplacian eigenvalue problem in terms of a constrained weighted Laplacian eigenvalue problem and discuss theoretical and computational advantages. We provide a correspondence between $p$-Laplacian eigenpairs and linear eigenpair of a constrained generalized weighted Laplacian eigenvalue problem. As a result, we can assign an index to any $p$-Laplacian eigenpair that matches the Morse index of the $p$-Rayleigh quotient evaluated at the eigenfunction. In the second part of the paper we introduce a class of spectral energy functions that depend on edge and node weights. We prove that differentiable saddle points of the $k$-th energy function correspond to $p$-Laplacian eigenpairs having index equal to $k$. Moreover, the first energy function is proved to possess a unique saddle point which corresponds to the unique first $p$-Laplacian eigenpair. Finally we develop novel gradient-based numerical methods suited to compute $p$-Laplacian eigenpairs for any $p\in(2,\infty)$ and present some experiments.
[207] arXiv:2405.07059 [pdf, other]: Title: Numerical Analysis of Finite Dimensional Approximations in Finite Temperature DFT

Authors: Ge Xu, Huajie Chen, Xingyu Gao

Comments: 20 pages, 6 figures

Subjects: Numerical Analysis (math.NA)

In this paper, we study numerical approximations of the ground states in finite temperature density functional theory. We formulate the problem with respect to the density matrices and justify the convergence of the finite dimensional approximations. Moreover, we provide an optimal a priori error estimate under some mild assumptions and present some numerical experiments to support the theory.
[208] arXiv:2405.07060 [pdf, other]: Title: Memory-Maze: Scenario Driven Benchmark and Visual Language Navigation Model for Guiding Blind People

Authors: Masaki Kuribayashi, Kohei Uehara, Allan Wang, Daisuke Sato, Simon Chu, Shigeo Morishima

Subjects: Robotics (cs.RO)

Visual Language Navigation (VLN) powered navigation robots have the potential to guide blind people by understanding and executing route instructions provided by sighted passersby. This capability allows robots to operate in environments that are often unknown a priori. Existing VLN models are insufficient for the scenario of navigation guidance for blind people, as they need to understand routes described from human memory, which frequently contain stutters, errors, and omission of details as opposed to those obtained by thinking out loud, such as in the Room-to-Room dataset. However, currently, there is no benchmark that simulates instructions that were obtained from human memory in environments where blind people navigate. To this end, we present our benchmark, Memory-Maze, which simulates the scenario of seeking route instructions for guiding blind people. Our benchmark contains a maze-like structured virtual environment and novel route instruction data from human memory. To collect natural language instructions, we conducted two studies from sighted passersby onsite and annotators online. Our analysis demonstrates that instructions data collected onsite were more lengthy and contained more varied wording. Alongside our benchmark, we propose a VLN model better equipped to handle the scenario. Our proposed VLN model uses Large Language Models (LLM) to parse instructions and generate Python codes for robot control. We further show that the existing state-of-the-art model performed suboptimally on our benchmark. In contrast, our proposed method outperformed the state-of-the-art model by a fair margin. We found that future research should exercise caution when considering VLN technology for practical applications, as real-world scenarios have different characteristics than ones collected in traditional settings.
[209] arXiv:2405.07061 [pdf, other]: Title: LLMs and the Future of Chip Design: Unveiling Security Risks and Building Trust

Authors: Zeng Wang, Lilas Alrahis, Likhitha Mankali, Johann Knechtel, Ozgur Sinanoglu

Subjects: Machine Learning (cs.LG); Hardware Architecture (cs.AR); Cryptography and Security (cs.CR)

Chip design is about to be revolutionized by the integration of large language, multimodal, and circuit models (collectively LxMs). While exploring this exciting frontier with tremendous potential, the community must also carefully consider the related security risks and the need for building trust into using LxMs for chip design. First, we review the recent surge of using LxMs for chip design in general. We cover state-of-the-art works for the automation of hardware description language code generation and for scripting and guidance of essential but cumbersome tasks for electronic design automation tools, e.g., design-space exploration, tuning, or designer training. Second, we raise and provide initial answers to novel research questions on critical issues for security and trustworthiness of LxM-powered chip design from both the attack and defense perspectives.
[210] arXiv:2405.07065 [pdf, other]: Title: LogoMotion: Visually Grounded Code Generation for Content-Aware Animation

Authors: Vivian Liu, Rubaiat Habib Kazi, Li-Yi Wei, Matthew Fisher, Timothy Langlois, Seth Walker, Lydia Chilton

Subjects: Human-Computer Interaction (cs.HC)

Animated logos are a compelling and ubiquitous way individuals and brands represent themselves online. Manually authoring these logos can require significant artistic skill and effort. To help novice designers animate logos, design tools currently offer templates and animation presets. However, these solutions can be limited in their expressive range. Large language models have the potential to help novice designers create animated logos by generating animation code that is tailored to their content. In this paper, we introduce LogoMotion, an LLM-based system that takes in a layered document and generates animated logos through visually-grounded program synthesis. We introduce techniques to create an HTML representation of a canvas, identify primary and secondary elements, synthesize animation code, and visually debug animation errors. When compared with an industry standard tool, we find that LogoMotion produces animations that are more content-aware and are on par in terms of quality. We conclude with a discussion of the implications of LLM-generated animation for motion design.
[211] arXiv:2405.07067 [pdf, other]: Title: Learning Flame Evolution Operator under Hybrid Darrieus Landau and Diffusive Thermal Instability

Authors: Rixin Yu, Erdzan Hodzic, Karl-Johan Nogenmyr

Comments: 25 page, 10 figures

Subjects: Machine Learning (cs.LG)

Recent advancements in the integration of artificial intelligence (AI) and machine learning (ML) with physical sciences have led to significant progress in addressing complex phenomena governed by nonlinear partial differential equations (PDE). This paper explores the application of novel operator learning methodologies to unravel the intricate dynamics of flame instability, particularly focusing on hybrid instabilities arising from the coexistence of Darrieus-Landau (DL) and Diffusive-Thermal (DT) mechanisms. Training datasets encompass a wide range of parameter configurations, enabling the learning of parametric solution advancement operators using techniques such as parametric Fourier Neural Operator (pFNO), and parametric convolutional neural networks (pCNN). Results demonstrate the efficacy of these methods in accurately predicting short-term and long-term flame evolution across diverse parameter regimes, capturing the characteristic behaviors of pure and blended instabilities. Comparative analyses reveal pFNO as the most accurate model for learning short-term solutions, while all models exhibit robust performance in capturing the nuanced dynamics of flame evolution. This research contributes to the development of robust modeling frameworks for understanding and controlling complex physical processes governed by nonlinear PDE.
[212] arXiv:2405.07070 [pdf, other]: Title: Decoding Cognitive Health Using Machine Learning: A Comprehensive Evaluation for Diagnosis of Significant Memory Concern

Authors: M. Sajid, Rahul Sharma, Iman Beheshti, M. Tanveer

Journal-ref: WIREs Data Mining and Knowledge Discovery, 2024

Subjects: Machine Learning (cs.LG)

The timely identification of significant memory concern (SMC) is crucial for proactive cognitive health management, especially in an aging population. Detecting SMC early enables timely intervention and personalized care, potentially slowing cognitive disorder progression. This study presents a state-of-the-art review followed by a comprehensive evaluation of machine learning models within the randomized neural networks (RNNs) and hyperplane-based classifiers (HbCs) family to investigate SMC diagnosis thoroughly. Utilizing the Alzheimer's Disease Neuroimaging Initiative 2 (ADNI2) dataset, 111 individuals with SMC and 111 healthy older adults are analyzed based on T1W magnetic resonance imaging (MRI) scans, extracting rich features. This analysis is based on baseline structural MRI (sMRI) scans, extracting rich features from gray matter (GM), white matter (WM), Jacobian determinant (JD), and cortical thickness (CT) measurements. In RNNs, deep random vector functional link (dRVFL) and ensemble dRVFL (edRVFL) emerge as the best classifiers in terms of performance metrics in the identification of SMC. In HbCs, Kernelized pinball general twin support vector machine (Pin-GTSVM-K) excels in CT and WM features, whereas Linear Pin-GTSVM (Pin-GTSVM-L) and Linear intuitionistic fuzzy TSVM (IFTSVM-L) performs well in the JD and GM features sets, respectively. This comprehensive evaluation emphasizes the critical role of feature selection and model choice in attaining an effective classifier for SMC diagnosis. The inclusion of statistical analyses further reinforces the credibility of the results, affirming the rigor of this analysis. The performance measures exhibit the suitability of this framework in aiding researchers with the automated and accurate assessment of SMC. The source codes of the algorithms and datasets used in this study are available at https://github.com/mtanveer1/SMC.
[213] arXiv:2405.07072 [pdf, other]: Title: Selecting focused digital cohorts from social media using the metric backbone of biomedical knowledge graphs

Authors: Ziqi Guo, Jack Felag, Jordan C. Rozum, Rion Brattig Correia, Luis M. Rocha

Subjects: Social and Information Networks (cs.SI)

The abundance of social media data allows researchers to construct large digital cohorts to study the interplay between human behavior and medical treatment. Identifying the users most relevant to a specific health problem is, however, a challenge in that social media sites vary in the generality of their discourse. While X (formerly Twitter), Instagram, and Facebook cater to wide ranging topics, Reddit subgroups and dedicated patient advocacy forums trade in much more specific, biomedically-relevant discourse. To hone in on relevant users anywhere, we have developed a general framework and applied it to epilepsy discourse in social media as a test case. We analyzed the text from posts by users who mention epilepsy drugs in the general-purpose social media sites X and Instagram, the epilepsy-focused Reddit subgroup (r/Epilepsy), and the Epilepsy Foundation of America (EFA) forums. We curated a medical terms dictionary and used it to generate a knowledge graph (KG) for each online community. For each KG, we computed the metric backbone--the smallest subgraph that preserves all shortest paths in the network. By comparing the subset of users who contribute to the backbone to the subset who do not, we found that epilepsy-focused social media users contribute to the KG backbone in much higher proportion than do general-purpose social media users. Furthermore, using human annotation of Instagram posts, we demonstrated that users who do not contribute to the backbone are more than twice as likely to use dictionary terms in a manner inconsistent with their biomedical meaning. For biomedical research applications, our backbone-based approach thus has several benefits over simple engagement-based approaches: It can retain low-engagement users who nonetheless contribute meaningful biomedical insights. It can filter out very vocal users who contribute no relevant content.
[214] arXiv:2405.07076 [pdf, other]: Title: Integrating Emotional and Linguistic Models for Ethical Compliance in Large Language Models

Authors: Edward Y. Chang

Comments: 26 pages, 7 tables, 6 figures

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

This research develops advanced methodologies for Large Language Models (LLMs) to better manage linguistic behaviors related to emotions and ethics. We introduce DIKE, an adversarial framework that enhances the LLMs' ability to internalize and reflect global human values, adapting to varied cultural contexts to promote transparency and trust among users. The methodology involves detailed modeling of emotions, classification of linguistic behaviors, and implementation of ethical guardrails. Our innovative approaches include mapping emotions and behaviors using self-supervised learning techniques, refining these guardrails through adversarial reviews, and systematically adjusting outputs to ensure ethical alignment. This framework establishes a robust foundation for AI systems to operate with ethical integrity and cultural sensitivity, paving the way for more responsible and context-aware AI interactions.
[215] arXiv:2405.07079 [pdf, other]: Title: Host-Based Allocators for Device Memory

Authors: Oren Bell, Ashwin Kumar, Chris Gill

Comments: 9 pages, 4 figures

Subjects: Software Engineering (cs.SE)

Memory allocation is a fairly mature field of computer science. However, we challenge a prevailing assumption in the literature over the last 50 years which, if reconsidered, necessitates a fundamental reevaluation of many classical memory management algorithms. We pose a model where the allocation algorithm runs on host memory but allocates device memory and so incur the following constraint: the allocator can't read the memory it is allocating.
This means we are unable to use boundary tags, which is a concept that has been ubiquitous in nearly every allocation algorithm. In this paper, we propose alternate algorithms to work around this constraint, and discuss in general the implications of this system model.
[216] arXiv:2405.07081 [pdf, other]: Title: T-curator: a trust based curation tool for LOD logs

Authors: Dihia Lanasri

Subjects: Databases (cs.DB); Computation and Language (cs.CL)

Nowadays, companies are racing towards Linked Open Data (LOD) to improve their added value, but they are ignoring their SPARQL query logs. If well curated, these logs can present an asset for decision makers. A naive and straightforward use of these logs is too risky because their provenance and quality are highly questionable. Users of these logs in a trusted way have to be assisted by providing them with in-depth knowledge of the whole LOD environment and tools to curate these logs. In this paper, we propose an interactive and intuitive trust based tool that can be used to curate these LOD logs before exploiting them. This tool is proposed to support our approach proposed in our previous work Lanasri et al. [2020].
[217] arXiv:2405.07083 [pdf, other]: Title: Data-Efficient and Robust Task Selection for Meta-Learning

Authors: Donglin Zhan, James Anderson

Comments: Accepted by CVPR 2024 Wrokshop

Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)

Meta-learning methods typically learn tasks under the assumption that all tasks are equally important. However, this assumption is often not valid. In real-world applications, tasks can vary both in their importance during different training stages and in whether they contain noisy labeled data or not, making a uniform approach suboptimal. To address these issues, we propose the Data-Efficient and Robust Task Selection (DERTS) algorithm, which can be incorporated into both gradient and metric-based meta-learning algorithms. DERTS selects weighted subsets of tasks from task pools by minimizing the approximation error of the full gradient of task pools in the meta-training stage. The selected tasks are efficient for rapid training and robust towards noisy label scenarios. Unlike existing algorithms, DERTS does not require any architecture modification for training and can handle noisy label data in both the support and query sets. Analysis of DERTS shows that the algorithm follows similar training dynamics as learning on the full task pools. Experiments show that DERTS outperforms existing sampling strategies for meta-learning on both gradient-based and metric-based meta-learning algorithms in limited data budget and noisy task settings.
[218] arXiv:2405.07086 [pdf, ps, other]: Title: An enhanced basis for producing Bezier-like curves

Authors: Bahareh Nouri, Jamshid Saeidian

Subjects: Numerical Analysis (math.NA)

This study aims on proposing a new structure for constructing Bernstein-like bases. The structure uses an auxiliary function and a shape parameter to construct a new family of bases from any family of blending functions. The new family of bases inherit almost all algebraic and geometric properties of the initial blending functions. The corresponding curves have the freedom to travel from the curve constructed from the initial blending functions to the line segment joining the first and last control points. The new bases have the monotonicity preservation property and the shape of the curve could be adjusted by changing the parameter.
[219] arXiv:2405.07087 [pdf, other]: Title: Auditing an Automatic Grading Model with deep Reinforcement Learning

Authors: Aubrey Condor, Zachary Pardos

Subjects: Artificial Intelligence (cs.AI); Emerging Technologies (cs.ET); Machine Learning (cs.LG)

We explore the use of deep reinforcement learning to audit an automatic short answer grading (ASAG) model. Automatic grading may decrease the time burden of rating open-ended items for educators, but a lack of robust evaluation methods for these models can result in uncertainty of their quality. Current state-of-the-art ASAG models are configured to match human ratings from a training set, and researchers typically assess their quality with accuracy metrics that signify agreement between model and human scores. In this paper, we show that a high level of agreement to human ratings does not give sufficient evidence that an ASAG model is infallible. We train a reinforcement learning agent to revise student responses with the objective of achieving a high rating from an automatic grading model in the least number of revisions. By analyzing the agent's revised responses that achieve a high grade from the ASAG model but would not be considered a high scoring responses according to a scoring rubric, we discover ways in which the automated grader can be exploited, exposing shortcomings in the grading model.
[220] arXiv:2405.07088 [pdf, other]: Title: Towards Context-Aware Modeling of Situation Awareness in Conditionally Automated Driving

Authors: Lilit Avetisyan, X. Jessie Yang, Feng Zhou

Comments: 37 Pages, 8 figures

Subjects: Human-Computer Interaction (cs.HC)

Maintaining adequate situation awareness (SA) is crucial for the safe operation of conditionally automated vehicles (AVs), which requires drivers to regain control during takeover (TOR) events. This study developed a predictive model for real-time assessment of driver SA using multimodal data (e.g., galvanic skin response, heart rate and eye tracking data, and driver characteristics) collected in a simulated driving environment. Sixty-seven participants experienced automated driving scenarios with TORs, with conditions varying in risk perception and the presence of automation errors. A LightGBM (Light Gradient Boosting Machine) model trained on the top 12 predictors identified by SHAP (SHapley Additive exPlanations) achieved promising performance with RMSE=0.89, MAE=0.71, and Corr=0.78. These findings have implications towards context-aware modeling of SA in conditionally automated driving, paving the way for safer and more seamless driver-AV interactions.
[221] arXiv:2405.07089 [pdf, other]: Title: SonifyAR: Context-Aware Sound Generation in Augmented Reality

Authors: Xia Su, Jon E. Froehlich, Eunyee Koh, Chang Xiao

Comments: 12 pages, 12 figures

Subjects: Human-Computer Interaction (cs.HC)

Sound plays a crucial role in enhancing user experience and immersiveness in Augmented Reality (AR). However, current platforms lack support for AR sound authoring due to limited interaction types, challenges in collecting and specifying context information, and difficulty in acquiring matching sound assets. We present SonifyAR, an LLM-based AR sound authoring system that generates context-aware sound effects for AR experiences. SonifyAR expands the current design space of AR sound and implements a Programming by Demonstration (PbD) pipeline to automatically collect contextual information of AR events, including virtual content semantics and real world context. This context information is then processed by a large language model to acquire sound effects with Recommendation, Retrieval, Generation, and Transfer methods. To evaluate the usability and performance of our system, we conducted a user study with eight participants and created five example applications, including an AR-based science experiment, an improving case for AR headset safety, and an assisting example for low vision AR users.
[222] arXiv:2405.07090 [pdf, other]: Title: MUD: Towards a Large-Scale and Noise-Filtered UI Dataset for Modern Style UI Modeling

Authors: Sidong Feng, Suyu Ma, Han Wang, David Kong, Chunyang Chen

Subjects: Human-Computer Interaction (cs.HC)

The importance of computational modeling of mobile user interfaces (UIs) is undeniable. However, these require a high-quality UI dataset. Existing datasets are often outdated, collected years ago, and are frequently noisy with mismatches in their visual representation. This presents challenges in modeling UI understanding in the wild. This paper introduces a novel approach to automatically mine UI data from Android apps, leveraging Large Language Models (LLMs) to mimic human-like exploration. To ensure dataset quality, we employ the best practices in UI noise filtering and incorporate human annotation as a final validation step. Our results demonstrate the effectiveness of LLMs-enhanced app exploration in mining more meaningful UIs, resulting in a large dataset MUD of 18k human-annotated UIs from 3.3k apps. We highlight the usefulness of MUD in two common UI modeling tasks: element detection and UI retrieval, showcasing its potential to establish a foundation for future research into high-quality, modern UIs.
[223] arXiv:2405.07094 [pdf, ps, other]: Title: The Road to Compliance: Executive Federal Agencies and the NIST Risk Management Framework

Authors: Michael Stoltz

Comments: This research paper was showcased at the University of West Florida Student Scholars Symposium and Faculty Research Showcase on April 18, 2024. It is supported by the National Science Foundation (NSF) under Grant No. 1946442. The views, findings, and conclusions presented are solely those of the author(s) and do not necessarily represent the views of the NSF

Subjects: Cryptography and Security (cs.CR)

This informative report provides a comprehensive analysis of how executive federal report agencies implement the National Institute of Standards and Technology's (NIST) Risk Management Framework (RMF) to achieve cybersecurity compliance. By exploring the concept and evolution of the RMF, the report delves into the framework's importance for enhancing cybersecurity measures within federal agencies, addressing the challenges these agencies face in the digital landscape. Through a methodical literature review, the report examines theoretical foundations, implementation strategies, and the critical role of continuous monitoring and automation in RMF processes, drawing from key sources like Ross (2014), Lubell (2020), Barrett et al. (2021), and Pillitteri et al. (2021, 2022), among others. Employing a detailed methodology for data collection and analysis, the report presents findings on the successes and challenges of RMF implementation, highlighting the impact of automation and continuous monitoring in bolstering cybersecurity postures. Case studies offer in-depth insights into the experiences of specific agencies, providing lessons learned and best practices. The report concludes with strategic recommendations for overcoming implementation challenges and suggests future directions for enhancing RMF research and practice. This investigation underscores the RMF's critical role in establishing robust cybersecurity compliance across executive federal agencies, offering valuable recommendations for policymakers, cybersecurity professionals, and governmental bodies.
[224] arXiv:2405.07096 [pdf, other]: Title: Multi-Relational Structural Entropy

Authors: Yuwei Cao, Hao Peng, Angsheng Li, Chenyu You, Zhifeng Hao, Philip S Yu

Comments: Accepted to UAI 2024

Subjects: Social and Information Networks (cs.SI); Information Theory (cs.IT)

Structural Entropy (SE) measures the structural information contained in a graph. Minimizing or maximizing SE helps to reveal or obscure the intrinsic structural patterns underlying graphs in an interpretable manner, finding applications in various tasks driven by networked data. However, SE ignores the heterogeneity inherent in the graph relations, which is ubiquitous in modern networks. In this work, we extend SE to consider heterogeneous relations and propose the first metric for multi-relational graph structural information, namely, Multi-relational Structural Entropy (MrSE). To this end, we first cast SE through the novel lens of the stationary distribution from random surfing, which readily extends to multi-relational networks by considering the choices of both nodes and relation types simultaneously at each step. The resulting MrSE is then optimized by a new greedy algorithm to reveal the essential structures within a multi-relational network. Experimental results highlight that the proposed MrSE offers a more insightful interpretation of the structure of multi-relational graphs compared to SE. Additionally, it enhances the performance of two tasks that involve real-world multi-relational graphs, including node clustering and social event detection.
[225] arXiv:2405.07097 [pdf, other]: Title: Diffusion models as probabilistic neural operators for recovering unobserved states of dynamical systems

Authors: Katsiaryna Haitsiukevich, Onur Poyraz, Pekka Marttinen, Alexander Ilin

Comments: Preprint submitted to IEEE MLSP 2024

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

This paper explores the efficacy of diffusion-based generative models as neural operators for partial differential equations (PDEs). Neural operators are neural networks that learn a mapping from the parameter space to the solution space of PDEs from data, and they can also solve the inverse problem of estimating the parameter from the solution. Diffusion models excel in many domains, but their potential as neural operators has not been thoroughly explored. In this work, we show that diffusion-based generative models exhibit many properties favourable for neural operators, and they can effectively generate the solution of a PDE conditionally on the parameter or recover the unobserved parts of the system. We propose to train a single model adaptable to multiple tasks, by alternating between the tasks during training. In our experiments with multiple realistic dynamical systems, diffusion models outperform other neural operators. Furthermore, we demonstrate how the probabilistic diffusion model can elegantly deal with systems which are only partially identifiable, by producing samples corresponding to the different possible solutions.
[226] arXiv:2405.07098 [pdf, other]: Title: Interpretable global minima of deep ReLU neural networks on sequentially separable data

Authors: Thomas Chen, Patricia Muñoz Ewald

Comments: AMS Latex, 22 pages, 3 figures

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Mathematical Physics (math-ph); Optimization and Control (math.OC); Machine Learning (stat.ML)

We explicitly construct zero loss neural network classifiers. We write the weight matrices and bias vectors in terms of cumulative parameters, which determine truncation maps acting recursively on input space. The configurations for the training data considered are (i) sufficiently small, well separated clusters corresponding to each class, and (ii) equivalence classes which are sequentially linearly separable. In the best case, for $Q$ classes of data in $\mathbb{R}^M$, global minimizers can be described with $Q(M+2)$ parameters.
[227] arXiv:2405.07099 [pdf, other]: Title: Do Pretrained Contextual Language Models Distinguish between Hebrew Homograph Analyses?

Authors: Avi Shmidman, Cheyn Shmuel Shmidman, Dan Bareket, Moshe Koppel, Reut Tsarfaty

Journal-ref: In Proceedings of EACL 2023, 849-864 (2023)

Subjects: Computation and Language (cs.CL)

Semitic morphologically-rich languages (MRLs) are characterized by extreme word ambiguity. Because most vowels are omitted in standard texts, many of the words are homographs with multiple possible analyses, each with a different pronunciation and different morphosyntactic properties. This ambiguity goes beyond word-sense disambiguation (WSD), and may include token segmentation into multiple word units. Previous research on MRLs claimed that standardly trained pre-trained language models (PLMs) based on word-pieces may not sufficiently capture the internal structure of such tokens in order to distinguish between these analyses. Taking Hebrew as a case study, we investigate the extent to which Hebrew homographs can be disambiguated and analyzed using PLMs. We evaluate all existing models for contextualized Hebrew embeddings on a novel Hebrew homograph challenge sets that we deliver. Our empirical results demonstrate that contemporary Hebrew contextualized embeddings outperform non-contextualized embeddings; and that they are most effective for disambiguating segmentation and morphosyntactic features, less so regarding pure word-sense disambiguation. We show that these embeddings are more effective when the number of word-piece splits is limited, and they are more effective for 2-way and 3-way ambiguities than for 4-way ambiguity. We show that the embeddings are equally effective for homographs of both balanced and skewed distributions, whether calculated as masked or unmasked tokens. Finally, we show that these embeddings are as effective for homograph disambiguation with extensive supervised training as with a few-shot setup.
[228] arXiv:2405.07101 [pdf, ps, other]: Title: Advanced Natural-based interaction for the ITAlian language: LLaMAntino-3-ANITA

Authors: Marco Polignano, Pierpaolo Basile, Giovanni Semeraro

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

In the pursuit of advancing natural language processing for the Italian language, we introduce a state-of-the-art Large Language Model (LLM) based on the novel Meta LLaMA-3 model: LLaMAntino-3-ANITA-8B-Inst-DPO-ITA. We fine-tuned the original 8B parameters instruction tuned model using the Supervised Fine-tuning (SFT) technique on the English and Italian language datasets in order to improve the original performance. Consequently, a Dynamic Preference Optimization (DPO) process has been used to align preferences, avoid dangerous and inappropriate answers, and limit biases and prejudices. Our model leverages the efficiency of QLoRA to fine-tune the model on a smaller portion of the original model weights and then adapt the model specifically for the Italian linguistic structure, achieving significant improvements in both performance and computational efficiency. Concurrently, DPO is employed to refine the model's output, ensuring that generated content aligns with quality answers. The synergy between SFT, QLoRA's parameter efficiency and DPO's user-centric optimization results in a robust LLM that excels in a variety of tasks, including but not limited to text completion, zero-shot classification, and contextual understanding. The model has been extensively evaluated over standard benchmarks for the Italian and English languages, showing outstanding results. The model is freely available over the HuggingFace hub and, examples of use can be found in our GitHub repository. https://huggingface.co/swap-uniba/LLaMAntino-3-ANITA-8B-Inst-DPO-ITA
[229] arXiv:2405.07104 [pdf, other]: Title: Uncertainty-Aware Shape Estimation of a Surgical Continuum Manipulator in Constrained Environments using Fiber Bragg Grating Sensors

Authors: Alexander Schwarz, Arian Mehrfard, Golchehr Amirkhani, Henry Phalen, Justin H. Ma, Robert B. Grupp, Alejandro Martin-Gomez, Mehran Armand

Subjects: Robotics (cs.RO)

Continuum Dexterous Manipulators (CDMs) are well-suited tools for minimally invasive surgery due to their inherent dexterity and reachability. Nonetheless, their flexible structure and non-linear curvature pose significant challenges for shape-based feedback control. The use of Fiber Bragg Grating (FBG) sensors for shape sensing has shown great potential in estimating the CDM's tip position and subsequently reconstructing the shape using optimization algorithms. This optimization, however, is under-constrained and may be ill-posed for complex shapes, falling into local minima. In this work, we introduce a novel method capable of directly estimating a CDM's shape from FBG sensor wavelengths using a deep neural network. In addition, we propose the integration of uncertainty estimation to address the critical issue of uncertainty in neural network predictions. Neural network predictions are unreliable when the input sample is outside the training distribution or corrupted by noise. Recognizing such deviations is crucial when integrating neural networks within surgical robotics, as inaccurate estimations can pose serious risks to the patient. We present a robust method that not only improves the precision upon existing techniques for FBG-based shape estimation but also incorporates a mechanism to quantify the models' confidence through uncertainty estimation. We validate the uncertainty estimation through extensive experiments, demonstrating its effectiveness and reliability on out-of-distribution (OOD) data, adding an additional layer of safety and precision to minimally invasive surgical robotics.
[230] arXiv:2405.07106 [pdf, other]: Title: Real-Time Simulation of a Resilient Control Center for Inverter-Based Microgrids

Authors: Milad Beikbabaei, Ali Mehrizi-Sani

Comments: Accepted for publication at IECON 2024 - 50th Annual Conference of the IEEE Industrial Electronics Society

Subjects: Systems and Control (eess.SY)

The number of installed remote terminal units (RTU) is on the rise, increasing the observability and control of the power system. RTUs enable sending data to and receiving data from a control center in the power system. A distribution grid control center runs distribution management system (DMS) algorithms, where the DMS takes control actions during transients and outages, such as tripping a circuit breaker and disconnecting a controllable load to increase the resiliency of the grid. Relying on communication-based devices makes the control center vulnerable to cyberattacks, and attackers can send falsified data to the control center to cause disturbances or power outages. Previous work has conducted research on developing ways to detect a cyberattack and ways to mitigate the adverse effects of the attack. This work studies false data injection (FDI) attacks on the DMS algorithm of a fully inverter-based microgrid in real time. The fully inverter-based microgrid is simulated using an RTDS, an amplifier, an electronic load, a server, a network switch, and a router. The DMS is integrated into the server codes and exchanges data with RTDS through TCP/IP protocols. Moreover, a recurrent neural network (RNN) algorithm is used to detect and mitigate the cyberattack. The effectiveness of the detection and mitigation algorithm is tested under various scenarios using the real-time testbed.
[231] arXiv:2405.07107 [pdf, other]: Title: A Pair of Bayesian Network Structures has Undecidable Conditional Independencies

Authors: Cheuk Ting Li

Comments: 13 pages, 2 figures

Subjects: Computational Complexity (cs.CC); Information Theory (cs.IT); Probability (math.PR)

Given a Bayesian network structure (directed acyclic graph), the celebrated d-separation algorithm efficiently determines whether the network structure implies a given conditional independence relation. We show that this changes drastically when we consider two Bayesian network structures instead. It is undecidable to determine whether two given network structures imply a given conditional independency, that is, whether every collection of random variables satisfying both network structures must also satisfy the conditional independency. Although the approximate combination of two Bayesian networks is a well-studied topic, our result shows that it is fundamentally impossible to accurately combine the knowledge of two Bayesian network structures, in the sense that no algorithm can tell what conditional independencies are implied by the two network structures. We can also explicitly construct two Bayesian network structures, such that whether they imply a certain conditional independency is unprovable in the ZFC set theory, assuming ZFC is consistent.
[232] arXiv:2405.07108 [pdf, other]: Title: Memory-Based Set Point Modulation for Improved Transient Response of Distributed Energy Resources

Authors: Milad Beikbabaei, Brady Alexander, Ashwin Venkataramanan, Ali Mehrizi-Sani

Comments: Accepted for publication at IECON 2024 - 50th Annual Conference of the IEEE Industrial Electronics Society

Subjects: Systems and Control (eess.SY)

As the composition of the power grid evolves to integrate more renewable generation, its reliance on distributed energy resources (DER) is increasing. Existing DERs are often controlled with proportional integral (PI) controllers that, if not properly tuned or if system parameters change, exhibit sluggish performance or large overshoot. The use of set point automatic adjustment with correction-enabled (SPAACE) with a linear predictor improves the transient response of these DERs without the need to access the PI controller parameters. The limitation of the existing SPAACE method is the high sampling rate needed for improved performance, which is not always practical. This paper proposes the addition of a memory term to the SPAACE with a linear predictor. This memory term is the integral of the errors of previous samples, which adds another layer to the prediction to improve the response at lower sampling rates and further reduces the overshoot and settling time compared to the existing SPAACE method. Time-domain simulation studies are performed in PSCAD/EMTDC to show the effectiveness of the proposed controller.
[233] arXiv:2405.07111 [pdf, other]: Title: Designing and Evaluating Dialogue LLMs for Co-Creative Improvised Theatre

Authors: Boyd Branch, Piotr Mirowski, Kory Mathewson, Sophia Ppali, Alexandra Covaci

Comments: 13 pages, 7 figures, accepted for publication at the International Conference on Computational Creativity 2024

Subjects: Computation and Language (cs.CL)

Social robotics researchers are increasingly interested in multi-party trained conversational agents. With a growing demand for real-world evaluations, our study presents Large Language Models (LLMs) deployed in a month-long live show at the Edinburgh Festival Fringe. This case study investigates human improvisers co-creating with conversational agents in a professional theatre setting. We explore the technical capabilities and constraints of on-the-spot multi-party dialogue, providing comprehensive insights from both audience and performer experiences with AI on stage. Our human-in-the-loop methodology underlines the challenges of these LLMs in generating context-relevant responses, stressing the user interface's crucial role. Audience feedback indicates an evolving interest for AI-driven live entertainment, direct human-AI interaction, and a diverse range of expectations about AI's conversational competence and utility as a creativity support tool. Human performers express immense enthusiasm, varied satisfaction, and the evolving public opinion highlights mixed emotions about AI's role in arts.
[234] arXiv:2405.07116 [pdf, other]: Title: CoViews: Adaptive Augmentation Using Cooperative Views for Enhanced Contrastive Learning

Authors: Nazim Bendib

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Data augmentation plays a critical role in generating high-quality positive and negative pairs necessary for effective contrastive learning. However, common practices involve using a single augmentation policy repeatedly to generate multiple views, potentially leading to inefficient training pairs due to a lack of cooperation between views. Furthermore, to find the optimal set of augmentations, many existing methods require extensive supervised evaluation, overlooking the evolving nature of the model that may require different augmentations throughout the training. Other approaches train differentiable augmentation generators, thus limiting the use of non-differentiable transformation functions from the literature. In this paper, we address these challenges by proposing a framework for learning efficient adaptive data augmentation policies for contrastive learning with minimal computational overhead. Our approach continuously generates new data augmentation policies during training and produces effective positives/negatives without any supervision. Within this framework, we present two methods: \ac{IndepViews}, which generates augmentation policies used across all views, and \ac{CoViews}, which generates dependent augmentation policies for each view. This enables us to learn dependencies between the transformations applied to each view and ensures that the augmentation strategies applied to different views complement each other, leading to more meaningful and discriminative representations. Through extensive experimentation on multiple datasets and contrastive learning frameworks, we demonstrate that our method consistently outperforms baseline solutions and that training with a view-dependent augmentation policy outperforms training with an independent policy shared across views, showcasing its effectiveness in enhancing contrastive learning performance.
[235] arXiv:2405.07117 [pdf, other]: Title: Context Neural Networks: A Scalable Multivariate Model for Time Series Forecasting

Authors: Abishek Sriramulu, Christoph Bergmeir, Slawek Smyl

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Real-world time series often exhibit complex interdependencies that cannot be captured in isolation. Global models that model past data from multiple related time series globally while producing series-specific forecasts locally are now common. However, their forecasts for each individual series remain isolated, failing to account for the current state of its neighbouring series. Multivariate models like multivariate attention and graph neural networks can explicitly incorporate inter-series information, thus addressing the shortcomings of global models. However, these techniques exhibit quadratic complexity per timestep, limiting scalability. This paper introduces the Context Neural Network, an efficient linear complexity approach for augmenting time series models with relevant contextual insights from neighbouring time series without significant computational overhead. The proposed method enriches predictive models by providing the target series with real-time information from its neighbours, addressing the limitations of global models, yet remaining computationally tractable for large datasets.
[236] arXiv:2405.07119 [pdf, ps, other]: Title: Best-response Algorithms for Integer Convex Quadratic Simultaneous Games

Authors: Sriram Sankaranarayanan

Subjects: Computer Science and Game Theory (cs.GT)

We evaluate the best-response algorithm in the context of pure-integer convex quadratic games. We provide a sufficient condition that if certain interaction matrices (the product of the inverse of the positive definite matrix defining the convex quadratic terms and the matrix that connects one player's problem to another's) have all their singular values less than 1, then finite termination of the best-response algorithm is guaranteed regardless of the initial point. Termination is triggered through cycling among a finite number of strategies for each player. Our findings indicate that if cycling happens, a relaxed version of the Nash equilibrium can be calculated by identifying a Nash equilibrium of a smaller finite game. Conversely, we prove that if every singular value of the interaction matrices is greater than 1, the algorithm will diverge from a large family of initial points. In addition, we provide an infinite family of examples in which some of the singular values of the interaction matrices are greater than 1, cycling occurs, but any mixed-strategy with support in the strategies where cycling occurs has arbitrarily better deviations. Then, we perform computational tests of our algorithm and compare it with standard algorithms to solve such problems. We notice that our algorithm finds a Nash equilibrium correctly in every instance. Moreover, compared to a state-of-the art algorithm, our method shows similar performance in two-player games and significantly higher speed when involving three or more players.
[237] arXiv:2405.07121 [pdf, other]: Title: In The Wild Ellipse Parameter Estimation for Circular Dining Plates and Bowls

Authors: Akil Pathiranage, Chris Czarnecki, Yuhao Chen, Pengcheng Xi, Linlin Xu, Alexander Wong

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Ellipse estimation is an important topic in food image processing because it can be leveraged to parameterize plates and bowls, which in turn can be used to estimate camera view angles and food portion sizes. Automatically detecting the elliptical rim of plates and bowls and estimating their ellipse parameters for data "in-the-wild" is challenging: diverse camera angles and plate shapes could have been used for capture, noisy background, multiple non-uniform plates and bowls in the image could be present. Recent advancements in foundational models offer promising capabilities for zero-shot semantic understanding and object segmentation. However, the output mask boundaries for plates and bowls generated by these models often lack consistency and precision compared to traditional ellipse fitting methods. In this paper, we combine ellipse fitting with semantic information extracted by zero-shot foundational models and propose WildEllipseFit, a method to detect and estimate the elliptical rim for plate and bowl. Evaluation on the proposed Yummly-ellipse dataset demonstrates its efficacy and zero-shot capability in real-world scenarios.
[238] arXiv:2405.07122 [pdf, other]: Title: PCF Learned Sort: a Learning Augmented Sort Algorithm with $O(n \log\log n)$ Expected Complexity

Authors: Atsuki Sato, Yusuke Matsui

Subjects: Data Structures and Algorithms (cs.DS); Computational Complexity (cs.CC)

Sorting is one of the most fundamental algorithms in computer science. Recently, Learned Sorts, which use machine learning to improve sorting speed, have attracted attention. While existing studies show that Learned Sort is experimentally faster than classical sorting algorithms, they do not provide theoretical guarantees about its computational complexity. We propose PCF Learned Sort, a theoretically guaranteed Learned Sort algorithm. We prove that the expected complexity of PCF Learned Sort is $O(n \log \log n)$ under mild assumptions on the data distribution. We also confirm experimentally that PCF Learned Sort has a computational complexity of $O(n \log \log n)$ on both synthetic and real datasets. This is the first study to theoretically support the experimental success of Learned Sort, and provides evidence for why Learned Sort is fast.
[239] arXiv:2405.07124 [pdf, other]: Title: Vertex Shader Domain Warping with Automatic Differentiation

Authors: Dave Pagurek van Mossel

Comments: 11 pages, 7 figures

Subjects: Graphics (cs.GR)

Domain warping is a technique commonly used in creative coding to distort graphics and add visual interest to a work. The approach has the potential to be used in 3D art as mesh vertices can be efficiently warped using a vertex shader in a WebGL pipeline. However, 3D models packaged for the web typically come with baked-in normal vectors, and these need to be updated when vertex positions change for lighting calculations to work. This is typically done via finite differences, which requires parameter tuning to achieve optimal visual fidelity. We present a method for 3D domain warping that works with automatic differentiation, allowing exact normals to be used without any tuning while still benefiting from hardware acceleration.
[240] arXiv:2405.07128 [pdf, other]: Title: 5G Virtual Reality Manipulator Teleoperation using a Mobile Phone

Authors: Alexander Werner, William Melek

Subjects: Robotics (cs.RO)

This paper presents an approach to teleoperate a manipulator using a mobile phone as a leader device. Using its IMU and camera, the phone estimates its Cartesian pose which is then used to to control the Cartesian pose of the robot's tool. The user receives visual feedback in the form of multi-view video - a point cloud rendered in a virtual reality environment. This enables the user to observe the scene from any position. To increase immersion, the robot's estimate of external forces is relayed using the phone's haptic actuator. Leader and follower are connected through wireless networks such as 5G or Wi-Fi. The paper describes the setup and analyzes its performance.
[241] arXiv:2405.07131 [pdf, other]: Title: MAxPrototyper: A Multi-Agent Generation System for Interactive User Interface Prototyping

Authors: Mingyue Yuan, Jieshan Chen, Aaron Quigley

Subjects: Human-Computer Interaction (cs.HC); Multiagent Systems (cs.MA)

In automated user interactive design, designers face key challenges, including accurate representation of user intent, crafting high-quality components, and ensuring both aesthetic and semantic consistency. Addressing these challenges, we introduce MAxPrototyper, our human-centered, multi-agent system for interactive design generation. The core of MAxPrototyper is a theme design agent. It coordinates with specialized sub-agents, each responsible for generating specific parts of the design. Through an intuitive online interface, users can control the design process by providing text descriptions and layout. Enhanced by improved language and image generation models, MAxPrototyper generates each component with careful detail and contextual understanding. Its multi-agent architecture enables a multi-round interaction capability between the system and users, facilitating precise and customized design adjustments throughout the creation process.
[242] arXiv:2405.07135 [pdf, other]: Title: Combining multiple post-training techniques to achieve most efficient quantized LLMs

Authors: Sayeh Sharify, Zifei Xu, Wanzin Yazar, Xin Wang

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Large Language Models (LLMs) have distinguished themselves with outstanding performance in complex language modeling tasks, yet they come with significant computational and storage challenges. This paper explores the potential of quantization to mitigate these challenges. We systematically study the combined application of two well-known post-training techniques, SmoothQuant and GPTQ, and provide a comprehensive analysis of their interactions and implications for advancing LLM quantization. We enhance the versatility of both techniques by enabling quantization to microscaling (MX) formats, expanding their applicability beyond their initial fixed-point format targets. We show that by applying GPTQ and SmoothQuant, and employing MX formats for quantizing models, we can achieve a significant reduction in the size of OPT models by up to 4x and LLaMA models by up to 3x with a negligible perplexity increase of 1-3%.
[243] arXiv:2405.07139 [pdf, ps, other]: Title: Reduced Krylov Basis Methods for Parametric Partial Differential Equations

Authors: Yuwen Li, Ludmil T. Zikatanov, Cheng Zuo

Comments: 23 pages, 6 figures

Subjects: Numerical Analysis (math.NA)

This work is on a user-friendly reduced basis method for solving a family of parametric PDEs by preconditioned Krylov subspace methods including the conjugate gradient method, generalized minimum residual method, and bi-conjugate gradient method. The proposed methods use a preconditioned Krylov subspace method for a high-fidelity discretization of one parameter instance to generate orthogonal basis vectors of the reduced basis subspace. Then large-scale discrete parameter-dependent problems are approximately solved in the low-dimensional Krylov subspace. As shown in the theory and experiments, only a small number of Krylov subspace iterations are needed to simultaneously generate approximate solutions of a family of high-fidelity and large-scale systems in the reduced basis subspace. This reduces the computational cost dramatically because (1) to construct the reduced basis vectors, we only solve one large-scale problem in the high-fidelity level; and (2) the family of large-scale problems restricted to the reduced basis subspace have much smaller sizes.
[244] arXiv:2405.07140 [pdf, other]: Title: Edge Intelligence Optimization for Large Language Model Inference with Batching and Quantization

Authors: Xinyuan Zhang, Jiang Liu, Zehui Xiong, Yudong Huang, Gaochang Xie, Ran Zhang

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Networking and Internet Architecture (cs.NI)

Generative Artificial Intelligence (GAI) is taking the world by storm with its unparalleled content creation ability. Large Language Models (LLMs) are at the forefront of this movement. However, the significant resource demands of LLMs often require cloud hosting, which raises issues regarding privacy, latency, and usage limitations. Although edge intelligence has long been utilized to solve these challenges by enabling real-time AI computation on ubiquitous edge resources close to data sources, most research has focused on traditional AI models and has left a gap in addressing the unique characteristics of LLM inference, such as considerable model size, auto-regressive processes, and self-attention mechanisms. In this paper, we present an edge intelligence optimization problem tailored for LLM inference. Specifically, with the deployment of the batching technique and model quantization on resource-limited edge devices, we formulate an inference model for transformer decoder-based LLMs. Furthermore, our approach aims to maximize the inference throughput via batch scheduling and joint allocation of communication and computation resources, while also considering edge resource constraints and varying user requirements of latency and accuracy. To address this NP-hard problem, we develop an optimal Depth-First Tree-Searching algorithm with online tree-Pruning (DFTSP) that operates within a feasible time complexity. Simulation results indicate that DFTSP surpasses other batching benchmarks in throughput across diverse user settings and quantization techniques, and it reduces time complexity by over 45% compared to the brute-force searching method.
[245] arXiv:2405.07142 [pdf, other]: Title: Cross-Domain Continual Learning via CLAMP

Authors: Weiwei Weng, Mahardhika Pratama, Jie Zhang, Chen Chen, Edward Yapp Kien Yee, Ramasamy Savitha

Comments: Under Review in Elsevier Journal

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Artificial neural networks, celebrated for their human-like cognitive learning abilities, often encounter the well-known catastrophic forgetting (CF) problem, where the neural networks lose the proficiency in previously acquired knowledge. Despite numerous efforts to mitigate CF, it remains the significant challenge particularly in complex changing environments. This challenge is even more pronounced in cross-domain adaptation following the continual learning (CL) setting, which is a more challenging and realistic scenario that is under-explored. To this end, this article proposes a cross-domain CL approach making possible to deploy a single model in such environments without additional labelling costs. Our approach, namely continual learning approach for many processes (CLAMP), integrates a class-aware adversarial domain adaptation strategy to align a source domain and a target domain. An assessor-guided learning process is put forward to navigate the learning process of a base model assigning a set of weights to every sample controlling the influence of every sample and the interactions of each loss function in such a way to balance the stability and plasticity dilemma thus preventing the CF problem. The first assessor focuses on the negative transfer problem rejecting irrelevant samples of the source domain while the second assessor prevents noisy pseudo labels of the target domain. Both assessors are trained in the meta-learning approach using random transformation techniques and similar samples of the source domain. Theoretical analysis and extensive numerical validations demonstrate that CLAMP significantly outperforms established baseline algorithms across all experiments by at least $10\%$ margin.
[246] arXiv:2405.07145 [pdf, other]: Title: Stable Signature is Unstable: Removing Image Watermark from Diffusion Models

Authors: Yuepeng Hu, Zhengyuan Jiang, Moyang Guo, Neil Gong

Subjects: Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV)

Watermark has been widely deployed by industry to detect AI-generated images. A recent watermarking framework called \emph{Stable Signature} (proposed by Meta) roots watermark into the parameters of a diffusion model's decoder such that its generated images are inherently watermarked. Stable Signature makes it possible to watermark images generated by \emph{open-source} diffusion models and was claimed to be robust against removal attacks. In this work, we propose a new attack to remove the watermark from a diffusion model by fine-tuning it. Our results show that our attack can effectively remove the watermark from a diffusion model such that its generated images are non-watermarked, while maintaining the visual quality of the generated images. Our results highlight that Stable Signature is not as stable as previously thought.
[247] arXiv:2405.07146 [pdf, other]: Title: TRAIL: Cross-Shard Validation for Cryptocurrency Byzantine Shard Protection

Authors: Mitch Jacovetty, Joseph Oglio, Mikhail Nesterenko, Gokarna Sharma

Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

We present TRAIL: an algorithm that uses a novel consensus procedure to tolerate failed or malicious shards within a blockchain-based cryptocurrency. Our algorithm takes a new approach of selecting validator shards for each transaction from those that previously held the assets being transferred. This approach ensures the algorithm's robustness and efficiency. TRAIL is presented using PBFT for internal shard transaction processing and a modified version of PBFT for external cross-shard validation. We describe TRAIL, prove it correct, analyze its message complexity, and evaluate its performance. We propose various TRAIL optimizations: we describe how it can be adapted to other Byzantine-tolerant consensus algorithms, how a complete system may be built on the basis of it, and how TRAIL can be applied to existing and future sharded blockchains.
[248] arXiv:2405.07147 [pdf, ps, other]: Title: Randomized algorithms for computing the tensor train approximation and their applications

Authors: Maolin Che, Yimin Wei, Hong Yan

Comments: 43 pages, 9 figures and 4 tables

Subjects: Numerical Analysis (math.NA)

In this paper, we focus on the fixed TT-rank and precision problems of finding an approximation of the tensor train (TT) decomposition of a tensor. Note that the TT-SVD and TT-cross are two well-known algorithms for these two problems. Firstly, by combining the random projection technique with the power scheme, we obtain two types of randomized algorithms for the fixed TT-rank problem. Secondly, by using the non-asymptotic theory of sub-random Gaussian matrices, we derive the upper bounds of the proposed randomized algorithms. Thirdly, we deduce a new deterministic strategy to estimate the desired TT-rank with a given tolerance and another adaptive randomized algorithm that finds a low TT-rank representation satisfying a given tolerance, and is beneficial when the target TT-rank is not known in advance. We finally illustrate the accuracy of the proposed algorithms via some test tensors from synthetic and real databases. In particular, for the fixed TT-rank problem, the proposed algorithms can be several times faster than the TT-SVD, and the accuracy of the proposed algorithms and the TT-SVD are comparable for several test tensors.
[249] arXiv:2405.07151 [pdf, other]: Title: Group Complete-$\{s\}$ Pliable Index Coding

Authors: Sina Eghbal, Badri N. Vellambi, Lawrence Ong, Parastoo Sadeghi

Comments: Accepted for publication in 2024 IEEE International Symposium on Information Theory

Subjects: Information Theory (cs.IT)

This paper introduces a novel class of PICOD($t$) problems referred to as $g$-group complete-$S$ PICOD($t$) problems. It constructs a multi-stage achievability scheme to generate pliable index codes for group complete PICOD problems when $S = \{s\}$ is a singleton set. Using the maximum acyclic induced subgraph bound, lower bounds on the broadcast rate are derived for singleton $S$, which establishes the optimality of the achievability scheme for a range of values for $t$ and for any $g$ and $s$. For all other values, it is shown that the achievability scheme is optimal among the restricted class of broadcast codes.
[250] arXiv:2405.07155 [pdf, other]: Title: Enhancing Multi-modal Learning: Meta-learned Cross-modal Knowledge Distillation for Handling Missing Modalities

Authors: Hu Wang, Congbo Ma, Yuyuan Liu, Yuanhong Chen, Yu Tian, Jodie Avery, Louise Hull, Gustavo Carneiro

Subjects: Computer Vision and Pattern Recognition (cs.CV)

In multi-modal learning, some modalities are more influential than others, and their absence can have a significant impact on classification/segmentation accuracy. Hence, an important research question is if it is possible for trained multi-modal models to have high accuracy even when influential modalities are absent from the input data. In this paper, we propose a novel approach called Meta-learned Cross-modal Knowledge Distillation (MCKD) to address this research question. MCKD adaptively estimates the importance weight of each modality through a meta-learning process. These dynamically learned modality importance weights are used in a pairwise cross-modal knowledge distillation process to transfer the knowledge from the modalities with higher importance weight to the modalities with lower importance weight. This cross-modal knowledge distillation produces a highly accurate model even with the absence of influential modalities. Differently from previous methods in the field, our approach is designed to work in multiple tasks (e.g., segmentation and classification) with minimal adaptation. Experimental results on the Brain tumor Segmentation Dataset 2018 (BraTS2018) and the Audiovision-MNIST classification dataset demonstrate the superiority of MCKD over current state-of-the-art models. Particularly in BraTS2018, we achieve substantial improvements of 3.51\% for enhancing tumor, 2.19\% for tumor core, and 1.14\% for the whole tumor in terms of average segmentation Dice score.
[251] arXiv:2405.07157 [pdf, other]: Title: Semi-Self-Supervised Domain Adaptation: Developing Deep Learning Models with Limited Annotated Data for Wheat Head Segmentation

Authors: Alireza Ghanbari, Gholamhassan Shirdel, Farhad Maleki

Comments: 12

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Precision agriculture involves the application of advanced technologies to improve agricultural productivity, efficiency, and profitability while minimizing waste and environmental impact. Deep learning approaches enable automated decision-making for many visual tasks. However, in the agricultural domain, variability in growth stages and environmental conditions, such as weather and lighting, presents significant challenges to developing deep learning-based techniques that generalize across different conditions. The resource-intensive nature of creating extensive annotated datasets that capture these variabilities further hinders the widespread adoption of these approaches. To tackle these issues, we introduce a semi-self-supervised domain adaptation technique based on deep convolutional neural networks with a probabilistic diffusion process, requiring minimal manual data annotation. Using only three manually annotated images and a selection of video clips from wheat fields, we generated a large-scale computationally annotated dataset of image-mask pairs and a large dataset of unannotated images extracted from video frames. We developed a two-branch convolutional encoder-decoder model architecture that uses both synthesized image-mask pairs and unannotated images, enabling effective adaptation to real images. The proposed model achieved a Dice score of 80.7\% on an internal test dataset and a Dice score of 64.8\% on an external test set, composed of images from five countries and spanning 18 domains, indicating its potential to develop generalizable solutions that could encourage the wider adoption of advanced technologies in agriculture.
[252] arXiv:2405.07162 [pdf, other]: Title: Learning Reward for Robot Skills Using Large Language Models via Self-Alignment

Authors: Yuwei Zeng, Yao Mu, Lin Shao

Comments: ICML 2024

Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)

Learning reward functions remains the bottleneck to equip a robot with a broad repertoire of skills. Large Language Models (LLM) contain valuable task-related knowledge that can potentially aid in the learning of reward functions. However, the proposed reward function can be imprecise, thus ineffective which requires to be further grounded with environment information. We proposed a method to learn rewards more efficiently in the absence of humans. Our approach consists of two components: We first use the LLM to propose features and parameterization of the reward, then update the parameters through an iterative self-alignment process. In particular, the process minimizes the ranking inconsistency between the LLM and the learnt reward functions based on the execution feedback. The method was validated on 9 tasks across 2 simulation environments. It demonstrates a consistent improvement over training efficacy and efficiency, meanwhile consuming significantly fewer GPT tokens compared to the alternative mutation-based method.
[253] arXiv:2405.07164 [pdf, other]: Title: Modeling Pedestrian Intrinsic Uncertainty for Multimodal Stochastic Trajectory Prediction via Energy Plan Denoising

Authors: Yao Liu, Quan Z. Sheng, Lina Yao

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Pedestrian trajectory prediction plays a pivotal role in the realms of autonomous driving and smart cities. Despite extensive prior research employing sequence and generative models, the unpredictable nature of pedestrians, influenced by their social interactions and individual preferences, presents challenges marked by uncertainty and multimodality. In response, we propose the Energy Plan Denoising (EPD) model for stochastic trajectory prediction. EPD initially provides a coarse estimation of the distribution of future trajectories, termed the Plan, utilizing the Langevin Energy Model. Subsequently, it refines this estimation through denoising via the Probabilistic Diffusion Model. By initiating denoising with the Plan, EPD effectively reduces the need for iterative steps, thereby enhancing efficiency. Furthermore, EPD differs from conventional approaches by modeling the distribution of trajectories instead of individual trajectories. This allows for the explicit modeling of pedestrian intrinsic uncertainties and eliminates the need for multiple denoising operations. A single denoising operation produces a distribution from which multiple samples can be drawn, significantly enhancing efficiency. Moreover, EPD's fine-tuning of the Plan contributes to improved model performance. We validate EPD on two publicly available datasets, where it achieves state-of-the-art results. Additionally, ablation experiments underscore the contributions of individual modules, affirming the efficacy of the proposed approach.
[254] arXiv:2405.07166 [pdf, other]: Title: Resource Efficient Perception for Vision Systems

Authors: A V Subramanyam, Niyati Singal, Vinay K Verma

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Despite the rapid advancement in the field of image recognition, the processing of high-resolution imagery remains a computational challenge. However, this processing is pivotal for extracting detailed object insights in areas ranging from autonomous vehicle navigation to medical imaging analyses. Our study introduces a framework aimed at mitigating these challenges by leveraging memory efficient patch based processing for high resolution images. It incorporates a global context representation alongside local patch information, enabling a comprehensive understanding of the image content. In contrast to traditional training methods which are limited by memory constraints, our method enables training of ultra high resolution images. We demonstrate the effectiveness of our method through superior performance on 7 different benchmarks across classification, object detection, and segmentation. Notably, the proposed method achieves strong performance even on resource-constrained devices like Jetson Nano. Our code is available at https://github.com/Visual-Conception-Group/Localized-Perception-Constrained-Vision-Systems.
[255] arXiv:2405.07167 [pdf, other]: Title: 3D Hand Mesh Recovery from Monocular RGB in Camera Space

Authors: Haonan Li, Patrick P. K. Chen, Yitong Zhou

Comments: 21 pages, 7 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV)

With the rapid advancement of technologies such as virtual reality, augmented reality, and gesture control, users expect interactions with computer interfaces to be more natural and intuitive. Existing visual algorithms often struggle to accomplish advanced human-computer interaction tasks, necessitating accurate and reliable absolute spatial prediction methods. Moreover, dealing with complex scenes and occlusions in monocular images poses entirely new challenges. This study proposes a network model that performs parallel processing of root-relative grids and root recovery tasks. The model enables the recovery of 3D hand meshes in camera space from monocular RGB images. To facilitate end-to-end training, we utilize an implicit learning approach for 2D heatmaps, enhancing the compatibility of 2D cues across different subtasks. Incorporate the Inception concept into spectral graph convolutional network to explore relative mesh of root, and integrate it with the locally detailed and globally attentive method designed for root recovery exploration. This approach improves the model's predictive performance in complex environments and self-occluded scenes. Through evaluation on the large-scale hand dataset FreiHAND, we have demonstrated that our proposed model is comparable with state-of-the-art models. This study contributes to the advancement of techniques for accurate and reliable absolute spatial prediction in various human-computer interaction applications.
[256] arXiv:2405.07169 [pdf, other]: Title: Challenges and Opportunities for Large-Scale Exploration with Air-Ground Teams using Semantics

Authors: Fernando Cladera, Ian D. Miller, Zachary Ravichandran, Varun Murali, Jason Hughes, M. Ani Hsieh, C. J. Taylor, Vijay Kumar

Comments: 6 pages, 5 figres

Subjects: Robotics (cs.RO)

One common and desirable application of robots is exploring potentially hazardous and unstructured environments. Air-ground collaboration offers a synergistic approach to addressing such exploration challenges. In this paper, we demonstrate a system for large-scale exploration using a team of aerial and ground robots. Our system uses semantics as lingua franca, and relies on fully opportunistic communications. We highlight the unique challenges from this approach, explain our system architecture and showcase lessons learned during our experiments. All our code is open-source, encouraging researchers to use it and build upon.
[257] arXiv:2405.07171 [pdf, other]: Title: Enhanced Online Test-time Adaptation with Feature-Weight Cosine Alignment

Authors: WeiQin Chuah, Ruwan Tennakoon, Alireza Bab-Hadiashar

Comments: 22 pages, 7 figures, 8 tables

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Online Test-Time Adaptation (OTTA) has emerged as an effective strategy to handle distributional shifts, allowing on-the-fly adaptation of pre-trained models to new target domains during inference, without the need for source data. We uncovered that the widely studied entropy minimization (EM) method for OTTA, suffers from noisy gradients due to ambiguity near decision boundaries and incorrect low-entropy predictions. To overcome these limitations, this paper introduces a novel cosine alignment optimization approach with a dual-objective loss function that refines the precision of class predictions and adaptability to novel domains. Specifically, our method optimizes the cosine similarity between feature vectors and class weight vectors, enhancing the precision of class predictions and the model's adaptability to novel domains. Our method outperforms state-of-the-art techniques and sets a new benchmark in multiple datasets, including CIFAR-10-C, CIFAR-100-C, ImageNet-C, Office-Home, and DomainNet datasets, demonstrating high accuracy and robustness against diverse corruptions and domain shifts.
[258] arXiv:2405.07172 [pdf, other]: Title: Observability and Incident Response in Managed Serverless Environments Using Ontology-Based Log Monitoring

Authors: Lavi Ben-Shimol, Edita Grolman, Aviad Elyashar, Inbar Maimon, Dudu Mimran, Oleg Brodt, Martin Strassmann, Heiko Lehmann, Yuval Elovici, Asaf Shabtai

Subjects: Cryptography and Security (cs.CR)

In a fully managed serverless environment, the cloud service provider is responsible for securing the cloud infrastructure, thereby reducing the operational and maintenance efforts of application developers. However, this environment limits the use of existing cybersecurity frameworks and tools, which reduces observability and situational awareness capabilities (e.g., risk assessment, incident response). In addition, existing security frameworks for serverless applications do not generalize well to all application architectures and usually require adaptation, specialized expertise, etc. for use in fully managed serverless environments. In this paper, we introduce a three-layer security scheme for applications deployed in fully managed serverless environments. The first two layers involve a unique ontology based solely on serverless logs which is used to transform them into a unified application activity knowledge graph. In the third layer, we address the need for observability and situational awareness capabilities by implementing two situational awareness tools that utilizes the graph-based representation: 1) An incident response dashboard that leverages the ontology to visualize and examine application activity logs in the context of cybersecurity alerts. Our user study showed that the dashboard enabled participants to respond more accurately and quickly to new security alerts than the baseline tool. 2) A criticality of asset (CoA) risk assessment framework that enables efficient expert-based prioritization in cybersecurity contexts.
[259] arXiv:2405.07174 [pdf, other]: Title: CRSFL: Cluster-based Resource-aware Split Federated Learning for Continuous Authentication

Authors: Mohamad Wazzeh, Mohamad Arafeh, Hani Sami, Hakima Ould-Slimane, Chamseddine Talhi, Azzam Mourad, Hadi Otrok

Subjects: Computer Vision and Pattern Recognition (cs.CV); Distributed, Parallel, and Cluster Computing (cs.DC)

In the ever-changing world of technology, continuous authentication and comprehensive access management are essential during user interactions with a device. Split Learning (SL) and Federated Learning (FL) have recently emerged as promising technologies for training a decentralized Machine Learning (ML) model. With the increasing use of smartphones and Internet of Things (IoT) devices, these distributed technologies enable users with limited resources to complete neural network model training with server assistance and collaboratively combine knowledge between different nodes. In this study, we propose combining these technologies to address the continuous authentication challenge while protecting user privacy and limiting device resource usage. However, the model's training is slowed due to SL sequential training and resource differences between IoT devices with different specifications. Therefore, we use a cluster-based approach to group devices with similar capabilities to mitigate the impact of slow devices while filtering out the devices incapable of training the model. In addition, we address the efficiency and robustness of training ML models by using SL and FL techniques to train the clients simultaneously while analyzing the overhead burden of the process. Following clustering, we select the best set of clients to participate in training through a Genetic Algorithm (GA) optimized on a carefully designed list of objectives. The performance of our proposed framework is compared to baseline methods, and the advantages are demonstrated using a real-life UMDAA-02-FD face detection dataset. The results show that CRSFL, our proposed approach, maintains high accuracy and reduces the overhead burden in continuous authentication scenarios while preserving user privacy.
[260] arXiv:2405.07175 [pdf, other]: Title: On-Demand Model and Client Deployment in Federated Learning with Deep Reinforcement Learning

Authors: Mario Chahoud, Hani Sami, Azzam Mourad, Hadi Otrok, Jamal Bentahar, Mohsen Guizani

Subjects: Machine Learning (cs.LG)

In Federated Learning (FL), the limited accessibility of data from diverse locations and user types poses a significant challenge due to restricted user participation. Expanding client access and diversifying data enhance models by incorporating diverse perspectives, thereby enhancing adaptability. However, challenges arise in dynamic and mobile environments where certain devices may become inaccessible as FL clients, impacting data availability and client selection methods. To address this, we propose an On-Demand solution, deploying new clients using Docker Containers on-the-fly. Our On-Demand solution, employing Deep Reinforcement Learning (DRL), targets client availability and selection, while considering data shifts, and container deployment complexities. It employs an autonomous end-to-end solution for handling model deployment and client selection. The DRL strategy uses a Markov Decision Process (MDP) framework, with a Master Learner and a Joiner Learner. The designed cost functions represent the complexity of the dynamic client deployment and selection. Simulated tests show that our architecture can easily adjust to changes in the environment and respond to On-Demand requests. This underscores its ability to improve client availability, capability, accuracy, and learning efficiency, surpassing heuristic and tabular reinforcement learning solutions.
[261] arXiv:2405.07176 [pdf, other]: Title: Capacity Maximization for Base Station with Hybrid Fixed and Movable Antennas

Authors: Xiaoming Shi, Xiaodan Shao, Rui Zhang

Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

Six-dimensional movable antenna (6DMA) is an effective solution for enhancing wireless network capacity through the adjustment of both 3D positions and 3D rotations of distributed antennas/antenna surfaces. Although freely positioning/rotating 6DMA surfaces offers the greatest flexibility and thus highest capacity improvement, its implementation may be challenging in practice due to the drastic architecture change required for existing base stations (BSs), which predominantly adopt fixed-position antenna (FPA) arrays (e.g., sector antenna arrays). Thus, we introduce in this letter a new BS architecture called hybrid fixed and movable antennas (HFMA), which consists of both conventional FPA arrays and position/rotation-adjustable 6DMA surfaces. For ease of implementation, we consider that all 6DMA surfaces can rotate along a circular track above the FPA arrays. We aim to maximize the network capacity via optimizing the rotation angles of all 6DMA surfaces based on the users' spatial distribution. Since this problem is combinatorial and its optimal solution requires prohibitively high computational complexity via exhaustive search, we propose an alternative adaptive Markov Chain Monte Carlo based method to solve it more efficiently. Finally, we present simulation results that show significant performance gains achieved by our proposed design over various benchmark schemes.
[262] arXiv:2405.07178 [pdf, other]: Title: Hologram: Realtime Holographic Overlays via LiDAR Augmented Reconstruction

Authors: Ekansh Agrawal

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Guided by the hologram technology of the infamous Star Wars franchise, I present an application that creates real-time holographic overlays using LiDAR augmented 3D reconstruction. Prior attempts involve SLAM or NeRFs which either require highly calibrated scenes, incur steep computation costs, or fail to render dynamic scenes. I propose 3 high-fidelity reconstruction tools that can run on a portable device, such as a iPhone 14 Pro, which can allow for metric accurate facial reconstructions. My systems enable interactive and immersive holographic experiences that can be used for a wide range of applications, including augmented reality, telepresence, and entertainment.
[263] arXiv:2405.07180 [pdf, other]: Title: Repairing Reed-Solomon Codes with Side Information

Authors: Thi Xinh Dinh, Ba Thong Le, Son Hoang Dau, Serdar Boztas, Stanislav Kruglik, Han Mao Kiah, Emanuele Viterbo, Tuvi Etzion, Yeow Meng Chee

Subjects: Information Theory (cs.IT)

We generalize the problem of recovering a lost/erased symbol in a Reed-Solomon code to the scenario in which some side information about the lost symbol is known. The side information is represented as a set $S$ of linearly independent combinations of the sub-symbols of the lost symbol. When $S = \varnothing$, this reduces to the standard problem of repairing a single codeword symbol. When $S$ is a set of sub-symbols of the erased one, this becomes the repair problem with partially lost/erased symbol. We first establish that the minimum repair bandwidth depends on $|S|$ and not the content of $S$ and construct a lower bound on the repair bandwidth of a linear repair scheme with side information $S$. We then consider the well-known subspace-polynomial repair schemes and show that their repair bandwidths can be optimized by choosing the right subspaces. Finally, we demonstrate several parameter regimes where the optimal bandwidths can be achieved for full-length Reed-Solomon codes.
[264] arXiv:2405.07188 [pdf, ps, other]: Title: Deciding regular games: a playground for exponential time algorithms

Authors: Zihui Liang, Bakh Khoussainov, Mingyu Xiao

Subjects: Computer Science and Game Theory (cs.GT)

Regular games form a well-established class of games for analysis and synthesis of reactive systems. They include coloured Muller games, McNaughton games, Muller games, Rabin games, and Streett games. These games are played on directed graphs $\mathcal G$ where Player 0 and Player 1 play by generating an infinite path $\rho$ through the graph. The winner is determined by specifications put on the set $X$ of vertices in $\rho$ that occur infinitely often. These games are determined, enabling the partitioning of $\mathcal G$ into two sets $W_0$ and $W_1$ of winning positions for Player 0 and Player 1, respectively. Numerous algorithms exist that decide specific instances of regular games, e.g., Muller games, by computing $W_0$ and $W_1$. In this paper we aim to find general principles for designing uniform algorithms that decide all regular games. For this we utilise various recursive and dynamic programming algorithms that leverage standard notions such as subgames and traps. Importantly, we show that our techniques improve or match the performances of existing algorithms for many instances of regular games.
[265] arXiv:2405.07189 [pdf, ps, other]: Title: A hybrid meta-heuristic approach for channel estimation in OFDM MIMO

Authors: Shahriar Hassan, Umme Farhana, Md Karam Newaz

Journal-ref: Journal of Gono Bishwabidyalay, Vol. 4, Issue. 1, PP. 224-236, 2023

Subjects: Networking and Internet Architecture (cs.NI)

In wireless communication Multiple Input Multiple Output (MIMO) technology has brought significant improvement in service by adopting Orthogonal Frequency Division Multiplexing (OFDM), a digital modulation technique. To achieve great performance with MIMO efficiently gathering channel state information (CSI) plays a vital role. Among different approach of channel estimation techniques data-aided channel estimation is more reliable. The existing methods of data-aided channel estimation are Least Square (LS) and Minimum Mean Square Error (MMSE) methods which do not achieve a great performance. Moreover, MMSE is little complex and has higher computational cost. That is why many attempts have been done previously to optimize the methods with help of meta heuristics and also other ways. In this paper we have tried to optimize LS estimation with a combined algorithm of Genetic Algorithm (GA) and Particle Swarm Optimization (PSO). The proposed algorithm has outperformed LS and MMSE. And it gives similar result if we optimize LS with standard PSO but in less numbers of iteration.
[266] arXiv:2405.07193 [pdf, ps, other]: Title: Learning Design Preferences through Design Feature Extraction and Weighted Ensemble

Authors: Dongju Shin, Sunghee Lee, Namwoo Kang

Subjects: Human-Computer Interaction (cs.HC)

Design is a factor that plays an important role in consumer purchase decisions. As the need for understanding and predicting various preferences for each customer increases along with the importance of mass customization, predicting individual design preferences has become a critical factor in product development. However, current methods for predicting design preferences have some limitations. Product design involves a vast amount of high-dimensional information, and personal design preference is a complex and heterogeneous area of emotion unique to each individual. To address these challenges, we propose an approach that utilizes dimensionality reduction model to transform design samples into low-dimensional feature vectors, enabling us to extract the key representational features of each design. For preference prediction models using feature vectors, by referring to the design preference tendencies of others, we can predict the individual-level design preferences more accurately. Our proposed framework overcomes the limitations of traditional methods to determine design preferences, allowing us to accurately identify design features and predict individual preferences for specific products. Through this framework, we can improve the effectiveness of product development and create personalized product recommendations that cater to the unique needs of each consumer.
[267] arXiv:2405.07194 [pdf, other]: Title: Differentiable Model Scaling using Differentiable Topk

Authors: Kai Liu, Ruohui Wang, Jianfei Gao, Kai Chen

Comments: Accepted by ICML 2024

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Over the past few years, as large language models have ushered in an era of intelligence emergence, there has been an intensified focus on scaling networks. Currently, many network architectures are designed manually, often resulting in sub-optimal configurations. Although Neural Architecture Search (NAS) methods have been proposed to automate this process, they suffer from low search efficiency. This study introduces Differentiable Model Scaling (DMS), increasing the efficiency for searching optimal width and depth in networks. DMS can model both width and depth in a direct and fully differentiable way, making it easy to optimize. We have evaluated our DMS across diverse tasks, ranging from vision tasks to NLP tasks and various network architectures, including CNNs and Transformers. Results consistently indicate that our DMS can find improved structures and outperforms state-of-the-art NAS methods. Specifically, for image classification on ImageNet, our DMS improves the top-1 accuracy of EfficientNet-B0 and Deit-Tiny by 1.4% and 0.6%, respectively, and outperforms the state-of-the-art zero-shot NAS method, ZiCo, by 1.3% while requiring only 0.4 GPU days for searching. For object detection on COCO, DMS improves the mAP of Yolo-v8-n by 2.0%. For language modeling, our pruned Llama-7B outperforms the prior method with lower perplexity and higher zero-shot classification accuracy. We will release our code in the future.
[268] arXiv:2405.07195 [pdf, other]: Title: InsightNet: Structured Insight Mining from Customer Feedback

Authors: Sandeep Sricharan Mukku, Manan Soni, Jitenkumar Rana, Chetan Aggarwal, Promod Yenigalla, Rashmi Patange, Shyam Mohan

Comments: EMNLP 2023

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

We propose InsightNet, a novel approach for the automated extraction of structured insights from customer reviews. Our end-to-end machine learning framework is designed to overcome the limitations of current solutions, including the absence of structure for identified topics, non-standard aspect names, and lack of abundant training data. The proposed solution builds a semi-supervised multi-level taxonomy from raw reviews, a semantic similarity heuristic approach to generate labelled data and employs a multi-task insight extraction architecture by fine-tuning an LLM. InsightNet identifies granular actionable topics with customer sentiments and verbatim for each topic. Evaluations on real-world customer review data show that InsightNet performs better than existing solutions in terms of structure, hierarchy and completeness. We empirically demonstrate that InsightNet outperforms the current state-of-the-art methods in multi-label topic classification, achieving an F1 score of 0.85, which is an improvement of 11% F1-score over the previous best results. Additionally, InsightNet generalises well for unseen aspects and suggests new topics to be added to the taxonomy.
[269] arXiv:2405.07196 [pdf, other]: Title: Permissioned Blockchain-based Framework for Ranking Synthetic Data Generators

Authors: Narasimha Raghavan Veeraragavan, Mohammad Hossein Tabatabaei, Severin Elvatun, Vibeke Binz Vallevik, Siri Larønningen, Jan F Nygård

Subjects: Databases (cs.DB); Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)

Synthetic data generation is increasingly recognized as a crucial solution to address data related challenges such as scarcity, bias, and privacy concerns. As synthetic data proliferates, the need for a robust evaluation framework to select a synthetic data generator becomes more pressing given the variety of options available. In this research study, we investigate two primary questions: 1) How can we select the most suitable synthetic data generator from a set of options for a specific purpose? 2) How can we make the selection process more transparent, accountable, and auditable? To address these questions, we introduce a novel approach in which the proposed ranking algorithm is implemented as a smart contract within a permissioned blockchain framework called Sawtooth. Through comprehensive experiments and comparisons with state-of-the-art baseline ranking solutions, our framework demonstrates its effectiveness in providing nuanced rankings that consider both desirable and undesirable properties. Furthermore, our framework serves as a valuable tool for selecting the optimal synthetic data generators for specific needs while ensuring compliance with data protection principles.
[270] arXiv:2405.07200 [pdf, other]: Title: Chebyshev Polynomial-Based Kolmogorov-Arnold Networks: An Efficient Architecture for Nonlinear Function Approximation

Authors: Sidharth SS

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Accurate approximation of complex nonlinear functions is a fundamental challenge across many scientific and engineering domains. Traditional neural network architectures often struggle to capture intricate patterns and irregularities present in high-dimensional functions. This paper introduces the Chebyshev Kolmogorov-Arnold Network (Chebyshev KAN), a novel approach that combines the theoretical foundations of the Kolmogorov-Arnold Theorem with the powerful approximation capabilities of Chebyshev polynomials. 1
[271] arXiv:2405.07201 [pdf, other]: Title: Building a Strong Pre-Training Baseline for Universal 3D Large-Scale Perception

Authors: Haoming Chen, Zhizhong Zhang, Yanyun Qu, Ruixin Zhang, Xin Tan, Yuan Xie

Comments: Accepted to CVPR 2024

Subjects: Computer Vision and Pattern Recognition (cs.CV)

An effective pre-training framework with universal 3D representations is extremely desired in perceiving large-scale dynamic scenes. However, establishing such an ideal framework that is both task-generic and label-efficient poses a challenge in unifying the representation of the same primitive across diverse scenes. The current contrastive 3D pre-training methods typically follow a frame-level consistency, which focuses on the 2D-3D relationships in each detached image. Such inconsiderate consistency greatly hampers the promising path of reaching an universal pre-training framework: (1) The cross-scene semantic self-conflict, i.e., the intense collision between primitive segments of the same semantics from different scenes; (2) Lacking a globally unified bond that pushes the cross-scene semantic consistency into 3D representation learning. To address above challenges, we propose a CSC framework that puts a scene-level semantic consistency in the heart, bridging the connection of the similar semantic segments across various scenes. To achieve this goal, we combine the coherent semantic cues provided by the vision foundation model and the knowledge-rich cross-scene prototypes derived from the complementary multi-modality information. These allow us to train a universal 3D pre-training model that facilitates various downstream tasks with less fine-tuning efforts. Empirically, we achieve consistent improvements over SOTA pre-training approaches in semantic segmentation (+1.4% mIoU), object detection (+1.0% mAP), and panoptic segmentation (+3.0% PQ) using their task-specific 3D network on nuScenes. Code is released at https://github.com/chenhaomingbob/CSC, hoping to inspire future research.
[272] arXiv:2405.07202 [pdf, other]: Title: Unified Video-Language Pre-training with Synchronized Audio

Authors: Shentong Mo, Haofan Wang, Huaxia Li, Xu Tang

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Video-language pre-training is a typical and challenging problem that aims at learning visual and textual representations from large-scale data in a self-supervised way. Existing pre-training approaches either captured the correspondence of image-text pairs or utilized temporal ordering of frames. However, they do not explicitly explore the natural synchronization between audio and the other two modalities. In this work, we propose an enhanced framework for Video-Language pre-training with Synchronized Audio, termed as VLSA, that can learn tri-modal representations in a unified self-supervised transformer. Specifically, our VLSA jointly aggregates embeddings of local patches and global tokens for video, text, and audio. Furthermore, we utilize local-patch masked modeling to learn modality-aware features, and leverage global audio matching to capture audio-guided features for video and text. We conduct extensive experiments on retrieval across text, video, and audio. Our simple model pre-trained on only 0.9M data achieves improving results against state-of-the-art baselines. In addition, qualitative visualizations vividly showcase the superiority of our VLSA in learning discriminative visual-textual representations.
[273] arXiv:2405.07204 [pdf, other]: Title: Transforming C++11 Code to C++03 to Support Legacy Compilation Environments

Authors: Gábor Antal, Dávid Havas, István Siket, Árpád Beszédes, Rudolf Ferenc, József Mihalicza

Subjects: Software Engineering (cs.SE); Programming Languages (cs.PL)

Newer technologies - programming languages, environments, libraries - change very rapidly. However, various internal and external constraints often prevent projects from quickly adopting to these changes. Customers may require specific platform compatibility from a software vendor, for example. In this work, we deal with such an issue in the context of the C++ programming language. Our industrial partner is required to use SDKs that support only older C++ language editions. They, however, would like to allow their developers to use the newest language constructs in their code. To address this problem, we created a source code transformation framework to automatically backport source code written according to the C++11 standard to its functionally equivalent C++03 variant. With our framework developers are free to exploit the latest language features, while production code is still built by using a restricted set of available language constructs. This paper reports on the technical details of the transformation engine, and our experiences in applying it on two large industrial code bases and four open-source systems. Our solution is freely available and open-source.
[274] arXiv:2405.07206 [pdf, other]: Title: Static JavaScript Call Graphs: A Comparative Study

Authors: Gábor Antal, Péter Hegedűs, Zoltán Tóth, Rudolf Ferenc, Tibor Gyimóthy

Subjects: Software Engineering (cs.SE)

The popularity and wide adoption of JavaScript both at the client and server side makes its code analysis more important than ever before. Most of the algorithms for vulnerability analysis, coding issue detection, or type inference rely on the call graph representation of the underlying program. Despite some obvious advantages of dynamic analysis, static algorithms should also be considered for call graph construction as they do not require extensive test beds for programs and their costly execution and tracing. In this paper, we systematically compare five widely adopted static algorithms - implemented by the npm call graph, IBM WALA, Google Closure Compiler, Approximate Call Graph, and Type Analyzer for JavaScript tools - for building JavaScript call graphs on 26 WebKit SunSpider benchmark programs and 6 real-world Node.js modules. We provide a performance analysis as well as a quantitative and qualitative evaluation of the results. We found that there was a relatively large intersection of the found call edges among the algorithms, which proved to be 100 precise. However, most of the tools found edges that were missed by all others. ACG had the highest precision followed immediately by TAJS, but ACG found significantly more call edges. As for the combination of tools, ACG and TAJS together covered 99% of the found true edges by all algorithms, while maintaining a precision as high as 98%. Only two of the tools were able to analyze up-to-date multi-file Node.js modules due to incomplete language features support. They agreed on almost 60% of the call edges, but each of them found valid edges that the other missed.
[275] arXiv:2405.07210 [pdf, other]: Title: A complete pair of solvents of a quadratic matrix pencil

Authors: V. G. Kurbatov, I. V. Kurbatova

Comments: 24 pages, 16 figures

Subjects: Numerical Analysis (math.NA); Dynamical Systems (math.DS); Functional Analysis (math.FA); Spectral Theory (math.SP)

Let $B$ and $C$ be square complex matrices. The differential equation \begin{equation*} x''(t)+Bx'(t)+Cx(t)=f(t) \end{equation*} is considered. A solvent is a matrix solution $X$ of the equation $X^2+BX+C=\mathbf0$. A pair of solvents $X$ and $Z$ is called complete if the matrix $X-Z$ is invertible. Knowing a complete pair of solvents $X$ and $Z$ allows us to reduce the solution of the initial value problem to the calculation of two matrix exponentials $e^{Xt}$ and $e^{Zt}$. The problem of finding a complete pair $X$ and $Z$, which leads to small rounding errors in solving the differential equation, is discussed.
[276] arXiv:2405.07212 [pdf, other]: Title: Enhancing Decision-Making in Optimization through LLM-Assisted Inference: A Neural Networks Perspective

Authors: Gaurav Singh, Kavitesh Kumar Bali

Comments: Accepted IJCNN

Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI)

This paper explores the seamless integration of Generative AI (GenAI) and Evolutionary Algorithms (EAs) within the domain of large-scale multi-objective optimization. Focusing on the transformative role of Large Language Models (LLMs), our study investigates the potential of LLM-Assisted Inference to automate and enhance decision-making processes. Specifically, we highlight its effectiveness in illuminating key decision variables in evolutionarily optimized solutions while articulating contextual trade-offs. Tailored to address the challenges inherent in inferring complex multi-objective optimization solutions at scale, our approach emphasizes the adaptive nature of LLMs, allowing them to provide nuanced explanations and align their language with diverse stakeholder expertise levels and domain preferences. Empirical studies underscore the practical applicability and impact of LLM-Assisted Inference in real-world decision-making scenarios.
[277] arXiv:2405.07213 [pdf, other]: Title: Challenging Machine Learning Algorithms in Predicting Vulnerable JavaScript Functions

Authors: Rudolf Ferenc, Péter Hegedűs, Péter Gyimesi, Gábor Antal, Dénes Bán, Tibor Gyimóthy

Subjects: Cryptography and Security (cs.CR); Software Engineering (cs.SE)

The rapid rise of cyber-crime activities and the growing number of devices threatened by them place software security issues in the spotlight. As around 90% of all attacks exploit known types of security issues, finding vulnerable components and applying existing mitigation techniques is a viable practical approach for fighting against cyber-crime. In this paper, we investigate how the state-of-the-art machine learning techniques, including a popular deep learning algorithm, perform in predicting functions with possible security vulnerabilities in JavaScript programs. We applied 8 machine learning algorithms to build prediction models using a new dataset constructed for this research from the vulnerability information in public databases of the Node Security Project and the Snyk platform, and code fixing patches from GitHub. We used static source code metrics as predictors and an extensive grid-search algorithm to find the best performing models. We also examined the effect of various re-sampling strategies to handle the imbalanced nature of the dataset. The best performing algorithm was KNN, which created a model for the prediction of vulnerable functions with an F-measure of 0.76 (0.91 precision and 0.66 recall). Moreover, deep learning, tree and forest based classifiers, and SVM were competitive with F-measures over 0.70. Although the F-measures did not vary significantly with the re-sampling strategies, the distribution of precision and recall did change. No re-sampling seemed to produce models preferring high precision, while re-sampling strategies balanced the IR measures.
[278] arXiv:2405.07216 [pdf, other]: Title: Magnetic-Guided Flexible Origami Robot toward Long-Term Phototherapy of H. pylori in the Stomach

Authors: Sishen Yuan, Baijia Liang, Po Wa Wong, Mingjing Xu, Chi Hsuan Li, Zhen Li, Hongliang Ren

Comments: IEEE ICRA 2024

Subjects: Systems and Control (eess.SY)

Helicobacter pylori, a pervasive bacterial infection associated with gastrointestinal disorders such as gastritis, peptic ulcer disease, and gastric cancer, impacts approximately 50% of the global population. The efficacy of standard clinical eradication therapies is diminishing due to the rise of antibiotic-resistant strains, necessitating alternative treatment strategies. Photodynamic therapy (PDT) emerges as a promising prospect in this context. This study presents the development and implementation of a magnetically-guided origami robot, incorporating flexible printed circuit units for sustained and stable phototherapy of Helicobacter pylori. Each integrated unit is equipped with wireless charging capabilities, producing an optimal power output that can concurrently illuminate up to 15 LEDs at their maximum intensity. Crucially, these units can be remotely manipulated via a magnetic field, facilitating both translational and rotational movements. We propose an open-loop manual control sequence that allows the formation of a stable, compliant triangular structure through the interaction of internal magnets. This adaptable configuration is uniquely designed to withstand the dynamic squeezing environment prevalent in real-world gastric applications. The research herein represents a significant stride in leveraging technology for innovative medical solutions, particularly in the management of antibiotic-resistant Helicobacter pylori infections.
[279] arXiv:2405.07220 [pdf, other]: Title: On Discovery of Local Independence over Continuous Variables via Neural Contextual Decomposition

Authors: Inwoo Hwang, Yunhyeok Kwak, Yeon-Ji Song, Byoung-Tak Zhang, Sanghack Lee

Comments: Conference on Causal Learning and Reasoning (CLeaR), 2023

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

Conditional independence provides a way to understand causal relationships among the variables of interest. An underlying system may exhibit more fine-grained causal relationships especially between a variable and its parents, which will be called the local independence relationships. One of the most widely studied local relationships is Context-Specific Independence (CSI), which holds in a specific assignment of conditioned variables. However, its applicability is often limited since it does not allow continuous variables: data conditioned to the specific value of a continuous variable contains few instances, if not none, making it infeasible to test independence. In this work, we define and characterize the local independence relationship that holds in a specific set of joint assignments of parental variables, which we call context-set specific independence (CSSI). We then provide a canonical representation of CSSI and prove its fundamental properties. Based on our theoretical findings, we cast the problem of discovering multiple CSSI relationships in a system as finding a partition of the joint outcome space. Finally, we propose a novel method, coined neural contextual decomposition (NCD), which learns such partition by imposing each set to induce CSSI via modeling a conditional distribution. We empirically demonstrate that the proposed method successfully discovers the ground truth local independence relationships in both synthetic dataset and complex system reflecting the real-world physical dynamics.
[280] arXiv:2405.07223 [pdf, other]: Title: Ensemble Successor Representations for Task Generalization in Offline-to-Online Reinforcement Learning

Authors: Changhong Wang, Xudong Yu, Chenjia Bai, Qiaosheng Zhang, Zhen Wang

Comments: Accepted by Science China Information Sciences

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

In Reinforcement Learning (RL), training a policy from scratch with online experiences can be inefficient because of the difficulties in exploration. Recently, offline RL provides a promising solution by giving an initialized offline policy, which can be refined through online interactions. However, existing approaches primarily perform offline and online learning in the same task, without considering the task generalization problem in offline-to-online adaptation. In real-world applications, it is common that we only have an offline dataset from a specific task while aiming for fast online-adaptation for several tasks. To address this problem, our work builds upon the investigation of successor representations for task generalization in online RL and extends the framework to incorporate offline-to-online learning. We demonstrate that the conventional paradigm using successor features cannot effectively utilize offline data and improve the performance for the new task by online fine-tuning. To mitigate this, we introduce a novel methodology that leverages offline data to acquire an ensemble of successor representations and subsequently constructs ensemble Q functions. This approach enables robust representation learning from datasets with different coverage and facilitates fast adaption of Q functions towards new tasks during the online fine-tuning phase. Extensive empirical evaluations provide compelling evidence showcasing the superior performance of our method in generalizing to diverse or even unseen tasks.
[281] arXiv:2405.07224 [pdf, other]: Title: A geometric decomposition of finite games: Convergence vs. recurrence under no-regret learning

Authors: Davide Legacci, Panayotis Mertikopoulos, Bary Pradelski

Comments: 50 pages, 16 figures

Subjects: Computer Science and Game Theory (cs.GT); Machine Learning (cs.LG); Optimization and Control (math.OC)

In view of the complexity of the dynamics of no-regret learning in games, we seek to decompose a finite game into simpler components where the day-to-day behavior of the dynamics is well understood. A natural starting point for this is Helmholtz's theorem, which resolves a vector field into a potential and an incompressible component. However, the geometry of no-regret dynamics - and, in particular, the dynamics of exponential / multiplicative weights (EW) schemes - is not compatible with the Euclidean underpinnings of Helmholtz's theorem, leading us to consider a Riemannian framework based on the Shahshahani metric. Using this geometric construction, we introduce the class of incompressible games, and we prove the following results: First, in addition to being volume-preserving, the continuous-time EW dynamics in incompressible games admit a constant of motion and are Poincar\'e recurrent - i.e., almost every trajectory of play comes arbitrarily close to its starting point infinitely often. Second, we establish a deep connection with a well-known decomposition of games into a potential and harmonic component (where the players' objectives are aligned and anti-aligned respectively): a game is incompressible if and only if it is harmonic, implying in turn that the EW dynamics lead to Poincar\'e recurrence in harmonic games.
[282] arXiv:2405.07229 [pdf, other]: Title: MM-InstructEval: Zero-Shot Evaluation of (Multimodal) Large Language Models on Multimodal Reasoning Tasks

Authors: Xiaocui Yang, Wenfang Wu, Shi Feng, Ming Wang, Daling Wang, Yang Li, Qi Sun, Yifei Zhang, Xiaoming Fu, Soujanya Poria

Comments: Under review, the new version of MM-BigBench: arXiv:2310.09036

Subjects: Multimedia (cs.MM)

The rising popularity of multimodal large language models (MLLMs) has sparked a significant increase in research dedicated to evaluating these models. However, current evaluation studies predominantly concentrate on the ability of models to comprehend and reason within a unimodal (vision-only) context, overlooking critical performance evaluations in complex multimodal reasoning tasks that integrate both visual and text contexts. Furthermore, tasks that demand reasoning across multiple modalities pose greater challenges and require a deep understanding of multimodal contexts. In this paper, we introduce a comprehensive assessment framework named MM-InstructEval, which integrates a diverse array of metrics to provide an extensive evaluation of the performance of various models and instructions across a broad range of multimodal reasoning tasks with vision-text contexts. MM-InstructEval enhances the research on the performance of MLLMs in complex multimodal reasoning tasks, facilitating a more thorough and holistic zero-shot evaluation of MLLMs. We firstly utilize the "Best Performance" metric to determine the upper performance limit of each model across various datasets. The "Mean Relative Gain" metric provides an analysis of the overall performance across different models and instructions, while the "Stability" metric evaluates their sensitivity to variations. Historically, the research has focused on evaluating models independently or solely assessing instructions, overlooking the interplay between models and instructions. To address this gap, we introduce the "Adaptability" metric, designed to quantify the degree of adaptability between models and instructions. Evaluations are conducted on 31 models (23 MLLMs) across 16 multimodal datasets, covering 6 tasks, with 10 distinct instructions. The extensive analysis enables us to derive novel insights.
[283] arXiv:2405.07232 [pdf, other]: Title: A Flow is a Stream of Packets: A Stream-Structured Data Approach for DDoS Detection

Authors: Raja Giryes, Lior Shafir, Avishai Wool

Subjects: Cryptography and Security (cs.CR)

Distributed Denial of Service (DDoS) attacks are getting increasingly harmful to the Internet, showing no signs of slowing down. Developing an accurate detection mechanism to thwart DDoS attacks is still a big challenge due to the rich variety of these attacks and the emergence of new attack vectors. In this paper, we propose a new tree-based DDoS detection approach that operates on a flow as a stream structure, rather than the traditional fixed-size record structure containing aggregated flow statistics. Although aggregated flow records have gained popularity over the past decade, providing an effective means for flow-based intrusion detection by inspecting only a fraction of the total traffic volume, they are inherently constrained. Their detection precision is limited not only by the lack of packet payloads, but also by their structure, which is unable to model fine-grained inter-packet relations, such as packet order and temporal relations. Additionally, inferring aggregated flow statistics must wait for the complete flow to end. Here we show that considering flow inputs as variable-length streams composed of their associated packet headers, allows for very accurate and fast detection of malicious flows. We evaluate our proposed strategy on the CICDDoS2019 and CICIDS2017 datasets, which contain a comprehensive variety of DDoS attacks. Our approach matches or exceeds existing machine learning techniques' accuracy, including state-of-the-art deep learning methods. Furthermore, our method achieves significantly earlier detection, e.g., with CICDDoS2019 detection based on the first 2 packets, which corresponds to an average time-saving of 99.79% and uses only 4--6% of the traffic volume.
[284] arXiv:2405.07233 [pdf, other]: Title: OXYGENERATOR: Reconstructing Global Ocean Deoxygenation Over a Century with Deep Learning

Authors: Bin Lu, Ze Zhao, Luyu Han, Xiaoying Gan, Yuntao Zhou, Lei Zhou, Luoyi Fu, Xinbing Wang, Chenghu Zhou, Jing Zhang

Comments: Accepted to ICML 2024

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Atmospheric and Oceanic Physics (physics.ao-ph)

Accurately reconstructing the global ocean deoxygenation over a century is crucial for assessing and protecting marine ecosystem. Existing expert-dominated numerical simulations fail to catch up with the dynamic variation caused by global warming and human activities. Besides, due to the high-cost data collection, the historical observations are severely sparse, leading to big challenge for precise reconstruction. In this work, we propose OxyGenerator, the first deep learning based model, to reconstruct the global ocean deoxygenation from 1920 to 2023. Specifically, to address the heterogeneity across large temporal and spatial scales, we propose zoning-varying graph message-passing to capture the complex oceanographic correlations between missing values and sparse observations. Additionally, to further calibrate the uncertainty, we incorporate inductive bias from dissolved oxygen (DO) variations and chemical effects. Compared with in-situ DO observations, OxyGenerator significantly outperforms CMIP6 numerical simulations, reducing MAPE by 38.77%, demonstrating a promising potential to understand the "breathless ocean" in data-driven manner.
[285] arXiv:2405.07236 [pdf, other]: Title: Adaptive control of recurrent neural networks using conceptors

Authors: Guillaume Pourcel, Mirko Goldmann, Ingo Fischer, Miguel C. Soriano

Subjects: Machine Learning (cs.LG); Adaptation and Self-Organizing Systems (nlin.AO)

Recurrent Neural Networks excel at predicting and generating complex high-dimensional temporal patterns. Due to their inherent nonlinear dynamics and memory, they can learn unbounded temporal dependencies from data. In a Machine Learning setting, the network's parameters are adapted during a training phase to match the requirements of a given task/problem increasing its computational capabilities. After the training, the network parameters are kept fixed to exploit the learned computations. The static parameters thereby render the network unadaptive to changing conditions, such as external or internal perturbation. In this manuscript, we demonstrate how keeping parts of the network adaptive even after the training enhances its functionality and robustness. Here, we utilize the conceptor framework and conceptualize an adaptive control loop analyzing the network's behavior continuously and adjusting its time-varying internal representation to follow a desired target. We demonstrate how the added adaptivity of the network supports the computational functionality in three distinct tasks: interpolation of temporal patterns, stabilization against partial network degradation, and robustness against input distortion. Our results highlight the potential of adaptive networks in machine learning beyond training, enabling them to not only learn complex patterns but also dynamically adjust to changing environments, ultimately broadening their applicability.
[286] arXiv:2405.07237 [pdf, other]: Title: Soft Contact Simulation and Manipulation Learning of Deformable Objects with Vision-based Tactile Sensor

Authors: Jianhua Shan, Yuhao Sun, Shixin Zhang, Fuchun Sun, Zixi Chen, Zirong Shen, Cesare Stefanini, Yiyong Yang, Shan Luo, Bin Fang

Subjects: Robotics (cs.RO)

Deformable object manipulation is a classical and challenging research area in robotics. Compared with rigid object manipulation, this problem is more complex due to the deformation properties including elastic, plastic, and elastoplastic deformation. In this paper, we describe a new deformable object manipulation method including soft contact simulation, manipulation learning, and sim-to-real transfer. We propose a novel approach utilizing Vision-Based Tactile Sensors (VBTSs) as the end-effector in simulation to produce observations like relative position, squeezed area, and object contour, which are transferable to real robots. For a more realistic contact simulation, a new simulation environment including elastic, plastic, and elastoplastic deformations is created. We utilize RL strategies to train agents in the simulation, and expert demonstrations are applied for challenging tasks. Finally, we build a real experimental platform to complete the sim-to-real transfer and achieve a 90% success rate on difficult tasks such as cylinder and sphere. To test the robustness of our method, we use plasticine of different hardness and sizes to repeat the tasks including cylinder and sphere. The experimental results show superior performances of deformable object manipulation with the proposed method.
[287] arXiv:2405.07241 [pdf, other]: Title: Case Study of Novelty, Complexity, and Adaptation in a Multicellular System

Authors: Matthew Andres Moreno, Santiago Rodriguez Papa, Charles Ofria

Subjects: Neural and Evolutionary Computing (cs.NE)

Continuing generation of novelty, complexity, and adaptation are well-established as core aspects of open-ended evolution. However, it has yet to be firmly established to what extent these phenomena are coupled and by what means they interact. In this work, we track the co-evolution of novelty, complexity, and adaptation in a case study from the DISHTINY simulation system, which is designed to study the evolution of digital multicellularity. In this case study, we describe ten qualitatively distinct multicellular morphologies, several of which exhibit asymmetrical growth and distinct life stages. We contextualize the evolutionary history of these morphologies with measurements of complexity and adaptation. Our case study suggests a loose -- sometimes divergent -- relationship can exist among novelty, complexity, and adaptation.
[288] arXiv:2405.07244 [pdf, other]: Title: Enhanced Bug Prediction in JavaScript Programs with Hybrid Call-Graph Based Invocation Metrics

Authors: Gábor Antal, Zoltán Tóth, Péter Hegedűs, Rudolf Ferenc

Subjects: Software Engineering (cs.SE)

Bug prediction aims at finding source code elements in a software system that are likely to contain defects. Being aware of the most error-prone parts of the program, one can efficiently allocate the limited amount of testing and code review resources. Therefore, bug prediction can support software maintenance and evolution to a great extent. In this paper, we propose a function level JavaScript bug prediction model based on static source code metrics with the addition of a hybrid (static and dynamic) code analysis based metric of the number of incoming and outgoing function calls (HNII and HNOI). Our motivation for this is that JavaScript is a highly dynamic scripting language for which static code analysis might be very imprecise; therefore, using a purely static source code features for bug prediction might not be enough. Based on a study where we extracted 824 buggy and 1943 non-buggy functions from the publicly available BugsJS dataset for the ESLint JavaScript project, we can confirm the positive impact of hybrid code metrics on the prediction performance of the ML models. Depending on the ML algorithm, applied hyper-parameters, and target measures we consider, hybrid invocation metrics bring a 2-10% increase in model performances (i.e., precision, recall, F-measure). Interestingly, replacing static NOI and NII metrics with their hybrid counterparts HNOI and HNII in itself improves model performances; however, using them all together yields the best results.
[289] arXiv:2405.07248 [pdf, other]: Title: Limited Ability of LLMs to Simulate Human Psychological Behaviours: a Psychometric Analysis

Authors: Nikolay B Petrov, Gregory Serapio-García, Jason Rentfrow

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Human-Computer Interaction (cs.HC)

The humanlike responses of large language models (LLMs) have prompted social scientists to investigate whether LLMs can be used to simulate human participants in experiments, opinion polls and surveys. Of central interest in this line of research has been mapping out the psychological profiles of LLMs by prompting them to respond to standardized questionnaires. The conflicting findings of this research are unsurprising given that mapping out underlying, or latent, traits from LLMs' text responses to questionnaires is no easy task. To address this, we use psychometrics, the science of psychological measurement. In this study, we prompt OpenAI's flagship models, GPT-3.5 and GPT-4, to assume different personas and respond to a range of standardized measures of personality constructs. We used two kinds of persona descriptions: either generic (four or five random person descriptions) or specific (mostly demographics of actual humans from a large-scale human dataset). We found that the responses from GPT-4, but not GPT-3.5, using generic persona descriptions show promising, albeit not perfect, psychometric properties, similar to human norms, but the data from both LLMs when using specific demographic profiles, show poor psychometrics properties. We conclude that, currently, when LLMs are asked to simulate silicon personas, their responses are poor signals of potentially underlying latent traits. Thus, our work casts doubt on LLMs' ability to simulate individual-level human behaviour across multiple-choice question answering tasks.
[290] arXiv:2405.07250 [pdf, ps, other]: Title: Towards Cloud Efficiency with Large-scale Workload Characterization

Authors: Anjaly Parayil, Jue Zhang, Xiaoting Qin, Íñigo Goiri, Lexiang Huang, Timothy Zhu, Chetan Bansal

Comments: 6 figures, 13 Tables

Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

Cloud providers introduce features (e.g., Spot VMs, Harvest VMs, and Burstable VMs) and optimizations (e.g., oversubscription, auto-scaling, power harvesting, and overclocking) to improve efficiency and reliability. To effectively utilize these features, it's crucial to understand the characteristics of workloads running in the cloud. However, workload characteristics can be complex and depend on multiple signals, making manual characterization difficult and unscalable. In this study, we conduct the first large-scale examination of first-party workloads at Microsoft to understand their characteristics. Through an empirical study, we aim to answer the following questions: (1) What are the critical workload characteristics that impact efficiency and reliability on cloud platforms? (2) How do these characteristics vary across different workloads? (3) How can cloud platforms leverage these insights to efficiently characterize all workloads at scale? This study provides a deeper understanding of workload characteristics and their impact on cloud performance, which can aid in optimizing cloud services. Additionally, it identifies potential areas for future research.
[291] arXiv:2405.07252 [pdf, ps, other]: Title: Universal Batch Learning Under The Misspecification Setting

Authors: Shlomi Vituri, Meir Feder

Subjects: Machine Learning (cs.LG); Information Theory (cs.IT)

In this paper we consider the problem of universal {\em batch} learning in a misspecification setting with log-loss. In this setting the hypothesis class is a set of models $\Theta$. However, the data is generated by an unknown distribution that may not belong to this set but comes from a larger set of models $\Phi \supset \Theta$. Given a training sample, a universal learner is requested to predict a probability distribution for the next outcome and a log-loss is incurred. The universal learner performance is measured by the regret relative to the best hypothesis matching the data, chosen from $\Theta$. Utilizing the minimax theorem and information theoretical tools, we derive the optimal universal learner, a mixture over the set of the data generating distributions, and get a closed form expression for the min-max regret. We show that this regret can be considered as a constrained version of the conditional capacity between the data and its generating distributions set. We present tight bounds for this min-max regret, implying that the complexity of the problem is dominated by the richness of the hypotheses models $\Theta$ and not by the data generating distributions set $\Phi$. We develop an extension to the Arimoto-Blahut algorithm for numerical evaluation of the regret and its capacity achieving prior distribution. We demonstrate our results for the case where the observations come from a $K$-parameters multinomial distributions while the hypothesis class $\Theta$ is only a subset of this family of distributions.
[292] arXiv:2405.07257 [pdf, other]: Title: Listen, Disentangle, and Control: Controllable Speech-Driven Talking Head Generation

Authors: Changpeng Cai, Guinan Guo, Jiao Li, Junhao Su, Chenghao He, Jing Xiao, Yuanxu Chen, Lei Dai, Feiyu Zhu

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Most earlier investigations on talking face generation have focused on the synchronization of lip motion and speech content. However, human head pose and facial emotions are equally important characteristics of natural human faces. While audio-driven talking face generation has seen notable advancements, existing methods either overlook facial emotions or are limited to specific individuals and cannot be applied to arbitrary subjects. In this paper, we propose a one-shot Talking Head Generation framework (SPEAK) that distinguishes itself from general Talking Face Generation by enabling emotional and postural control. Specifically, we introduce the Inter-Reconstructed Feature Disentanglement (IRFD) method to decouple human facial features into three latent spaces. We then design a face editing module that modifies speech content and facial latent codes into a single latent space. Subsequently, we present a novel generator that employs modified latent codes derived from the editing module to regulate emotional expression, head poses, and speech content in synthesizing facial animations. Extensive trials demonstrate that our method can generate realistic talking head with coordinated lip motions, authentic facial emotions, and smooth head movements. The demo video is available at the anonymous link: https://anonymous.4open.science/r/SPEAK-F56E
[293] arXiv:2405.07259 [pdf, other]: Title: CiMLoop: A Flexible, Accurate, and Fast Compute-In-Memory Modeling Tool

Authors: Tanner Andrulis, Joel S. Emer, Vivienne Sze

Comments: Available at this https URL Published in ISPASS 2024

Subjects: Hardware Architecture (cs.AR)

Compute-In-Memory (CiM) is a promising solution to accelerate Deep Neural Networks (DNNs) as it can avoid energy-intensive DNN weight movement and use memory arrays to perform low-energy, high-density computations. These benefits have inspired research across the CiM stack, but CiM research often focuses on only one level of the stack (i.e., devices, circuits, architecture, workload, or mapping) or only one design point (e.g., one fabricated chip). There is a need for a full-stack modeling tool to evaluate design decisions in the context of full systems (e.g., see how a circuit impacts system energy) and to perform rapid early-stage exploration of the CiM co-design space.
To address this need, we propose CiMLoop: an open-source tool to model diverse CiM systems and explore decisions across the CiM stack. CiMLoop introduces (1) a flexible specification that lets users describe, model, and map workloads to both circuits and architecture, (2) an accurate energy model that captures the interaction between DNN operand values, hardware data representations, and analog/digital values propagated by circuits, and (3) a fast statistical model that can explore the design space orders-of-magnitude more quickly than other high-accuracy models.
Using CiMLoop, researchers can evaluate design choices at different levels of the CiM stack, co-design across all levels, fairly compare different implementations, and rapidly explore the design space.
[294] arXiv:2405.07260 [pdf, ps, other]: Title: A Supervised Information Enhanced Multi-Granularity Contrastive Learning Framework for EEG Based Emotion Recognition

Authors: Xiang Li, Jian Song, Zhigang Zhao, Chunxiao Wang, Dawei Song, Bin Hu

Comments: 5 pages, 3 figures, 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Signal Processing (eess.SP)

This study introduces a novel Supervised Info-enhanced Contrastive Learning framework for EEG based Emotion Recognition (SICLEER). SI-CLEER employs multi-granularity contrastive learning to create robust EEG contextual representations, potentiallyn improving emotion recognition effectiveness. Unlike existing methods solely guided by classification loss, we propose a joint learning model combining self-supervised contrastive learning loss and supervised classification loss. This model optimizes both loss functions, capturing subtle EEG signal differences specific to emotion detection. Extensive experiments demonstrate SI-CLEER's robustness and superior accuracy on the SEED dataset compared to state-of-the-art methods. Furthermore, we analyze electrode performance, highlighting the significance of central frontal and temporal brain region EEGs in emotion detection. This study offers an universally applicable approach with potential benefits for diverse EEG classification tasks.
[295] arXiv:2405.07263 [pdf, ps, other]: Title: Span-Aggregatable, Contextualized Word Embeddings for Effective Phrase Mining

Authors: Eyal Orbach, Lev Haikin, Nelly David, Avi Faizakof

Subjects: Computation and Language (cs.CL)

Dense vector representations for sentences made significant progress in recent years as can be seen on sentence similarity tasks. Real-world phrase retrieval applications, on the other hand, still encounter challenges for effective use of dense representations. We show that when target phrases reside inside noisy context, representing the full sentence with a single dense vector, is not sufficient for effective phrase retrieval. We therefore look into the notion of representing multiple, sub-sentence, consecutive word spans, each with its own dense vector. We show that this technique is much more effective for phrase mining, yet requires considerable compute to obtain useful span representations. Accordingly, we make an argument for contextualized word/token embeddings that can be aggregated for arbitrary word spans while maintaining the span's semantic meaning. We introduce a modification to the common contrastive loss used for sentence embeddings that encourages word embeddings to have this property. To demonstrate the effect of this method we present a dataset based on the STS-B dataset with additional generated text, that requires finding the best matching paraphrase residing in a larger context and report the degree of similarity to the origin phrase. We demonstrate on this dataset, how our proposed method can achieve better results without significant increase to compute.
[296] arXiv:2405.07264 [pdf, other]: Title: Information Rates Over Multi-View Channels

Authors: V. Arvind Rameshwar, Nir Weinberger

Comments: 33 pages, 1 figure, submitted to the IEEE

Subjects: Information Theory (cs.IT)

We investigate the fundamental limits of reliable communication over multi-view channels, in which the channel output is comprised of a large number of independent noisy views of a transmitted symbol. We consider first the setting of multi-view discrete memoryless channels and then extend our results to general multi-view channels (using multi-letter formulas). We argue that the channel capacity and dispersion of such multi-view channels converge exponentially fast in the number of views to the entropy and varentropy of the input distribution, respectively. We identify the exact rate of convergence as the smallest Chernoff information between two conditional distributions of the output, conditioned on unequal inputs. For the special case of the deletion channel, we compute upper bounds on this Chernoff information. Finally, we present a new channel model we term the Poisson approximation channel -- of possible independent interest -- whose capacity closely approximates the capacity of the multi-view binary symmetric channel for any fixed number of views.
[297] arXiv:2405.07265 [pdf, other]: Title: An Approach for Decentralized Authentication in Networks of UAVs

Authors: Nicholas Jäger, Andreas Aßmuth

Comments: 5 pages

Journal-ref: Proc of the 12th International Conference on Cloud Computing, GRIDs, and Virtualization (Cloud Computing 2021), Porto Portugal, April 2021, pp. 13-17, ISSN 2308-4294

Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Cryptography and Security (cs.CR)

We propose a decentralized authentication system for networks of unmanned aerial vehicles. A blockchain-based public key infrastructure allows the usage of public key cryptography and public key based authentication protocols. The blockchain provides a common storage of the public keys and their relations and can provide the required information for the authentication process. Furthermore, the unmanned aerial vehicles store selected parts of the blockchain in order to operate independently in areas where they might not have access to the Internet. This allows unmanned aerial vehicles to authenticate entities of the network, like other unmanned aerial vehicles, cloud services, cars, and any computer.
[298] arXiv:2405.07266 [pdf, other]: Title: Architecture-Level Modeling of Photonic Deep Neural Network Accelerators

Authors: Tanner Andrulis, Gohar Irfan Chaudhry, Vinith M. Suriyakumar, Joel S. Emer, Vivienne Sze

Comments: Published at ISPASS 2024

Subjects: Emerging Technologies (cs.ET); Hardware Architecture (cs.AR)

Photonics is a promising technology to accelerate Deep Neural Networks as it can use optical interconnects to reduce data movement energy and it enables low-energy, high-throughput optical-analog computations.
To realize these benefits in a full system (accelerator + DRAM), designers must ensure that the benefits of using the electrical, optical, analog, and digital domains exceed the costs of converting data between domains. Designers must also consider system-level energy costs such as data fetch from DRAM. Converting data and accessing DRAM can consume significant energy, so to evaluate and explore the photonic system space, there is a need for a tool that can model these full-system considerations.
In this work, we show that similarities between Compute-in-Memory (CiM) and photonics let us use CiM system modeling tools to accurately model photonics systems. Bringing modeling tools to photonics enables evaluation of photonic research in a full-system context, rapid design space exploration, co-design, and comparison between systems.
Using our open-source model, we show that cross-domain conversion and DRAM can consume a significant portion of photonic system energy. We then demonstrate optimizations that reduce conversions and DRAM accesses to improve photonic system energy efficiency by up to 3x.
[299] arXiv:2405.07267 [pdf, other]: Title: Fields, Bridges, and Foundations: How Researchers Browse Citation Network Visualizations

Authors: Kiroong Choe, Eunhye Kim, Sangwon Park, Jinwook Seo

Subjects: Human-Computer Interaction (cs.HC)

Visualizing citation relations with network structures is widely used, but the visual complexity can make it challenging for individual researchers to navigate through them. We collected data from 18 researchers using an interface that we designed using network simplification methods and analyzed how users browsed and identified important papers. Our analysis reveals six major patterns used for identifying papers of interest, which can be categorized into three key components: Fields, Bridges, and Foundations, each viewed from two distinct perspectives: layout-oriented and connection-oriented. The connection-oriented approach was found to be more effective for selecting relevant papers, but the layout-oriented method was adopted more often, even though it led to unexpected results and user frustration. Our findings emphasize the importance of integrating these components and the necessity to balance visual layouts with meaningful connections to enhance the effectiveness of citation networks in academic browsing systems.
[300] arXiv:2405.07272 [pdf, ps, other]: Title: MAML MOT: Multiple Object Tracking based on Meta-Learning

Authors: Jiayi Chen, Chunhua Deng

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

With the advancement of video analysis technology, the multi-object tracking (MOT) problem in complex scenes involving pedestrians is gaining increasing importance. This challenge primarily involves two key tasks: pedestrian detection and re-identification. While significant progress has been achieved in pedestrian detection tasks in recent years, enhancing the effectiveness of re-identification tasks remains a persistent challenge. This difficulty arises from the large total number of pedestrian samples in multi-object tracking datasets and the scarcity of individual instance samples. Motivated by recent rapid advancements in meta-learning techniques, we introduce MAML MOT, a meta-learning-based training approach for multi-object tracking. This approach leverages the rapid learning capability of meta-learning to tackle the issue of sample scarcity in pedestrian re-identification tasks, aiming to improve the model's generalization performance and robustness. Experimental results demonstrate that the proposed method achieves high accuracy on mainstream datasets in the MOT Challenge. This offers new perspectives and solutions for research in the field of pedestrian multi-object tracking.
[301] arXiv:2405.07274 [pdf, other]: Title: Timely Offloading in Mobile Edge Cloud Systems

Authors: Nitya Sathyavageeswaran, Roy D. Yates, Anand D. Sarwate, Narayan Mandayam

Subjects: Systems and Control (eess.SY)

Future real-time applications like smart cities will use complex Machine Learning (ML) models for a variety of tasks. Timely status information is required for these applications to be reliable. Offloading computation to a mobile edge cloud (MEC) can reduce the completion time of these tasks. However, using the MEC may come at a cost such as related to use of a cloud service or privacy. In this paper, we consider a source that generates time-stamped status updates for delivery to a monitor after processing by the mobile device or MEC. We study how a scheduler must forward these updates to achieve timely updates at the monitor but also limit MEC usage. We measure timeliness at the monitor using the age of information (AoI) metric. We formulate this problem as an infinite horizon Markov decision process (MDP) with an average cost criterion. We prove that an optimal scheduling policy has an age-threshold structure that depends on how long an update has been in service.
[302] arXiv:2405.07275 [pdf, other]: Title: Distribution-Preserving Integrated Sensing and Communication with Secure Reconstruction

Authors: Yiqi Chen, Tobias Oechtering, Holger Boche, Mikael Skoglund, Yuan Luo

Comments: Accepted by ISIT2024

Subjects: Information Theory (cs.IT)

Distribution-preserving integrated sensing and communication with secure reconstruction is investigated in this paper. In addition to the distortion constraint, we impose another constraint on the distance between the reconstructed sequence distribution and the original state distribution to force the system to preserve the statistical property of the channel states. An inner bound of the distribution-preserving capacity-distortion region is provided with some capacity region results under special cases. A numerical example demonstrates the tradeoff between the communication rate, reconstruction distortion and distribution preservation. Furthermore, we consider the case that the reconstructed sequence should be kept secret from an eavesdropper who also observes the channel output. An inner bound of the tradeoff region and a capacity-achieving special case are presented.
[303] arXiv:2405.07277 [pdf, ps, other]: Title: Mining Influential Spreaders in Complex Networks by an Effective Combination of the Degree and K-Shell

Authors: Shima Esfandiari, Seyed Mostafa Fakhrahmad

Comments: 6 page, In 2024 20th CSI International Symposium on Artificial Intelligence and Signal Processing (AISP), pp. 1-6. IEEE, 2024

Subjects: Social and Information Networks (cs.SI)

Graph mining is an important technique that used in many applications such as predicting and understanding behaviors and information dissemination within networks. One crucial aspect of graph mining is the identification and ranking of influential nodes, which has applications in various fields including marketing, social communications, and disease control. However, existing models and methods come with high computational complexity and may not accurately distinguish and identify influential nodes. This paper develops a method based on the k-shell index and degree centrality of nodes and their neighbors. Comparisons to previous works, such as Degree and Neighborhood information Centrality (DNC) and Neighborhood and Path Information Centrality (NPIC), are conducted. The evaluations, which include the correctness with Kendall's Tau, resolution with monotonicity index, correlation plots, and time complexity, demonstrate its superior results.
[304] arXiv:2405.07278 [pdf, other]: Title: Human-interpretable clustering of short-text using large language models

Authors: Justin K. Miller, Tristram J. Alexander

Comments: Main text: 18 pages, 8 figures. Supplementary: 21 pages, 15 figures, 3 tables

Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

Large language models have seen extraordinary growth in popularity due to their human-like content generation capabilities. We show that these models can also be used to successfully cluster human-generated content, with success defined through the measures of distinctiveness and interpretability. This success is validated by both human reviewers and ChatGPT, providing an automated means to close the 'validation gap' that has challenged short-text clustering. Comparing the machine and human approaches we identify the biases inherent in each, and question the reliance on human-coding as the 'gold standard'. We apply our methodology to Twitter bios and find characteristic ways humans describe themselves, agreeing well with prior specialist work, but with interesting differences characteristic of the medium used to express identity.
[305] arXiv:2405.07280 [pdf, other]: Title: Humor Mechanics: Advancing Humor Generation with Multistep Reasoning

Authors: Alexey Tikhonov, Pavel Shtykovskiy

Comments: ICCC 2024

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Human-Computer Interaction (cs.HC)

In this paper, we explore the generation of one-liner jokes through multi-step reasoning. Our work involved reconstructing the process behind creating humorous one-liners and developing a working prototype for humor generation. We conducted comprehensive experiments with human participants to evaluate our approach, comparing it with human-created jokes, zero-shot GPT-4 generated humor, and other baselines. The evaluation focused on the quality of humor produced, using human labeling as a benchmark. Our findings demonstrate that the multi-step reasoning approach consistently improves the quality of generated humor. We present the results and share the datasets used in our experiments, offering insights into enhancing humor generation with artificial intelligence.
[306] arXiv:2405.07282 [pdf, other]: Title: Branching Narratives: Character Decision Points Detection

Authors: Alexey Tikhonov

Comments: GamesAndNLP @ LREC COLING 2024

Subjects: Computation and Language (cs.CL)

This paper presents the Character Decision Points Detection (CHADPOD) task, a task of identification of points within narratives where characters make decisions that may significantly influence the story's direction. We propose a novel dataset based on CYOA-like games graphs to be used as a benchmark for such a task. We provide a comparative analysis of different models' performance on this task, including a couple of LLMs and several MLMs as baselines, achieving up to 89% accuracy. This underscores the complexity of narrative analysis, showing the challenges associated with understanding character-driven story dynamics. Additionally, we show how such a model can be applied to the existing text to produce linear segments divided by potential branching points, demonstrating the practical application of our findings in narrative analysis.
[307] arXiv:2405.07283 [pdf, other]: Title: BeautyMap: Binary-Encoded Adaptable Ground Matrix for Dynamic Points Removal in Global Maps

Authors: Mingkai Jia, Qingwen Zhang, Bowen Yang, Jin Wu, Ming Liu, Patric Jensfelt

Comments: The first two authors are co-first authors. 8 pages, accepted by RA-L

Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)

Global point clouds that correctly represent the static environment features can facilitate accurate localization and robust path planning. However, dynamic objects introduce undesired ghost tracks that are mixed up with the static environment. Existing dynamic removal methods normally fail to balance the performance in computational efficiency and accuracy. In response, we present BeautyMap to efficiently remove the dynamic points while retaining static features for high-fidelity global maps. Our approach utilizes a binary-encoded matrix to efficiently extract the environment features. With a bit-wise comparison between matrices of each frame and the corresponding map region, we can extract potential dynamic regions. Then we use coarse to fine hierarchical segmentation of the $z$-axis to handle terrain variations. The final static restoration module accounts for the range-visibility of each single scan and protects static points out of sight. Comparative experiments underscore BeautyMap's superior performance in both accuracy and efficiency against other dynamic points removal methods. The code is open-sourced at https://github.com/MKJia/BeautyMap.
[308] arXiv:2405.07284 [pdf, ps, other]: Title: Zero Shot Context-Based Object Segmentation using SLIP (SAM+CLIP)

Authors: Saaketh Koundinya Gundavarapu, Arushi Arora, Shreya Agarwal

Comments: 5 pages, 3 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV)

We present SLIP (SAM+CLIP), an enhanced architecture for zero-shot object segmentation. SLIP combines the Segment Anything Model (SAM) \cite{kirillov2023segment} with the Contrastive Language-Image Pretraining (CLIP) \cite{radford2021learning}. By incorporating text prompts into SAM using CLIP, SLIP enables object segmentation without prior training on specific classes or categories. We fine-tune CLIP on a Pokemon dataset, allowing it to learn meaningful image-text representations. SLIP demonstrates the ability to recognize and segment objects in images based on contextual information from text prompts, expanding the capabilities of SAM for versatile object segmentation. Our experiments demonstrate the effectiveness of the SLIP architecture in segmenting objects in images based on textual cues. The integration of CLIP's text-image understanding capabilities into SAM expands the capabilities of the original architecture and enables more versatile and context-aware object segmentation.
[309] arXiv:2405.07288 [pdf, other]: Title: Erasing Concepts from Text-to-Image Diffusion Models with Few-shot Unlearning

Authors: Masane Fuchi, Tomohiro Takagi

Comments: 23 pages, 28 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Generating images from text has become easier because of the scaling of diffusion models and advancements in the field of vision and language. These models are trained using vast amounts of data from the Internet. Hence, they often contain undesirable content such as copyrighted material. As it is challenging to remove such data and retrain the models, methods for erasing specific concepts from pre-trained models have been investigated. We propose a novel concept-erasure method that updates the text encoder using few-shot unlearning in which a few real images are used. The discussion regarding the generated images after erasing a concept has been lacking. While there are methods for specifying the transition destination for concepts, the validity of the specified concepts is unclear. Our method implicitly achieves this by transitioning to the latent concepts inherent in the model or the images. Our method can erase a concept within 10 s, making concept erasure more accessible than ever before. Implicitly transitioning to related concepts leads to more natural concept erasure. We applied the proposed method to various concepts and confirmed that concept erasure can be achieved tens to hundreds of times faster than with current methods. By varying the parameters to be updated, we obtained results suggesting that, like previous research, knowledge is primarily accumulated in the feed-forward networks of the text encoder.
[310] arXiv:2405.07291 [pdf, other]: Title: Robust Beamforming with Gradient-based Liquid Neural Network

Authors: Xinquan Wang, Fenghao Zhu, Chongwen Huang, Ahmed Alhammadi, Faouzi Bader, Zhaoyang Zhang, Chau Yuen, Merouane Debbah

Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

Millimeter-wave (mmWave) multiple-input multiple-output (MIMO) communication with the advanced beamforming technologies is a key enabler to meet the growing demands of future mobile communication. However, the dynamic nature of cellular channels in large-scale urban mmWave MIMO communication scenarios brings substantial challenges, particularly in terms of complexity and robustness. To address these issues, we propose a robust gradient-based liquid neural network (GLNN) framework that utilizes ordinary differential equation-based liquid neurons to solve the beamforming problem. Specifically, our proposed GLNN framework takes gradients of the optimization objective function as inputs to extract the high-order channel feature information, and then introduces a residual connection to mitigate the training burden. Furthermore, we use the manifold learning technique to compress the search space of the beamforming problem. These designs enable the GLNN to effectively maintain low complexity while ensuring strong robustness to noisy and highly dynamic channels. Extensive simulation results demonstrate that the GLNN can achieve 4.15% higher spectral efficiency than that of typical iterative algorithms, and reduce the time consumption to only 1.61% that of conventional methods.
[311] arXiv:2405.07293 [pdf, other]: Title: Sparse Sampling is All You Need for Fast Wrong-way Cycling Detection in CCTV Videos

Authors: Jing Xu, Wentao Shi, Sheng Ren, Pan Gao, Peng Zhou, Jie Qin

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

In the field of transportation, it is of paramount importance to address and mitigate illegal actions committed by both motor and non-motor vehicles. Among those actions, wrong-way cycling (i.e., riding a bicycle or e-bike in the opposite direction of the designated traffic flow) poses significant risks to both cyclists and other road users. To this end, this paper formulates a problem of detecting wrong-way cycling ratios in CCTV videos. Specifically, we propose a sparse sampling method called WWC-Predictor to efficiently solve this problem, addressing the inefficiencies of direct tracking methods. Our approach leverages both detection-based information, which utilizes the information from bounding boxes, and orientation-based information, which provides insights into the image itself, to enhance instantaneous information capture capability. On our proposed benchmark dataset consisting of 35 minutes of video sequences and minute-level annotation, our method achieves an average error rate of a mere 1.475% while taking only 19.12% GPU time of straightforward tracking methods under the same detection model. This remarkable performance demonstrates the effectiveness of our approach in identifying and predicting instances of wrong-way cycling.
[312] arXiv:2405.07305 [pdf, other]: Title: Finding a Way Through the Social Media Labyrinth: Guiding Design Through User Expectations

Authors: Thomas Mildner, Gian-Luca Savino, Susanne Putze, Rainer Malaka

Comments: This is the author-version of this work. The paper is submitted as a draft, as it is still under review, but was published at arXiv in support of a cumulative dissertation

Subjects: Human-Computer Interaction (cs.HC)

Social networking services (SNS) have become integral to modern life to create and maintain meaningful relationships. Nevertheless, their historic growth of features has led to labyrinthine user interfaces (UIs) that often result in frustration among users - for instance, when trying to control privacy-related settings. This paper aims to mitigate labyrinthine UIs by studying users' expectations (N=21) through an online card sorting exercise based on 58 common SNS UI features, teaching us about their expectations regarding the importance of specific UI features and the frequency with which they use them. Our findings offer a valuable understanding of the relationship between the importance and frequency of UI features and provide design considerations for six identified UI feature groups. Through these findings, we inform the design and development of user-centred alternatives to current SNS interfaces that enable users to successfully navigate SNS and feel in control over their data by meeting their expectations.
[313] arXiv:2405.07306 [pdf, other]: Title: Point Resampling and Ray Transformation Aid to Editable NeRF Models

Authors: Zhenyang Li, Zilong Chen, Feifan Qu, Mingqing Wang, Yizhou Zhao, Kai Zhang, Yifan Peng

Subjects: Computer Vision and Pattern Recognition (cs.CV)

In NeRF-aided editing tasks, object movement presents difficulties in supervision generation due to the introduction of variability in object positions. Moreover, the removal operations of certain scene objects often lead to empty regions, presenting challenges for NeRF models in inpainting them effectively. We propose an implicit ray transformation strategy, allowing for direct manipulation of the 3D object's pose by operating on the neural-point in NeRF rays. To address the challenge of inpainting potential empty regions, we present a plug-and-play inpainting module, dubbed differentiable neural-point resampling (DNR), which interpolates those regions in 3D space at the original ray locations within the implicit space, thereby facilitating object removal & scene inpainting tasks. Importantly, employing DNR effectively narrows the gap between ground truth and predicted implicit features, potentially increasing the mutual information (MI) of the features across rays. Then, we leverage DNR and ray transformation to construct a point-based editable NeRF pipeline PR^2T-NeRF. Results primarily evaluated on 3D object removal & inpainting tasks indicate that our pipeline achieves state-of-the-art performance. In addition, our pipeline supports high-quality rendering visualization for diverse editing operations without necessitating extra supervision.
[314] arXiv:2405.07309 [pdf, other]: Title: DiffGen: Robot Demonstration Generation via Differentiable Physics Simulation, Differentiable Rendering, and Vision-Language Model

Authors: Yang Jin, Jun Lv, Shuqiang Jiang, Cewu Lu

Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Generating robot demonstrations through simulation is widely recognized as an effective way to scale up robot data. Previous work often trained reinforcement learning agents to generate expert policies, but this approach lacks sample efficiency. Recently, a line of work has attempted to generate robot demonstrations via differentiable simulation, which is promising but heavily relies on reward design, a labor-intensive process. In this paper, we propose DiffGen, a novel framework that integrates differentiable physics simulation, differentiable rendering, and a vision-language model to enable automatic and efficient generation of robot demonstrations. Given a simulated robot manipulation scenario and a natural language instruction, DiffGen can generate realistic robot demonstrations by minimizing the distance between the embedding of the language instruction and the embedding of the simulated observation after manipulation. The embeddings are obtained from the vision-language model, and the optimization is achieved by calculating and descending gradients through the differentiable simulation, differentiable rendering, and vision-language model components, thereby accomplishing the specified task. Experiments demonstrate that with DiffGen, we could efficiently and effectively generate robot data with minimal human effort or training time.
[315] arXiv:2405.07310 [pdf, other]: Title: Machine Learning-Based Protection and Fault Identification of 100% Inverter-Based Microgrids

Authors: Milad Beikbabaei, Michael Lindemann, Mohammad Heidari Kapourchali, Ali Mehrizi-Sani

Comments: Accepted for publication at 2024 IEEE 33rd International Symposium on Industrial Electronics (ISIE)

Subjects: Systems and Control (eess.SY)

100% inverter-based renewable units are becoming more prevalent, introducing new challenges in the protection of microgrids that incorporate these resources. This is particularly due to low fault currents and bidirectional flows. Previous work has studied the protection of microgrids with high penetration of inverter-interfaced distributed generators; however, very few have studied the protection of a 100% inverter-based microgrid. This work proposes machine learning (ML)-based protection solutions using local electrical measurements that consider implementation challenges and effectively combine short-circuit fault detection and type identification. A decision tree method is used to analyze a wide range of fault scenarios. PSCAD/EMTDC simulation environment is used to create a dataset for training and testing the proposed method. The effectiveness of the proposed methods is examined under seven distinct fault types, each featuring varying fault resistance, in a 100% inverter-based microgrid consisting of four inverters.
[316] arXiv:2405.07311 [pdf, other]: Title: A Compact Delay Model for OTS Devices

Authors: M. M. Al Chawa, R. Tetzlaff, D. Bedau, J. W. Reiner, D. A. Stewart, M. K. Grobis

Subjects: Systems and Control (eess.SY); Signal Processing (eess.SP)

This paper presents a novel compact delay model of Ovonic Threshold Switch (OTS) devices that works efficiently for circuit simulations. The internal state variable of the two terminal devices is estimated using a delay system that uses a few electrical components related to a suggested equivalent circuit of the device. Finally, we tested the proposed model against measured data from devices fabricated by Western Digital Research.
[317] arXiv:2405.07312 [pdf, other]: Title: Nonparametric Control-Koopman Operator Learning: Flexible and Scalable Models for Prediction and Control

Authors: Petar Bevanda, Bas Driessen, Lucian Cristian Iacob, Roland Toth, Stefan Sosnowski, Sandra Hirche

Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG)

Linearity of Koopman operators and simplicity of their estimators coupled with model-reduction capabilities has lead to their great popularity in applications for learning dynamical systems. While nonparametric Koopman operator learning in infinite-dimensional reproducing kernel Hilbert spaces is well understood for autonomous systems, its control system analogues are largely unexplored. Addressing systems with control inputs in a principled manner is crucial for fully data-driven learning of controllers, especially since existing approaches commonly resort to representational heuristics or parametric models of limited expressiveness and scalability. We address the aforementioned challenge by proposing a universal framework via control-affine reproducing kernels that enables direct estimation of a single operator even for control systems. The proposed approach, called control-Koopman operator regression (cKOR), is thus completely analogous to Koopman operator regression of the autonomous case. First in the literature, we present a nonparametric framework for learning Koopman operator representations of nonlinear control-affine systems that does not suffer from the curse of control input dimensionality. This allows for reformulating the infinite-dimensional learning problem in a finite-dimensional space based solely on data without apriori loss of precision due to a restriction to a finite span of functions or inputs as in other approaches. For enabling applications to large-scale control systems, we also enhance the scalability of control-Koopman operator estimators by leveraging random projections (sketching). The efficacy of our novel cKOR approach is demonstrated on both forecasting and control tasks.
[318] arXiv:2405.07314 [pdf, other]: Title: Learnable Tokenizer for LLM-based Generative Recommendation

Authors: Wenjie Wang, Honghui Bao, Xinyu Lin, Jizhi Zhang, Yongqi Li, Fuli Feng, See-Kiong Ng, Tat-Seng Chua

Subjects: Information Retrieval (cs.IR)

Harnessing Large Language Models (LLMs) for generative recommendation has garnered significant attention due to LLMs' powerful capacities such as rich world knowledge and reasoning. However, a critical challenge lies in transforming recommendation data into the language space of LLMs through effective item tokenization. Existing approaches, such as ID identifiers, textual identifiers, and codebook-based identifiers, exhibit limitations in encoding semantic information, incorporating collaborative signals, or handling code assignment bias. To address these shortcomings, we propose LETTER (a LEarnable Tokenizer for generaTivE Recommendation), designed to meet the key criteria of identifiers by integrating hierarchical semantics, collaborative signals, and code assignment diversity. LETTER integrates Residual Quantized VAE for semantic regularization, a contrastive alignment loss for collaborative regularization, and a diversity loss to mitigate code assignment bias. We instantiate LETTER within two generative recommender models and introduce a ranking-guided generation loss to enhance their ranking ability. Extensive experiments across three datasets demonstrate the superiority of LETTER in item tokenization, thereby advancing the state-of-the-art in the field of generative recommendation.
[319] arXiv:2405.07316 [pdf, other]: Title: VALID: a Validated Algorithm for Learning in Decentralized Networks with Possible Adversarial Presence

Authors: Mayank Bakshi, Sara Ghasvarianjahromi, Yauhen Yakimenka, Allison Beemer, Oliver Kosut, Joerg Kliewer

Comments: This is an extended version of the paper at International Symposium on Information Theory 2024

Subjects: Machine Learning (cs.LG); Information Theory (cs.IT)

We introduce the paradigm of validated decentralized learning for undirected networks with heterogeneous data and possible adversarial infiltration. We require (a) convergence to a global empirical loss minimizer when adversaries are absent, and (b) either detection of adversarial presence of convergence to an admissible consensus irrespective of the adversarial configuration. To this end, we propose the VALID protocol which, to the best of our knowledge, is the first to achieve a validated learning guarantee. Moreover, VALID offers an O(1/T) convergence rate (under pertinent regularity assumptions), and computational and communication complexities comparable to non-adversarial distributed stochastic gradient descent. Remarkably, VALID retains optimal performance metrics in adversary-free environments, sidestepping the robustness penalties observed in prior byzantine-robust methods. A distinctive aspect of our study is a heterogeneity metric based on the norms of individual agents' gradients computed at the global empirical loss minimizer. This not only provides a natural statistic for detecting significant byzantine disruptions but also allows us to prove the optimality of VALID in wide generality. Lastly, our numerical results reveal that, in the absence of adversaries, VALID converges faster than state-of-the-art byzantine robust algorithms, while when adversaries are present, VALID terminates with each honest either converging to an admissible consensus of declaring adversarial presence in the network.
[320] arXiv:2405.07317 [pdf, other]: Title: Machine Unlearning in Contrastive Learning

Authors: Zixin Wang, Kongyang Chen

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Machine unlearning is a complex process that necessitates the model to diminish the influence of the training data while keeping the loss of accuracy to a minimum. Despite the numerous studies on machine unlearning in recent years, the majority of them have primarily focused on supervised learning models, leaving research on contrastive learning models relatively underexplored. With the conviction that self-supervised learning harbors a promising potential, surpassing or rivaling that of supervised learning, we set out to investigate methods for machine unlearning centered around contrastive learning models. In this study, we introduce a novel gradient constraint-based approach for training the model to effectively achieve machine unlearning. Our method only necessitates a minimal number of training epochs and the identification of the data slated for unlearning. Remarkably, our approach demonstrates proficient performance not only on contrastive learning models but also on supervised learning models, showcasing its versatility and adaptability in various learning paradigms.
[321] arXiv:2405.07319 [pdf, other]: Title: LayGA: Layered Gaussian Avatars for Animatable Clothing Transfer

Authors: Siyou Lin, Zhe Li, Zhaoqi Su, Zerong Zheng, Hongwen Zhang, Yebin Liu

Comments: SIGGRAPH 2024 conference track

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Animatable clothing transfer, aiming at dressing and animating garments across characters, is a challenging problem. Most human avatar works entangle the representations of the human body and clothing together, which leads to difficulties for virtual try-on across identities. What's worse, the entangled representations usually fail to exactly track the sliding motion of garments. To overcome these limitations, we present Layered Gaussian Avatars (LayGA), a new representation that formulates body and clothing as two separate layers for photorealistic animatable clothing transfer from multi-view videos. Our representation is built upon the Gaussian map-based avatar for its excellent representation power of garment details. However, the Gaussian map produces unstructured 3D Gaussians distributed around the actual surface. The absence of a smooth explicit surface raises challenges in accurate garment tracking and collision handling between body and garments. Therefore, we propose two-stage training involving single-layer reconstruction and multi-layer fitting. In the single-layer reconstruction stage, we propose a series of geometric constraints to reconstruct smooth surfaces and simultaneously obtain the segmentation between body and clothing. Next, in the multi-layer fitting stage, we train two separate models to represent body and clothing and utilize the reconstructed clothing geometries as 3D supervision for more accurate garment tracking. Furthermore, we propose geometry and rendering layers for both high-quality geometric reconstruction and high-fidelity rendering. Overall, the proposed LayGA realizes photorealistic animations and virtual try-on, and outperforms other baseline methods. Our project page is https://jsnln.github.io/layga/index.html.
[322] arXiv:2405.07320 [pdf, other]: Title: L(u)PIN: LLM-based Political Ideology Nowcasting

Authors: Ken Kato, Annabelle Purnomo, Christopher Cochrane, Raeid Saqur

Subjects: Computation and Language (cs.CL)

The quantitative analysis of political ideological positions is a difficult task. In the past, various literature focused on parliamentary voting data of politicians, party manifestos and parliamentary speech to estimate political disagreement and polarization in various political systems. However previous methods of quantitative political analysis suffered from a common challenge which was the amount of data available for analysis. Also previous methods frequently focused on a more general analysis of politics such as overall polarization of the parliament or party-wide political ideological positions. In this paper, we present a method to analyze ideological positions of individual parliamentary representatives by leveraging the latent knowledge of LLMs. The method allows us to evaluate the stance of politicians on an axis of our choice allowing us to flexibly measure the stance of politicians in regards to a topic/controversy of our choice. We achieve this by using a fine-tuned BERT classifier to extract the opinion-based sentences from the speeches of representatives and projecting the average BERT embeddings for each representative on a pair of reference seeds. These reference seeds are either manually chosen representatives known to have opposing views on a particular topic or they are generated sentences which where created using the GPT-4 model of OpenAI. We created the sentences by prompting the GPT-4 model to generate a speech that would come from a politician defending a particular position.
[323] arXiv:2405.07324 [pdf, other]: Title: QACM: QoS-Aware xApp Conflict Mitigation in Open RAN

Authors: Abdul Wadud, Fatemeh Golpayegani, Nima Afraz

Subjects: Networking and Internet Architecture (cs.NI)

The advent of Open Radio Access Network (RAN) has revolutionized the field of RAN by introducing elements of native support of intelligence and openness into the next generation of mobile network infrastructure. Open RAN paves the way for standardized interfaces and enables the integration of network applications from diverse vendors, thereby enhancing network management flexibility. However, control decision conflicts occur when components from different vendors are deployed together. This article provides an overview of various types of conflicts that may occur in Open RAN, with a particular focus on intra-component conflict mitigation among Extended Applications (xApps) in the Near Real Time RAN Intelligent Controller (Near-RT-RIC). A QoS-Aware Conflict Mitigation (QACM) method is proposed that finds the optimal configuration of conflicting parameters while maximizing the number of xApps that have their Quality of Service (QoS) requirements met. We compare the performance of the proposed QACM method with two benchmark methods for priority and non-priority cases. The results indicate that our proposed method is the most effective in maintaining QoS requirements for conflicting xApps.
[324] arXiv:2405.07326 [pdf, other]: Title: Power Evaluation of IOT Application Layer Protocols

Authors: Amirhossein Shahrokhi, Mahmood Ahmadi

Subjects: Networking and Internet Architecture (cs.NI)

The Internet of Things has affected all aspects of daily life, and the number of IoT devices is increasing day by day. According to forecasts, the number of Internet of Things devices will reach one trillion devices by 2035. The increase in the number of devices connected to the Internet will cause various concerns. One of the most important concerns is the energy and power consumption of these devices. Although Internet of Things modules are low in energy consumption, their widespread and large-scale use has made the issue of power consumption become the most important challenge in this field. For this reason, it is necessary to use communication protocols that, in addition to establishing efficient communication, impose minimal power consumption on the network. In this paper, application layer protocols such as MQTT, MQTT-SN, CoAP, and HTTP are simulated using the tools available in the Contiki operating system, including COOJA and Powertrace, and they { are evaluated} and compared with each other in terms of power consumption. According to the simulations performed by the mentioned tools, the MQTT-SN protocol was the least consuming protocol in terms of power consumption. After that, the CoAP protocol is placed, and with a slight difference, the MQTT protocol, which consumes more than MQTT-SN. Finally, the HTTP protocol consumes the most power, which makes it unsuitable for communication in the Internet of Things
[325] arXiv:2405.07327 [pdf, other]: Title: Liquid Ensemble Selection for Continual Learning

Authors: Carter Blair, Ben Armstrong, Kate Larson

Comments: Accepted at Canadian AI Conference 2024

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Continual learning aims to enable machine learning models to continually learn from a shifting data distribution without forgetting what has already been learned. Such shifting distributions can be broken into disjoint subsets of related examples; by training each member of an ensemble on a different subset it is possible for the ensemble as a whole to achieve much higher accuracy with less forgetting than a naive model. We address the problem of selecting which models within an ensemble should learn on any given data, and which should predict. By drawing on work from delegative voting we develop an algorithm for using delegation to dynamically select which models in an ensemble are active. We explore a variety of delegation methods and performance metrics, ultimately finding that delegation is able to provide a significant performance boost over naive learning in the face of distribution shifts.
[326] arXiv:2405.07331 [pdf, other]: Title: Stochastic Bandits with ReLU Neural Networks

Authors: Kan Xu, Hamsa Bastani, Surbhi Goel, Osbert Bastani

Subjects: Machine Learning (cs.LG); Data Structures and Algorithms (cs.DS); Machine Learning (stat.ML)

We study the stochastic bandit problem with ReLU neural network structure. We show that a $\tilde{O}(\sqrt{T})$ regret guarantee is achievable by considering bandits with one-layer ReLU neural networks; to the best of our knowledge, our work is the first to achieve such a guarantee. In this specific setting, we propose an OFU-ReLU algorithm that can achieve this upper bound. The algorithm first explores randomly until it reaches a linear regime, and then implements a UCB-type linear bandit algorithm to balance exploration and exploitation. Our key insight is that we can exploit the piecewise linear structure of ReLU activations and convert the problem into a linear bandit in a transformed feature space, once we learn the parameters of ReLU relatively accurately during the exploration stage. To remove dependence on model parameters, we design an OFU-ReLU+ algorithm based on a batching strategy, which can provide the same theoretical guarantee.
[327] arXiv:2405.07332 [pdf, other]: Title: PotatoGANs: Utilizing Generative Adversarial Networks, Instance Segmentation, and Explainable AI for Enhanced Potato Disease Identification and Classification

Authors: Mohammad Shafiul Alam, Fatema Tuj Johora Faria, Mukaffi Bin Moin, Ahmed Al Wase, Md. Rabius Sani, Khan Md Hasib

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Numerous applications have resulted from the automation of agricultural disease segmentation using deep learning techniques. However, when applied to new conditions, these applications frequently face the difficulty of overfitting, resulting in lower segmentation performance. In the context of potato farming, where diseases have a large influence on yields, it is critical for the agricultural economy to quickly and properly identify these diseases. Traditional data augmentation approaches, such as rotation, flip, and translation, have limitations and frequently fail to provide strong generalization results. To address these issues, our research employs a novel approach termed as PotatoGANs. In this novel data augmentation approach, two types of Generative Adversarial Networks (GANs) are utilized to generate synthetic potato disease images from healthy potato images. This approach not only expands the dataset but also adds variety, which helps to enhance model generalization. Using the Inception score as a measure, our experiments show the better quality and realisticness of the images created by PotatoGANs, emphasizing their capacity to resemble real disease images closely. The CycleGAN model outperforms the Pix2Pix GAN model in terms of image quality, as evidenced by its higher IS scores CycleGAN achieves higher Inception scores (IS) of 1.2001 and 1.0900 for black scurf and common scab, respectively. This synthetic data can significantly improve the training of large neural networks. It also reduces data collection costs while enhancing data diversity and generalization capabilities. Our work improves interpretability by combining three gradient-based Explainable AI algorithms (GradCAM, GradCAM++, and ScoreCAM) with three distinct CNN architectures (DenseNet169, Resnet152 V2, InceptionResNet V2) for potato disease classification.
[328] arXiv:2405.07335 [pdf, ps, other]: Title: Tremor Reduction for Accessible Ray Based Interaction in VR Applications

Authors: Dr Corrie Green, Dr Yang Jiang, Dr John Isaacs, Dr Michael Heron

Comments: The pre-print contains 7 pages, 5 figures and 4 tables. The attached pre-print is an extract containing some information about the completed study results, the full paper is in review at the appropriate journal. This pre-print is released to support developers implementing tremor reduction solutions for VR now as its been in the review process for years

Subjects: Human-Computer Interaction (cs.HC); Computers and Society (cs.CY)

Comparative to conventional 2D interaction methods, virtual reality (VR) demonstrates an opportunity for unique interface and interaction design decisions. Currently, this poses a challenge when developing an accessible VR experience as existing interaction techniques may not be usable by all users. It was discovered that many traditional 2D interface interaction methods have been directly converted to work in a VR space with little alteration to the input mechanism, such as the use of a laser pointer designed to that of a traditional cursor. It is recognized that distanceindependent millimetres can support designers in developing interfaces that scale in virtual worlds. Relevantly, Fitts law states that as distance increases, user movements are increasingly slower and performed less accurately. In this paper we propose the use of a low pass filter, to normalize user input noise, alleviating fine motor requirements during ray-based interaction. A development study was conducted to understand the feasibility of implementing such a filter and explore its effects on end users experience. It demonstrates how an algorithm can provide an opportunity for a more accurate and consequently less frustrating experience by filtering and reducing involuntary hand tremors. Further discussion on existing VR design philosophies is also conducted, analysing evidence that supports multisensory feedback and psychological models. The completed study can be downloaded from GitHub.
[329] arXiv:2405.07336 [pdf, other]: Title: Data Trading Combination Auction Mechanism based on the Exponential Mechanism

Authors: Kongyang Chen, Zeming Xu, Bing Mi

Subjects: Computer Science and Game Theory (cs.GT); Machine Learning (cs.LG)

With the widespread application of machine learning technology in recent years, the demand for training data has increased significantly, leading to the emergence of research areas such as data trading. The work in this field is still in the developmental stage. Different buyers have varying degrees of demand for various types of data, and auctions play a role in such scenarios due to their authenticity and fairness. Recent related work has proposed combination auction mechanisms for different domains. However, such mechanisms have not addressed the privacy concerns of buyers. In this paper, we design a \textit{Data Trading Combination Auction Mechanism based on the exponential mechanism} (DCAE) to protect buyers' bidding privacy from being leaked. We apply the exponential mechanism to select the final settlement price for the auction and generate a probability distribution based on the relationship between the price and the revenue. In the experimental aspect, we consider the selection of different mechanisms under two scenarios, and the experimental results show that this method can ensure high auction revenue and protect buyers' privacy from being violated.
[330] arXiv:2405.07340 [pdf, other]: Title: Machine Consciousness as Pseudoscience: The Myth of Conscious Machines

Authors: Eduardo C. Garrido-Merchán

Comments: 19 pages

Subjects: Computers and Society (cs.CY)

The hypothesis of conscious machines has been debated since the invention of the notion of artificial intelligence, powered by the assumption that the computational intelligence achieved by a system is the cause of the emergence of phenomenal consciousness in that system as an epiphenomenon or as a consequence of the behavioral or internal complexity of the system surpassing some threshold. As a consequence, a huge amount of literature exploring the possibility of machine consciousness and how to implement it on a computer has been published. Moreover, common folk psychology and transhumanism literature has fed this hypothesis with the popularity of science fiction literature, where intelligent robots are usually antropomorphized and hence given phenomenal consciousness. However, in this work, we argue how these literature lacks scientific rigour, being impossible to falsify the opposite hypothesis, and illustrate a list of arguments that show how every approach that the machine consciousness literature has published depends on philosophical assumptions that cannot be proven by the scientific method. Concretely, we also show how phenomenal consciousness is not computable, independently on the complexity of the algorithm or model, cannot be objectively measured nor quantitatively defined and it is basically a phenomenon that is subjective and internal to the observer. Given all those arguments we end the work arguing why the idea of conscious machines is nowadays a myth of transhumanism and science fiction culture.
[331] arXiv:2405.07343 [pdf, other]: Title: Graph neural networks for power grid operational risk assessment under evolving grid topology

Authors: Yadong Zhang, Pranav M Karve, Sankaran Mahadevan

Comments: Manuscript submitted to Applied Energy

Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG); Methodology (stat.ME)

This article investigates the ability of graph neural networks (GNNs) to identify risky conditions in a power grid over the subsequent few hours, without explicit, high-resolution information regarding future generator on/off status (grid topology) or power dispatch decisions. The GNNs are trained using supervised learning, to predict the power grid's aggregated bus-level (either zonal or system-level) or individual branch-level state under different power supply and demand conditions. The variability of the stochastic grid variables (wind/solar generation and load demand), and their statistical correlations, are rigorously considered while generating the inputs for the training data. The outputs in the training data, obtained by solving numerous mixed-integer linear programming (MILP) optimal power flow problems, correspond to system-level, zonal and transmission line-level quantities of interest (QoIs). The QoIs predicted by the GNNs are used to conduct hours-ahead, sampling-based reliability and risk assessment w.r.t. zonal and system-level (load shedding) as well as branch-level (overloading) failure events. The proposed methodology is demonstrated for three synthetic grids with sizes ranging from 118 to 2848 buses. Our results demonstrate that GNNs are capable of providing fast and accurate prediction of QoIs and can be good proxies for computationally expensive MILP algorithms. The excellent accuracy of GNN-based reliability and risk assessment suggests that GNN models can substantially improve situational awareness by quickly providing rigorous reliability and risk estimates.
[332] arXiv:2405.07344 [pdf, other]: Title: TKAN: Temporal Kolmogorov-Arnold Networks

Authors: Remi Genet, Hugo Inzirillo

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Recurrent Neural Networks (RNNs) have revolutionized many areas of machine learning, particularly in natural language and data sequence processing. Long Short-Term Memory (LSTM) has demonstrated its ability to capture long-term dependencies in sequential data. Inspired by the Kolmogorov-Arnold Networks (KANs) a promising alternatives to Multi-Layer Perceptrons (MLPs), we proposed a new neural networks architecture inspired by KAN and the LSTM, the Temporal Kolomogorov-Arnold Networks (TKANs). TKANs combined the strenght of both networks, it is composed of Recurring Kolmogorov-Arnold Networks (RKANs) Layers embedding memory management. This innovation enables us to perform multi-step time series forecasting with enhanced accuracy and efficiency. By addressing the limitations of traditional models in handling complex sequential patterns, the TKAN architecture offers significant potential for advancements in fields requiring more than one step ahead forecasting.
[333] arXiv:2405.07346 [pdf, other]: Title: Understanding and Evaluating Human Preferences for AI Generated Images with Instruction Tuning

Authors: Jiarui Wang, Huiyu Duan, Guangtao Zhai, Xiongkuo Min

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Artificial Intelligence Generated Content (AIGC) has grown rapidly in recent years, among which AI-based image generation has gained widespread attention due to its efficient and imaginative image creation ability. However, AI-generated Images (AIGIs) may not satisfy human preferences due to their unique distortions, which highlights the necessity to understand and evaluate human preferences for AIGIs. To this end, in this paper, we first establish a novel Image Quality Assessment (IQA) database for AIGIs, termed AIGCIQA2023+, which provides human visual preference scores and detailed preference explanations from three perspectives including quality, authenticity, and correspondence. Then, based on the constructed AIGCIQA2023+ database, this paper presents a MINT-IQA model to evaluate and explain human preferences for AIGIs from Multi-perspectives with INstruction Tuning. Specifically, the MINT-IQA model first learn and evaluate human preferences for AI-generated Images from multi-perspectives, then via the vision-language instruction tuning strategy, MINT-IQA attains powerful understanding and explanation ability for human visual preference on AIGIs, which can be used for feedback to further improve the assessment capabilities. Extensive experimental results demonstrate that the proposed MINT-IQA model achieves state-of-the-art performance in understanding and evaluating human visual preferences for AIGIs, and the proposed model also achieves competing results on traditional IQA tasks compared with state-of-the-art IQA models. The AIGCIQA2023+ database and MINT-IQA model will be released to facilitate future research.
[334] arXiv:2405.07348 [pdf, other]: Title: MedConceptsQA -- Open Source Medical Concepts QA Benchmark

Authors: Ofir Ben Shoham, Nadav Rappoport

Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

We present MedConceptsQA, a dedicated open source benchmark for medical concepts question answering. The benchmark comprises of questions of various medical concepts across different vocabularies: diagnoses, procedures, and drugs. The questions are categorized into three levels of difficulty: easy, medium, and hard. We conducted evaluations of the benchmark using various Large Language Models. Our findings show that pre-trained clinical Large Language Models achieved accuracy levels close to random guessing on this benchmark, despite being pre-trained on medical data. However, GPT-4 achieves an absolute average improvement of nearly 27%-37% (27% for zero-shot learning and 37% for few-shot learning) when compared to clinical Large Language Models. Our benchmark serves as a valuable resource for evaluating the understanding and reasoning of medical concepts by Large Language Models. Our benchmark is available at https://huggingface.co/datasets/ofir408/MedConceptsQA
[335] arXiv:2405.07349 [pdf, other]: Title: WeedScout: Real-Time Autonomous blackgrass Classification and Mapping using dedicated hardware

Authors: Matthew Gazzard, Helen Hicks, Isibor Kennedy Ihianle, Jordan J. Bird, Md Mahmudul Hasan, Pedro Machado

Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)

Blackgrass (Alopecurus myosuroides) is a competitive weed that has wide-ranging impacts on food security by reducing crop yields and increasing cultivation costs. In addition to the financial burden on agriculture, the application of herbicides as a preventive to blackgrass can negatively affect access to clean water and sanitation. The WeedScout project introduces a Real-Rime Autonomous Black-Grass Classification and Mapping (RT-ABGCM), a cutting-edge solution tailored for real-time detection of blackgrass, for precision weed management practices. Leveraging Artificial Intelligence (AI) algorithms, the system processes live image feeds, infers blackgrass density, and covers two stages of maturation. The research investigates the deployment of You Only Look Once (YOLO) models, specifically the streamlined YOLOv8 and YOLO-NAS, accelerated at the edge with the NVIDIA Jetson Nano (NJN). By optimising inference speed and model performance, the project advances the integration of AI into agricultural practices, offering potential solutions to challenges such as herbicide resistance and environmental impact. Additionally, two datasets and model weights are made available to the research community, facilitating further advancements in weed detection and precision farming technologies.
[336] arXiv:2405.07351 [pdf, ps, other]: Title: A Standard Rigid Transformation Notation Convention for Robotics Research

Authors: Philippe Nadeau

Comments: 43 pages, 8 figures, 5 tables, 1 code listing

Subjects: Robotics (cs.RO)

Notation conventions for rigid transformations are as diverse as they are fundamental to the field of robotics. A well-defined convention that is practical, consistent and unambiguous is essential for the clear communication of ideas and to foster collaboration between researchers. This work presents an analysis of conventions used in state-of-the-art robotics research, defines a new notation convention, and provides software packages to facilitate its use. To shed some light on the current state of notation conventions in robotics research, this work presents an analysis of the ICRA 2023 proceedings, focusing on the notation conventions used for rigid transformations. A total of 1655 papers were inspected to identify the convention used, and key insights about trends and usage preferences are derived. Based on this analysis, a new notation convention called RIGID is defined, which complies with the "ISO 80000 Standard on Quantities and Units". The RIGID convention is designed to be concise yet unambiguous and easy to use. Additionally, this work introduces a LaTeX package that facilitates the use of the RIGID notation in manuscripts preparation through simple customizable commands that can be easily translated into variable names for software development.
[337] arXiv:2405.07353 [pdf, ps, other]: Title: Distributed Lovász Local Lemma under Bandwidth Limitations

Authors: Magnús M. Halldórsson, Yannic Maus, Saku Peltonen

Subjects: Data Structures and Algorithms (cs.DS); Distributed, Parallel, and Cluster Computing (cs.DC)

The constructive Lov\'{a}sz Local Lemma has become a central tool for designing efficient distributed algorithms. While it has been extensively studied in the classic LOCAL model that uses unlimited bandwidth, much less is known in the bandwidth-restricted CONGEST model.
In this paper, we present bandwidth- and time-efficient algorithms for various subclasses of LLL problems, including a large class of subgraph sampling problems that are naturally formulated as LLLs. Lastly, we use our LLLs to design efficient CONGEST algorithms for coloring sparse and triangle-free graphs with few colors. These coloring algorithms are exponentially faster than previous LOCAL model algorithms.
[338] arXiv:2405.07354 [pdf, other]: Title: SoccerNet-Echoes: A Soccer Game Audio Commentary Dataset

Authors: Sushant Gautam, Mehdi Houshmand Sarkhoosh, Jan Held, Cise Midoglu, Anthony Cioppa, Silvio Giancola, Vajira Thambawita, Michael A. Riegler, Pål Halvorsen, Mubarak Shah

Subjects: Sound (cs.SD); Information Retrieval (cs.IR); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)

The application of Automatic Speech Recognition (ASR) technology in soccer offers numerous opportunities for sports analytics. Specifically, extracting audio commentaries with ASR provides valuable insights into the events of the game, and opens the door to several downstream applications such as automatic highlight generation. This paper presents SoccerNet-Echoes, an augmentation of the SoccerNet dataset with automatically generated transcriptions of audio commentaries from soccer game broadcasts, enhancing video content with rich layers of textual information derived from the game audio using ASR. These textual commentaries, generated using the Whisper model and translated with Google Translate, extend the usefulness of the SoccerNet dataset in diverse applications such as enhanced action spotting, automatic caption generation, and game summarization. By incorporating textual data alongside visual and auditory content, SoccerNet-Echoes aims to serve as a comprehensive resource for the development of algorithms specialized in capturing the dynamics of soccer games. We detail the methods involved in the curation of this dataset and the integration of ASR. We also highlight the implications of a multimodal approach in sports analytics, and how the enriched dataset can support diverse applications, thus broadening the scope of research and development in the field of sports analytics.
[339] arXiv:2405.07358 [pdf, ps, other]: Title: A Value Driven Framework for Cybersecurity Innovation in Transportation & Infrastructure

Authors: Lampis Alevizos, Lalit Bhakuni, Stefan Jaschke

Subjects: Cryptography and Security (cs.CR)

This paper introduces a value-driven cybersecurity innovation framework for the transportation and infrastructure sectors, as opposed to the traditional market-centric approaches that have dominated the field. Recontextualizing innovation categories into sustaining, incremental, disruptive, and transformative, we aim to foster a culture of self-innovation within organizations, enabling a strategic focus on cybersecurity measures that directly contribute to business value and strategic goals. This approach enhances operational effectiveness and efficiency of cyber defences primarily, while also aligns cybersecurity initiatives with mission-critical objectives. We detail a practical method for evaluating the business value of cybersecurity innovations and present a pragmatic approach for organizations to funnel innovative ideas in a structured and repeatable manner. The framework is designed to reinforce cybersecurity capabilities against an evolving cyber threat landscape while maintaining infrastructural integrity. Shifting the focus from general market appeal to sector-specific needs, our framework provides cybersecurity leaders with the strategic cyber-foresight necessary for prioritizing impactful initiatives, thereby making cybersecurity a core business enabler rather than a burden.
[340] arXiv:2405.07359 [pdf, other]: Title: Forecasting with an N-dimensional Langevin Equation and a Neural-Ordinary Differential Equation

Authors: Antonio Malpica-Morales, Miguel A. Duran-Olivencia, Serafim Kalliadasis

Comments: 26 pages, 7 figures

Journal-ref: Chaos, 34, 043105 (2024)

Subjects: Machine Learning (cs.LG); Dynamical Systems (math.DS); Data Analysis, Statistics and Probability (physics.data-an); Methodology (stat.ME)

Accurate prediction of electricity day-ahead prices is essential in competitive electricity markets. Although stationary electricity-price forecasting techniques have received considerable attention, research on non-stationary methods is comparatively scarce, despite the common prevalence of non-stationary features in electricity markets. Specifically, existing non-stationary techniques will often aim to address individual non-stationary features in isolation, leaving aside the exploration of concurrent multiple non-stationary effects. Our overarching objective here is the formulation of a framework to systematically model and forecast non-stationary electricity-price time series, encompassing the broader scope of non-stationary behavior. For this purpose we develop a data-driven model that combines an N-dimensional Langevin equation (LE) with a neural-ordinary differential equation (NODE). The LE captures fine-grained details of the electricity-price behavior in stationary regimes but is inadequate for non-stationary conditions. To overcome this inherent limitation, we adopt a NODE approach to learn, and at the same time predict, the difference between the actual electricity-price time series and the simulated price trajectories generated by the LE. By learning this difference, the NODE reconstructs the non-stationary components of the time series that the LE is not able to capture. We exemplify the effectiveness of our framework using the Spanish electricity day-ahead market as a prototypical case study. Our findings reveal that the NODE nicely complements the LE, providing a comprehensive strategy to tackle both stationary and non-stationary electricity-price behavior. The framework's dependability and robustness is demonstrated through different non-stationary scenarios by comparing it against a range of basic naive methods.
[341] arXiv:2405.07363 [pdf, ps, other]: Title: Multilingual Power and Ideology Identification in the Parliament: a Reference Dataset and Simple Baselines

Authors: Çağrı Çöltekin, Matyáš Kopp, Katja Meden, Vaidas Morkevicius, Nikola Ljubešić, Tomaž Erjavec

Subjects: Computation and Language (cs.CL)

We introduce a dataset on political orientation and power position identification. The dataset is derived from ParlaMint, a set of comparable corpora of transcribed parliamentary speeches from 29 national and regional parliaments. We introduce the dataset, provide the reasoning behind some of the choices during its creation, present statistics on the dataset, and, using a simple classifier, some baseline results on predicting political orientation on the left-to-right axis, and on power position identification, i.e., distinguishing between the speeches delivered by governing coalition party members from those of opposition party members.
[342] arXiv:2405.07364 [pdf, other]: Title: BoQ: A Place is Worth a Bag of Learnable Queries

Authors: Amar Ali-bey, Brahim Chaib-draa, Philippe Giguère

Comments: Accepted at CVPR 2024

Subjects: Computer Vision and Pattern Recognition (cs.CV)

In visual place recognition, accurately identifying and matching images of locations under varying environmental conditions and viewpoints remains a significant challenge. In this paper, we introduce a new technique, called Bag-of-Queries (BoQ), which learns a set of global queries designed to capture universal place-specific attributes. Unlike existing methods that employ self-attention and generate the queries directly from the input features, BoQ employs distinct learnable global queries, which probe the input features via cross-attention, ensuring consistent information aggregation. In addition, our technique provides an interpretable attention mechanism and integrates with both CNN and Vision Transformer backbones. The performance of BoQ is demonstrated through extensive experiments on 14 large-scale benchmarks. It consistently outperforms current state-of-the-art techniques including NetVLAD, MixVPR and EigenPlaces. Moreover, as a global retrieval technique (one-stage), BoQ surpasses two-stage retrieval methods, such as Patch-NetVLAD, TransVPR and R2Former, all while being orders of magnitude faster and more efficient. The code and model weights are publicly available at https://github.com/amaralibey/Bag-of-Queries.
[343] arXiv:2405.07368 [pdf, other]: Title: A New Algorithm for Computing $α$-Capacity

Authors: Akira Kamatsuka, Koki Kazama, Takahiro Yoshida

Subjects: Information Theory (cs.IT)

The problem of computing $\alpha$-capacity for $\alpha>1$ is equivalent to that of computing the correct decoding exponent. Various algorithms for computing them have been proposed, such as Arimoto and Jitsumatsu--Oohama algorithm. In this study, we propose a novel alternating optimization algorithm for computing the $\alpha$-capacity for $\alpha>1$ based on a variational characterization of the Augustin--Csisz{\'a}r mutual information. A comparison of the convergence performance of these algorithms is demonstrated through numerical examples.
[344] arXiv:2405.07369 [pdf, other]: Title: Incorporating Anatomical Awareness for Enhanced Generalizability and Progression Prediction in Deep Learning-Based Radiographic Sacroiliitis Detection

Authors: Felix J. Dorfner, Janis L. Vahldiek, Leonhard Donle, Andrei Zhukov, Lina Xu, Hartmut Häntze, Marcus R. Makowski, Hugo J.W.L. Aerts, Fabian Proft, Valeria Rios Rodriguez, Judith Rademacher, Mikhail Protopopov, Hildrun Haibel, Torsten Diekhoff, Murat Torgutalp, Lisa C. Adams, Denis Poddubnyy, Keno K. Bressem

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Purpose: To examine whether incorporating anatomical awareness into a deep learning model can improve generalizability and enable prediction of disease progression.
Methods: This retrospective multicenter study included conventional pelvic radiographs of 4 different patient cohorts focusing on axial spondyloarthritis (axSpA) collected at university and community hospitals. The first cohort, which consisted of 1483 radiographs, was split into training (n=1261) and validation (n=222) sets. The other cohorts comprising 436, 340, and 163 patients, respectively, were used as independent test datasets. For the second cohort, follow-up data of 311 patients was used to examine progression prediction capabilities. Two neural networks were trained, one on images cropped to the bounding box of the sacroiliac joints (anatomy-aware) and the other one on full radiographs. The performance of the models was compared using the area under the receiver operating characteristic curve (AUC), accuracy, sensitivity, and specificity.
Results: On the three test datasets, the standard model achieved AUC scores of 0.853, 0.817, 0.947, with an accuracy of 0.770, 0.724, 0.850. Whereas the anatomy-aware model achieved AUC scores of 0.899, 0.846, 0.957, with an accuracy of 0.821, 0.744, 0.906, respectively. The patients who were identified as high risk by the anatomy aware model had an odds ratio of 2.16 (95% CI: 1.19, 3.86) for having progression of radiographic sacroiliitis within 2 years.
Conclusion: Anatomical awareness can improve the generalizability of a deep learning model in detecting radiographic sacroiliitis. The model is published as fully open source alongside this study.
[345] arXiv:2405.07371 [pdf, ps, other]: Title: Extreme Distance Distributions of Poisson Voronoi Cells

Authors: Jaume Anguera Peris, Joakim Jaldén

Subjects: Numerical Analysis (math.NA); Probability (math.PR)

Poisson point processes provide a versatile framework for modeling the distributions of random points in space. When the space is partitioned into cells, each associated with a single generating point from the Poisson process, there appears a geometric structure known as Poisson Voronoi tessellation. These tessellations find applications in various fields such as biology, material science, and communications, where the statistical properties of the Voronoi cells reveal patterns and structures that hold key insights into the underlying processes generating the observed phenomena.
In this paper, we investigate a distance measure of Poisson Voronoi tessellations that is emerging in the literature, yet for which its statistical and geometrical properties remain explored only in the asymptotic case when the density of seed points approaches infinity. Our work, specifically focused on homogeneous Poisson point processes, characterizes the cumulative distribution functions governing the smallest and largest distances between the points generating the Voronoi regions and their respective vertices for an arbitrary density of points in $\mathbb{R}^2$. For that, we conduct a Monte-Carlo type simulation with $10^8$ Voronoi cells and fit the resulting empirical cumulative distribution functions to the Generalized Gamma, Gamma, Log-normal, Rayleigh, and Weibull distributions. Our analysis compares these fits in terms of root mean-squared error and maximum absolute variation, revealing the Generalized Gamma distribution as the best-fit model for characterizing these distances in homogeneous Poisson Voronoi tessellations. Furthermore, we provide estimates for the maximum likelihood and the $95$\% confidence interval of the parameters of the Generalized Gamma distribution along with the algorithm implemented to calculate the maximum and minimum distances.
[346] arXiv:2405.07373 [pdf, other]: Title: Probabilistic and Causal Satisfiability: the Impact of Marginalization

Authors: Julian Dörfler, Benito van der Zander, Markus Bläser, Maciej Liskiewicz

Subjects: Artificial Intelligence (cs.AI); Computational Complexity (cs.CC)

The framework of Pearl's Causal Hierarchy (PCH) formalizes three types of reasoning: observational, interventional, and counterfactual, that reflect the progressive sophistication of human thought regarding causation. We investigate the computational complexity aspects of reasoning in this framework focusing mainly on satisfiability problems expressed in probabilistic and causal languages across the PCH. That is, given a system of formulas in the standard probabilistic and causal languages, does there exist a model satisfying the formulas? The resulting complexity changes depending on the level of the hierarchy as well as the operators allowed in the formulas (addition, multiplication, or marginalization).
We focus on formulas involving marginalization that are widely used in probabilistic and causal inference, but whose complexity issues are still little explored. Our main contribution are the exact computational complexity results showing that linear languages (allowing addition and marginalization) yield NP^PP-, PSPACE-, and NEXP-complete satisfiability problems, depending on the level of the PCH. Moreover, we prove that the problem for the full language (allowing additionally multiplication) is complete for the class succ$\exists$R for languages on the highest, counterfactual level. Previous work has shown that the satisfiability problem is complete for succ$\exists$R on the lower levels leaving the counterfactual case open. Finally, we consider constrained models that are restricted to a small polynomial size. The constraint on the size reduces the complexity of the interventional and counterfactual languages to NEXP-complete.
[347] arXiv:2405.07374 [pdf, other]: Title: Conformalized Survival Distributions: A Generic Post-Process to Increase Calibration

Authors: Shi-ang Qi, Yakun Yu, Russell Greiner

Comments: Accepted to ICML 2024

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

Discrimination and calibration represent two important properties of survival analysis, with the former assessing the model's ability to accurately rank subjects and the latter evaluating the alignment of predicted outcomes with actual events. With their distinct nature, it is hard for survival models to simultaneously optimize both of them especially as many previous results found improving calibration tends to diminish discrimination performance. This paper introduces a novel approach utilizing conformal regression that can improve a model's calibration without degrading discrimination. We provide theoretical guarantees for the above claim, and rigorously validate the efficiency of our approach across 11 real-world datasets, showcasing its practical applicability and robustness in diverse scenarios.
[348] arXiv:2405.07376 [pdf, other]: Title: Advocating Feedback Control for Human-Earth System Applications

Authors: Guido Cavraro

Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC)

This paper proposes a feedback control perspective for Human-Earth Systems (HESs) which essentially are complex systems that capture the interactions between humans and nature. Recent attention in HES research has been directed towards devising strategies for climate change mitigation and adaptation, aimed at achieving environmental and societal objectives. However, existing approaches heavily rely on HES models, which inherently suffer from inaccuracies due to the complexity of the system. Moreover, overly detailed models often prove impractical for optimization tasks. We propose a framework inheriting from feedback control strategies the robustness against model errors, because inaccuracies are mitigated using measurements retrieved from the field. The framework comprises two nested control loops. The outer loop computes the optimal inputs to the HES, which are then implemented by actuators controlled in the inner loop. Potential fields of applications are also identified.
[349] arXiv:2405.07378 [pdf, other]: Title: Hell is Paved with Good Intentions: The Intricate Relationship Between Cognitive Biases and Dark Patterns

Authors: Thomas Mildner, Albert Inkoom, Rainer Malaka, Jasmin Niess

Comments: This is the author-version of this work. The paper is submitted as a draft, as it is still under review, but was published at arXiv in support of a cumulative dissertation

Subjects: Human-Computer Interaction (cs.HC)

Throughout the past decade, research in HCI has identified numerous instances of dark patterns in digital interfaces. These efforts have led to a well-fostered typology describing harmful strategies users struggle to navigate. However, an in-depth understanding of the underlying mechanisms that deceive, coerce, or manipulate users is missing. We explore the interplay between cognitive biases and dark patterns to address this gap. To that end, we conducted four focus groups with experts (N=15) in psychology and dark pattern scholarship, inquiring how they conceptualise the relation between cognitive biases and dark patterns. Based on our results, we constructed the "Relationship Model of Cognitive Biases and Dark Patterns" which illustrates how cognitive bias and deceptive design patterns relate and identifies opportune moments for ethical reconsideration and user protection mechanisms. Our insights contribute to the current discourse by emphasising ethical design decisions and their implications in the field of HCI.
[350] arXiv:2405.07381 [pdf, ps, other]: Title: Networked Control with Hybrid Automatic Repeat Request Protocols

Authors: Touraj Soleymani, John S. Baras, Deniz Gündüz

Subjects: Information Theory (cs.IT); Optimization and Control (math.OC)

We study feedback control of a dynamical process over a lossy channel equipped with a hybrid automatic repeat request protocol that connects a sensor to an actuator. The dynamical process is modeled by a Gauss-Markov process, and the lossy channel by a packet-erasure channel with ideal feedback. We suppose that data is communicated in the format of packets with negligible quantization error. In such a networked control system, whenever a packet loss occurs, there exists a tradeoff between transmitting new sensory information with a lower success probability and retransmitting previously failed sensory information with a higher success probability. In essence, an inherent tradeoff between freshness and reliability. To address this tradeoff, we consider a linear-quadratic-regulator performance index, which penalizes state deviations and control efforts over a finite horizon, and jointly design optimal policies for an encoder and a decoder, which are collocated with the sensor and the actuator, respectively. Our emphasis here lies specifically on designing switching and control policies, rather than error-correcting codes. We derive the structural properties of the optimal encoding and decoding policies. We show that the former is a threshold switching policy and the latter is a certainty-equivalent control policy. In addition, we specify the iterative equations that the encoder and the decoder need to solve in order to implement the optimal policies.
[351] arXiv:2405.07387 [pdf, other]: Title: Semantic Loss Functions for Neuro-Symbolic Structured Prediction

Authors: Kareem Ahmed, Stefano Teso, Paolo Morettin, Luca Di Liello, Pierfrancesco Ardino, Jacopo Gobbi, Yitao Liang, Eric Wang, Kai-Wei Chang, Andrea Passerini, Guy Van den Broeck

Comments: Preprint of Ch. 22 "Semantic Loss Functions for Neuro-Symbolic Structured Prediction" in "Compendium of Neurosymbolic Artificial Intelligence", this https URL arXiv admin note: substantial text overlap with arXiv:2201.11250, arXiv:2007.13197

Subjects: Machine Learning (cs.LG)

Structured output prediction problems are ubiquitous in machine learning. The prominent approach leverages neural networks as powerful feature extractors, otherwise assuming the independence of the outputs. These outputs, however, jointly encode an object, e.g. a path in a graph, and are therefore related through the structure underlying the output space. We discuss the semantic loss, which injects knowledge about such structure, defined symbolically, into training by minimizing the network's violation of such dependencies, steering the network towards predicting distributions satisfying the underlying structure. At the same time, it is agnostic to the arrangement of the symbols, and depends only on the semantics expressed thereby, while also enabling efficient end-to-end training and inference. We also discuss key improvements and applications of the semantic loss. One limitations of the semantic loss is that it does not exploit the association of every data point with certain features certifying its membership in a target class. We should therefore prefer minimum-entropy distributions over valid structures, which we obtain by additionally minimizing the neuro-symbolic entropy. We empirically demonstrate the benefits of this more refined formulation. Moreover, the semantic loss is designed to be modular and can be combined with both discriminative and generative neural models. This is illustrated by integrating it into generative adversarial networks, yielding constrained adversarial networks, a novel class of deep generative models able to efficiently synthesize complex objects obeying the structure of the underlying domain.
[352] arXiv:2405.07391 [pdf, other]: Title: AnyRotate: Gravity-Invariant In-Hand Object Rotation with Sim-to-Real Touch

Authors: Max Yang, Chenghua Lu, Alex Church, Yijiong Lin, Chris Ford, Haoran Li, Efi Psomopoulou, David A.W. Barton, Nathan F. Lepora

Comments: Project website can be found at this https URL

Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

In-hand manipulation is an integral component of human dexterity. Our hands rely on tactile feedback for stable and reactive motions to ensure objects do not slip away unintentionally during manipulation. For a robot hand, this level of dexterity requires extracting and utilizing rich contact information for precise motor control. In this paper, we present AnyRotate, a system for gravity-invariant multi-axis in-hand object rotation using dense featured sim-to-real touch. We construct a continuous contact feature representation to provide tactile feedback for training a policy in simulation and introduce an approach to perform zero-shot policy transfer by training an observation model to bridge the sim-to-real gap. Our experiments highlight the benefit of detailed contact information when handling objects with varying properties. In the real world, we demonstrate successful sim-to-real transfer of the dense tactile policy, generalizing to a diverse range of objects for various rotation axes and hand directions and outperforming other forms of low-dimensional touch. Interestingly, despite not having explicit slip detection, rich multi-fingered tactile sensing can implicitly detect object movement within grasp and provide a reactive behavior that improves the robustness of the policy, highlighting the importance of information-rich tactile sensing for in-hand manipulation.
[353] arXiv:2405.07392 [pdf, other]: Title: NGD-SLAM: Towards Real-Time SLAM for Dynamic Environments without GPU

Authors: Yuhao Zhang

Comments: 12 pages, 5 figures

Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)

Accurate and robust camera tracking in dynamic environments presents a significant challenge for visual SLAM (Simultaneous Localization and Mapping). Recent progress in this field often involves the use of deep learning techniques to generate mask for dynamic objects, which usually require GPUs to operate in real-time (30 fps). Therefore, this paper proposes a novel visual SLAM system for dynamic environments that obtains real-time performance on CPU by incorporating a mask prediction mechanism, which allows the deep learning method and the camera tracking to run entirely in parallel at different frequencies such that neither waits for the result from the other. Based on this, it further introduces a dual-stage optical flow tracking approach and employs a hybrid usage of optical flow and ORB features, which significantly enhance the efficiency and robustness of the system. Compared with state-of-the-art methods, this system maintains high localization accuracy in dynamic environments while achieving a tracking frame rate of 56 fps on a single laptop CPU without any hardware acceleration, thus proving that deep learning methods are still feasible for dynamic SLAM even without GPU support. Based on the available information, this is the first SLAM system to achieve this.
[354] arXiv:2405.07393 [pdf, other]: Title: Intrinsic Fairness-Accuracy Tradeoffs under Equalized Odds

Authors: Meiyu Zhong, Ravi Tandon

Subjects: Machine Learning (cs.LG)

With the growing adoption of machine learning (ML) systems in areas like law enforcement, criminal justice, finance, hiring, and admissions, it is increasingly critical to guarantee the fairness of decisions assisted by ML. In this paper, we study the tradeoff between fairness and accuracy under the statistical notion of equalized odds. We present a new upper bound on the accuracy (that holds for any classifier), as a function of the fairness budget. In addition, our bounds also exhibit dependence on the underlying statistics of the data, labels and the sensitive group attributes. We validate our theoretical upper bounds through empirical analysis on three real-world datasets: COMPAS, Adult, and Law School. Specifically, we compare our upper bound to the tradeoffs that are achieved by various existing fair classifiers in the literature. Our results show that achieving high accuracy subject to a low-bias could be fundamentally limited based on the statistical disparity across the groups.
[355] arXiv:2405.07395 [pdf, other]: Title: CaFA: Global Weather Forecasting with Factorized Attention on Sphere

Authors: Zijie Li, Anthony Zhou, Saurabh Patil, Amir Barati Farimani

Comments: Preprint

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computational Engineering, Finance, and Science (cs.CE)

Accurate weather forecasting is crucial in various sectors, impacting decision-making processes and societal events. Data-driven approaches based on machine learning models have recently emerged as a promising alternative to numerical weather prediction models given their potential to capture physics of different scales from historical data and the significantly lower computational cost during the prediction stage. Renowned for its state-of-the-art performance across diverse domains, the Transformer model has also gained popularity in machine learning weather prediction. Yet applying Transformer architectures to weather forecasting, particularly on a global scale is computationally challenging due to the quadratic complexity of attention and the quadratic increase in spatial points as resolution increases. In this work, we propose a factorized-attention-based model tailored for spherical geometries to mitigate this issue. More specifically, it utilizes multi-dimensional factorized kernels that convolve over different axes where the computational complexity of the kernel is only quadratic to the axial resolution instead of overall resolution. The deterministic forecasting accuracy of the proposed model on $1.5^\circ$ and 0-7 days' lead time is on par with state-of-the-art purely data-driven machine learning weather prediction models. We also showcase the proposed model holds great potential to push forward the Pareto front of accuracy-efficiency for Transformer weather models, where it can achieve better accuracy with less computational cost compared to Transformer based models with standard attention.
[356] arXiv:2405.07396 [pdf, other]: Title: An Unstructured Body-of-Revolution Electromagnetic Particle-in-Cell Algorithm with Radial Perfectly Matched Layers and Dual Polarizations

Authors: Dong-Yeop Na, Fernando L. Teixeira, Yuri A. Omelchenko

Comments: This manuscript has been accepted for the publication in Computer Physics Communications COMPHY-D-23-00476R4 (May 11, 2024)

Subjects: Numerical Analysis (math.NA); Computational Physics (physics.comp-ph)

A novel electromagnetic particle-in-cell algorithm has been developed for fully kinetic plasma simulations on unstructured (irregular) meshes in complex body-of-revolution geometries. The algorithm, implemented in the BORPIC++ code, utilizes a set of field scalings and a coordinate mapping, reducing the Maxwell field problem in a cylindrical system to a Cartesian finite element Maxwell solver in the meridian plane. The latter obviates the cylindrical coordinate singularity in the symmetry axis. The choice of an unstructured finite element discretization enhances the geometrical flexibility of the BORPIC++ solver compared to the more traditional finite difference solvers. Symmetries in Maxwell's equations are explored to decompose the problem into two dual polarization states with isomorphic representations that enable code reuse. The particle-in-cell scatter and gather steps preserve charge-conservation at the discrete level. Our previous algorithm (BORPIC+) discretized the E and B field components of TE-phi and TM-phi polarizations on the finite element (primal) mesh. A cylindrical perfectly matched layer is implemented as a boundary condition in the radial direction to simulate open space problems, with periodic boundary conditions in the axial direction. We investigate effects of charged particles moving next to the cylindrical perfectly matched layer. We model azimuthal currents arising from rotational motion of charged rings, which produce TM-phi polarized fields. Several numerical examples are provided to illustrate the first application of the algorithm.
[357] arXiv:2405.07399 [pdf, other]: Title: Semi-Supervised Weed Detection for Rapid Deployment and Enhanced Efficiency

Authors: Alzayat Saleh, Alex Olsen, Jake Wood, Bronson Philippa, Mostafa Rahimi Azghadi

Comments: 16 pages, 4 figures, 6 tables. Submitted to Elsevier

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Weeds present a significant challenge in agriculture, causing yield loss and requiring expensive control measures. Automatic weed detection using computer vision and deep learning offers a promising solution. However, conventional deep learning methods often require large amounts of labelled training data, which can be costly and time-consuming to acquire. This paper introduces a novel method for semi-supervised weed detection, comprising two main components. Firstly, a multi-scale feature representation technique is employed to capture distinctive weed features across different scales. Secondly, we propose an adaptive pseudo-label assignment strategy, leveraging a small set of labelled images during training. This strategy dynamically assigns confidence scores to pseudo-labels generated from unlabeled data. Additionally, our approach integrates epoch-corresponding and mixed pseudo-labels to further enhance the learning process. Experimental results on the COCO dataset and five prominent weed datasets -- CottonWeedDet12, CropAndWeed, Palmer amaranth, RadishWheat, and RoboWeedMap -- illustrate that our method achieves state-of-the-art performance in weed detection, even with significantly less labelled data compared to existing techniques. This approach holds the potential to alleviate the labelling burden and enhance the feasibility and deployment speed of deep learning for weed detection in real-world agricultural scenarios.
[358] arXiv:2405.07404 [pdf, ps, other]: Title: Indoor PM2.5 forecasting and the association with outdoor air pollution: a modelling study based on sensor data in Australia

Authors: Wenhua Yu, Bahareh Nakisa, Seng W. Loke, Svetlana Stevanovic, Yuming Guo, Mohammad Naim Rastgoo

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Exposure to poor indoor air quality poses significant health risks, necessitating thorough assessment to mitigate associated dangers. This study aims to predict hourly indoor fine particulate matter (PM2.5) concentrations and investigate their correlation with outdoor PM2.5 levels across 24 distinct buildings in Australia. Indoor air quality data were gathered from 91 monitoring sensors in eight Australian cities spanning 2019 to 2022. Employing an innovative three-stage deep ensemble machine learning framework (DEML), comprising three base models (Support Vector Machine, Random Forest, and eXtreme Gradient Boosting) and two meta-models (Random Forest and Generalized Linear Model), hourly indoor PM2.5 concentrations were predicted. The model's accuracy was evaluated using a rolling windows approach, comparing its performance against three benchmark algorithms (SVM, RF, and XGBoost). Additionally, a correlation analysis assessed the relationship between indoor and outdoor PM2.5 concentrations. Results indicate that the DEML model consistently outperformed benchmark models, achieving an R2 ranging from 0.63 to 0.99 and RMSE from 0.01 to 0.663 mg/m3 for most sensors. Notably, outdoor PM2.5 concentrations significantly impacted indoor air quality, particularly evident during events like bushfires. This study underscores the importance of accurate indoor air quality prediction, crucial for developing location-specific early warning systems and informing effective interventions. By promoting protective behaviors, these efforts contribute to enhanced public health outcomes.
[359] arXiv:2405.07406 [pdf, other]: Title: Machine Unlearning: A Comprehensive Survey

Authors: Weiqi Wang, Zhiyi Tian, Shui Yu

Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)

As the right to be forgotten has been legislated worldwide, many studies attempt to design unlearning mechanisms to protect users' privacy when they want to leave machine learning service platforms. Specifically, machine unlearning is to make a trained model to remove the contribution of an erased subset of the training dataset. This survey aims to systematically classify a wide range of machine unlearning and discuss their differences, connections and open problems. We categorize current unlearning methods into four scenarios: centralized unlearning, distributed and irregular data unlearning, unlearning verification, and privacy and security issues in unlearning. Since centralized unlearning is the primary domain, we use two parts to introduce: firstly, we classify centralized unlearning into exact unlearning and approximate unlearning; secondly, we offer a detailed introduction to the techniques of these methods. Besides the centralized unlearning, we notice some studies about distributed and irregular data unlearning and introduce federated unlearning and graph unlearning as the two representative directions. After introducing unlearning methods, we review studies about unlearning verification. Moreover, we consider the privacy and security issues essential in machine unlearning and organize the latest related literature. Finally, we discuss the challenges of various unlearning scenarios and address the potential research directions.
[360] arXiv:2405.07407 [pdf, other]: Title: PitcherNet: Powering the Moneyball Evolution in Baseball Video Analytics

Authors: Jerrin Bright, Bavesh Balaji, Yuhao Chen, David A Clausi, John S Zelek

Comments: IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW'24)

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

In the high-stakes world of baseball, every nuance of a pitcher's mechanics holds the key to maximizing performance and minimizing runs. Traditional analysis methods often rely on pre-recorded offline numerical data, hindering their application in the dynamic environment of live games. Broadcast video analysis, while seemingly ideal, faces significant challenges due to factors like motion blur and low resolution. To address these challenges, we introduce PitcherNet, an end-to-end automated system that analyzes pitcher kinematics directly from live broadcast video, thereby extracting valuable pitch statistics including velocity, release point, pitch position, and release extension. This system leverages three key components: (1) Player tracking and identification by decoupling actions from player kinematics; (2) Distribution and depth-aware 3D human modeling; and (3) Kinematic-driven pitch statistics. Experimental validation demonstrates that PitcherNet achieves robust analysis results with 96.82% accuracy in pitcher tracklet identification, reduced joint position error by 1.8mm and superior analytics compared to baseline methods. By enabling performance-critical kinematic analysis from broadcast video, PitcherNet paves the way for the future of baseball analytics by optimizing pitching strategies, preventing injuries, and unlocking a deeper understanding of pitcher mechanics, forever transforming the game.
[361] arXiv:2405.07409 [pdf, other]: Title: ZBanner: Fast Stateless Scanning Capable of Obtaining Responses over TCP

Authors: Chiyu Chen, Yuliang Lu, Guozheng Yang, Yi Xie, Shasha Guo

Comments: The paper has been submitted and the code will be published later

Subjects: Networking and Internet Architecture (cs.NI)

Fast large-scale network scanning is an important way to understand internet service configurations and security in real time, among which stateless scan is representative. Existing stateless scanners can perform single-packet scans for internet-wide network measurements but are limited to host discovery or port scanning. To obtain further information over TCP, slower stateful scanners must be used in conjunction which spend more time and memory because of connection state maintenance. Through simplifying TCP finite state machine, this paper proposes a novel stateless scanning model, which can establish TCP connections and obtain further responses in a completely stateless manner. Based on this model, we implement ZBanner, an improved modular stateless scanner that utilizes user-defined probes for identifying services and versions, fingerprinting TLS servers, etc. We present unique design of ZBanner and experimentally characterize its feasibility and performance. Experiments show that ZBanner performs better than current state-of-the-art solutions in terms of scan rate and memory usage. ZBanner achieves at least three times faster than current tools for generic ports and over 90 times faster for open ports while keeping a minimum and stable memory usage.
[362] arXiv:2405.07411 [pdf, other]: Title: MoVL:Exploring Fusion Strategies for the Domain-Adaptive Application of Pretrained Models in Medical Imaging Tasks

Authors: Haijiang Tian, Jingkun Yue, Xiaohong Liu, Guoxing Yang, Zeyu Jiang, Guangyu Wang

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Medical images are often more difficult to acquire than natural images due to the specialism of the equipment and technology, which leads to less medical image datasets. So it is hard to train a strong pretrained medical vision model. How to make the best of natural pretrained vision model and adapt in medical domain still pends. For image classification, a popular method is linear probe (LP). However, LP only considers the output after feature extraction. Yet, there exists a gap between input medical images and natural pretrained vision model. We introduce visual prompting (VP) to fill in the gap, and analyze the strategies of coupling between LP and VP. We design a joint learning loss function containing categorisation loss and discrepancy loss, which describe the variance of prompted and plain images, naming this joint training strategy MoVL (Mixture of Visual Prompting and Linear Probe). We experiment on 4 medical image classification datasets, with two mainstream architectures, ResNet and CLIP. Results shows that without changing the parameters and architecture of backbone model and with less parameters, there is potential for MoVL to achieve full finetune (FF) accuracy (on four medical datasets, average 90.91% for MoVL and 91.13% for FF). On out of distribution medical dataset, our method(90.33%) can outperform FF (85.15%) with absolute 5.18 % lead.
[363] arXiv:2405.07412 [pdf, other]: Title: Non-intrusive optimal experimental design for large-scale nonlinear Bayesian inverse problems using a Bayesian approximation error approach

Authors: Karina Koval, Ruanui Nicholson

Subjects: Numerical Analysis (math.NA)

We consider optimal experimental design (OED) for nonlinear inverse problems within the Bayesian framework. Optimizing the data acquisition process for large-scale nonlinear Bayesian inverse problems is a computationally challenging task since the posterior is typically intractable and commonly-encountered optimality criteria depend on the observed data. Since these challenges are not present in OED for linear Bayesian inverse problems, we propose an approach based on first linearizing the associated forward problem and then optimizing the experimental design. Replacing an accurate but costly model with some linear surrogate, while justified for certain problems, can lead to incorrect posteriors and sub-optimal designs if model discrepancy is ignored. To avoid this, we use the Bayesian approximation error (BAE) approach to formulate an A-optimal design objective for sensor selection that is aware of the model error. In line with recent developments, we prove that this uncertainty-aware objective is independent of the exact choice of linearization. This key observation facilitates the formulation of an uncertainty-aware OED objective function using a completely trivial linear map, the zero map, as a surrogate to the forward dynamics. The base methodology is also extended to marginalized OED problems, accommodating uncertainties arising from both linear approximations and unknown auxiliary parameters. Our approach only requires parameter and data sample pairs, hence it is particularly well-suited for black box forward models. We demonstrate the effectiveness of our method for finding optimal designs in an idealized subsurface flow inverse problem and for tsunami detection.
[364] arXiv:2405.07414 [pdf, other]: Title: Binning as a Pretext Task: Improving Self-Supervised Learning in Tabular Domains

Authors: Kyungeun Lee, Ye Seul Sim, Hye-Seung Cho, Moonjung Eo, Suhee Yoon, Sanghyu Yoon, Woohyung Lim

Comments: ICML 2024, 18 pages (including supplementary materials)

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

The ability of deep networks to learn superior representations hinges on leveraging the proper inductive biases, considering the inherent properties of datasets. In tabular domains, it is critical to effectively handle heterogeneous features (both categorical and numerical) in a unified manner and to grasp irregular functions like piecewise constant functions. To address the challenges in the self-supervised learning framework, we propose a novel pretext task based on the classical binning method. The idea is straightforward: reconstructing the bin indices (either orders or classes) rather than the original values. This pretext task provides the encoder with an inductive bias to capture the irregular dependencies, mapping from continuous inputs to discretized bins, and mitigates the feature heterogeneity by setting all features to have category-type targets. Our empirical investigations ascertain several advantages of binning: capturing the irregular function, compatibility with encoder architecture and additional modifications, standardizing all features into equal sets, grouping similar values within a feature, and providing ordering information. Comprehensive evaluations across diverse tabular datasets corroborate that our method consistently improves tabular representation learning performance for a wide range of downstream tasks. The codes are available in https://github.com/kyungeun-lee/tabularbinning.
[365] arXiv:2405.07415 [pdf, ps, other]: Title: Structured Reinforcement Learning for Incentivized Stochastic Covert Optimization

Authors: Adit Jain, Vikram Krishnamurthy

Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY)

This paper studies how a stochastic gradient algorithm (SG) can be controlled to hide the estimate of the local stationary point from an eavesdropper. Such problems are of significant interest in distributed optimization settings like federated learning and inventory management. A learner queries a stochastic oracle and incentivizes the oracle to obtain noisy gradient measurements and perform SG. The oracle probabilistically returns either a noisy gradient of the function} or a non-informative measurement, depending on the oracle state and incentive. The learner's query and incentive are visible to an eavesdropper who wishes to estimate the stationary point. This paper formulates the problem of the learner performing covert optimization by dynamically incentivizing the stochastic oracle and obfuscating the eavesdropper as a finite-horizon Markov decision process (MDP). Using conditions for interval-dominance on the cost and transition probability structure, we show that the optimal policy for the MDP has a monotone threshold structure. We propose searching for the optimal stationary policy with the threshold structure using a stochastic approximation algorithm and a multi-armed bandit approach. The effectiveness of our methods is numerically demonstrated on a covert federated learning hate-speech classification task.
[366] arXiv:2405.07417 [pdf, other]: Title: Identifying Hate Speech Peddlers in Online Platforms. A Bayesian Social Learning Approach for Large Language Model Driven Decision-Makers

Authors: Adit Jain, Vikram Krishnamurthy

Subjects: Social and Information Networks (cs.SI); Signal Processing (eess.SP)

This paper studies the problem of autonomous agents performing Bayesian social learning for sequential detection when the observations of the state belong to a high-dimensional space and are expensive to analyze. Specifically, when the observations are textual, the Bayesian agent can use a large language model (LLM) as a map to get a low-dimensional private observation. The agent performs Bayesian learning and takes an action that minimizes the expected cost and is visible to subsequent agents. We prove that a sequence of such Bayesian agents herd in finite time to the public belief and take the same action disregarding the private observations. We propose a stopping time formulation for quickest time herding in social learning and optimally balance privacy and herding. Structural results are shown on the threshold nature of the optimal policy to the stopping time problem. We illustrate the application of our framework when autonomous Bayesian detectors aim to sequentially identify if a user is a hate speech peddler on an online platform by parsing text observations using an LLM. We numerically validate our results on real-world hate speech datasets. We show that autonomous Bayesian agents designed to flag hate speech peddlers in online platforms herd and misclassify the users when the public prior is strong. We also numerically show the effect of a threshold policy in delaying herding.
[367] arXiv:2405.07418 [pdf, other]: Title: Exploring the Effects of User-Agent and User-Designer Similarity in Virtual Human Design to Promote Mental Health Intentions for College Students

Authors: Pedro Guillermo Feijóo-García, Chase Wrenn, Alexandre Gomes de Siqueira, Rashi Ghosh, Jacob Stuart, Heng Yao, Benjamin Lok

Comments: 43 pages, 12 figures, under review for publication at ACM Transactions on Applied Perception

Subjects: Human-Computer Interaction (cs.HC)

Virtual humans (i.e., embodied conversational agents) have the potential to support college students' mental health, particularly in Science, Technology, Engineering, and Mathematics (STEM) fields where students are at a heightened risk of mental disorders such as anxiety and depression. A comprehensive understanding of students, considering their cultural characteristics, experiences, and expectations, is crucial for creating timely and effective virtual human interventions. To this end, we conducted a user study with 481 computer science students from a major university in North America, exploring how they co-designed virtual humans to support mental health conversations for students similar to them. Our findings suggest that computer science students who engage in co-design processes of virtual humans tend to create agents that closely resemble them demographically--agent-designer demographic similarity. Key factors influencing virtual human design included age, gender, ethnicity, and the matching between appearance and voice. We also observed that the demographic characteristics of virtual human designers, especially ethnicity and gender, tend to be associated with those of the virtual humans they designed. Finally, we provide insights concerning the impact of user-designer demographic similarity in virtual humans' effectiveness in promoting mental health conversations when designers' characteristics are shared explicitly or implicitly. Understanding how virtual humans' characteristics serve users' experiences in mental wellness conversations and the similarity-attraction effects between agents, users, and designers may help tailor virtual humans' design to enhance their acceptance and increase their counseling effectiveness.
[368] arXiv:2405.07419 [pdf, other]: Title: Indoor and Outdoor Crowd Density Level Estimation with Video Analysis through Machine Learning Models

Authors: Mahira Arefin, Md. Anwar Hussen Wadud, Anichur Rahman

Subjects: Cryptography and Security (cs.CR)

Crowd density level estimation is an essential aspect of crowd safety since it helps to identify areas of probable overcrowding and required conditions. Nowadays, AI systems can help in various sectors. Here for safety purposes or many for public service crowd detection, tracking or estimating crowd level is essential. So we decided to build an AI project to fulfil the purpose. This project can detect crowds from images, videos, or webcams. From these images, videos, or webcams, this system can detect, track and identify humans. This system also can estimate the crowd level. Though this project is simple, it is very effective, user-friendly, and less costly. Also, we trained our system with a dataset. So our system also can predict the crowd. Though the AI system is not a hundred percent accurate, this project is more than 97 percent accurate. We also represent the dataset in a graphical way.
[369] arXiv:2405.07423 [pdf, other]: Title: RoboCAP: Robotic Classification and Precision Pouring of Diverse Liquids and Granular Media with Capacitive Sensing

Authors: Yexin Hu, Alexandra Gillespie, Akhil Padmanabha, Kavya Puthuveetil, Wesley Lewis, Karan Khokar, Zackory Erickson

Subjects: Robotics (cs.RO)

Liquids and granular media are pervasive throughout human environments, yet remain particularly challenging for robots to sense and manipulate precisely. In this work, we present a systematic approach at integrating capacitive sensing within robotic end effectors to enable robust sensing and precise manipulation of liquids and granular media. We introduce the parallel-jaw RoboCAP Gripper with embedded capacitive sensing arrays that enable a robot to directly sense the materials and dynamics of liquids inside of diverse containers, including some visually opaque. When coupled with model-based control, we demonstrate that the proposed system enables a robotic manipulator to achieve state-of-the-art precision pouring accuracy for a range of substances with varying dynamics properties. Code, designs, and build details are available on the project website.
[370] arXiv:2405.07425 [pdf, other]: Title: Sakuga-42M Dataset: Scaling Up Cartoon Research

Authors: Zhenglin Pan, Yu Zhu, Yuxuan Mu

Comments: Arxiv Pre-print. Work in Progress

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Hand-drawn cartoon animation employs sketches and flat-color segments to create the illusion of motion. While recent advancements like CLIP, SVD, and Sora show impressive results in understanding and generating natural video by scaling large models with extensive datasets, they are not as effective for cartoons. Through our empirical experiments, we argue that this ineffectiveness stems from a notable bias in hand-drawn cartoons that diverges from the distribution of natural videos. Can we harness the success of the scaling paradigm to benefit cartoon research? Unfortunately, until now, there has not been a sizable cartoon dataset available for exploration. In this research, we propose the Sakuga-42M Dataset, the first large-scale cartoon animation dataset. Sakuga-42M comprises 42 million keyframes covering various artistic styles, regions, and years, with comprehensive semantic annotations including video-text description pairs, anime tags, content taxonomies, etc. We pioneer the benefits of such a large-scale cartoon dataset on comprehension and generation tasks by finetuning contemporary foundation models like Video CLIP, Video Mamba, and SVD, achieving outstanding performance on cartoon-related tasks. Our motivation is to introduce large-scaling to cartoon research and foster generalization and robustness in future cartoon applications. Dataset, Code, and Pretrained Models will be publicly available.
[371] arXiv:2405.07429 [pdf, other]: Title: JointLoc: A Real-time Visual Localization Framework for Planetary UAVs Based on Joint Relative and Absolute Pose Estimation

Authors: Xubo Luo, Xue Wan, Yixing Gao, Yaolin Tian, Wei Zhang, Leizheng Shu

Comments: 8 pages

Subjects: Robotics (cs.RO)

Unmanned aerial vehicles (UAVs) visual localization in planetary aims to estimate the absolute pose of the UAV in the world coordinate system through satellite maps and images captured by on-board cameras. However, since planetary scenes often lack significant landmarks and there are modal differences between satellite maps and UAV images, the accuracy and real-time performance of UAV positioning will be reduced. In order to accurately determine the position of the UAV in a planetary scene in the absence of the global navigation satellite system (GNSS), this paper proposes JointLoc, which estimates the real-time UAV position in the world coordinate system by adaptively fusing the absolute 2-degree-of-freedom (2-DoF) pose and the relative 6-degree-of-freedom (6-DoF) pose. Extensive comparative experiments were conducted on a proposed planetary UAV image cross-modal localization dataset, which contains three types of typical Martian topography generated via a simulation engine as well as real Martian UAV images from the Ingenuity helicopter. JointLoc achieved a root-mean-square error of 0.237m in the trajectories of up to 1,000m, compared to 0.594m and 0.557m for ORB-SLAM2 and ORB-SLAM3 respectively. The source code will be available at https://github.com/LuoXubo/JointLoc.
[372] arXiv:2405.07430 [pdf, other]: Title: Don't Chase Your Tail! Missing Key Aspects Augmentation in Textual Vulnerability Descriptions of Long-tail Software through Feature Inference

Authors: Linyi Han, Shidong Pan, Zhenchang Xing, Jiamou Sun, Sofonias Yitagesu, Xiaowang Zhang, Zhiyong Feng

Subjects: Software Engineering (cs.SE); Cryptography and Security (cs.CR)

Augmenting missing key aspects in Textual Vulnerability Descriptions (TVDs) for software with a large user base (referred to as non-long-tail software) has greatly advanced vulnerability analysis and software security research. However, these methods often overlook software instances that have a limited user base (referred to as long-tail software) due to limited TVDs, variations in software features, and domain-specific jargon, which hinders vulnerability analysis and software repairs. In this paper, we introduce a novel software feature inference framework designed to augment the missing key aspects of TVDs for long-tail software. Firstly, we tackle the issue of non-standard software names found in community-maintained vulnerability databases by cross-referencing government databases with Common Vulnerabilities and Exposures (CVEs). Next, we employ Large Language Models (LLMs) to generate the missing key aspects. However, the limited availability of historical TVDs restricts the variety of examples. To overcome this limitation, we utilize the Common Weakness Enumeration (CWE) to classify all TVDs and select cluster centers as representative examples. To ensure accuracy, we present Natural Language Inference (NLI) models specifically designed for long-tail software. These models identify and eliminate incorrect responses. Additionally, we use a wiki repository to provide explanations for proprietary terms. Our evaluations demonstrate that our approach significantly improves the accuracy of augmenting missing key aspects of TVDs for log-tail software from 0.27 to 0.56 (+107%). Interestingly, the accuracy of non-long-tail software also increases from 64% to 71%. As a result, our approach can be useful in various downstream tasks that require complete TVD information.
[373] arXiv:2405.07434 [pdf, other]: Title: Concurrent aggregate queries

Authors: Gal Sela, Erez Petrank

Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS)

Concurrent data structures serve as fundamental building blocks for concurrent computing. Many concurrent counterparts have been designed for basic sequential mechanisms; however, one notable omission is a concurrent tree that supports aggregate queries. Aggregate queries essentially compile succinct information about a range of data items, for example, calculating the average salary of employees in their 30s. Such queries play an essential role in various applications and are commonly taught in undergraduate data structures courses. In this paper, we formalize a type of aggregate queries that can be efficiently supported by concurrent trees and present a design for implementing these queries on concurrent trees. We bring two algorithms implementing this design, where one optimizes for tree update time, while the other optimizes for aggregate query time. We analyze their correctness and complexity, demonstrating the trade-offs between query time and update time.
[374] arXiv:2405.07435 [pdf, ps, other]: Title: An Efficient Multimodal Learning Framework to Comprehend Consumer Preferences Using BERT and Cross-Attention

Authors: Junichiro Niimi

Comments: This manuscript is under peer review

Subjects: Computational Engineering, Finance, and Science (cs.CE)

Today, the acquisition of various behavioral log data has enabled deeper understanding of customer preferences and future behaviors in the marketing field. In particular, multimodal deep learning has achieved highly accurate predictions by combining multiple types of data. Many of these studies utilize with feature fusion to construct multimodal models, which combines extracted representations from each modality. However, since feature fusion treats information from each modality equally, it is difficult to perform flexible analysis such as the attention mechanism that has been used extensively in recent years. Therefore, this study proposes a context-aware multimodal deep learning model that combines Bidirectional Encoder Representations from Transformers (BERT) and cross-attention Transformer, which dynamically changes the attention of deep-contextualized word representations based on background information such as consumer demographic and lifestyle variables. We conduct a comprehensive analysis and demonstrate the effectiveness of our model by comparing it with six reference models in three categories using behavioral logs stored on an online platform. In addition, we present an efficient multimodal learning method by comparing the learning efficiency depending on the optimizers and the prediction accuracy depending on the number of tokens in the text data.
[375] arXiv:2405.07436 [pdf, other]: Title: Can Language Models Explain Their Own Classification Behavior?

Authors: Dane Sherburn, Bilal Chughtai, Owain Evans

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Large language models (LLMs) perform well at a myriad of tasks, but explaining the processes behind this performance is a challenge. This paper investigates whether LLMs can give faithful high-level explanations of their own internal processes. To explore this, we introduce a dataset, ArticulateRules, of few-shot text-based classification tasks generated by simple rules. Each rule is associated with a simple natural-language explanation. We test whether models that have learned to classify inputs competently (both in- and out-of-distribution) are able to articulate freeform natural language explanations that match their classification behavior. Our dataset can be used for both in-context and finetuning evaluations. We evaluate a range of LLMs, demonstrating that articulation accuracy varies considerably between models, with a particularly sharp increase from GPT-3 to GPT-4. We then investigate whether we can improve GPT-3's articulation accuracy through a range of methods. GPT-3 completely fails to articulate 7/10 rules in our test, even after additional finetuning on correct explanations. We release our dataset, ArticulateRules, which can be used to test self-explanation for LLMs trained either in-context or by finetuning.
[376] arXiv:2405.07437 [pdf, other]: Title: Evaluation of Retrieval-Augmented Generation: A Survey

Authors: Hao Yu, Aoran Gan, Kai Zhang, Shiwei Tong, Qi Liu, Zhaofeng Liu

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Retrieval-Augmented Generation (RAG) has emerged as a pivotal innovation in natural language processing, enhancing generative models by incorporating external information retrieval. Evaluating RAG systems, however, poses distinct challenges due to their hybrid structure and reliance on dynamic knowledge sources. We consequently enhanced an extensive survey and proposed an analysis framework for benchmarks of RAG systems, RAGR (Retrieval, Generation, Additional Requirement), designed to systematically analyze RAG benchmarks by focusing on measurable outputs and established truths. Specifically, we scrutinize and contrast multiple quantifiable metrics of the Retrieval and Generation component, such as relevance, accuracy, and faithfulness, of the internal links within the current RAG evaluation methods, covering the possible output and ground truth pairs. We also analyze the integration of additional requirements of different works, discuss the limitations of current benchmarks, and propose potential directions for further research to address these shortcomings and advance the field of RAG evaluation. In conclusion, this paper collates the challenges associated with RAG evaluation. It presents a thorough analysis and examination of existing methodologies for RAG benchmark design based on the proposed RGAR framework.
[377] arXiv:2405.07438 [pdf, other]: Title: Towards improved software visualisation of parameterised REE patterns: Introducing REEkit for geological analysis

Authors: Jaxon Kneipp, Alex Potanin, Michael Anenburg

Subjects: Human-Computer Interaction (cs.HC)

Modern geological studies and mineral exploration techniques rely heavily on being able to digitally visualise and interpret data. Rare earth elements (REEs) are vital for renewable energy technologies. REE concentrations, when normalised to a standard material, show unique geometric curves (or patterns) in geological samples due to their similar chemical properties. The lambda technique can be used to describe these patterns and turn them into points - making it easier to visualise and interpret larger datasets. Lambdas have the potential to help industry understand intricate sample relationships and the geological and economic importance of their data.
This study explored the use of lambdas through the evaluation of various visualisation methods to determine their usefulness in mineral exploration. The 'REEkit' platform facilitated the evaluation of the different visualisation methods and gauged industry interest and acceptance of such a service. Qualitative data was gathered through contextual inquiry, utilising semi-structured interviews and an observational session with 10 participants. Conceptual thematic analysis was applied to extract key findings.
This study found that two critical factors for successful lambda data visualisation in the mineral exploration industry are familiarity and clarity: visualisations that were familiar and commonplace for users allowed for better analysis and clear communication to non-technical audiences. This included visualisations such as the 3D scatter plot and scatter plot matrix. Furthermore, visualisations that complemented each other and seamlessly integrated into the same workflow provided diverse perspectives on the data. Important aspects included understanding population grouping versus data distribution, achieved through combinations such as scatter plot and density contour plot, or 3D scatter plot and violin plot.
[378] arXiv:2405.07440 [pdf, other]: Title: Maximizing Information Gain in Privacy-Aware Active Learning of Email Anomalies

Authors: Mu-Huan Miles Chung, Sharon Li, Jaturong Kongmanee, Lu Wang, Yuhong Yang, Calvin Giang, Khilan Jerath, Abhay Raman, David Lie, Mark Chignell

Comments: arXiv admin note: substantial text overlap with arXiv:2303.00870

Subjects: Human-Computer Interaction (cs.HC); Cryptography and Security (cs.CR); Machine Learning (cs.LG)

Redacted emails satisfy most privacy requirements but they make it more difficult to detect anomalous emails that may be indicative of data exfiltration. In this paper we develop an enhanced method of Active Learning using an information gain maximizing heuristic, and we evaluate its effectiveness in a real world setting where only redacted versions of email could be labeled by human analysts due to privacy concerns. In the first case study we examined how Active Learning should be carried out. We found that model performance was best when a single highly skilled (in terms of the labelling task) analyst provided the labels. In the second case study we used confidence ratings to estimate the labeling uncertainty of analysts and then prioritized instances for labeling based on the expected information gain (the difference between model uncertainty and analyst uncertainty) that would be provided by labelling each instance. We found that the information maximization gain heuristic improved model performance over existing sampling methods for Active Learning. Based on the results obtained, we recommend that analysts should be screened, and possibly trained, prior to implementation of Active Learning in cybersecurity applications. We also recommend that the information gain maximizing sample method (based on expert confidence) should be used in early stages of Active Learning, providing that well-calibrated confidence can be obtained. We also note that the expertise of analysts should be assessed prior to Active Learning, as we found that analysts with lower labelling skill had poorly calibrated (over-) confidence in their labels.
[379] arXiv:2405.07441 [pdf, other]: Title: Reducing Spatial Discretization Error on Coarse CFD Simulations Using an OpenFOAM-Embedded Deep Learning Framework

Authors: Jesus Gonzalez-Sieiro, David Pardo, Vincenzo Nava, Victor M. Calo, Markus Towara

Subjects: Machine Learning (cs.LG); Fluid Dynamics (physics.flu-dyn)

We propose a method for reducing the spatial discretization error of coarse computational fluid dynamics (CFD) problems by enhancing the quality of low-resolution simulations using a deep learning model fed with high-quality data. We substitute the default differencing scheme for the convection term by a feed-forward neural network that interpolates velocities from cell centers to face values to produce velocities that approximate the fine-mesh data well. The deep learning framework incorporates the open-source CFD code OpenFOAM, resulting in an end-to-end differentiable model. We automatically differentiate the CFD physics using a discrete adjoint code version. We present a fast communication method between TensorFlow (Python) and OpenFOAM (c++) that accelerates the training process. We applied the model to the flow past a square cylinder problem, reducing the error to about 50% for simulations outside the training distribution compared to the traditional solver in the x- and y-velocity components using an 8x coarser mesh. The training is affordable in terms of time and data samples since the architecture exploits the local features of the physics while generating stable predictions for mid-term simulations.
[380] arXiv:2405.07442 [pdf, ps, other]: Title: Rene: A Pre-trained Multi-modal Architecture for Auscultation of Respiratory Diseases

Authors: Pengfei Zhang, Zhihang Zheng, Shichen Zhang, Minghao Yang, Shaojun Tang

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS); Quantitative Methods (q-bio.QM)

This study presents a novel methodology utilizing a pre-trained speech recognition model for processing respiratory sound data. By incorporating medical record information, we introduce an innovative multi-modal deep-learning architecture, named Rene, which addresses the challenges of poor interpretability and underperformance in real-time clinical diagnostic response observed in previous respiratory disease-focused models. The proposed Rene architecture demonstrated significant improvements of 10.24%, 16.15%, 15.29%, and 18.90% respectively, compared to the baseline across four tasks related to respiratory event detection and audio record classification on the SPRSound database. In patient disease prediction tests on the ICBHI database, the architecture exhibited improvements of 23% in the mean of average score and harmonic score compared to the baseline. Furthermore, we developed a real-time respiratory sound discrimination system based on the Rene architecture, featuring a dual-thread design and compressed model parameters for simultaneous microphone recording and real-time dynamic decoding. Employing state-of-the-art Edge AI technology, this system enables rapid and accurate responses for respiratory sound auscultation, facilitating deployment on wearable clinical detection devices to capture incremental data, which can be synergistically evolved with large-scale models deployed on cloud servers for downstream tasks.
[381] arXiv:2405.07443 [pdf, other]: Title: Minimum-Variance Recursive State Estimation for 2-D Systems: When Asynchronous Multi-Channel Delays meet Energy Harvesting Constraints

Authors: Yu Chen, Wei Wang, Juanjuan Xu, Chunyan Han

Subjects: Systems and Control (eess.SY)

This paper is concerned with the state estimation problem for two-dimensional systems with asynchronous multichannel delays and energy harvesting constraints. In the system, each smart sensor has a certain probability of harvesting energy from the external environment, the authorized transmission between the sensor and the remote filter is contingent upon the current energy level of the sensor, which results in intermittent transmission of observation information. Addressing the issue of incomplete observation information due to asynchronous multi-channel delays, a novel approach for observation partition reconstruction is proposed to convert the delayed activated observation sequences into equivalent delay-free activated observation sequences. Through generating spatial equivalency validation, it is found that the reconstructed delay-free activated observation sequences contain the same information as the original delayed activated observation sequences. Based on the reconstructed activated observation sequence and activated probability, a novel unbiased h+1-step recursive estimator is constructed. Then, the evolution of the probability distribution of the energy level is discussed. The estimation gains are obtained by minimizing the filtering error covariance. Subsequently, through parameter assumptions, a uniform lower bound and a recursive upper bound for the filtering error covariance are presented. And the monotonicity analysis of activated probability on estimation performance is given. Finally, the effectiveness of the proposed estimation scheme is verified through a numerical simulation example.
[382] arXiv:2405.07444 [pdf, other]: Title: Motion Keyframe Interpolation for Any Human Skeleton via Temporally Consistent Point Cloud Sampling and Reconstruction

Authors: Clinton Mo, Kun Hu, Chengjiang Long, Dong Yuan, Zhiyong Wang

Comments: 17 pages, 7 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV)

In the character animation field, modern supervised keyframe interpolation models have demonstrated exceptional performance in constructing natural human motions from sparse pose definitions. As supervised models, large motion datasets are necessary to facilitate the learning process; however, since motion is represented with fixed hierarchical skeletons, such datasets are incompatible for skeletons outside the datasets' native configurations. Consequently, the expected availability of a motion dataset for desired skeletons severely hinders the feasibility of learned interpolation in practice. To combat this limitation, we propose Point Cloud-based Motion Representation Learning (PC-MRL), an unsupervised approach to enabling cross-compatibility between skeletons for motion interpolation learning. PC-MRL consists of a skeleton obfuscation strategy using temporal point cloud sampling, and an unsupervised skeleton reconstruction method from point clouds. We devise a temporal point-wise K-nearest neighbors loss for unsupervised learning. Moreover, we propose First-frame Offset Quaternion (FOQ) and Rest Pose Augmentation (RPA) strategies to overcome necessary limitations of our unsupervised point cloud-to-skeletal motion process. Comprehensive experiments demonstrate the effectiveness of PC-MRL in motion interpolation for desired skeletons without supervision from native datasets.
[383] arXiv:2405.07445 [pdf, other]: Title: Cybathlon -- Legged Mobile Assistance for Quadriplegics

Authors: Carmen Scheidemann, Andrei Cramariuc, Marco Hutter

Subjects: Robotics (cs.RO)

Assistance robots are the future for people who need daily care due to limited mobility or being wheelchair-bound. Current solutions of attaching robotic arms to motorized wheelchairs only provide limited additional mobility at the cost of increased size. We present a mouth joystick control interface, augmented with voice commands, for an independent quadrupedal assistance robot with an arm. We validate and showcase our system in the Cybathlon Challenges February 2024 Assistance Robot Race, where we solve four everyday tasks in record time, winning first place. Our system remains generic and sets the basis for a platform that could help and provide independence in the everyday lives of people in wheelchairs.
[384] arXiv:2405.07447 [pdf, ps, other]: Title: From traces to measures: Large language models as a tool for psychological measurement from text

Authors: Joseph J.P. Simons, Wong Liang Ze, Prasanta Bhattacharya, Brandon Siyuan Loh, Wei Gao

Comments: 12 pages, 2 figures, 1 table

Subjects: Human-Computer Interaction (cs.HC)

Digital trace data provide potentially valuable resources for understanding human behaviour, but their value has been limited by issues of unclear measurement. The growth of large language models provides an opportunity to address this limitation in the case of text data. Specifically, recognizing cases where their responses are a form of psychological measurement (the use of observable indicators to assess an underlying construct) allows existing measures and accuracy assessment frameworks from psychology to be re-purposed to use with large language models. Based on this, we offer four methodological recommendations for using these models to quantify text features: (1) identify the target of measurement, (2) use multiple prompts, (3) assess internal consistency, and (4) treat evaluation metrics (such as human annotations) as expected correlates rather than direct ground-truth measures. Additionally, we provide a workflow for implementing this approach.
[385] arXiv:2405.07448 [pdf, other]: Title: Evaluating the Language-Based Security for Plugin Development

Authors: Naisheng Liang, Alex Potanin

Subjects: Software Engineering (cs.SE); Programming Languages (cs.PL)

With the increasing popularity of plugin-based software systems, ensuring the security of plugins has become a critical concern. When users install plugins or browse websites with plugins from an untrusted source, how can we be sure that they do have any undesirable functions implicitly? In this research, we present a comprehensive study on language-based security mechanisms for plugin development. We aim to enhance the understanding of access control vulnerabilities in plugins and explore effective security measures by introducing a capability-based system. We also developed and evaluated test plugins to assess the security mechanisms in popular development environments such as IntelliJ IDEA and Visual Studio Code by utilising Java, JavaScript, and associated APIs and frameworks. We also explore the concept of capability-based module systems as an alternative approach to plugin security. A comparative analysis is conducted to evaluate the effectiveness of capability-based systems in addressing access control vulnerabilities identified in earlier sections. Finally, recommendations for improving plugin security practices and tools will be presented, emphasizing the importance of robust security measures in the ever-evolving landscape of software plugins.
[386] arXiv:2405.07450 [pdf, other]: Title: Locality-Preserving Free-Form Deformation

Authors: Tsukasa Fukusato, Akinobu Maejima, Takeo Igarashi

Subjects: Graphics (cs.GR)

This paper proposes a method to estimate the locations of grid handles in free-form deformation (FFD) while preserving the local shape characteristics of the 2D/3D input model embedded into the grid, named locality-preserving FFD (lp-FFD). Users first specify some vertex locations in the input model and grid handle locations. The system then optimizes all locations of grid handles by minimizing the distortion of the input model's mesh elements. The proposed method is fast and stable, allowing the user to directly and indirectly make the deformed shape of mesh model and grid. This paper shows some examples of deformation results to demonstrate the robustness of our lp-FFD. In addition, we conducted a user study and confirm our lp-FFD's efficiency and effectiveness in shape deformation is higher than those of existing methods used in commercial software.
[387] arXiv:2405.07451 [pdf, other]: Title: CLIP-Powered TASS: Target-Aware Single-Stream Network for Audio-Visual Question Answering

Authors: Yuanyuan Jiang, Jianqin Yin

Comments: Submitted to the Journal on February 6, 2024

Subjects: Computer Vision and Pattern Recognition (cs.CV)

While vision-language pretrained models (VLMs) excel in various multimodal understanding tasks, their potential in fine-grained audio-visual reasoning, particularly for audio-visual question answering (AVQA), remains largely unexplored. AVQA presents specific challenges for VLMs due to the requirement of visual understanding at the region level and seamless integration with audio modality. Previous VLM-based AVQA methods merely used CLIP as a feature encoder but underutilized its knowledge, and mistreated audio and video as separate entities in a dual-stream framework as most AVQA methods. This paper proposes a new CLIP-powered target-aware single-stream (TASS) network for AVQA using the image-text matching knowledge of the pretrained model through the audio-visual matching characteristic of nature. It consists of two key components: the target-aware spatial grounding module (TSG+) and the single-stream joint temporal grounding module (JTG). Specifically, we propose a TSG+ module to transfer the image-text matching knowledge from CLIP models to our region-text matching process without corresponding ground-truth labels. Moreover, unlike previous separate dual-stream networks that still required an additional audio-visual fusion module, JTG unifies audio-visual fusion and question-aware temporal grounding in a simplified single-stream architecture. It treats audio and video as a cohesive entity and further extends the pretrained image-text knowledge to audio-text matching by preserving their temporal correlation with our proposed cross-modal synchrony (CMS) loss. Extensive experiments conducted on the MUSIC-AVQA benchmark verified the effectiveness of our proposed method over existing state-of-the-art methods.
[388] arXiv:2405.07453 [pdf, other]: Title: An Effectiveness Study Across Baseline and Neural Network-based Force Estimation Methods on the da Vinci Research Kit Si System

Authors: Hao Yang, Ayberk Acar, Keshuai Xu, Anton Deguet, Peter Kazanzides, Jie Ying Wu

Comments: Accepted by the Hamlyn Symposium on Medical Robotics 2024

Subjects: Robotics (cs.RO); Machine Learning (cs.LG)

In this study, we further investigate the robustness and generalization ability of an neural network (NN) based force estimation method, using the da Vinci Research Kit Si (dVRK-Si). To evaluate our method's performance, we compare the force estimation accuracy with several baseline methods. We conduct comparative studies between the dVRK classic and dVRK-Si systems to benchmark the effectiveness of these approaches.
We conclude that the NN-based method provides comparable force estimation accuracy across the two systems, as the average root mean square error (RMSE) over the average range of force ratio is approximately 3.07% for the dVRK classic, and 5.27% for the dVRK-Si. On the dVRK-Si, the force estimation RMSEs for all the baseline methods are 2 to 4 times larger than the NN-based method in all directions. One possible reason is, we made assumptions in the baseline methods that static forces remain the same or dynamics is time-invariant. These assumptions may hold for the dVRK Classic, as it has pre-loaded weight and maintains horizontal self balance. Since the dVRK-Si configuration does not have this property, assumptions do not hold anymore, therefore the NN-based method significantly outperforms.
[389] arXiv:2405.07454 [pdf, ps, other]: Title: On Securing Analog Lagrange Coded Computing from Colluding Adversaries

Authors: Rimpi Borah, J. Harshan

Comments: To appear in the proceedings of IEEE ISIT 2024

Subjects: Information Theory (cs.IT)

Analog Lagrange Coded Computing (ALCC) is a recently proposed coded computing paradigm wherein certain computations over analog datasets can be efficiently performed using distributed worker nodes through floating point implementation. While ALCC is known to preserve privacy of data from the workers, it is not resilient to adversarial workers that return erroneous computation results. Pointing at this security vulnerability, we focus on securing ALCC from a wide range of non-colluding and colluding adversarial workers. As a foundational step, we make use of error-correction algorithms for Discrete Fourier Transform (DFT) codes to build novel algorithms to nullify the erroneous computations returned from the adversaries. Furthermore, when such a robust ALCC is implemented in practical settings, we show that the presence of precision errors in the system can be exploited by the adversaries to propose novel colluding attacks to degrade the computation accuracy. As the main takeaway, we prove a counter-intuitive result that not all the adversaries should inject noise in their computations in order to optimally degrade the accuracy of the ALCC framework. This is the first work of its kind to address the vulnerability of ALCC against colluding adversaries.
[390] arXiv:2405.07456 [pdf, other]: Title: Boosting House Price Estimations with Multi-Head Gated Attention

Authors: Zakaria Abdellah Sellam, Cosimo Distante, Abdelmalik Taleb-Ahmed, Pier Luigi Mazzeo

Subjects: Machine Learning (cs.LG)

Evaluating house prices is crucial for various stakeholders, including homeowners, investors, and policymakers. However, traditional spatial interpolation methods have limitations in capturing the complex spatial relationships that affect property values. To address these challenges, we have developed a new method called Multi-Head Gated Attention for spatial interpolation. Our approach builds upon attention-based interpolation models and incorporates multiple attention heads and gating mechanisms to capture spatial dependencies and contextual information better. Importantly, our model produces embeddings that reduce the dimensionality of the data, enabling simpler models like linear regression to outperform complex ensembling models. We conducted extensive experiments to compare our model with baseline methods and the original attention-based interpolation model. The results show a significant improvement in the accuracy of house price predictions, validating the effectiveness of our approach. This research advances the field of spatial interpolation and provides a robust tool for more precise house price evaluation. Our GitHub repository.contains the data and code for all datasets, which are available for researchers and practitioners interested in replicating or building upon our work.
[391] arXiv:2405.07458 [pdf, other]: Title: Examining Humanness as a Metaphor to Design Voice User Interfaces

Authors: Smit Desai, Mateusz Dubiel, Luis A. Leiva

Comments: Accepted to appear in the proceedings of CUI 2024

Subjects: Human-Computer Interaction (cs.HC)

Voice User Interfaces (VUIs) increasingly leverage 'humanness' as a foundational design metaphor, adopting roles like 'assistants,' 'teachers,' and 'secretaries' to foster natural interactions. Yet, this approach can sometimes misalign user trust and reinforce societal stereotypes, leading to socio-technical challenges that might impede long-term engagement. This paper explores an alternative approach to navigate these challenges-incorporating non-human metaphors in VUI design. We report on a study with 240 participants examining the effects of human versus non-human metaphors on user perceptions within health and finance domains. Results indicate a preference for the human metaphor (doctor) over the non-human (health encyclopedia) in health contexts for its perceived enjoyability and likeability. In finance, however, user perceptions do not significantly differ between human (financial advisor) and non-human (calculator) metaphors. Importantly, our research reveals that the explicit awareness of a metaphor's use influences adoption intentions, with a marked preference for non-human metaphors when their metaphorical nature is not disclosed. These findings highlight context-specific conversation design strategies required in integrating non-human metaphors into VUI design, suggesting tradeoffs and design considerations that could enhance user engagement and adoption.
[392] arXiv:2405.07459 [pdf, other]: Title: DualFocus: A Unified Framework for Integrating Positive and Negative Descriptors in Text-based Person Retrieval

Authors: Yuchuan Deng, Zhanpeng Hu, Jiakun Han, Chuang Deng, Qijun Zhao

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Text-based person retrieval (TPR) aims to retrieve images of a person from an extensive array of candidates based on a given textual description. The core challenge lies in mapping visual and textual data into a unified latent space. While existing TPR methods concentrate on recognizing explicit and positive characteristics, they often neglect the critical influence of negative descriptors, resulting in potential false positives that fulfill positive criteria but could be excluded by negative descriptors. To alleviate these issues, we introduce DualFocus, a unified framework for integrating positive and negative descriptors to enhance the interpretative accuracy of vision-language foundational models regarding textual queries. DualFocus employs Dual (Positive/Negative) Attribute Prompt Learning (DAPL), which integrates Dual Image-Attribute Contrastive (DIAC) Learning and Sensitive Image-Attributes Matching (SIAM) Learning. This way DualFocus enhances the detection of unseen attributes, thereby boosting retrieval precision. To further achieve a balance between coarse and fine-grained alignment of visual and textual embeddings, we propose the Dynamic Tokenwise Similarity (DTS) loss, which refines the representation of both matching and non-matching descriptions, thereby enhancing the matching process through a detailed and adaptable similarity assessment. By focusing on token-level comparisons, DualFocus significantly outperforms existing techniques in both precision and robustness. The experiment results highlight DualFocus's superior performance on CUHK-PEDES, ICFG-PEDES, and RSTPReid.
[393] arXiv:2405.07460 [pdf, other]: Title: HoneyBee: A Scalable Modular Framework for Creating Multimodal Oncology Datasets with Foundational Embedding Models

Authors: Aakash Tripathi, Asim Waqas, Yasin Yilmaz, Ghulam Rasool

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Databases (cs.DB)

Developing accurate machine learning models for oncology requires large-scale, high-quality multimodal datasets. However, creating such datasets remains challenging due to the complexity and heterogeneity of medical data. To address this challenge, we introduce HoneyBee, a scalable modular framework for building multimodal oncology datasets that leverages foundational models to generate representative embeddings. HoneyBee integrates various data modalities, including clinical records, imaging data, and patient outcomes. It employs data preprocessing techniques and transformer-based architectures to generate embeddings that capture the essential features and relationships within the raw medical data. The generated embeddings are stored in a structured format using Hugging Face datasets and PyTorch dataloaders for accessibility. Vector databases enable efficient querying and retrieval for machine learning applications. We demonstrate the effectiveness of HoneyBee through experiments assessing the quality and representativeness of the embeddings. The framework is designed to be extensible to other medical domains and aims to accelerate oncology research by providing high-quality, machine learning-ready datasets. HoneyBee is an ongoing open-source effort, and the code, datasets, and models are available at the project repository.
[394] arXiv:2405.07465 [pdf, other]: Title: Deception in Differential Games: Information Limiting Strategy to Induce Dilemma

Authors: Daigo Shishika, Alexander Von Moll, Dipankar Maity, Michael Dorothy

Subjects: Computer Science and Game Theory (cs.GT); Systems and Control (eess.SY)

Can deception exist in differential games? We provide a case study for a Turret-Attacker differential game, where two Attackers seek to score points by reaching a target region while a Turret tries to minimize the score by aligning itself with the Attackers before they reach the target. In contrast to the original problem solved with complete information, we assume that the Turret only has partial information about the maximum speed of the Attackers. We investigate whether there is any incentive for the Attackers to move slower than their maximum speed in order to ``deceive'' the Turret into taking suboptimal actions. We first describe the existence of a dilemma that the Turret may face. Then we derive a set of initial conditions from which the Attackers can force the Turret into a situation where it must take a guess.
[395] arXiv:2405.07467 [pdf, other]: Title: MCS-SQL: Leveraging Multiple Prompts and Multiple-Choice Selection For Text-to-SQL Generation

Authors: Dongjun Lee, Choongwon Park, Jaehyuk Kim, Heesoo Park

Subjects: Computation and Language (cs.CL)

Recent advancements in large language models (LLMs) have enabled in-context learning (ICL)-based methods that significantly outperform fine-tuning approaches for text-to-SQL tasks. However, their performance is still considerably lower than that of human experts on benchmarks that include complex schemas and queries, such as BIRD. This study considers the sensitivity of LLMs to the prompts and introduces a novel approach that leverages multiple prompts to explore a broader search space for possible answers and effectively aggregate them. Specifically, we robustly refine the database schema through schema linking using multiple prompts. Thereafter, we generate various candidate SQL queries based on the refined schema and diverse prompts. Finally, the candidate queries are filtered based on their confidence scores, and the optimal query is obtained through a multiple-choice selection that is presented to the LLM. When evaluated on the BIRD and Spider benchmarks, the proposed method achieved execution accuracies of 65.5\% and 89.6\%, respectively, significantly outperforming previous ICL-based methods. Moreover, we established a new SOTA performance on the BIRD in terms of both the accuracy and efficiency of the generated queries.
[396] arXiv:2405.07468 [pdf, ps, other]: Title: Evaluating large language models in medical applications: a survey

Authors: Xiaolan Chen, Jiayang Xiang, Shanfu Lu, Yexin Liu, Mingguang He, Danli Shi

Comments: 4 figures, 1 table

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Large language models (LLMs) have emerged as powerful tools with transformative potential across numerous domains, including healthcare and medicine. In the medical domain, LLMs hold promise for tasks ranging from clinical decision support to patient education. However, evaluating the performance of LLMs in medical contexts presents unique challenges due to the complex and critical nature of medical information. This paper provides a comprehensive overview of the landscape of medical LLM evaluation, synthesizing insights from existing studies and highlighting evaluation data sources, task scenarios, and evaluation methods. Additionally, it identifies key challenges and opportunities in medical LLM evaluation, emphasizing the need for continued research and innovation to ensure the responsible integration of LLMs into clinical practice.
[397] arXiv:2405.07472 [pdf, other]: Title: GaussianVTON: 3D Human Virtual Try-ON via Multi-Stage Gaussian Splatting Editing with Image Prompting

Authors: Haodong Chen, Yongle Huang, Haojian Huang, Xiangsheng Ge, Dian Shao

Comments: On-going work

Subjects: Computer Vision and Pattern Recognition (cs.CV)

The increasing prominence of e-commerce has underscored the importance of Virtual Try-On (VTON). However, previous studies predominantly focus on the 2D realm and rely heavily on extensive data for training. Research on 3D VTON primarily centers on garment-body shape compatibility, a topic extensively covered in 2D VTON. Thanks to advances in 3D scene editing, a 2D diffusion model has now been adapted for 3D editing via multi-viewpoint editing. In this work, we propose GaussianVTON, an innovative 3D VTON pipeline integrating Gaussian Splatting (GS) editing with 2D VTON. To facilitate a seamless transition from 2D to 3D VTON, we propose, for the first time, the use of only images as editing prompts for 3D editing. To further address issues, e.g., face blurring, garment inaccuracy, and degraded viewpoint quality during editing, we devise a three-stage refinement strategy to gradually mitigate potential issues. Furthermore, we introduce a new editing strategy termed Edit Recall Reconstruction (ERR) to tackle the limitations of previous editing strategies in leading to complex geometric changes. Our comprehensive experiments demonstrate the superiority of GaussianVTON, offering a novel perspective on 3D VTON while also establishing a novel starting point for image-prompting 3D scene editing.
[398] arXiv:2405.07473 [pdf, other]: Title: Intrinsic Rewards for Exploration without Harm from Observational Noise: A Simulation Study Based on the Free Energy Principle

Authors: Theodore Jerome Tinker, Kenji Doya, Jun Tani

Comments: 54 pages, 11 figures, to be published in Neural Computation

Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

In Reinforcement Learning (RL), artificial agents are trained to maximize numerical rewards by performing tasks. Exploration is essential in RL because agents must discover information before exploiting it. Two rewards encouraging efficient exploration are the entropy of action policy and curiosity for information gain. Entropy is well-established in literature, promoting randomized action selection. Curiosity is defined in a broad variety of ways in literature, promoting discovery of novel experiences. One example, prediction error curiosity, rewards agents for discovering observations they cannot accurately predict. However, such agents may be distracted by unpredictable observational noises known as curiosity traps. Based on the Free Energy Principle (FEP), this paper proposes hidden state curiosity, which rewards agents by the KL divergence between the predictive prior and posterior probabilities of latent variables. We trained six types of agents to navigate mazes: baseline agents without rewards for entropy or curiosity, and agents rewarded for entropy and/or either prediction error curiosity or hidden state curiosity. We find entropy and curiosity result in efficient exploration, especially both employed together. Notably, agents with hidden state curiosity demonstrate resilience against curiosity traps, which hinder agents with prediction error curiosity. This suggests implementing the FEP may enhance the robustness and generalization of RL models, potentially aligning the learning processes of artificial and biological agents.
[399] arXiv:2405.07474 [pdf, other]: Title: Integrating Intent Understanding and Optimal Behavior Planning for Behavior Tree Generation from Human Instructions

Authors: Xinglin Chen, Yishuai Cai, Yunxin Mao, Minglong Li, Wenjing Yang, Weixia Xu, Ji Wang

Subjects: Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Robotics (cs.RO)

Robots executing tasks following human instructions in domestic or industrial environments essentially require both adaptability and reliability. Behavior Tree (BT) emerges as an appropriate control architecture for these scenarios due to its modularity and reactivity. Existing BT generation methods, however, either do not involve interpreting natural language or cannot theoretically guarantee the BTs' success. This paper proposes a two-stage framework for BT generation, which first employs large language models (LLMs) to interpret goals from high-level instructions, then constructs an efficient goal-specific BT through the Optimal Behavior Tree Expansion Algorithm (OBTEA). We represent goals as well-formed formulas in first-order logic, effectively bridging intent understanding and optimal behavior planning. Experiments in the service robot validate the proficiency of LLMs in producing grammatically correct and accurately interpreted goals, demonstrate OBTEA's superiority over the baseline BT Expansion algorithm in various metrics, and finally confirm the practical deployability of our framework. The project website is https://dids-ei.github.io/Project/LLM-OBTEA/.
[400] arXiv:2405.07475 [pdf, other]: Title: How Non-native English Speakers Use, Assess, and Select AI-Generated Paraphrases with Information Aids

Authors: Yewon Kim, Thanh-Long V. Le, Donghwi Kim, Mina Lee, Sung-Ju Lee

Subjects: Human-Computer Interaction (cs.HC)

Non-native English speakers (NNESs) often face challenges in achieving fluency in their written English. AI paraphrasing tools have the potential to improve their writing by suggesting more fluent paraphrases to their original sentences. Yet, the effectiveness of these tools depends on the user's ability to accurately assess and select context-appropriate suggestions, which is a significant challenge for those with limited English proficiency. This paper explores how NNESs utilize a paraphrasing tool augmented with information aids designed to facilitate the assessment of paraphrased suggestions. Through a formative study with 15 NNESs, we identify their specific needs when paraphrasing with AI, leading to the design of a paraphrasing tool integrated with five types of information aids, termed "support features." A user study with 22 NNESs demonstrates their heavy reliance on the paraphrasing functionality throughout the writing process, where they leverage the support features to assess and select suggestions efficiently and comprehensively. When equipped with the support features, NNESs experience enhanced writing experience in efficiency, confidence, and trust. Our findings contribute to the HCI community by (i) identifying the distinct needs of NNESs in AI paraphrasing tools, (ii) elucidating how NNESs use paraphrasing tools with support features, and (iii) offering design implications for the development of more effective AI paraphrasing tools tailored to NNESs' requirements.
[401] arXiv:2405.07478 [pdf, other]: Title: Coded Event-triggered Control for Nonlinear Systems

Authors: Ruihang Ji, Shuzhi Sam Ge, Kai Zhao

Subjects: Systems and Control (eess.SY)

This paper studies a Coded Event-triggered Control (CEC) for a class of nonlinear systems under any initial condition. To reduce communication burden, the CEC is designed from the encoding-decoding viewpoint by which only $m$-length string is transmitted for each communication between CEC and actuator. If a more general Entry Capture Problem is encountered, such control design will be rather complicated yet challenging where the performance constraints are satisfied some time after (rather than from the beginning of) system operation, rendering normally employed prescribed performance control invalid because they may be not defined in the initial interval. By introducing auxiliary functions, we develop a Self-adjustable Prescribed Performance (SPP) mechanism which can flexibly adjust the symmetric or asymmetric performance boundaries to accommodate different initial conditions, providing an effective solution for the underlying tracking problem. In this way, the resulted CEC can not only consume less communication resources but also regulate the tracking error under any initial condition into an allowable set before a given time in a bounded and customizable manner. Simulation results verify and clarify the theoretical findings.
[402] arXiv:2405.07479 [pdf, other]: Title: Enhancing 3D Object Detection by Using Neural Network with Self-adaptive Thresholding

Authors: Houze Liu, Chongqing Wang, Xiaoan Zhan, Haotian Zheng, Chang Che

Comments: This paper has been accepted by the CONF-SEML 2024

Subjects: Robotics (cs.RO)

Robust 3D object detection remains a pivotal concern in the domain of autonomous field robotics. Despite notable enhancements in detection accuracy across standard datasets, real-world urban environments, characterized by their unstructured and dynamic nature, frequently precipitate an elevated incidence of false positives, thereby undermining the reliability of existing detection paradigms. In this context, our study introduces an advanced post-processing algorithm that modulates detection thresholds dynamically relative to the distance from the ego object. Traditional perception systems typically utilize a uniform threshold, which often leads to decreased efficacy in detecting distant objects. In contrast, our proposed methodology employs a Neural Network with a self-adaptive thresholding mechanism that significantly attenuates false negatives while concurrently diminishing false positives, particularly in complex urban settings. Empirical results substantiate that our algorithm not only augments the performance of 3D object detection models in diverse urban and adverse weather scenarios but also establishes a new benchmark for adaptive thresholding techniques in field robotics.
[403] arXiv:2405.07481 [pdf, other]: Title: Text Grouping Adapter: Adapting Pre-trained Text Detector for Layout Analysis

Authors: Tianci Bi, Xiaoyi Zhang, Zhizheng Zhang, Wenxuan Xie, Cuiling Lan, Yan Lu, Nanning Zheng

Comments: Accepted to CVPR 2024

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Significant progress has been made in scene text detection models since the rise of deep learning, but scene text layout analysis, which aims to group detected text instances as paragraphs, has not kept pace. Previous works either treated text detection and grouping using separate models, or train a model from scratch while using a unified one. All of them have not yet made full use of the already well-trained text detectors and easily obtainable detection datasets. In this paper, we present Text Grouping Adapter (TGA), a module that can enable the utilization of various pre-trained text detectors to learn layout analysis, allowing us to adopt a well-trained text detector right off the shelf or just fine-tune it efficiently. Designed to be compatible with various text detector architectures, TGA takes detected text regions and image features as universal inputs to assemble text instance features. To capture broader contextual information for layout analysis, we propose to predict text group masks from text instance features by one-to-many assignment. Our comprehensive experiments demonstrate that, even with frozen pre-trained models, incorporating our TGA into various pre-trained text detectors and text spotters can achieve superior layout analysis performance, simultaneously inheriting generalized text detection ability from pre-training. In the case of full parameter fine-tuning, we can further improve layout analysis performance.
[404] arXiv:2405.07487 [pdf, other]: Title: An Efficient Compression Method for Sign Information of DCT Coefficients via Sign Retrieval

Authors: Chihiro Tsutake, Keita Takahashi, Toshiaki Fujii

Journal-ref: 2021 IEEE International Conference on Image Processing

Subjects: Information Theory (cs.IT)

Compression of the sign information of discrete cosine transform coefficients is an intractable problem in image compression schemes due to the equiprobable occurrence of the sign bits. To overcome this difficulty, we propose an efficient compression method for such sign information based on phase retrieval, which is a classical signal restoration problem attempting to find the phase information of discrete Fourier transform coefficients from their magnitudes. In our compression strategy, the sign bits of all the AC components in the cosine domain are excluded from a bitstream at the encoder and are complemented at the decoder by solving a sign recovery problem, which we call \textit{sign retrieval}. The experimental results demonstrate that the proposed method outperforms previous techniques for sign compression in terms of a rate-distortion criterion. Our method implemented in Python language is available from \url{https://github.com/ctsutake/sr}.
[405] arXiv:2405.07488 [pdf, other]: Title: Predictive Modeling of Flexible EHD Pumps using Kolmogorov-Arnold Networks

Authors: Yanhong Peng, Miao He, Fangchao Hu, Zebing Mao, Xia Huang, Jun Ding

Subjects: Machine Learning (cs.LG); Robotics (cs.RO); Symbolic Computation (cs.SC)

We present a novel approach to predicting the pressure and flow rate of flexible electrohydrodynamic pumps using the Kolmogorov-Arnold Network. Inspired by the Kolmogorov-Arnold representation theorem, KAN replaces fixed activation functions with learnable spline-based activation functions, enabling it to approximate complex nonlinear functions more effectively than traditional models like Multi-Layer Perceptron and Random Forest. We evaluated KAN on a dataset of flexible EHD pump parameters and compared its performance against RF, and MLP models. KAN achieved superior predictive accuracy, with Mean Squared Errors of 12.186 and 0.001 for pressure and flow rate predictions, respectively. The symbolic formulas extracted from KAN provided insights into the nonlinear relationships between input parameters and pump performance. These findings demonstrate that KAN offers exceptional accuracy and interpretability, making it a promising alternative for predictive modeling in electrohydrodynamic pumping.
[406] arXiv:2405.07489 [pdf, other]: Title: Sparse Domain Transfer via Elastic Net Regularization

Authors: Jingwei Zhang, Farzan Farnia

Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

Transportation of samples across different domains is a central task in several machine learning problems. A sensible requirement for domain transfer tasks in computer vision and language domains is the sparsity of the transportation map, i.e., the transfer algorithm aims to modify the least number of input features while transporting samples across the source and target domains. In this work, we propose Elastic Net Optimal Transport (ENOT) to address the sparse distribution transfer problem. The ENOT framework utilizes the $L_1$-norm and $L_2$-norm regularization mechanisms to find a sparse and stable transportation map between the source and target domains. To compute the ENOT transport map, we consider the dual formulation of the ENOT optimization task and prove that the sparsified gradient of the optimal potential function in the ENOT's dual representation provides the ENOT transport map. Furthermore, we demonstrate the application of the ENOT framework to perform feature selection for sparse domain transfer. We present the numerical results of applying ENOT to several domain transfer problems for synthetic Gaussian mixtures and real image and text data. Our empirical results indicate the success of the ENOT framework in identifying a sparse domain transport map.
[407] arXiv:2405.07490 [pdf, other]: Title: Strategic Data Ordering: Enhancing Large Language Model Performance through Curriculum Learning

Authors: Jisu Kim, Juhwan Lee

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

The rapid advancement of Large Language Models (LLMs) has improved text understanding and generation but poses challenges in computational resources. This study proposes a curriculum learning-inspired, data-centric training strategy that begins with simpler tasks and progresses to more complex ones, using criteria such as prompt length, attention scores, and loss values to structure the training data. Experiments with Mistral-7B (Jiang et al., 2023) and Gemma-7B (Team et al., 2024) models demonstrate that curriculum learning slightly improves performance compared to traditional random data shuffling. Notably, we observed that sorting data based on our proposed attention criteria generally led to better performance. This approach offers a sustainable method to enhance LLM performance without increasing model size or dataset volume, addressing scalability challenges in LLM training.
[408] arXiv:2405.07493 [pdf, ps, other]: Title: Variable-Length Secret Key Agreement via Random Stopping Time

Authors: Junda Zhou, Cheuk Ting Li

Comments: 8 pages

Subjects: Information Theory (cs.IT)

We consider a key agreement setting where two parties observe correlated random sources, and want to agree on a secret key via public discussions. In order to allow the key length to adapt to the realizations of the random sources, we allow the key to be of variable length, subject to a novel variable-length version of the uniformity constraint based on random stopping time. We propose simple, computationally efficient key agreement schemes under the new constraint. The proposed scheme can be considered as the key agreement analogue of variable-length source coding via Huffman coding, and the Knuth-Yao random number generator.
[409] arXiv:2405.07495 [pdf, ps, other]: Title: MacBehaviour: An R package for behavioural experimentation on large language models

Authors: Xufeng Duan, Shixuan Li, Zhenguang G. Cai1

Comments: 11 pages

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

There has been increasing interest in investigating the behaviours of large language models (LLMs) and LLM-powered chatbots by treating an LLM as a participant in a psychological experiment. We therefore developed an R package called "MacBehaviour" that aims to interact with more than 60 language models in one package (e.g., OpenAI's GPT family, the Claude family, Gemini, Llama family, and open-source models) and streamline the experimental process of LLMs behaviour experiments. The package offers a comprehensive set of functions designed for LLM experiments, covering experiment design, stimuli presentation, model behaviour manipulation, logging response and token probability. To demonstrate the utility and effectiveness of "MacBehaviour," we conducted three validation experiments on three LLMs (GPT-3.5, Llama-2 7B, and Vicuna-1.5 13B) to replicate sound-gender association in LLMs. The results consistently showed that they exhibit human-like tendencies to infer gender from novel personal names based on their phonology, as previously demonstrated (Cai et al., 2023). In summary, "MacBehaviour" is an R package for machine behaviour studies which offers a user-friendly interface and comprehensive features to simplify and standardize the experimental process.
[410] arXiv:2405.07496 [pdf, other]: Title: Oedipus: LLM-enchanced Reasoning CAPTCHA Solver

Authors: Gelei Deng, Haoran Ou, Yi Liu, Jie Zhang, Tianwei Zhang, Yang Liu

Subjects: Cryptography and Security (cs.CR)

CAPTCHAs have become a ubiquitous tool in safeguarding applications from automated bots. Over time, the arms race between CAPTCHA development and evasion techniques has led to increasingly sophisticated and diverse designs. The latest iteration, reasoning CAPTCHAs, exploits tasks that are intuitively simple for humans but challenging for conventional AI technologies, thereby enhancing security measures.
Driven by the evolving AI capabilities, particularly the advancements in Large Language Models (LLMs), we investigate the potential of multimodal LLMs to solve modern reasoning CAPTCHAs. Our empirical analysis reveals that, despite their advanced reasoning capabilities, LLMs struggle to solve these CAPTCHAs effectively. In response, we introduce Oedipus, an innovative end-to-end framework for automated reasoning CAPTCHA solving. Central to this framework is a novel strategy that dissects the complex and human-easy-AI-hard tasks into a sequence of simpler and AI-easy steps. This is achieved through the development of a Domain Specific Language (DSL) for CAPTCHAs that guides LLMs in generating actionable sub-steps for each CAPTCHA challenge. The DSL is customized to ensure that each unit operation is a highly solvable subtask revealed in our previous empirical study. These sub-steps are then tackled sequentially using the Chain-of-Thought (CoT) methodology.
Our evaluation shows that Oedipus effectively resolves the studied CAPTCHAs, achieving an average success rate of 63.5\%. Remarkably, it also shows adaptability to the most recent CAPTCHA designs introduced in late 2023, which are not included in our initial study. This prompts a discussion on future strategies for designing reasoning CAPTCHAs that can effectively counter advanced AI solutions.
[411] arXiv:2405.07497 [pdf, other]: Title: Towards Subgraph Isomorphism Counting with Graph Kernels

Authors: Xin Liu, Weiqi Wang, Jiaxin Bai, Yangqiu Song

Subjects: Machine Learning (cs.LG)

Subgraph isomorphism counting is known as #P-complete and requires exponential time to find the accurate solution. Utilizing representation learning has been shown as a promising direction to represent substructures and approximate the solution. Graph kernels that implicitly capture the correlations among substructures in diverse graphs have exhibited great discriminative power in graph classification, so we pioneeringly investigate their potential in counting subgraph isomorphisms and further explore the augmentation of kernel capability through various variants, including polynomial and Gaussian kernels. Through comprehensive analysis, we enhance the graph kernels by incorporating neighborhood information. Finally, we present the results of extensive experiments to demonstrate the effectiveness of the enhanced graph kernels and discuss promising directions for future research.
[412] arXiv:2405.07500 [pdf, other]: Title: PromptLink: Leveraging Large Language Models for Cross-Source Biomedical Concept Linking

Authors: Yuzhang Xie, Jiaying Lu, Joyce Ho, Fadi Nahab, Xiao Hu, Carl Yang

Journal-ref: Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (Short-Paper Track), 2024

Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

Linking (aligning) biomedical concepts across diverse data sources enables various integrative analyses, but it is challenging due to the discrepancies in concept naming conventions. Various strategies have been developed to overcome this challenge, such as those based on string-matching rules, manually crafted thesauri, and machine learning models. However, these methods are constrained by limited prior biomedical knowledge and can hardly generalize beyond the limited amounts of rules, thesauri, or training samples. Recently, large language models (LLMs) have exhibited impressive results in diverse biomedical NLP tasks due to their unprecedentedly rich prior knowledge and strong zero-shot prediction abilities. However, LLMs suffer from issues including high costs, limited context length, and unreliable predictions. In this research, we propose PromptLink, a novel biomedical concept linking framework that leverages LLMs. It first employs a biomedical-specialized pre-trained language model to generate candidate concepts that can fit in the LLM context windows. Then it utilizes an LLM to link concepts through two-stage prompts, where the first-stage prompt aims to elicit the biomedical prior knowledge from the LLM for the concept linking task and the second-stage prompt enforces the LLM to reflect on its own predictions to further enhance their reliability. Empirical results on the concept linking task between two EHR datasets and an external biomedical KG demonstrate the effectiveness of PromptLink. Furthermore, PromptLink is a generic framework without reliance on additional prior knowledge, context, or training data, making it well-suited for concept linking across various types of data sources. The source code is available at https://github.com/constantjxyz/PromptLink.
[413] arXiv:2405.07503 [pdf, other]: Title: Consistency Policy: Accelerated Visuomotor Policies via Consistency Distillation

Authors: Aaditya Prasad, Kevin Lin, Jimmy Wu, Linqi Zhou, Jeannette Bohg

Comments: this https URL

Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)

Many robotic systems, such as mobile manipulators or quadrotors, cannot be equipped with high-end GPUs due to space, weight, and power constraints. These constraints prevent these systems from leveraging recent developments in visuomotor policy architectures that require high-end GPUs to achieve fast policy inference. In this paper, we propose Consistency Policy, a faster and similarly powerful alternative to Diffusion Policy for learning visuomotor robot control. By virtue of its fast inference speed, Consistency Policy can enable low latency decision making in resource-constrained robotic setups. A Consistency Policy is distilled from a pretrained Diffusion Policy by enforcing self-consistency along the Diffusion Policy's learned trajectories. We compare Consistency Policy with Diffusion Policy and other related speed-up methods across 6 simulation tasks as well as two real-world tasks where we demonstrate inference on a laptop GPU. For all these tasks, Consistency Policy speeds up inference by an order of magnitude compared to the fastest alternative method and maintains competitive success rates. We also show that the Conistency Policy training procedure is robust to the pretrained Diffusion Policy's quality, a useful result that helps practioners avoid extensive testing of the pretrained model. Key design decisions that enabled this performance are the choice of consistency objective, reduced initial sample variance, and the choice of preset chaining steps. Code and training details will be released publicly.
[414] arXiv:2405.07505 [pdf, ps, other]: Title: A cyclic proof system for Guarded Kleene Algebra with Tests (full version)

Authors: Jan Rooduijn, Dexter Kozen, Alexandra Silva

Comments: Full version of paper accepted at IJCAR 2024

Subjects: Logic in Computer Science (cs.LO); Formal Languages and Automata Theory (cs.FL); Logic (math.LO)

Guarded Kleene Algebra with Tests (GKAT for short) is an efficient fragment of Kleene Algebra with Tests, suitable for reasoning about simple imperative while-programs. Following earlier work by Das and Pous on Kleene Algebra, we study GKAT from a proof-theoretical perspective. The deterministic nature of GKAT allows for a non-well-founded sequent system whose set of regular proofs is complete with respect to the guarded language model. This is unlike the situation with Kleene Algebra, where hypersequents are required. Moreover, the decision procedure induced by proof search runs in NLOGSPACE, whereas that of Kleene Algebra is in PSPACE.
[415] arXiv:2405.07506 [pdf, other]: Title: Chronoblox: Chronophotographic Sequential Graph Visualization

Authors: Quentin Lobbé (CMB), Camille Roth (CAMS, CMB), Lena Mangold (CAMS, CMB)

Subjects: Human-Computer Interaction (cs.HC)

We introduce Chronoblox, a system for visualizing dynamic graphs. Chronoblox consists of a chronophotography of a sequence of graph snapshots based on a single embedding space common to all time periods. The goal of Chronoblox is to project all snapshots onto a common visualization space so as to represent both local and global dynamics at a glance. In this short paper, we review both the embedding and spatialization strategies. We then explain the way in which Chronoblox translates micro to meso structural evolution visually. We finally evaluate our approach using a synthetic network before illustrating it on a real world retweet network.
[416] arXiv:2405.07508 [pdf, other]: Title: Revealing the value of Repository Centrality in lifespan prediction of Open Source Software Projects

Authors: Runzhi He, Hengzhi Ye, Minghui Zhou

Subjects: Software Engineering (cs.SE)

Background: Open Source Software is the building block of modern software. However, the prevalence of project deprecation in the open source world weakens the integrity of the downstream systems and the broad ecosystem. Therefore it calls for efforts in monitoring and predicting project deprecations, empowering stakeholders to take proactive measures. Challenge: Existing techniques mainly focus on static features on a point in time to make predictions, resulting in limited effects. Goal: We propose a novel metric from the user-repository network, and leverage the metric to fit project deprecation predictors and prove its real-life implications. Method: We establish a comprehensive dataset containing 103,354 non-fork GitHub OSS projects spanning from 2011 to 2023. We propose repository centrality, a family of HITS weights that captures shifts in the popularity of a repository in the repository-user star network. Further with the metric, we utilize the advancements in gradient boosting and deep learning to fit survival analysis models to predict project lifespan or its survival hazard. Results: Our study reveals a correlation between the HITS centrality metrics and the repository deprecation risk. A drop in the HITS weights of a repository indicates a decline in its centrality and prevalence, leading to an increase in its deprecation risk and a decrease in its expected lifespan. Our predictive models powered by repository centrality and other repository features achieve satisfactory accuracy on the test set, with repository centrality being the most significant feature among all. Implications: This research offers a novel perspective on understanding the effect of prevalence on the deprecation of OSS repositories. Our approach to predict repository deprecation help detect health status of project and take actions in advance, fostering a more resilient OSS ecosystem.
[417] arXiv:2405.07509 [pdf, other]: Title: RESTAD: REconstruction and Similarity based Transformer for time series Anomaly Detection

Authors: Ramin Ghorbani, Marcel J.T. Reinders, David M.J. Tax

Comments: Manuscript under review

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Anomaly detection in time series data is crucial across various domains. The scarcity of labeled data for such tasks has increased the attention towards unsupervised learning methods. These approaches, often relying solely on reconstruction error, typically fail to detect subtle anomalies in complex datasets. To address this, we introduce RESTAD, an adaptation of the Transformer model by incorporating a layer of Radial Basis Function (RBF) neurons within its architecture. This layer fits a non-parametric density in the latent representation, such that a high RBF output indicates similarity with predominantly normal training data. RESTAD integrates the RBF similarity scores with the reconstruction errors to increase sensitivity to anomalies. Our empirical evaluations demonstrate that RESTAD outperforms various established baselines across multiple benchmark datasets.
[418] arXiv:2405.07510 [pdf, other]: Title: PeRFlow: Piecewise Rectified Flow as Universal Plug-and-Play Accelerator

Authors: Hanshu Yan, Xingchao Liu, Jiachun Pan, Jun Hao Liew, Qiang Liu, Jiashi Feng

Subjects: Machine Learning (cs.LG)

We present Piecewise Rectified Flow (PeRFlow), a flow-based method for accelerating diffusion models. PeRFlow divides the sampling process of generative flows into several time windows and straightens the trajectories in each interval via the reflow operation, thereby approaching piecewise linear flows. PeRFlow achieves superior performance in a few-step generation. Moreover, through dedicated parameterizations, the obtained PeRFlow models show advantageous transfer ability, serving as universal plug-and-play accelerators that are compatible with various workflows based on the pre-trained diffusion models. The implementations of training and inference are fully open-sourced. https://github.com/magic-research/piecewise-rectified-flow
[419] arXiv:2405.07513 [pdf, other]: Title: Fine-tuning the SwissBERT Encoder Model for Embedding Sentences and Documents

Authors: Juri Grosjean, Jannis Vamvas

Comments: SwissText 2024

Subjects: Computation and Language (cs.CL)

Encoder models trained for the embedding of sentences or short documents have proven useful for tasks such as semantic search and topic modeling. In this paper, we present a version of the SwissBERT encoder model that we specifically fine-tuned for this purpose. SwissBERT contains language adapters for the four national languages of Switzerland -- German, French, Italian, and Romansh -- and has been pre-trained on a large number of news articles in those languages. Using contrastive learning based on a subset of these articles, we trained a fine-tuned version, which we call SentenceSwissBERT. Multilingual experiments on document retrieval and text classification in a Switzerland-specific setting show that SentenceSwissBERT surpasses the accuracy of the original SwissBERT model and of a comparable baseline. The model is openly available for research use.
[420] arXiv:2405.07515 [pdf, other]: Title: OpenBot-Fleet: A System for Collective Learning with Real Robots

Authors: Matthias Müller, Samarth Brahmbhatt, Ankur Deka, Quentin Leboutet, David Hafner, Vladlen Koltun

Comments: Accepted at ICRA'24

Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

We introduce OpenBot-Fleet, a comprehensive open-source cloud robotics system for navigation. OpenBot-Fleet uses smartphones for sensing, local compute and communication, Google Firebase for secure cloud storage and off-board compute, and a robust yet low-cost wheeled robot toact in real-world environments. The robots collect task data and upload it to the cloud where navigation policies can be learned either offline or online and can then be sent back to the robot fleet. In our experiments we distribute 72 robots to a crowd of workers who operate them in homes, and show that OpenBot-Fleet can learn robust navigation policies that generalize to unseen homes with >80% success rate. OpenBot-Fleet represents a significant step forward in cloud robotics, making it possible to deploy large continually learning robot fleets in a cost-effective and scalable manner. All materials can be found at https://www.openbot.org. A video is available at https://youtu.be/wiv2oaDgDi8
[421] arXiv:2405.07516 [pdf, other]: Title: Support-Query Prototype Fusion Network for Few-shot Medical Image Segmentation

Authors: Xiaoxiao Wu, Zhenguo Gao, Xiaowei Chen, Yakai Wang, Shulei Qu, Na Li

Comments: 19 pages, 7 figures, 4 tables

Subjects: Computer Vision and Pattern Recognition (cs.CV)

In recent years, deep learning based on Convolutional Neural Networks (CNNs) has achieved remarkable success in many applications. However, their heavy reliance on extensive labeled data and limited generalization ability to unseen classes pose challenges to their suitability for medical image processing tasks. Few-shot learning, which utilizes a small amount of labeled data to generalize to unseen classes, has emerged as a critical research area, attracting substantial attention. Currently, most studies employ a prototype-based approach, in which prototypical networks are used to construct prototypes from the support set, guiding the processing of the query set to obtain the final results. While effective, this approach heavily relies on the support set while neglecting the query set, resulting in notable disparities within the model classes. To mitigate this drawback, we propose a novel Support-Query Prototype Fusion Network (SQPFNet). SQPFNet initially generates several support prototypes for the foreground areas of the support images, thus producing a coarse segmentation mask. Subsequently, a query prototype is constructed based on the coarse segmentation mask, additionally exploiting pattern information in the query set. Thus, SQPFNet constructs high-quality support-query fused prototypes, upon which the query image is segmented to obtain the final refined query mask. Evaluation results on two public datasets, SABS and CMR, show that SQPFNet achieves state-of-the-art performance.
[422] arXiv:2405.07518 [pdf, other]: Title: SambaNova SN40L: Scaling the AI Memory Wall with Dataflow and Composition of Experts

Authors: Raghu Prabhakar, Ram Sivaramakrishnan, Darshan Gandhi, Yun Du, Mingran Wang, Xiangyu Song, Kejie Zhang, Tianren Gao, Angela Wang, Karen Li, Yongning Sheng, Joshua Brot, Denis Sokolov, Apurv Vivek, Calvin Leung, Arjun Sabnis, Jiayu Bai, Tuowen Zhao, Mark Gottscho, David Jackson, Mark Luttrell, Manish K. Shah, Edison Chen, Kaizhao Liang, Swayambhoo Jain, Urmish Thakker, Dawei Huang, Sumti Jairath, Kevin J. Brown, Kunle Olukotun

Subjects: Hardware Architecture (cs.AR); Artificial Intelligence (cs.AI)

Monolithic large language models (LLMs) like GPT-4 have paved the way for modern generative AI applications. Training, serving, and maintaining monolithic LLMs at scale, however, remains prohibitively expensive and challenging. The disproportionate increase in compute-to-memory ratio of modern AI accelerators have created a memory wall, necessitating new methods to deploy AI. Composition of Experts (CoE) is an alternative modular approach that lowers the cost and complexity of training and serving. However, this approach presents two key challenges when using conventional hardware: (1) without fused operations, smaller models have lower operational intensity, which makes high utilization more challenging to achieve; and (2) hosting a large number of models can be either prohibitively expensive or slow when dynamically switching between them.
In this paper, we describe how combining CoE, streaming dataflow, and a three-tier memory system scales the AI memory wall. We describe Samba-CoE, a CoE system with 150 experts and a trillion total parameters. We deploy Samba-CoE on the SambaNova SN40L Reconfigurable Dataflow Unit (RDU) - a commercial dataflow accelerator architecture that has been co-designed for enterprise inference and training applications. The chip introduces a new three-tier memory system with on-chip distributed SRAM, on-package HBM, and off-package DDR DRAM. A dedicated inter-RDU network enables scaling up and out over multiple sockets. We demonstrate speedups ranging from 2x to 13x on various benchmarks running on eight RDU sockets compared with an unfused baseline. We show that for CoE inference deployments, the 8-socket RDU Node reduces machine footprint by up to 19x, speeds up model switching time by 15x to 31x, and achieves an overall speedup of 3.7x over a DGX H100 and 6.6x over a DGX A100.
[423] arXiv:2405.07520 [pdf, ps, other]: Title: Dehazing Remote Sensing and UAV Imagery: A Review of Deep Learning, Prior-based, and Hybrid Approaches

Authors: Gao Yu Lee, Jinkuan Chen, Tanmoy Dam, Md Meftahul Ferdaus, Daniel Puiu Poenar, Vu N Duong

Comments: Submitted to journal and under review, once the paper is accepted, the copyright will be transferred to the corresponding journal

Subjects: Computer Vision and Pattern Recognition (cs.CV)

High-quality images are crucial in remote sensing and UAV applications, but atmospheric haze can severely degrade image quality, making image dehazing a critical research area. Since the introduction of deep convolutional neural networks, numerous approaches have been proposed, and even more have emerged with the development of vision transformers and contrastive/few-shot learning. Simultaneously, papers describing dehazing architectures applicable to various Remote Sensing (RS) domains are also being published. This review goes beyond the traditional focus on benchmarked haze datasets, as we also explore the application of dehazing techniques to remote sensing and UAV datasets, providing a comprehensive overview of both deep learning and prior-based approaches in these domains. We identify key challenges, including the lack of large-scale RS datasets and the need for more robust evaluation metrics, and outline potential solutions and future research directions to address them. This review is the first, to our knowledge, to provide comprehensive discussions on both existing and very recent dehazing approaches (as of 2024) on benchmarked and RS datasets, including UAV-based imagery.
[424] arXiv:2405.07523 [pdf, other]: Title: Adaptation of Distinct Semantics for Uncertain Areas in Polyp Segmentation

Authors: Quang Vinh Nguyen, Van Thong Huynh, Soo-Hyung Kim

Comments: 13 pages with 7 figures, British Machine Vision Conference 2023

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Colonoscopy is a common and practical method for detecting and treating polyps. Segmenting polyps from colonoscopy image is useful for diagnosis and surgery progress. Nevertheless, achieving excellent segmentation performance is still difficult because of polyp characteristics like shape, color, condition, and obvious non-distinction from the surrounding context. This work presents a new novel architecture namely Adaptation of Distinct Semantics for Uncertain Areas in Polyp Segmentation (ADSNet), which modifies misclassified details and recovers weak features having the ability to vanish and not be detected at the final stage. The architecture consists of a complementary trilateral decoder to produce an early global map. A continuous attention module modifies semantics of high-level features to analyze two separate semantics of the early global map. The suggested method is experienced on polyp benchmarks in learning ability and generalization ability, experimental results demonstrate the great correction and recovery ability leading to better segmentation performance compared to the other state of the art in the polyp image segmentation task. Especially, the proposed architecture could be experimented flexibly for other CNN-based encoders, Transformer-based encoders, and decoder backbones.
[425] arXiv:2405.07524 [pdf, other]: Title: HybridHash: Hybrid Convolutional and Self-Attention Deep Hashing for Image Retrieval

Authors: Chao He, Hongxi Wei

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Deep image hashing aims to map input images into simple binary hash codes via deep neural networks and thus enable effective large-scale image retrieval. Recently, hybrid networks that combine convolution and Transformer have achieved superior performance on various computer tasks and have attracted extensive attention from researchers. Nevertheless, the potential benefits of such hybrid networks in image retrieval still need to be verified. To this end, we propose a hybrid convolutional and self-attention deep hashing method known as HybridHash. Specifically, we propose a backbone network with stage-wise architecture in which the block aggregation function is introduced to achieve the effect of local self-attention and reduce the computational complexity. The interaction module has been elaborately designed to promote the communication of information between image blocks and to enhance the visual representations. We have conducted comprehensive experiments on three widely used datasets: CIFAR-10, NUS-WIDE and IMAGENET. The experimental results demonstrate that the method proposed in this paper has superior performance with respect to state-of-the-art deep hashing methods. Source code is available https://github.com/shuaichaochao/HybridHash.
[426] arXiv:2405.07526 [pdf, other]: Title: MS MARCO Web Search: a Large-scale Information-rich Web Dataset with Millions of Real Click Labels

Authors: Qi Chen, Xiubo Geng, Corby Rosset, Carolyn Buractaon, Jingwen Lu, Tao Shen, Kun Zhou, Chenyan Xiong, Yeyun Gong, Paul Bennett, Nick Craswell, Xing Xie, Fan Yang, Bryan Tower, Nikhil Rao, Anlei Dong, Wenqi Jiang, Zheng Liu, Mingqin Li, Chuanjie Liu, Zengzhong Li, Rangan Majumder, Jennifer Neville, Andy Oakley, Knut Magne Risvik, Harsha Vardhan Simhadri, Manik Varma, Yujing Wang, Linjun Yang, Mao Yang, Ce Zhang

Comments: 10 pages, 6 figures, for associated dataset, see this http URL

Subjects: Information Retrieval (cs.IR)

Recent breakthroughs in large models have highlighted the critical significance of data scale, labels and modals. In this paper, we introduce MS MARCO Web Search, the first large-scale information-rich web dataset, featuring millions of real clicked query-document labels. This dataset closely mimics real-world web document and query distribution, provides rich information for various kinds of downstream tasks and encourages research in various areas, such as generic end-to-end neural indexer models, generic embedding models, and next generation information access system with large language models. MS MARCO Web Search offers a retrieval benchmark with three web retrieval challenge tasks that demand innovations in both machine learning and information retrieval system research domains. As the first dataset that meets large, real and rich data requirements, MS MARCO Web Search paves the way for future advancements in AI and system research. MS MARCO Web Search dataset is available at: https://github.com/microsoft/MS-MARCO-Web-Search.
[427] arXiv:2405.07527 [pdf, other]: Title: Train Faster, Perform Better: Modular Adaptive Training in Over-Parameterized Models

Authors: Yubin Shi, Yixuan Chen, Mingzhi Dong, Xiaochen Yang, Dongsheng Li, Yujiang Wang, Robert P. Dick, Qin Lv, Yingying Zhao, Fan Yang, Tun Lu, Ning Gu, Li Shang

Comments: Accepted at NeurIPS 2023

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Despite their prevalence in deep-learning communities, over-parameterized models convey high demands of computational costs for proper training. This work studies the fine-grained, modular-level learning dynamics of over-parameterized models to attain a more efficient and fruitful training strategy. Empirical evidence reveals that when scaling down into network modules, such as heads in self-attention models, we can observe varying learning patterns implicitly associated with each module's trainability. To describe such modular-level learning capabilities, we introduce a novel concept dubbed modular neural tangent kernel (mNTK), and we demonstrate that the quality of a module's learning is tightly associated with its mNTK's principal eigenvalue $\lambda_{\max}$. A large $\lambda_{\max}$ indicates that the module learns features with better convergence, while those miniature ones may impact generalization negatively. Inspired by the discovery, we propose a novel training strategy termed Modular Adaptive Training (MAT) to update those modules with their $\lambda_{\max}$ exceeding a dynamic threshold selectively, concentrating the model on learning common features and ignoring those inconsistent ones. Unlike most existing training schemes with a complete BP cycle across all network modules, MAT can significantly save computations by its partially-updating strategy and can further improve performance. Experiments show that MAT nearly halves the computational cost of model training and outperforms the accuracy of baselines.
[428] arXiv:2405.07528 [pdf, other]: Title: Comparing Perceptions of Static and Adaptive Proactive Speech Agents

Authors: Justin Edwards, Philip R. Doyle, Holly Brannigan, Benjamin R. Cowan

Comments: Accepted to CUI 2024 - 6th Conference on Conversational User Interfaces

Subjects: Human-Computer Interaction (cs.HC)

A growing literature on speech interruptions describes how people interrupt one another with speech, but these behaviours have not yet been implemented in the design of artificial agents which interrupt. Perceptions of a prototype proactive speech agent which adapts its speech to both urgency and to the difficulty of the ongoing task it interrupts are compared against perceptions of a static proactive agent which does not. The study hypothesises that adaptive proactive speech modelled on human speech interruptions will lead to partner models which consider the proactive agent as a stronger conversational partner than a static agent, and that interruptions initiated by an adaptive agent will be judged as better timed and more appropriately asked. These hypotheses are all rejected however, as quantitative analysis reveals that participants view the adaptive agent as a poorer dialogue partner than the static agent and as less appropriate in the style it interrupts. Qualitative analysis sheds light on the source of this surprising finding, as participants see the adaptive agent as less socially appropriate and as less consistent in its interactions than the static agent.
[429] arXiv:2405.07530 [pdf, other]: Title: Prompt-based Code Completion via Multi-Retrieval Augmented Generation

Authors: Hanzhuo Tan, Qi Luo, Ling Jiang, Zizheng Zhan, Jing Li, Haotian Zhang, Yuqun Zhang

Subjects: Software Engineering (cs.SE)

Automated code completion, aiming at generating subsequent tokens from unfinished code, has been significantly benefited from recent progress in pre-trained Large Language Models (LLMs). However, these models often suffer from coherence issues and hallucinations when dealing with complex code logic or extrapolating beyond their training data. Existing Retrieval Augmented Generation (RAG) techniques partially address these issues by retrieving relevant code with a separate encoding model where the retrieved snippet serves as contextual reference for code completion. However, their retrieval scope is subject to a singular perspective defined by the encoding model, which largely overlooks the complexity and diversity inherent in code semantics. To address this limitation, we propose ProCC, a code completion framework leveraging prompt engineering and the contextual multi-armed bandits algorithm to flexibly incorporate and adapt to multiple perspectives of code. ProCC first employs a prompt-based multi-retriever system which crafts prompt templates to elicit LLM knowledge to understand code semantics with multiple retrieval perspectives. Then, it adopts the adaptive retrieval selection algorithm to incorporate code similarity into the decision-making process to determine the most suitable retrieval perspective for the LLM to complete the code. Experimental results demonstrate that ProCC outperforms state-of-the-art code completion technique by 8.6% on our collected open-source benchmark suite and 10.1% on the private-domain benchmark suite collected from a billion-user e-commerce company in terms of Exact Match. ProCC also allows augmenting fine-tuned techniques in a plug-and-play manner, yielding 5.6% improvement over our studied fine-tuned model.
[430] arXiv:2405.07533 [pdf, other]: Title: DID Connect: Authentication in TLS with Decentralized Identifiers and Verifiable Credentials

Authors: Sandro Rodriguez Garzon, Dennis Natusch, Artur Philipp, Axel Küpper, Hans Joachim Einsiedler, Daniela Schneider

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

Subjects: Cryptography and Security (cs.CR); Networking and Internet Architecture (cs.NI)

Authentication in TLS is predominately carried out with X.509 digital certificates issued by certificate authorities (CA). The centralized nature of current public key infrastructures, however, comes along with severe risks, such as single points of failure and susceptibility to cyber-attacks, potentially undermining the security and trustworthiness of the entire system. With Decentralized Identifiers (DID) alongside distributed ledger technology, it becomes technically feasible to prove ownership of a unique identifier without requiring an attestation of the proof's public key by a centralized and therefore vulnerable CA. This article presents DID Connect, a novel authentication scheme for TLS 1.3 that empowers entities to authenticate in a TLS-compliant way with self-issued X.509 certificates that are equipped with ledger-anchored DIDs instead of CA-issued identifiers. It facilitates the exchange of tamper-proof and 3rd-party attested claims in the form of DID-bound Verifiable Credentials after the TLS handshake to complete the authentication with a full identification of the communication partner. A prototypical implementation shows comparable TLS handshake durations of DID Connect if verification material is cached and reasonable prolongations if it is obtained from a ledger. The significant speed improvement of the resulting TLS channel over a widely used, DID-based alternative transport protocol on the application layer demonstrates the potential of DID Connect to become a viable solution for the establishment of secure and trustful end-to-end communication links with decentrally managed digital identities.
[431] arXiv:2405.07536 [pdf, other]: Title: Multi-AUV Kinematic Task Assignment based on Self-organizing Map Neural Network and Dubins Path Generator

Authors: Xin Li, Wenyang Gan, Pang Wen, Daqi Zhu

Subjects: Robotics (cs.RO); Systems and Control (eess.SY)

To deal with the task assignment problem of multi-AUV systems under kinematic constraints, which means steering capability constraints for underactuated AUVs or other vehicles likely, an improved task assignment algorithm is proposed combining the Dubins Path algorithm with improved SOM neural network algorithm. At first, the aimed tasks are assigned to the AUVs by improved SOM neural network method based on workload balance and neighborhood function. When there exists kinematic constraints or obstacles which may cause failure of trajectory planning, task re-assignment will be implemented by change the weights of SOM neurals, until the AUVs can have paths to reach all the targets. Then, the Dubins paths are generated in several limited cases. AUV's yaw angle is limited, which result in new assignments to the targets. Computation flow is designed so that the algorithm in MATLAB and Python can realizes the path planning to multiple targets. Finally, simulation results prove that the proposed algorithm can effectively accomplish the task assignment task for multi-AUV system.
[432] arXiv:2405.07537 [pdf, other]: Title: Probabilistic Rounding Error Analysis From A Statistical Perspective

Authors: Yiming Fang, Li Chen

Comments: 24 pages, 7 figures. Submitted to SIAM for possible publication

Subjects: Numerical Analysis (math.NA)

The conventional probabilistic rounding error analysis in numerical linear algebra provides worst-case bounds with an associated failure probability, which can still be pessimistic. In this paper, we develop a new probabilistic rounding error analysis from a statistical perspective. By assuming both the data and the relative error are independent random variables, we derive the approximate closed-form expressions for the expectation and variance of the rounding errors in various key computational kernels. Our analytical expressions have three notable characteristics: they are statistical and do not involve a failure probability; they are sharper than other deterministic and probabilistic bounds, using mean square error as the metric; they are correct to all orders of unit roundoff and valid for any dimension. Furthermore, numerical experiments validate the accuracy of our derivations and demonstrate that our analytical expressions are generally at least two orders of magnitude tighter than alternative worst-case bounds, exemplified through the inner products. We also discuss a scenario involving inner products where the underlying assumptions are invalid, i.e., input data are dependent, rendering the analytical expressions inapplicable.
[433] arXiv:2405.07538 [pdf, ps, other]: Title: Mirroring the Parking Target: An Optimal-Control-Based Parking Motion Planner with Strengthened Parking Reliability and Faster Parking Completion

Authors: Jia Hu, Yongwei Feng, Shuoyuan Li, Haoran Wang

Subjects: Robotics (cs.RO)

Automated Parking Assist (APA) systems are now facing great challenges of low adoption in applications, due to users' concerns about parking capability, reliability, and completion efficiency. To upgrade the conventional APA planners and enhance user's acceptance, this research proposes an optimal-control-based parking motion planner. Its highlight lies in its control logic: planning trajectories by mirroring the parking target. This method enables: i) parking capability in narrow spaces; ii) better parking reliability by expanding Operation Design Domain (ODD); iii) faster completion of parking process; iv) enhanced computational efficiency; v) universal to all types of parking. A comprehensive evaluation is conducted. Results demonstrate the proposed planner does enhance parking success rate by 40.6%, improve parking completion efficiency by 18.0%, and expand ODD by 86.1%. It shows its superiority in difficult parking cases, such as the parallel parking scenario and narrow spaces. Moreover, the average computation time of the proposed planner is 74 milliseconds. Results indicate that the proposed planner is ready for real-time commercial applications.
[434] arXiv:2405.07541 [pdf, ps, other]: Title: Random walk model that universally generates inverse square Lévy walk by eliminating search cost minimization constraint

Authors: Shuji Shinohara, Daiki Morita, Hayato Hirai, Ryosuke Kuribayashi, Nobuhito Manome, Toru Moriyama, Hiroshi Okamoto, Yoshihiro Nakajima, Pegio-Yukio Gunji, Ung-il Chung

Subjects: Multiagent Systems (cs.MA); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)

The L\'evy walk, a type of random walk characterized by linear step lengths that follow a power-law distribution, is observed in the migratory behaviors of various organisms, ranging from bacteria to humans. Notably, L\'evy walks with power exponents close to two are frequently observed, though their underlying causes remain elusive. This study introduces a simplified, abstract random walk model designed to produce inverse square L\'evy walks, also known as Cauchy walks and explores the conditions that facilitate these phenomena. In our model, agents move toward a randomly selected destination in multi-dimensional space, and their movement strategy is parameterized by the extent to which they pursue the shortest path. When the search cost is proportional to the distance traveled, this parameter effectively reflects the emphasis on minimizing search costs. Our findings reveal that strict adherence to this cost minimization constraint results in a Brownian walk pattern. However, removing this constraint transitions the movement to an inverse square L\'evy walk. Therefore, by modulating the prioritization of search costs, our model can seamlessly alternate between Brownian and Cauchy walk dynamics. This model has the potential to be utilized for exploring the parameter space of an optimization problem.
[435] arXiv:2405.07542 [pdf, other]: Title: EMS-SD: Efficient Multi-sample Speculative Decoding for Accelerating Large Language Models

Authors: Yunsheng Ni, Chuanjian Liu, Yehui Tang, Kai Han, Yunhe Wang

Subjects: Computation and Language (cs.CL)

Speculative decoding emerges as a pivotal technique for enhancing the inference speed of Large Language Models (LLMs). Despite recent research aiming to improve prediction efficiency, multi-sample speculative decoding has been overlooked due to varying numbers of accepted tokens within a batch in the verification phase. Vanilla method adds padding tokens in order to ensure that the number of new tokens remains consistent across samples. However, this increases the computational and memory access overhead, thereby reducing the speedup ratio. We propose a novel method that can resolve the issue of inconsistent tokens accepted by different samples without necessitating an increase in memory or computing overhead. Furthermore, our proposed method can handle the situation where the prediction tokens of different samples are inconsistent without the need to add padding tokens. Sufficient experiments demonstrate the efficacy of our method. Our code is available at https://github.com/niyunsheng/EMS-SD.
[436] arXiv:2405.07543 [pdf, ps, other]: Title: Accelerating the Evolution of Personalized Automated Lane Change through Lesson Learning

Authors: Jia Hu, Mingyue Lei, Duo Li, Zhenning Li, Jaehyun (Jason)So, Haoran Wang

Subjects: Machine Learning (cs.LG); Robotics (cs.RO)

Personalization is crucial for the widespread adoption of advanced driver assistance system. To match up with each user's preference, the online evolution capability is a must. However, conventional evolution methods learn from naturalistic driving data, which requires a lot computing power and cannot be applied online. To address this challenge, this paper proposes a lesson learning approach: learning from driver's takeover interventions. By leveraging online takeover data, the driving zone is generated to ensure perceived safety using Gaussian discriminant analysis. Real-time corrections to trajectory planning rewards are enacted through apprenticeship learning. Guided by the objective of optimizing rewards within the constraints of the driving zone, this approach employs model predictive control for trajectory planning. This lesson learning framework is highlighted for its faster evolution capability, adeptness at experience accumulating, assurance of perceived safety, and computational efficiency. Simulation results demonstrate that the proposed system consistently achieves a successful customization without further takeover interventions. Accumulated experience yields a 24% enhancement in evolution efficiency. The average number of learning iterations is only 13.8. The average computation time is 0.08 seconds.
[437] arXiv:2405.07544 [pdf, other]: Title: Automatic Odometry-Less OpenDRIVE Generation From Sparse Point Clouds

Authors: Leon Eisemann, Johannes Maucher

Comments: 8 pages, 4 figures, 3 algorithms, 2 tables

Journal-ref: 2023 IEEE 26th International Conference on Intelligent Transportation Systems (ITSC)

Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)

High-resolution road representations are a key factor for the success of (highly) automated driving functions. These representations, for example, high-definition (HD) maps, contain accurate information on a multitude of factors, among others: road geometry, lane information, and traffic signs. Through the growing complexity and functionality of automated driving functions, also the requirements on testing and evaluation grow continuously. This leads to an increasing interest in virtual test drives for evaluation purposes. As roads play a crucial role in traffic flow, accurate real-world representations are needed, especially when deriving realistic driving behavior data. This paper proposes a novel approach to generate realistic road representations based solely on point cloud information, independent of the LiDAR sensor, mounting position, and without the need for odometry data, multi-sensor fusion, machine learning, or highly-accurate calibration. As the primary use case is simulation, we use the OpenDRIVE format for evaluation.
[438] arXiv:2405.07547 [pdf, other]: Title: Channel Coding Toward 6G: Technical Overview and Outlook

Authors: Mohammad Rowshan, Min Qiu, Yixuan Xie, Xinyi Gu, Jinhong Yuan

Comments: 102 pages, 87 figures, IEEE Open Journal of the Communications Society (invited paper)

Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

Channel coding plays a pivotal role in ensuring reliable communication over wireless channels. With the growing need for ultra-reliable communication in emerging wireless use cases, the significance of channel coding has amplified. Furthermore, minimizing decoding latency is crucial for critical-mission applications, while optimizing energy efficiency is paramount for mobile and the Internet of Things (IoT) communications. As the fifth generation (5G) of mobile communications is currently in operation and 5G-advanced is on the horizon, the objective of this paper is to assess prominent channel coding schemes in the context of recent advancements and the anticipated requirements for the sixth generation (6G). In this paper, after considering the potential impact of channel coding on key performance indicators (KPIs) of wireless networks, we review the evolution of mobile communication standards and the organizations involved in the standardization, from the first generation (1G) to the current 5G, highlighting the technologies integral to achieving targeted KPIs such as reliability, data rate, latency, energy efficiency, spectral efficiency, connection density, and traffic capacity. Following this, we delve into the anticipated requirements for potential use cases in 6G. The subsequent sections of the paper focus on a comprehensive review of three primary coding schemes utilized in past generations and their recent advancements: low-density parity-check (LDPC) codes, turbo codes (including convolutional codes), polar codes (alongside Reed-Muller codes). Additionally, we examine alternative coding schemes like Fountain codes and sparse regression codes. Our evaluation includes a comparative analysis of error correction performance and the performance of hardware implementation for these coding schemes, providing insights into their potential and suitability for the upcoming 6G era.
[439] arXiv:2405.07550 [pdf, other]: Title: Wild Berry image dataset collected in Finnish forests and peatlands using drones

Authors: Luigi Riz, Sergio Povoli, Andrea Caraffa, Davide Boscaini, Mohamed Lamine Mekhalfi, Paul Chippendale, Marjut Turtiainen, Birgitta Partanen, Laura Smith Ballester, Francisco Blanes Noguera, Alessio Franchi, Elisa Castelli, Giacomo Piccinini, Luca Marchesotti, Micael Santos Couceiro, Fabio Poiesi

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Berry picking has long-standing traditions in Finland, yet it is challenging and can potentially be dangerous. The integration of drones equipped with advanced imaging techniques represents a transformative leap forward, optimising harvests and promising sustainable practices. We propose WildBe, the first image dataset of wild berries captured in peatlands and under the canopy of Finnish forests using drones. Unlike previous and related datasets, WildBe includes new varieties of berries, such as bilberries, cloudberries, lingonberries, and crowberries, captured under severe light variations and in cluttered environments. WildBe features 3,516 images, including a total of 18,468 annotated bounding boxes. We carry out a comprehensive analysis of WildBe using six popular object detectors, assessing their effectiveness in berry detection across different forest regions and camera types. We will release WildBe publicly.
[440] arXiv:2405.07551 [pdf, other]: Title: MuMath-Code: Combining Tool-Use Large Language Models with Multi-perspective Data Augmentation for Mathematical Reasoning

Authors: Shuo Yin, Weihao You, Zhilong Ji, Guoqiang Zhong, Jinfeng Bai

Comments: The state-of-the-art open-source tool-use LLMs for mathematical reasoning

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

The tool-use Large Language Models (LLMs) that integrate with external Python interpreters have significantly enhanced mathematical reasoning capabilities for open-source LLMs, while tool-free methods chose another track: augmenting math reasoning data. However, a great method to integrate the above two research paths and combine their advantages remains to be explored. In this work, we firstly include new math questions via multi-perspective data augmenting methods and then synthesize code-nested solutions to them. The open LLMs (i.e., Llama-2) are finetuned on the augmented dataset to get the resulting models, MuMath-Code ($\mu$-Math-Code). During the inference phase, our MuMath-Code generates code and interacts with the external python interpreter to get the execution results. Therefore, MuMath-Code leverages the advantages of both the external tool and data augmentation. To fully leverage the advantages of our augmented data, we propose a two-stage training strategy: In Stage-1, we finetune Llama-2 on pure CoT data to get an intermediate model, which then is trained on the code-nested data in Stage-2 to get the resulting MuMath-Code. Our MuMath-Code-7B achieves 83.8 on GSM8K and 52.4 on MATH, while MuMath-Code-70B model achieves new state-of-the-art performance among open methods -- achieving 90.7% on GSM8K and 55.1% on MATH. Extensive experiments validate the combination of tool use and data augmentation, as well as our two-stage training strategy. We release the proposed dataset along with the associated code for public use.
[441] arXiv:2405.07553 [pdf, ps, other]: Title: Space Domain based Ecological Cooperative and Adaptive Cruise Control on Rolling Terrain

Authors: Mingyue Lei, Haoran Wang, Duo Li, Zhenning Li, Ashish Dhamaniya, Jia Hu

Subjects: Robotics (cs.RO)

Ecological Cooperative and Adaptive Cruise Control (Eco-CACC) is widely focused to enhance sustainability of CACC. However, state-of-the-art Eco-CACC studies are still facing challenges in adopting on rolling terrain. Furthermore, they cannot ensure both ecology optimality and computational efficiency. Hence, this paper proposes a nonlinear optimal control based Eco-CACC controller. It has the following features: i) enhancing performance across rolling terrains by modeling in space domain; ii) enhancing fuel efficiency via globally optimizing all vehicle's fuel consumptions; iii) ensuring computational efficiency by developing a differential dynamic programming-based solving method for the non-linear optimal control problem; iv) ensuring string stability through theoretically proving and experimentally validating. The performance of the proposed Eco-CACC controller was evaluated. Results showed that the proposed Eco-CACC controller can improve average fuel saving by 37.67% at collector road and about 17.30% at major arterial.
[442] arXiv:2405.07556 [pdf, ps, other]: Title: Safety-Aware Human-Lead Vehicle Platooning by Proactively Reacting to Uncertain Human Behaving

Authors: Jia Hu, Shuhan Wang, Yiming Zhang, Haoran Wang

Subjects: Robotics (cs.RO)

Human-Lead Cooperative Adaptive Cruise Control (HL-CACC) is regarded as a promising vehicle platooning technology in real-world implementation. By utilizing a Human-driven Vehicle (HV) as the platoon leader, HL-CACC reduces the cost and enhances the reliability of perception and decision-making. However, state-of-the-art HL-CACC technology still has a great limitation on driving safety for the lack of considering the leading human driver's uncertain behaving. In this study, a HL-CACC controller is designed based on Stochastic Model Predictive Control (SMPC). It is enabled to predict the driving intention of the leading Connected Human-Driven Vehicle (CHV). The proposed controller has the following features: i) enhanced perceived safety in oscillating traffic; ii) guaranteed safety against hard brakes; iii) computational efficient for real-time implementation. The proposed controller is evaluated on a PreScan&Simulink simulation platform. Real vehicle trajectory data is collected for the calibration of simulation. Results reveal that the proposed controller: i) improves perceived safety by 19.17% in oscillating traffic; ii) enhances actual safety by 7.76% against hard brake; iii) is confirmed with string stability. The computation time is approximately 3 milliseconds when running on a laptop equipped with an Intel i5-13500H CPU. This indicates the proposed controller is ready for real-time implementation.
[443] arXiv:2405.07557 [pdf, other]: Title: Towards Rational Consensus in Honest Majority

Authors: Varul Srivastava, Sujit Gujar

Subjects: Computer Science and Game Theory (cs.GT); Distributed, Parallel, and Cluster Computing (cs.DC)

Distributed consensus protocols reach agreement among $n$ players in the presence of $f$ adversaries; different protocols support different values of $f$. Existing works study this problem for different adversary types (captured by threat models). There are three primary threat models: (i) Crash fault tolerance (CFT), (ii) Byzantine fault tolerance (BFT), and (iii) Rational fault tolerance (RFT), each more general than the previous. Agreement in repeated rounds on both (1) the proposed value in each round and (2) the ordering among agreed-upon values across multiple rounds is called Atomic BroadCast (ABC). ABC is more generalized than consensus and is employed in blockchains.
This work studies ABC under the RFT threat model. We consider $t$ byzantine and $k$ rational adversaries among $n$ players. We also study different types of rational players based on their utility towards (1) liveness attack, (2) censorship or (3) disagreement (forking attack). We study the problem of ABC under this general threat model in partially-synchronous networks. We show (1) ABC is impossible for $n/3< (t+k) <n/2$ if rational players prefer liveness or censorship attacks and (2) the consensus protocol proposed by Ranchal-Pedrosa and Gramoli cannot be generalized to solve ABC due to insecure Nash equilibrium (resulting in disagreement). For ABC in partially synchronous network settings, we propose a novel protocol \textsf{pRFT}(practical Rational Fault Tolerance). We show \textsf{pRFT} achieves ABC if (a) rational players prefer only disagreement attacks and (b) $t < \frac{n}{4}$ and $(t + k) < \frac{n}{2}$. In \textsf{pRFT}, we incorporate accountability (capturing deviating players) within the protocol by leveraging honest players. We also show that the message complexity of \textsf{pRFT} is at par with the best consensus protocols that guarantee accountability.
[444] arXiv:2405.07560 [pdf, ps, other]: Title: Coding historical causes of death data with Large Language Models

Authors: Bjørn Pedersen, Maisha Islam, Doris Tove Kristoffersen, Lars Ailo Bongo, Eilidh Garrett, Alice Reid, Hilde Sommerseth

Comments: 18 pages, 1 figure in main text, 3 figures in appendix

Subjects: Machine Learning (cs.LG)

This paper investigates the feasibility of using pre-trained generative Large Language Models (LLMs) to automate the assignment of ICD-10 codes to historical causes of death. Due to the complex narratives often found in historical causes of death, this task has traditionally been manually performed by coding experts. We evaluate the ability of GPT-3.5, GPT-4, and Llama 2 LLMs to accurately assign ICD-10 codes on the HiCaD dataset that contains causes of death recorded in the civil death register entries of 19,361 individuals from Ipswich, Kilmarnock, and the Isle of Skye from the UK between 1861-1901. Our findings show that GPT-3.5, GPT-4, and Llama 2 assign the correct code for 69%, 83%, and 40% of causes, respectively. However, we achieve a maximum accuracy of 89% by standard machine learning techniques. All LLMs performed better for causes of death that contained terms still in use today, compared to archaic terms. Also they perform better for short causes (1-2 words) compared to longer causes. LLMs therefore do not currently perform well enough for historical ICD-10 code assignment tasks. We suggest further fine-tuning or alternative frameworks to achieve adequate performance.
[445] arXiv:2405.07562 [pdf, other]: Title: GLiRA: Black-Box Membership Inference Attack via Knowledge Distillation

Authors: Andrey V. Galichin, Mikhail Pautov, Alexey Zhavoronkin, Oleg Y. Rogov, Ivan Oseledets

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

While Deep Neural Networks (DNNs) have demonstrated remarkable performance in tasks related to perception and control, there are still several unresolved concerns regarding the privacy of their training data, particularly in the context of vulnerability to Membership Inference Attacks (MIAs). In this paper, we explore a connection between the susceptibility to membership inference attacks and the vulnerability to distillation-based functionality stealing attacks. In particular, we propose {GLiRA}, a distillation-guided approach to membership inference attack on the black-box neural network. We observe that the knowledge distillation significantly improves the efficiency of likelihood ratio of membership inference attack, especially in the black-box setting, i.e., when the architecture of the target model is unknown to the attacker. We evaluate the proposed method across multiple image classification datasets and models and demonstrate that likelihood ratio attacks when guided by the knowledge distillation, outperform the current state-of-the-art membership inference attacks in the black-box setting.
[446] arXiv:2405.07570 [pdf, other]: Title: Gaze-Based Intention Recognition for Human-Robot Collaboration

Authors: Valerio Belcamino, Miwa Takase, Mariya Kilina, Alessandro Carfì, Akira Shimada, Sota Shimizu, Fulvio Mastrogiovanni

Comments: 5 pages, 4 figures, AVI2024 conference

Subjects: Robotics (cs.RO); Human-Computer Interaction (cs.HC)

This work aims to tackle the intent recognition problem in Human-Robot Collaborative assembly scenarios. Precisely, we consider an interactive assembly of a wooden stool where the robot fetches the pieces in the correct order and the human builds the parts following the instruction manual. The intent recognition is limited to the idle state estimation and it is needed to ensure a better synchronization between the two agents. We carried out a comparison between two distinct solutions involving wearable sensors and eye tracking integrated into the perception pipeline of a flexible planning architecture based on Hierarchical Task Networks. At runtime, the wearable sensing module exploits the raw measurements from four 9-axis Inertial Measurement Units positioned on the wrists and hands of the user as an input for a Long Short-Term Memory Network. On the other hand, the eye tracking relies on a Head Mounted Display and Unreal Engine.
We tested the effectiveness of the two approaches with 10 participants, each of whom explored both options in alternate order. We collected explicit metrics about the attractiveness and efficiency of the two techniques through User Experience Questionnaires as well as implicit criteria regarding the classification time and the overall assembly time.
The results of our work show that the two methods can reach comparable performances both in terms of effectiveness and user preference. Future development could aim at joining the two approaches two allow the recognition of more complex activities and to anticipate the user actions.
[447] arXiv:2405.07571 [pdf, other]: Title: TattTRN: Template Reconstruction Network for Tattoo Retrieval

Authors: Lazaro Janier Gonzalez-Soler, Maciej Salwowski, Christian Rathgeb, Daniel Fischer

Comments: Accepted at CVPR Workshop 2024

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Tattoos have been used effectively as soft biometrics to assist law enforcement in the identification of offenders and victims, as they contain discriminative information, and are a useful indicator to locate members of a criminal gang or organisation. Due to various privacy issues in the acquisition of images containing tattoos, only a limited number of databases exists. This lack of databases has delayed the development of new methods to effectively retrieve a potential suspect's tattoo images from a candidate gallery. To mitigate this issue, in our work, we use an unsupervised generative approach to create a balanced database consisting of 28,550 semi-synthetic images with tattooed subjects from 571 tattoo categories. Further, we introduce a novel Tattoo Template Reconstruction Network (TattTRN), which learns to map the input tattoo sample to its respective tattoo template to enhance the distinguishing attributes of the final feature embedding. Experimental results with real data, i.e., WebTattoo and BIVTatt databases, demonstrate the soundness of the presented approach: an accuracy of up to 99% is achieved for checking at most the first 20 entries of the candidate list.
[448] arXiv:2405.07573 [pdf, other]: Title: MaskFuser: Masked Fusion of Joint Multi-Modal Tokenization for End-to-End Autonomous Driving

Authors: Yiqun Duan, Xianda Guo, Zheng Zhu, Zhen Wang, Yu-Kai Wang, Chin-Teng Lin

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Current multi-modality driving frameworks normally fuse representation by utilizing attention between single-modality branches. However, the existing networks still suppress the driving performance as the Image and LiDAR branches are independent and lack a unified observation representation. Thus, this paper proposes MaskFuser, which tokenizes various modalities into a unified semantic feature space and provides a joint representation for further behavior cloning in driving contexts. Given the unified token representation, MaskFuser is the first work to introduce cross-modality masked auto-encoder training. The masked training enhances the fusion representation by reconstruction on masked tokens. Architecturally, a hybrid-fusion network is proposed to combine advantages from both early and late fusion: For the early fusion stage, modalities are fused by performing monotonic-to-BEV translation attention between branches; Late fusion is performed by tokenizing various modalities into a unified token space with shared encoding on it. MaskFuser respectively reaches a driving score of 49.05 and route completion of 92.85% on the CARLA LongSet6 benchmark evaluation, which improves the best of previous baselines by 1.74 and 3.21%. The introduced masked fusion increases driving stability under damaged sensory inputs. MaskFuser outperforms the best of previous baselines on driving score by 6.55 (27.8%), 1.53 (13.8%), 1.57 (30.9%), respectively given sensory masking ratios 25%, 50%, and 75%.
[449] arXiv:2405.07580 [pdf, other]: Title: DynLLM: When Large Language Models Meet Dynamic Graph Recommendation

Authors: Ziwei Zhao, Fake Lin, Xi Zhu, Zhi Zheng, Tong Xu, Shitian Shen, Xueying Li, Zikai Yin, Enhong Chen

Comments: 11 pages, 5 figures

Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)

Last year has witnessed the considerable interest of Large Language Models (LLMs) for their potential applications in recommender systems, which may mitigate the persistent issue of data sparsity. Though large efforts have been made for user-item graph augmentation with better graph-based recommendation performance, they may fail to deal with the dynamic graph recommendation task, which involves both structural and temporal graph dynamics with inherent complexity in processing time-evolving data. To bridge this gap, in this paper, we propose a novel framework, called DynLLM, to deal with the dynamic graph recommendation task with LLMs. Specifically, DynLLM harnesses the power of LLMs to generate multi-faceted user profiles based on the rich textual features of historical purchase records, including crowd segments, personal interests, preferred categories, and favored brands, which in turn supplement and enrich the underlying relationships between users and items. Along this line, to fuse the multi-faceted profiles with temporal graph embedding, we engage LLMs to derive corresponding profile embeddings, and further employ a distilled attention mechanism to refine the LLM-generated profile embeddings for alleviating noisy signals, while also assessing and adjusting the relevance of each distilled facet embedding for seamless integration with temporal graph embedding from continuous time dynamic graphs (CTDGs). Extensive experiments on two real e-commerce datasets have validated the superior improvements of DynLLM over a wide range of state-of-the-art baseline methods.
[450] arXiv:2405.07582 [pdf, other]: Title: FRRffusion: Unveiling Authenticity with Diffusion-Based Face Retouching Reversal

Authors: Fengchuang Xing, Xiaowen Shi, Yuan-Gen Wang, Chunsheng Yang

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Unveiling the real appearance of retouched faces to prevent malicious users from deceptive advertising and economic fraud has been an increasing concern in the era of digital economics. This article makes the first attempt to investigate the face retouching reversal (FRR) problem. We first collect an FRR dataset, named deepFRR, which contains 50,000 StyleGAN-generated high-resolution (1024*1024) facial images and their corresponding retouched ones by a commercial online API. To our best knowledge, deepFRR is the first FRR dataset tailored for training the deep FRR models. Then, we propose a novel diffusion-based FRR approach (FRRffusion) for the FRR task. Our FRRffusion consists of a coarse-to-fine two-stage network: A diffusion-based Facial Morpho-Architectonic Restorer (FMAR) is constructed to generate the basic contours of low-resolution faces in the first stage, while a Transformer-based Hyperrealistic Facial Detail Generator (HFDG) is designed to create high-resolution facial details in the second stage. Tested on deepFRR, our FRRffusion surpasses the GP-UNIT and Stable Diffusion methods by a large margin in four widespread quantitative metrics. Especially, the de-retouched images by our FRRffusion are visually much closer to the raw face images than both the retouched face images and those restored by the GP-UNIT and Stable Diffusion methods in terms of qualitative evaluation with 85 subjects. These results sufficiently validate the efficacy of our work, bridging the recently-standing gap between the FRR and generic image restoration tasks. The dataset and code are available at https://github.com/GZHU-DVL/FRRffusion.
[451] arXiv:2405.07584 [pdf, other]: Title: La ROUTOURNE va tourner

Authors: Quentin Bramas (ICube, UNISTRA), Jean-Romain Luttringer (ICube, UNISTRA), Pascal Mérindol (ICube, UNISTRA)

Comments: in French language. AlgoTel 2024 -- 26{\`e}mes Rencontres Francophones sur les Aspects Algorithmiques des T{\'e}l{\'e}communications, May 2024, Saint-Briac-sur-Mer, France

Subjects: Networking and Internet Architecture (cs.NI)

Segment routing (SR) offers precise control over the paths taken: it specifies a list of detours, called segments, in IP packets. However, the number of detours that can be specified is limited by the hardware. When calculating segment lists, it is therefore necessary to limit their size. Although solutions have been proposed for calculating these lists, they lack generality and are not always optimal or efficient. We present ROUTOURNE, a method for diverting routing algorithms so that they calculate, not simply an optimal physical path to be translated into a list of segments a posteriori (with no guarantee of its size), but directly the optimal lists of segments deployable by the underlying hardware. ROUTOURNE thus facilitates the deployment of advanced traffic engineering strategies and policies, notably for load balancing from sources. Despite a route fraught with surprising challenges - in particular, the loss of isotonicity induced by SR - ROUTOURNE proves efficient, inducing at worst a linear overhead. Its accuracy and optimality have been proven, and its effectiveness evaluated by generalizing it to several more or less complex path calculation algorithms.
[452] arXiv:2405.07585 [pdf, ps, other]: Title: On the Coexistence of eMBB and URLLC in the Cell-Free Massive MIMO Downlink

Authors: Giovanni Interdonato, Stefano Buzzi

Comments: Paper submitted for presentation to an IEEE conference. {\copyright} 2024 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses

Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

We investigate the non-orthogonal coexistence between the ultra-reliable low-latency communication (URLLC) and the enhanced mobile broadband (eMBB) in the downlink of a cell-free massive multiple-input multiple-output (MIMO) system. We provide a unified information-theoretic framework that combines a finite-blocklength analysis of the URLLC error probability based on the use of mismatched decoding with an infinite-blocklength analysis of the eMBB spectral efficiency. Superposition coding and three levels of puncturing are considered as alternative downlink coexistence strategies to cope with the inter-service interference and the URLLC random activation pattern, under the assumption of imperfect pilot-based channel state information acquisition at the access points and statistical channel knowledge at the users. Numerical results shed light into the trade-off between eMBB and URLLC performances considering different precoding and power control strategies.
[453] arXiv:2405.07586 [pdf, other]: Title: Thai Universal Dependency Treebank

Authors: Panyur Sriwirote, Wei Qi Leong, Charin Polpanumas, Santhawat Thanyawong, William Chandra Tjhi, Wirote Aroonmanakun, Attapol T. Rutherford

Subjects: Computation and Language (cs.CL)

Automatic dependency parsing of Thai sentences has been underexplored, as evidenced by the lack of large Thai dependency treebanks with complete dependency structures and the lack of a published systematic evaluation of state-of-the-art models, especially transformer-based parsers. In this work, we address these problems by introducing Thai Universal Dependency Treebank (TUD), a new largest Thai treebank consisting of 3,627 trees annotated in accordance with the Universal Dependencies (UD) framework. We then benchmark dependency parsing models that incorporate pretrained transformers as encoders and train them on Thai-PUD and our TUD. The evaluation results show that most of our models can outperform other models reported in previous papers and provide insight into the optimal choices of components to include in Thai dependency parsers. The new treebank and every model's full prediction generated in our experiment are made available on a GitHub repository for further study.
[454] arXiv:2405.07587 [pdf, other]: Title: Structure-Preserving Model Order Reduction for Nonlinear DAE Models of Power Networks

Authors: Muhammad Nadeem, Ahmad F. Taha

Subjects: Systems and Control (eess.SY)

This paper deals with the joint reduction of dynamic states (internal states of generator, solar, and loads, etc) and algebraic variables (states of the network e.g., voltage and phase angles) of a nonlinear differential-algebraic equation (NDAE) model of power networks. Traditionally, in the current literature of power systemmodel order reduction (MOR), the algebraic constraints are usually neglected and the power network is commonly modeled via a set of ordinary differential equations (ODEs) instead of NDAEs. Thus, reduction is usually carried out for the dynamic states only and the algebraic variables are kept intact. This leaves a significant part of the system's size and complexity unreduced. This paper addresses this aforementioned limitation, by jointly reducing both dynamic and algebraic variables. As compared to the literature the proposedMOR techniques herein are endowed with the following features: (i) no system linearization is required, (ii) requires no transformation to an equivalent or approximate ODE representation, (iii) guarantee that the reduced order model to be NDAE and thus preserves the differential-algebraic structure of original power system model, and (iv) can seamlessly reduce both dynamic and algebraic variables while maintaining high accuracy. Case studies performed on a 2000-bus power system reveal that the proposedMOR techniques are able to reduce system order while maintaining accuracy
[455] arXiv:2405.07588 [pdf, other]: Title: Practical Computation of Graph VC-Dimension

Authors: David Coudert (COATI), Mónika Csikós (IRIF (UMR\_8243)), Guillaume Ducoffe (UniBuc, ICI), Laurent Viennot (DI-ENS, ARGO)

Journal-ref: Symposium on Experimental Algorithms (SEA) 2024, Jul 2024, Vienne, Austria

Subjects: Data Structures and Algorithms (cs.DS)

For any set system $H=(V,R), \ R \subseteq 2^V$, a subset $S \subseteq V$ is called \emph{shattered} if every $S' \subseteq S$ results from the intersection of $S$ with some set in $\R$. The \emph{VC-dimension} of $H$ is the size of a largest shattered set in $V$. In this paper, we focus on the problem of computing the VC-dimension of graphs. In particular, given a graph $G=(V,E)$, the VC-dimension of $G$ is defined as the VC-dimension of $(V, \mathcal N)$, where $\mathcal N$ contains each subset of $V$ that can be obtained as the closed neighborhood of some vertex $v \in V$ in $G$. Our main contribution is an algorithm for computing the VC-dimension of any graph, whose effectiveness is shown through experiments on various types of practical graphs, including graphs with millions of vertices. A key aspect of its efficiency resides in the fact that practical graphs have small VC-dimension, up to 8 in our experiments. As a side-product, we present several new bounds relating the graph VC-dimension to other classical graph theoretical notions. We also establish the $W[1]$-hardness of the graph VC-dimension problem by extending a previous result for arbitrary set systems.
[456] arXiv:2405.07590 [pdf, other]: Title: Evaluating the Explainable AI Method Grad-CAM for Breath Classification on Newborn Time Series Data

Authors: Camelia Oprea, Mike Grüne, Mateusz Buglowski, Lena Olivier, Thorsten Orlikowsky, Stefan Kowalewski, Mark Schoberer, André Stollenwerk

Comments: \c{opyright} 2024 The authors. This work has been accepted to IFAC for publication under a Creative Commons Licence CC-BY-NC-ND. Accepted for the 12th IFAC Symposium on Biological and Medical Systems. 6 pages, 7 figures

Subjects: Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Machine Learning (cs.LG)

With the digitalization of health care systems, artificial intelligence becomes more present in medicine. Especially machine learning shows great potential for complex tasks such as time series classification, usually at the cost of transparency and comprehensibility. This leads to a lack of trust by humans and thus hinders its active usage. Explainable artificial intelligence tries to close this gap by providing insight into the decision-making process, the actual usefulness of its different methods is however unclear. This paper proposes a user study based evaluation of the explanation method Grad-CAM with application to a neural network for the classification of breaths in time series neonatal ventilation data. We present the perceived usefulness of the explainability method by different stakeholders, exposing the difficulty to achieve actual transparency and the wish for more in-depth explanations by many of the participants.
[457] arXiv:2405.07591 [pdf, ps, other]: Title: A Partially Defined Game with Payments

Authors: Satoshi Masuya

Subjects: Computer Science and Game Theory (cs.GT); Optimization and Control (math.OC)

We investigate a new problem that can be solved by using the theory of a partially defined game. We consider the situation described below: first, we assume that the worth of the grand and singleton coalitions is only known. It take some amount of costs to obtain worth of larger coalitions. If it is performed, then players make a payment from the worth of the grand coalition. That is, the worth of the grand coalition is reduced by examinations of coalitional worth. The problem of a partially defined game with payments is finding the solution of partially defined games at each point and the best exiting rule of examinations of coalitional worth.
[458] arXiv:2405.07594 [pdf, other]: Title: RGBD-Glue: General Feature Combination for Robust RGB-D Point Cloud Registration

Authors: Congjia Chen, Xiaoyu Jia, Yanhong Zheng, Yufu Qu

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Point cloud registration is a fundamental task for estimating rigid transformations between point clouds. Previous studies have used geometric information for extracting features, matching and estimating transformation. Recently, owing to the advancement of RGB-D sensors, researchers have attempted to utilize visual information to improve registration performance. However, these studies focused on extracting distinctive features by deep feature fusion, which cannot effectively solve the negative effects of each feature's weakness, and cannot sufficiently leverage the valid information. In this paper, we propose a new feature combination framework, which applies a looser but more effective fusion and can achieve better performance. An explicit filter based on transformation consistency is designed for the combination framework, which can overcome each feature's weakness. And an adaptive threshold determined by the error distribution is proposed to extract more valid information from the two types of features. Owing to the distinctive design, our proposed framework can estimate more accurate correspondences and is applicable to both hand-crafted and learning-based feature descriptors. Experiments on ScanNet show that our method achieves a state-of-the-art performance and the rotation accuracy of 99.1%.
[459] arXiv:2405.07595 [pdf, other]: Title: Environmental Matching Attack Against Unmanned Aerial Vehicles Object Detection

Authors: Dehong Kong, Siyuan Liang, Wenqi Ren

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Object detection techniques for Unmanned Aerial Vehicles (UAVs) rely on Deep Neural Networks (DNNs), which are vulnerable to adversarial attacks. Nonetheless, adversarial patches generated by existing algorithms in the UAV domain pay very little attention to the naturalness of adversarial patches. Moreover, imposing constraints directly on adversarial patches makes it difficult to generate patches that appear natural to the human eye while ensuring a high attack success rate. We notice that patches are natural looking when their overall color is consistent with the environment. Therefore, we propose a new method named Environmental Matching Attack(EMA) to address the issue of optimizing the adversarial patch under the constraints of color. To the best of our knowledge, this paper is the first to consider natural patches in the domain of UAVs. The EMA method exploits strong prior knowledge of a pretrained stable diffusion to guide the optimization direction of the adversarial patch, where the text guidance can restrict the color of the patch. To better match the environment, the contrast and brightness of the patch are appropriately adjusted. Instead of optimizing the adversarial patch itself, we optimize an adversarial perturbation patch which initializes to zero so that the model can better trade off attacking performance and naturalness. Experiments conducted on the DroneVehicle and Carpk datasets have shown that our work can reach nearly the same attack performance in the digital attack(no greater than 2 in mAP$\%$), surpass the baseline method in the physical specific scenarios, and exhibit a significant advantage in terms of naturalness in visualization and color difference with the environment.
[460] arXiv:2405.07596 [pdf, ps, other]: Title: Local Mutual-Information Differential Privacy

Authors: Khac-Hoang Ngo, Johan Östman, Alexandre Graell i Amat

Comments: submitted to the IEEE Information Theory Workshop (ITW) 2024

Subjects: Information Theory (cs.IT)

Local mutual-information differential privacy (LMIDP) is a privacy notion that aims to quantify the reduction of uncertainty about the input data when the output of a privacy-preserving mechanism is revealed. We study the relation of LMIDP with local differential privacy (LDP), the de facto standard notion of privacy in context-independent (CI) scenarios, and with local information privacy (LIP), the state-of-the-art notion for context-dependent settings. We establish explicit conversion rules, i.e., bounds on the privacy parameters for a LMIDP mechanism to also satisfy LDP/LIP, and vice versa. We use our bounds to formally verify that LMIDP is a weak privacy notion. We also show that uncorrelated Gaussian noise is the best-case noise in terms of CI-LMIDP if both the input data and the noise are subject to an average power constraint.
[461] arXiv:2405.07597 [pdf, ps, other]: Title: Using Model-Theoretic Approaches to Uncover Linguistic Organization

Authors: Olivia Griffin, Jerry Sun

Subjects: Computation and Language (cs.CL)

In this paper, we consider pluractional markers in Kaqchikel, Karuk, and Yurok. Like Balinese, each of these languages marks one type of pluractionality via reduplication, and a different type of pluractionality via non-reduplicative affixation. This paper serves as a proof-of-concept for applying model-theoretic approaches to language as a lens that can help us to recognize linguistic organization that is not apparent on the surface.
[462] arXiv:2405.07600 [pdf, other]: Title: Integrity Monitoring of 3D Object Detection in Automated Driving Systems using Raw Activation Patterns and Spatial Filtering

Authors: Hakan Yekta Yatbaz, Mehrdad Dianati, Konstantinos Koufos, Roger Woodman

Comments: Submitted to ITSC 2024. arXiv admin note: text overlap with arXiv:2404.07685

Subjects: Computer Vision and Pattern Recognition (cs.CV)

The deep neural network (DNN) models are widely used for object detection in automated driving systems (ADS). Yet, such models are prone to errors which can have serious safety implications. Introspection and self-assessment models that aim to detect such errors are therefore of paramount importance for the safe deployment of ADS. Current research on this topic has focused on techniques to monitor the integrity of the perception mechanism in ADS. Existing introspection models in the literature, however, largely concentrate on detecting perception errors by assigning equal importance to all parts of the input data frame to the perception module. This generic approach overlooks the varying safety significance of different objects within a scene, which obscures the recognition of safety-critical errors, posing challenges in assessing the reliability of perception in specific, crucial instances. Motivated by this shortcoming of state of the art, this paper proposes a novel method integrating raw activation patterns of the underlying DNNs, employed by the perception module, analysis with spatial filtering techniques. This novel approach enhances the accuracy of runtime introspection of the DNN-based 3D object detections by selectively focusing on an area of interest in the data, thereby contributing to the safety and efficacy of ADS perception self-assessment processes.
[463] arXiv:2405.07601 [pdf, other]: Title: On-device Online Learning and Semantic Management of TinyML Systems

Authors: Haoyu Ren, Xue Li, Darko Anicic, Thomas A. Runkler

Comments: Accepted by Journal Transactions on Embedded Computing Systems (TECS)

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Databases (cs.DB); Distributed, Parallel, and Cluster Computing (cs.DC)

Recent advances in Tiny Machine Learning (TinyML) empower low-footprint embedded devices for real-time on-device Machine Learning. While many acknowledge the potential benefits of TinyML, its practical implementation presents unique challenges. This study aims to bridge the gap between prototyping single TinyML models and developing reliable TinyML systems in production: (1) Embedded devices operate in dynamically changing conditions. Existing TinyML solutions primarily focus on inference, with models trained offline on powerful machines and deployed as static objects. However, static models may underperform in the real world due to evolving input data distributions. We propose online learning to enable training on constrained devices, adapting local models towards the latest field conditions. (2) Nevertheless, current on-device learning methods struggle with heterogeneous deployment conditions and the scarcity of labeled data when applied across numerous devices. We introduce federated meta-learning incorporating online learning to enhance model generalization, facilitating rapid learning. This approach ensures optimal performance among distributed devices by knowledge sharing. (3) Moreover, TinyML's pivotal advantage is widespread adoption. Embedded devices and TinyML models prioritize extreme efficiency, leading to diverse characteristics ranging from memory and sensors to model architectures. Given their diversity and non-standardized representations, managing these resources becomes challenging as TinyML systems scale up. We present semantic management for the joint management of models and devices at scale. We demonstrate our methods through a basic regression example and then assess them in three real-world TinyML applications: handwritten character image classification, keyword audio classification, and smart building presence detection, confirming our approaches' effectiveness.
[464] arXiv:2405.07603 [pdf, ps, other]: Title: Reducing Risk for Assistive Reinforcement Learning Policies with Diffusion Models

Authors: Andrii Tytarenko

Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)

Care-giving and assistive robotics, driven by advancements in AI, offer promising solutions to meet the growing demand for care, particularly in the context of increasing numbers of individuals requiring assistance. This creates a pressing need for efficient and safe assistive devices, particularly in light of heightened demand due to war-related injuries. While cost has been a barrier to accessibility, technological progress is able to democratize these solutions. Safety remains a paramount concern, especially given the intricate interactions between assistive robots and humans. This study explores the application of reinforcement learning (RL) and imitation learning, in improving policy design for assistive robots. The proposed approach makes the risky policies safer without additional environmental interactions. Through experimentation using simulated environments, the enhancement of the conventional RL approaches in tasks related to assistive robotics is demonstrated.
[465] arXiv:2405.07604 [pdf, other]: Title: Improving classifier-based effort-aware software defect prediction by reducing ranking errors

Authors: Yuchen Guo, Martin Shepperd, Ning Li

Comments: 10 pages with 12 figures. Accepted by International Conference on Evaluation and Assessment in Software Engineering (EASE) 2024

Subjects: Software Engineering (cs.SE)

Context: Software defect prediction utilizes historical data to direct software quality assurance resources to potentially problematic components. Effort-aware (EA) defect prediction prioritizes more bug-like components by taking cost-effectiveness into account. In other words, it is a ranking problem, however, existing ranking strategies based on classification, give limited consideration to ranking errors. Objective: Improve the performance of classifier-based EA ranking methods by focusing on ranking errors. Method: We propose a ranking score calculation strategy called EA-Z which sets a lower bound to avoid near-zero ranking errors. We investigate four primary EA ranking strategies with 16 classification learners, and conduct the experiments for EA-Z and the other four existing strategies. Results: Experimental results from 72 data sets show EA-Z is the best ranking score calculation strategy in terms of Recall@20% and Popt when considering all 16 learners. For particular learners, imbalanced ensemble learner UBag-svm and UBst-rf achieve top performance with EA-Z. Conclusion: Our study indicates the effectiveness of reducing ranking errors for classifier-based effort-aware defect prediction. We recommend using EA-Z with imbalanced ensemble learning.
[466] arXiv:2405.07605 [pdf, other]: Title: Empirical Application Insights on Industrial Data and Service Aspects of Digital Twin Networks

Authors: Marco Becattini, Davide Borsatti, Armir Bujari, Laura Carnevali, Andrea Garbugli, Hrant Khachatrian, Theofanis P. Raptis, Daniele Tarchi

Comments: Funding: (i) European Union under the Italian National Recovery and Resilience Plan (NRRP) of NextGenerationEU, partnership on "Telecommunications of the Future" PE00000001 - program "RESTART", (ii) RA Science Committee grant No. 22rl-052

Journal-ref: IEEE MeditCom 2024

Subjects: Networking and Internet Architecture (cs.NI); Emerging Technologies (cs.ET)

Digital twin networks (DTNs) serve as an emerging facilitator in the industrial networking sector, enabling the management of new classes of services, which require tailored support for improved resource utilization, low latencies and accurate data fidelity. In this paper, we explore the intersection between theoretical recommendations and practical implications of applying DTNs to industrial networked environments, sharing empirical findings and lessons learned from our ongoing work. To this end, we first provide experimental examples from selected aspects of data representations and fidelity, mixed-criticality workload support, and application-driven services. Then, we introduce an architectural framework for DTNs, exposing a more practical extension of existing standards; notably the ITU-T Y.3090 (2022) recommendation. Specifically, we explore and discuss the dual nature of DTNs, meant as a digital twin of the network and a network of digital twins, allowing the co-existence of both paradigms.
[467] arXiv:2405.07606 [pdf, other]: Title: AIris: An AI-powered Wearable Assistive Device for the Visually Impaired

Authors: Dionysia Danai Brilli, Evangelos Georgaras, Stefania Tsilivaki, Nikos Melanitis, Konstantina Nikita

Subjects: Human-Computer Interaction (cs.HC); Computer Vision and Pattern Recognition (cs.CV)

Assistive technologies for the visually impaired have evolved to facilitate interaction with a complex and dynamic world. In this paper, we introduce AIris, an AI-powered wearable device that provides environmental awareness and interaction capabilities to visually impaired users. AIris combines a sophisticated camera mounted on eyewear with a natural language processing interface, enabling users to receive real-time auditory descriptions of their surroundings. We have created a functional prototype system that operates effectively in real-world conditions. AIris demonstrates the ability to accurately identify objects and interpret scenes, providing users with a sense of spatial awareness previously unattainable with traditional assistive devices. The system is designed to be cost-effective and user-friendly, supporting general and specialized tasks: face recognition, scene description, text reading, object recognition, money counting, note-taking, and barcode scanning. AIris marks a transformative step, bringing AI enhancements to assistive technology, enabling rich interactions with a human-like feel.
[468] arXiv:2405.07608 [pdf, other]: Title: FNCC: Fast Notification Congestion Control in Data Center Networks

Authors: Jing Xu, Zhan Wang, Fan Yang, Ning Kang, Zhenlong Ma, Xiaoyi Lu, Rui Miao, Guojun Yuan, Guangming Tan, Ninghui Sun

Subjects: Networking and Internet Architecture (cs.NI)

Congestion control plays a pivotal role in large-scale data centers, facilitating ultra-low latency, high bandwidth, and optimal utilization. Even with the deployment of data center congestion control mechanisms such as DCQCN and HPCC, these algorithms often respond to congestion sluggishly. This sluggishness is primarily due to the slow notification of congestion. It takes almost one round-trip time (RTT) for the congestion information to reach the sender. In this paper, we introduce the Fast Notification Congestion Control (FNCC) mechanism, which achieves sub-RTT notification. FNCC leverages the acknowledgment packet (ACK) from the return path to carry in-network telemetry (INT) information of the request path, offering the sender more timely and accurate INT. To further accelerate the responsiveness of last-hop congestion control, we propose that the receiver notifies the sender of the number of concurrent congested flows, which can be used to adjust the congested flows to a fair rate quickly. Our experimental results demonstrate that FNCC reduces flow completion time by 27.4% and 88.9% compared to HPCC and DCQCN, respectively. Moreover, FNCC triggers minimal pause frames and maintains high utilization even at 400Gbps.
[469] arXiv:2405.07609 [pdf, other]: Title: NoiseBench: Benchmarking the Impact of Real Label Noise on Named Entity Recognition

Authors: Elena Merdjanovska, Ansar Aynetdinov, Alan Akbik

Comments: data available at this https URL

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Available training data for named entity recognition (NER) often contains a significant percentage of incorrect labels for entity types and entity boundaries. Such label noise poses challenges for supervised learning and may significantly deteriorate model quality. To address this, prior work proposed various noise-robust learning approaches capable of learning from data with partially incorrect labels. These approaches are typically evaluated using simulated noise where the labels in a clean dataset are automatically corrupted. However, as we show in this paper, this leads to unrealistic noise that is far easier to handle than real noise caused by human error or semi-automatic annotation. To enable the study of the impact of various types of real noise, we introduce NoiseBench, an NER benchmark consisting of clean training data corrupted with 6 types of real noise, including expert errors, crowdsourcing errors, automatic annotation errors and LLM errors. We present an analysis that shows that real noise is significantly more challenging than simulated noise, and show that current state-of-the-art models for noise-robust learning fall far short of their theoretically achievable upper bound. We release NoiseBench to the research community.
[470] arXiv:2405.07611 [pdf, other]: Title: Uncovering GNSS Interference with Aerial Mapping UAV

Authors: Marco Spanghero, Filip Geib, Ronny Panier, Panos Papadimitratos

Comments: In proceedings of the 2024 IEEE Aerospace Conference (AeroConf)

Subjects: Cryptography and Security (cs.CR)

Global Navigation Satellite System (GNSS) receivers provide ubiquitous and precise position, navigation, and time (PNT) to a wide gamut of civilian and tactical infrastructures and devices. Due to the low GNSS received signal power, even low-power radiofrequency interference (RFI) sources are a serious threat to the GNSS integrity and availability. Nonetheless, RFI source localization is paramount yet hard, especially over large areas. Methods based on multi-rotor unmanned aerial vehicles (UAV) exist but are often limited by hovering time, and require specific antenna and detectors. In comparison, fixed-wing planes allow longer missions but are more complex to operate and deploy. A vertical take-off and landing (VTOL) UAV combines the positive aspects of both platforms: high maneuverability, and long mission time and, jointly with highly integrated control systems, simple operation and deployment. Building upon the flexibility allowed by such a platform, we propose a method that combines advanced flight dynamics with high-performance consumer receivers to detect interference over large areas, with minimal interaction with the operator. The proposed system can detect multiple interference sources and map their area of influence, gaining situational awareness of poor GNSS quality or denied environments. Furthermore, it can estimate the relative heading and position of the interference source within tens of meters. The proposed method is validated with real-life measurements, successfully mapping two interference-affected areas and exposing radio equipment causing involuntary in-band interference.
[471] arXiv:2405.07615 [pdf, other]: Title: ViWikiFC: Fact-Checking for Vietnamese Wikipedia-Based Textual Knowledge Source

Authors: Hung Tuan Le, Long Truong To, Manh Trong Nguyen, Kiet Van Nguyen

Subjects: Computation and Language (cs.CL)

Fact-checking is essential due to the explosion of misinformation in the media ecosystem. Although false information exists in every language and country, most research to solve the problem mainly concentrated on huge communities like English and Chinese. Low-resource languages like Vietnamese are necessary to explore corpora and models for fact verification. To bridge this gap, we construct ViWikiFC, the first manual annotated open-domain corpus for Vietnamese Wikipedia Fact Checking more than 20K claims generated by converting evidence sentences extracted from Wikipedia articles. We analyze our corpus through many linguistic aspects, from the new dependency rate, the new n-gram rate, and the new word rate. We conducted various experiments for Vietnamese fact-checking, including evidence retrieval and verdict prediction. BM25 and InfoXLM (Large) achieved the best results in two tasks, with BM25 achieving an accuracy of 88.30% for SUPPORTS, 86.93% for REFUTES, and only 56.67% for the NEI label in the evidence retrieval task, InfoXLM (Large) achieved an F1 score of 86.51%. Furthermore, we also conducted a pipeline approach, which only achieved a strict accuracy of 67.00% when using InfoXLM (Large) and BM25. These results demonstrate that our dataset is challenging for the Vietnamese language model in fact-checking tasks.
[472] arXiv:2405.07616 [pdf, ps, other]: Title: Conditional well-posedness and data-driven method for identifying the dynamic source in a coupled diffusion system from one single boundary measurement

Authors: Chunlong Sun, Mengmeng Zhang, Zhidong Zhang

Comments: arXiv admin note: text overlap with arXiv:2307.14348

Subjects: Numerical Analysis (math.NA); Mathematical Physics (math-ph); Analysis of PDEs (math.AP)

This work considers the inverse dynamic source problem arising from the time-domain fluorescence diffuse optical tomography (FDOT). We recover the dynamic distributions of fluorophores in biological tissue by the one single boundary measurement in finite time domain. We build the uniqueness theorem of this inverse problem. After that, we introduce a weighted norm and establish the conditional stability of Lipschitz type for the inverse problem by this weighted norm. The numerical inversions are considered under the framework of the deep neural networks (DNNs). We establish the generalization error estimates rigorously derived from Lipschitz conditional stability of inverse problem. Finally, we propose the reconstruction algorithms and give several numerical examples illustrating the performance of the proposed inversion schemes.
[473] arXiv:2405.07620 [pdf, other]: Title: New Low-Dissipation Central-Upwind Schemes. Part II

Authors: Shaoshuai Chu, Alexander Kurganov, Ruixiao Xin

Subjects: Numerical Analysis (math.NA)

The low-dissipation central-upwind (LDCU) schemes have been recently introduced in [A. Kurganov and R. Xin, J. Sci. Comput., 96 (2023), Paper No. 56] as a modification of the central-upwind (CU) schemes from [{\sc A. Kurganov and C. T. Lin, Commun. Comput. Phys., 2 (2007), pp. 141-163}]. The LDCU schemes achieve much higher resolution of contact waves and many (two-dimensional) structures resulting from complicated wave interaction. However, the LDCU schemes sometimes produce more oscillatory results compared with the CU schemes, especially near the computational domain boundaries.
In this paper, we propose a very simple -- yet systematic -- modification of the LDCU schemes, which completely eliminates the aforementioned oscillations almost without affecting the quality of the computed solution.
[474] arXiv:2405.07621 [pdf, other]: Title: Towards Adaptive IMFs -- Generalization of utility functions in Multi-Agent Frameworks

Authors: Kaushik Dey, Satheesh K. Perepu, Abir Das

Comments: Accepted in Netsoft-2024 conference

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Intent Management Function (IMF) is an integral part of future-generation networks. In recent years, there has been some work on AI-based IMFs that can handle conflicting intents and prioritize the global objective based on apriori definition of the utility function and accorded priorities for competing intents. Some of the earlier works use Multi-Agent Reinforcement Learning (MARL) techniques with AdHoc Teaming (AHT) approaches for efficient conflict handling in IMF. However, the success of such frameworks in real-life scenarios requires them to be flexible to business situations. The intent priorities can change and the utility function, which measures the extent of intent fulfilment, may also vary in definition. This paper proposes a novel mechanism whereby the IMF can generalize to different forms of utility functions and change of intent priorities at run-time without additional training. Such generalization ability, without additional training requirements, would help to deploy IMF in live networks where customer intents and priorities change frequently. Results on the network emulator demonstrate the efficacy of the approach, scalability for new intents, outperforming existing techniques that require additional training to achieve the same degree of flexibility thereby saving cost, and increasing efficiency and adaptability.
[475] arXiv:2405.07623 [pdf, other]: Title: COBias and Debias: Minimizing Language Model Pairwise Accuracy Bias via Nonlinear Integer Programming

Authors: Ruixi Lin, Yang You

Subjects: Computation and Language (cs.CL)

For language model classification, would you prefer having only one workable class or having every class working? The latter makes more practical uses. Especially for large language models (LLMs), the fact that they achieve a fair overall accuracy by in-context learning (ICL) obscures a large difference in individual class accuracies. In this work, we uncover and tackle language models' imbalance in per-class prediction accuracy by reconceptualizing it as the Contextual Oddity Bias (COBias), and we are the first to engage nonlinear integer programming (NIP) to debias it. Briefly, COBias refers to the difference in accuracy by a class A compared to its ''odd'' class, which holds the majority wrong predictions of class A. With the COBias metric, we reveal that LLMs of varied scales and families exhibit large per-class accuracy differences. Then we propose Debiasing as Nonlinear Integer Programming (DNIP) to correct ICL per-class probabilities for lower bias and higher overall accuracy. Our optimization objective is directly based on the evaluation scores by COBias and accuracy metrics, solved by simulated annealing. Evaluations on three LLMs across seven NLP classification tasks show that DNIP simultaneously achieves significant COBias reduction ($-27\%$) and accuracy improvement ($+12\%$) over the conventional ICL approach, suggesting that modeling pairwise class accuracy differences is a direction in pushing forward more accurate, more reliable LLM predictions.
[476] arXiv:2405.07626 [pdf, other]: Title: AnomalyLLM: Few-shot Anomaly Edge Detection for Dynamic Graphs using Large Language Models

Authors: Shuo Liu, Di Yao, Lanting Fang, Zhetao Li, Wenbin Li, Kaiyu Feng, XiaoWen Ji, Jingping Bi

Comments: 13pages

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Detecting anomaly edges for dynamic graphs aims to identify edges significantly deviating from the normal pattern and can be applied in various domains, such as cybersecurity, financial transactions and AIOps. With the evolving of time, the types of anomaly edges are emerging and the labeled anomaly samples are few for each type. Current methods are either designed to detect randomly inserted edges or require sufficient labeled data for model training, which harms their applicability for real-world applications. In this paper, we study this problem by cooperating with the rich knowledge encoded in large language models(LLMs) and propose a method, namely AnomalyLLM. To align the dynamic graph with LLMs, AnomalyLLM pre-trains a dynamic-aware encoder to generate the representations of edges and reprograms the edges using the prototypes of word embeddings. Along with the encoder, we design an in-context learning framework that integrates the information of a few labeled samples to achieve few-shot anomaly detection. Experiments on four datasets reveal that AnomalyLLM can not only significantly improve the performance of few-shot anomaly detection, but also achieve superior results on new anomalies without any update of model parameters.
[477] arXiv:2405.07627 [pdf, other]: Title: End-to-End Delivery in LEO Mega-constellations and the Reordering Problem

Authors: Rasmus Sibbern Frederiksen, Thomas Gundgaard Mulvad, Israel Leyva-Mayorga, Tatiana Kozlova Madsen, Federico Chiariotti

Comments: Submitted to IEEE PIMRC Workshops 2024

Subjects: Networking and Internet Architecture (cs.NI)

Low Earth orbit (LEO) satellite mega-constellations with hundreds or thousands of satellites and inter-satellite links (ISLs) have the potential to provide global end-to-end connectivity. Furthermore, if the physical distance between source and destination is sufficiently long, end-to-end routing over the LEO constellation can provide lower latency when compared to the terrestrial infrastructure due to the faster propagation of electromagnetic waves in space than in optic fiber. However, the frequent route changes due to the movement of the satellites result in the out-of-order delivery of packets, causing sudden changes to the Round-Trip Time (RTT) that can be misinterpreted as congestion by congestion control algorithms. In this paper, the performance of three widely used congestion control algorithms, Cubic, Reno, and BBR, is evaluated in an emulated LEO satellite constellation with Free-Space Optical (FSO) ISLs. Furthermore, we perform a sensitivity analysis for Cubic by changing the satellite constellation parameters, length of the routes, and the positions of the source and destination to identify problematic routing scenarios. The results show that route changes can have profound transient effects on the goodput of the connection, posing problems for typical broadband applications.
[478] arXiv:2405.07637 [pdf, ps, other]: Title: Near-Optimal Regret in Linear MDPs with Aggregate Bandit Feedback

Authors: Asaf Cassel, Haipeng Luo, Aviv Rosenberg, Dmitry Sotnikov

Subjects: Machine Learning (cs.LG)

In many real-world applications, it is hard to provide a reward signal in each step of a Reinforcement Learning (RL) process and more natural to give feedback when an episode ends. To this end, we study the recently proposed model of RL with Aggregate Bandit Feedback (RL-ABF), where the agent only observes the sum of rewards at the end of an episode instead of each reward individually. Prior work studied RL-ABF only in tabular settings, where the number of states is assumed to be small. In this paper, we extend ABF to linear function approximation and develop two efficient algorithms with near-optimal regret guarantees: a value-based optimistic algorithm built on a new randomization technique with a Q-functions ensemble, and a policy optimization algorithm that uses a novel hedging scheme over the ensemble.
[479] arXiv:2405.07638 [pdf, other]: Title: DoLLM: How Large Language Models Understanding Network Flow Data to Detect Carpet Bombing DDoS

Authors: Qingyang Li, Yihang Zhang, Zhidong Jia, Yannan Hu, Lei Zhang, Jianrong Zhang, Yongming Xu, Yong Cui, Zongming Guo, Xinggong Zhang

Subjects: Networking and Internet Architecture (cs.NI); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)

It is an interesting question Can and How Large Language Models (LLMs) understand non-language network data, and help us detect unknown malicious flows. This paper takes Carpet Bombing as a case study and shows how to exploit LLMs' powerful capability in the networking area. Carpet Bombing is a new DDoS attack that has dramatically increased in recent years, significantly threatening network infrastructures. It targets multiple victim IPs within subnets, causing congestion on access links and disrupting network services for a vast number of users. Characterized by low-rates, multi-vectors, these attacks challenge traditional DDoS defenses. We propose DoLLM, a DDoS detection model utilizes open-source LLMs as backbone. By reorganizing non-contextual network flows into Flow-Sequences and projecting them into LLMs semantic space as token embeddings, DoLLM leverages LLMs' contextual understanding to extract flow representations in overall network context. The representations are used to improve the DDoS detection performance. We evaluate DoLLM with public datasets CIC-DDoS2019 and real NetFlow trace from Top-3 countrywide ISP. The tests have proven that DoLLM possesses strong detection capabilities. Its F1 score increased by up to 33.3% in zero-shot scenarios and by at least 20.6% in real ISP traces.
[480] arXiv:2405.07640 [pdf, other]: Title: Hyperparameter Importance Analysis for Multi-Objective AutoML

Authors: Daphne Theodorakopoulos (1 and 2), Frederic Stahl (1), Marius Lindauer (2 and 3) ((1) Marine Perception Research Department, German Research Center for Artificial Intelligence (DFKI), (2) Institute of Artificial Intelligence (LUH|AI), L3S Research Center, Leibniz University Hannover, (3) L3S Research Center)

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Hyperparameter optimization plays a pivotal role in enhancing the predictive performance and generalization capabilities of ML models. However, in many applications, we do not only care about predictive performance but also about objectives such as inference time, memory, or energy consumption. In such MOO scenarios, determining the importance of hyperparameters poses a significant challenge due to the complex interplay between the conflicting objectives. In this paper, we propose the first method for assessing the importance of hyperparameters in the context of multi-objective hyperparameter optimization. Our approach leverages surrogate-based hyperparameter importance (HPI) measures, i.e. fANOVA and ablation paths, to provide insights into the impact of hyperparameters on the optimization objectives. Specifically, we compute the a-priori scalarization of the objectives and determine the importance of the hyperparameters for different objective tradeoffs. Through extensive empirical evaluations on diverse benchmark datasets with three different objectives paired with accuracy, namely time, demographic parity, and energy consumption, we demonstrate the effectiveness and robustness of our proposed method. Our findings not only offer valuable guidance for hyperparameter tuning in MOO tasks but also contribute to advancing the understanding of HPI in complex optimization scenarios.
[481] arXiv:2405.07644 [pdf, other]: Title: A Hessian-Based Field Deformer for Real-Time Topology-Aware Shape Editing

Authors: Yunxiao Zhang, Zixiong Wang, Zihan Zhao, Rui Xu, Shuangmin Chen, Shiqing Xin, Wenping Wang, Changhe Tu

Comments: 10 pages, 18 figures

Subjects: Graphics (cs.GR)

Shape manipulation is a central research topic in computer graphics. Topology editing, such as breaking apart connections, joining disconnected ends, and filling/opening a topological hole, is generally more challenging than geometry editing. In this paper, we observe that the saddle points of the signed distance function (SDF) provide useful hints for altering surface topology deliberately. Based on this key observation, we parameterize the SDF into a cubic trivariate tensor-product B-spline function $F$ whose saddle points $\{\boldsymbol{s}_i\}$ can be quickly exhausted based on a subdivision-based root-finding technique coupled with Newton's method. Users can select one of the candidate points, say $\boldsymbol{s}_i$, to edit the topology in real time. In implementation, we add a compactly supported B-spline function rooted at $\boldsymbol{s}_i$, which we call a \textit{deformer} in this paper, to $F$, with its local coordinate system aligning with the three eigenvectors of the Hessian. Combined with ray marching technique, our interactive system operates at 30 FPS. Additionally, our system empowers users to create desired bulges or concavities on the surface. An extensive user study indicates that our system is user-friendly and intuitive to operate. We demonstrate the effectiveness and usefulness of our system in a range of applications, including fixing surface reconstruction errors, artistic work design, 3D medical imaging and simulation, and antiquity restoration. Please refer to the attached video for a demonstration.
[482] arXiv:2405.07647 [pdf, ps, other]: Title: Fuzzy Logic Weight based Coordination Scheme for Utilizing Electric Vehicle Charging Stations

Authors: Shahid Hussain, Young-Chon Kim

Subjects: Emerging Technologies (cs.ET)

The larger battery capacities and the longer waiting and charging time of electric vehicles (EVs) results in low utilization of charging stations (CSs). This paper, proposes fuzzy logic weight based coordination (FLWC) scheme to enhance the utilization of CSs. Each EV has an associated uncertain information including stay time and the current state-of-charge (SoC). The fuzzy logic controller (FLC) analyze these inputs and determines a weight value. The proposed FLWC scheme then allocates CSs to the EVs according to the weight values. The proposed scheme is simulated for 100 EVs and 5 CSs using Matlab. The simulation result shows about 30% improvement in the average utilization of CSs as compared to first-come-first-serve (FCFS) based scheme.
[483] arXiv:2405.07648 [pdf, other]: Title: CDFormer:When Degradation Prediction Embraces Diffusion Model for Blind Image Super-Resolution

Authors: Qingguo Liu, Chenyi Zhuang, Pan Gao, Jie Qin

Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)

Existing Blind image Super-Resolution (BSR) methods focus on estimating either kernel or degradation information, but have long overlooked the essential content details. In this paper, we propose a novel BSR approach, Content-aware Degradation-driven Transformer (CDFormer), to capture both degradation and content representations. However, low-resolution images cannot provide enough content details, and thus we introduce a diffusion-based module $CDFormer_{diff}$ to first learn Content Degradation Prior (CDP) in both low- and high-resolution images, and then approximate the real distribution given only low-resolution information. Moreover, we apply an adaptive SR network $CDFormer_{SR}$ that effectively utilizes CDP to refine features. Compared to previous diffusion-based SR methods, we treat the diffusion model as an estimator that can overcome the limitations of expensive sampling time and excessive diversity. Experiments show that CDFormer can outperform existing methods, establishing a new state-of-the-art performance on various benchmarks under blind settings. Codes and models will be available at \href{https://github.com/I2-Multimedia-Lab/CDFormer}{https://github.com/I2-Multimedia-Lab/CDFormer}.
[484] arXiv:2405.07652 [pdf, other]: Title: G-VOILA: Gaze-Facilitated Information Querying in Daily Scenarios

Authors: Zeyu Wang, Yuanchun Shi, Yuntao Wang, Yuchen Yao, Kun Yan, Yuhan Wang, Lei Ji, Xuhai Xu, Chun Yu

Comments: 25 pages, 12 figures

Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI)

Modern information querying systems are progressively incorporating multimodal inputs like vision and audio. However, the integration of gaze -- a modality deeply linked to user intent and increasingly accessible via gaze-tracking wearables -- remains underexplored. This paper introduces a novel gaze-facilitated information querying paradigm, named G-VOILA, which synergizes users' gaze, visual field, and voice-based natural language queries to facilitate a more intuitive querying process. In a user-enactment study involving 21 participants in 3 daily scenarios (p = 21, scene = 3), we revealed the ambiguity in users' query language and a gaze-voice coordination pattern in users' natural query behaviors with G-VOILA. Based on the quantitative and qualitative findings, we developed a design framework for the G-VOILA paradigm, which effectively integrates the gaze data with the in-situ querying context. Then we implemented a G-VOILA proof-of-concept using cutting-edge deep learning techniques. A follow-up user study (p = 16, scene = 2) demonstrates its effectiveness by achieving both higher objective score and subjective score, compared to a baseline without gaze data. We further conducted interviews and provided insights for future gaze-facilitated information querying systems.
[485] arXiv:2405.07653 [pdf, other]: Title: Fast Training Data Acquisition for Object Detection and Segmentation using Black Screen Luminance Keying

Authors: Thomas Pöllabauer, Volker Knauthe, André Boller, Arjan Kuijper, Dieter Fellner

Comments: 32. International Conference in Central Europe on Computer Graphics, Visualization and Computer Vision'2024

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Deep Neural Networks (DNNs) require large amounts of annotated training data for a good performance. Often this data is generated using manual labeling (error-prone and time-consuming) or rendering (requiring geometry and material information). Both approaches make it difficult or uneconomic to apply them to many small-scale applications. A fast and straightforward approach of acquiring the necessary training data would allow the adoption of deep learning to even the smallest of applications. Chroma keying is the process of replacing a color (usually blue or green) with another background. Instead of chroma keying, we propose luminance keying for fast and straightforward training image acquisition. We deploy a black screen with high light absorption (99.99\%) to record roughly 1-minute long videos of our target objects, circumventing typical problems of chroma keying, such as color bleeding or color overlap between background color and object color. Next we automatically mask our objects using simple brightness thresholding, saving the need for manual annotation. Finally, we automatically place the objects on random backgrounds and train a 2D object detector. We do extensive evaluation of the performance on the widely-used YCB-V object set and compare favourably to other conventional techniques such as rendering, without needing 3D meshes, materials or any other information of our target objects and in a fraction of the time needed for other approaches. Our work demonstrates highly accurate training data acquisition allowing to start training state-of-the-art networks within minutes.
[486] arXiv:2405.07655 [pdf, other]: Title: Quality-aware Selective Fusion Network for V-D-T Salient Object Detection

Authors: Liuxin Bao, Xiaofei Zhou, Xiankai Lu, Yaoqi Sun, Haibing Yin, Zhenghui Hu, Jiyong Zhang, Chenggang Yan

Comments: Accepted by IEEE Transactions on Image Processing (TIP)

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Depth images and thermal images contain the spatial geometry information and surface temperature information, which can act as complementary information for the RGB modality. However, the quality of the depth and thermal images is often unreliable in some challenging scenarios, which will result in the performance degradation of the two-modal based salient object detection (SOD). Meanwhile, some researchers pay attention to the triple-modal SOD task, where they attempt to explore the complementarity of the RGB image, the depth image, and the thermal image. However, existing triple-modal SOD methods fail to perceive the quality of depth maps and thermal images, which leads to performance degradation when dealing with scenes with low-quality depth and thermal images. Therefore, we propose a quality-aware selective fusion network (QSF-Net) to conduct VDT salient object detection, which contains three subnets including the initial feature extraction subnet, the quality-aware region selection subnet, and the region-guided selective fusion subnet. Firstly, except for extracting features, the initial feature extraction subnet can generate a preliminary prediction map from each modality via a shrinkage pyramid architecture. Then, we design the weakly-supervised quality-aware region selection subnet to generate the quality-aware maps. Concretely, we first find the high-quality and low-quality regions by using the preliminary predictions, which further constitute the pseudo label that can be used to train this subnet. Finally, the region-guided selective fusion subnet purifies the initial features under the guidance of the quality-aware maps, and then fuses the triple-modal features and refines the edge details of prediction maps through the intra-modality and inter-modality attention (IIA) module and the edge refinement (ER) module, respectively. Extensive experiments are performed on VDT-2048
[487] arXiv:2405.07656 [pdf, ps, other]: Title: Non-Rigid Designators in Modal and Temporal Free Description Logics (Extended Version)

Authors: Alessandro Artale, Roman Kontchakov, Andrea Mazzullo, Frank Wolter

Subjects: Logic in Computer Science (cs.LO)

Definite descriptions, such as 'the General Chair of KR 2024', are a semantically transparent device for object identification in knowledge representation. In first-order modal logic, definite descriptions have been widely investigated for their non-rigidity, which allows them to designate different objects (or none at all) at different states. We propose expressive modal description logics with non-rigid definite descriptions and names, and investigate decidability and complexity of the satisfaction problem. We first systematically link satisfiability for the one-variable fragment of first-order modal logic with counting to our modal description logics. Then, we prove a promising NEXPTIME-completeness result for concept satisfiability for the fundamental epistemic multi-agent logic $\mathbf{S5}^{n}$ and its neighbours, and show that some expressive logics that are undecidable with constant domain become decidable (but Ackermann-hard) with expanding domains. Finally, we conduct a fine-grained analysis of decidability of temporal logics.
[488] arXiv:2405.07658 [pdf, ps, other]: Title: Understanding Data Understanding: A Framework to Navigate the Intricacies of Data Analytics

Authors: Joshua Holstein, Philipp Spitzer, Marieke Hoell, Michael Vössing, Niklas Kühl

Comments: Accepted at 32nd European Conference on Information Systems (2024)

Subjects: Human-Computer Interaction (cs.HC)

As organizations face the challenges of processing exponentially growing data volumes, their reliance on analytics to unlock value from this data has intensified. However, the intricacies of big data, such as its extensive feature sets, pose significant challenges. A crucial step in leveraging this data for insightful analysis is an in-depth understanding of both the data and its domain. Yet, existing literature presents a fragmented picture of what comprises an effective understanding of data and domain, varying significantly in depth and focus. To address this research gap, we conduct a systematic literature review, aiming to delineate the dimensions of data understanding. We identify five dimensions: Foundations, Collection & Selection, Contextualization & Integration, Exploration & Discovery, and Insights. These dimensions collectively form a comprehensive framework for data understanding, providing guidance for organizations seeking meaningful insights from complex datasets. This study synthesizes the current state of knowledge and lays the groundwork for further exploration.
[489] arXiv:2405.07662 [pdf, other]: Title: Squeezing Lemons with Hammers: An Evaluation of AutoML and Tabular Deep Learning for Data-Scarce Classification Applications

Authors: Ricardo Knauer, Erik Rodner

Comments: ICLR 2024 Workshop on Practical ML for Low Resource Settings

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Many industry verticals are confronted with small-sized tabular data. In this low-data regime, it is currently unclear whether the best performance can be expected from simple baselines, or more complex machine learning approaches that leverage meta-learning and ensembling. On 44 tabular classification datasets with sample sizes $\leq$ 500, we find that L2-regularized logistic regression performs similar to state-of-the-art automated machine learning (AutoML) frameworks (AutoPrognosis, AutoGluon) and off-the-shelf deep neural networks (TabPFN, HyperFast) on the majority of the benchmark datasets. We therefore recommend to consider logistic regression as the first choice for data-scarce applications with tabular data and provide practitioners with best practices for further method selection.
[490] arXiv:2405.07663 [pdf, other]: Title: Sign Stitching: A Novel Approach to Sign Language Production

Authors: Harry Walsh, Ben Saunders, Richard Bowden

Comments: 18 pages, 3 figures, 4 tables

Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)

Sign Language Production (SLP) is a challenging task, given the limited resources available and the inherent diversity within sign data. As a result, previous works have suffered from the problem of regression to the mean, leading to under-articulated and incomprehensible signing. In this paper, we propose using dictionary examples and a learnt codebook of facial expressions to create expressive sign language sequences. However, simply concatenating signs and adding the face creates robotic and unnatural sequences. To address this we present a 7-step approach to effectively stitch sequences together. First, by normalizing each sign into a canonical pose, cropping, and stitching we create a continuous sequence. Then, by applying filtering in the frequency domain and resampling each sign, we create cohesive natural sequences that mimic the prosody found in the original data. We leverage a SignGAN model to map the output to a photo-realistic signer and present a complete Text-to-Sign (T2S) SLP pipeline. Our evaluation demonstrates the effectiveness of the approach, showcasing state-of-the-art performance across all datasets. Finally, a user evaluation shows our approach outperforms the baseline model and is capable of producing realistic sign language sequences.
[491] arXiv:2405.07664 [pdf, ps, other]: Title: Geospatial Knowledge Graphs

Authors: Rui Zhu

Subjects: Artificial Intelligence (cs.AI)

Geospatial knowledge graphs have emerged as a novel paradigm for representing and reasoning over geospatial information. In this framework, entities such as places, people, events, and observations are depicted as nodes, while their relationships are represented as edges. This graph-based data format lays the foundation for creating a "FAIR" (Findable, Accessible, Interoperable, and Reusable) environment, facilitating the management and analysis of geographic information. This entry first introduces key concepts in knowledge graphs along with their associated standardization and tools. It then delves into the application of knowledge graphs in geography and environmental sciences, emphasizing their role in bridging symbolic and subsymbolic GeoAI to address cross-disciplinary geospatial challenges. At the end, new research directions related to geospatial knowledge graphs are outlined.
[492] arXiv:2405.07665 [pdf, other]: Title: Partial information decomposition as information bottleneck

Authors: Artemy Kolchinsky

Subjects: Information Theory (cs.IT); Machine Learning (stat.ML)

The partial information decomposition (PID) aims to quantify the amount of redundant information that a set of sources provide about a target. Here we show that this goal can be formulated as a type of information bottleneck (IB) problem, which we term the "redundancy bottleneck" (RB). The RB formalizes a tradeoff between prediction and compression: it extracts information from the sources that predicts the target, without revealing which source provided the information. It can be understood as a generalization "Blackwell redundancy", which we previously proposed as a principled measure of PID redundancy. The "RB curve" quantifies the prediction/compression tradeoff at multiple scales. This curve can also be quantified for individual sources, allowing subsets of redundant sources to be identified without combinatorial optimization. We provide an efficient iterative algorithm for computing the RB curve.
[493] arXiv:2405.07666 [pdf, other]: Title: New Solutions to Delsarte's Dual Linear Programs

Authors: André Chailloux, Thomas Debris-Alazard

Subjects: Information Theory (cs.IT); Discrete Mathematics (cs.DM)

Understanding the maximum size of a code with a given minimum distance is a major question in computer science and discrete mathematics. The most fruitful approach for finding asymptotic bounds on such codes is by using Delsarte's theory of association schemes. With this approach, Delsarte constructs a linear program such that its maximum value is an upper bound on the maximum size of a code with a given minimum distance. Bounding this value can be done by finding solutions to the corresponding dual linear program. Delsarte's theory is very general and goes way beyond binary codes.
In this work, we provide universal bounds in the framework of association schemes that generalize the Hamming bound and the Elias-Bassalygo bound, which can be applied to any association scheme constructed from a distance function. These bounds are obtained by constructing new solutions to Delsarte's dual linear program. We instantiate these results and we recover known bounds for $q$-ary codes and for constant-weight binary codes but which didn't come from the linear program method. Our other contribution is to recover, for essentially any $Q$-polynomial scheme, MRRW-type solutions to Delsarte's dual linear program which are inspired by the Laplacian approach of Friedman and Tillich instead of using the Christoffel-Darboux formulas. We show in particular how the second linear programming bound can be interpreted in this framework.
[494] arXiv:2405.07667 [pdf, other]: Title: Backdoor Removal for Generative Large Language Models

Authors: Haoran Li, Yulin Chen, Zihao Zheng, Qi Hu, Chunkit Chan, Heshan Liu, Yangqiu Song

Subjects: Cryptography and Security (cs.CR); Computation and Language (cs.CL)

With rapid advances, generative large language models (LLMs) dominate various Natural Language Processing (NLP) tasks from understanding to reasoning. Yet, language models' inherent vulnerabilities may be exacerbated due to increased accessibility and unrestricted model training on massive textual data from the Internet. A malicious adversary may publish poisoned data online and conduct backdoor attacks on the victim LLMs pre-trained on the poisoned data. Backdoored LLMs behave innocuously for normal queries and generate harmful responses when the backdoor trigger is activated. Despite significant efforts paid to LLMs' safety issues, LLMs are still struggling against backdoor attacks. As Anthropic recently revealed, existing safety training strategies, including supervised fine-tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF), fail to revoke the backdoors once the LLM is backdoored during the pre-training stage. In this paper, we present Simulate and Eliminate (SANDE) to erase the undesired backdoored mappings for generative LLMs. We initially propose Overwrite Supervised Fine-tuning (OSFT) for effective backdoor removal when the trigger is known. Then, to handle the scenarios where the trigger patterns are unknown, we integrate OSFT into our two-stage framework, SANDE. Unlike previous works that center on the identification of backdoors, our safety-enhanced LLMs are able to behave normally even when the exact triggers are activated. We conduct comprehensive experiments to show that our proposed SANDE is effective against backdoor attacks while bringing minimal harm to LLMs' powerful capability without any additional access to unbackdoored clean models. We will release the reproducible code.
[495] arXiv:2405.07668 [pdf, other]: Title: CrossCert: A Cross-Checking Detection Approach to Patch Robustness Certification for Deep Learning Models

Authors: Qilin Zhou, Zhengyuan Wei, Haipeng Wang, Bo Jiang, W.K. Chan

Comments: 23 pages, 2 figures, accepted by FSE 2024 (The ACM International Conference on the Foundations of Software Engineering)

Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)

Patch robustness certification is an emerging kind of defense technique against adversarial patch attacks with provable guarantees. There are two research lines: certified recovery and certified detection. They aim to label malicious samples with provable guarantees correctly and issue warnings for malicious samples predicted to non-benign labels with provable guarantees, respectively. However, existing certified detection defenders suffer from protecting labels subject to manipulation, and existing certified recovery defenders cannot systematically warn samples about their labels. A certified defense that simultaneously offers robust labels and systematic warning protection against patch attacks is desirable. This paper proposes a novel certified defense technique called CrossCert. CrossCert formulates a novel approach by cross-checking two certified recovery defenders to provide unwavering certification and detection certification. Unwavering certification ensures that a certified sample, when subjected to a patched perturbation, will always be returned with a benign label without triggering any warnings with a provable guarantee. To our knowledge, CrossCert is the first certified detection technique to offer this guarantee. Our experiments show that, with a slightly lower performance than ViP and comparable performance with PatchCensor in terms of detection certification, CrossCert certifies a significant proportion of samples with the guarantee of unwavering certification.
[496] arXiv:2405.07670 [pdf, other]: Title: Impact of white Gaussian internal noise on analog echo-state neural networks

Authors: Nadezhda Semenova

Comments: 10 pages 8 figures

Subjects: Neural and Evolutionary Computing (cs.NE); Machine Learning (cs.LG)

In recent years, more and more works have appeared devoted to the analog (hardware) implementation of artificial neural networks, in which neurons and the connection between them are based not on computer calculations, but on physical principles. Such networks offer improved energy efficiency and, in some cases, scalability, but may be susceptible to internal noise. This paper studies the influence of noise on the functioning of recurrent networks using the example of trained echo state networks (ESNs). The most common reservoir connection matrices were chosen as various topologies of ESNs: random uniform and band matrices with different connectivity. White Gaussian noise was chosen as the influence, and according to the way of its introducing it was additive or multiplicative, as well as correlated or uncorrelated. In the paper, we show that the propagation of noise in reservoir is mainly controlled by the statistical properties of the output connection matrix, namely the mean and the mean square. Depending on these values, more correlated or uncorrelated noise accumulates in the network. We also show that there are conditions under which even noise with an intensity of $10^{-20}$ is already enough to completely lose the useful signal. In the article we show which types of noise are most critical for networks with different activation functions (hyperbolic tangent, sigmoid and linear) and if the network is self-closed.
[497] arXiv:2405.07671 [pdf, other]: Title: Constructing a BPE Tokenization DFA

Authors: Martin Berglund, Willeke Martens, Brink van der Merwe

Subjects: Formal Languages and Automata Theory (cs.FL); Computation and Language (cs.CL); Machine Learning (cs.LG)

Many natural language processing systems operate over tokenizations of text to address the open-vocabulary problem. In this paper, we give and analyze an algorithm for the efficient construction of deterministic finite automata designed to operate directly on tokenizations produced by the popular byte pair encoding technique. This makes it possible to apply many existing techniques and algorithms to the tokenized case, such as pattern matching, equivalence checking of tokenization dictionaries, and composing tokenized languages in various ways.
[498] arXiv:2405.07673 [pdf, other]: Title: An Empirical Study on the Robustness of Massively Multilingual Neural Machine Translation

Authors: Supryadi, Leiyu Pan, Deyi Xiong

Comments: 12 pages, 6 figures

Subjects: Computation and Language (cs.CL)

Massively multilingual neural machine translation (MMNMT) has been proven to enhance the translation quality of low-resource languages. In this paper, we empirically investigate the translation robustness of Indonesian-Chinese translation in the face of various naturally occurring noise. To assess this, we create a robustness evaluation benchmark dataset for Indonesian-Chinese translation. This dataset is automatically translated into Chinese using four NLLB-200 models of different sizes. We conduct both automatic and human evaluations. Our in-depth analysis reveal the correlations between translation error types and the types of noise present, how these correlations change across different model sizes, and the relationships between automatic evaluation indicators and human evaluation indicators. The dataset is publicly available at https://github.com/tjunlp-lab/ID-ZH-MTRobustEval.
[499] arXiv:2405.07679 [pdf, other]: Title: Class-wise Activation Unravelling the Engima of Deep Double Descent

Authors: Yufei Gu

Comments: arXiv admin note: text overlap with arXiv:2310.13572

Subjects: Machine Learning (cs.LG)

Double descent presents a counter-intuitive aspect within the machine learning domain, and researchers have observed its manifestation in various models and tasks. While some theoretical explanations have been proposed for this phenomenon in specific contexts, an accepted theory for its occurring mechanism in deep learning remains yet to be established. In this study, we revisited the phenomenon of double descent and discussed the conditions of its occurrence. This paper introduces the concept of class-activation matrices and a methodology for estimating the effective complexity of functions, on which we unveil that over-parameterized models exhibit more distinct and simpler class patterns in hidden activations compared to under-parameterized ones. We further looked into the interpolation of noisy labelled data among clean representations and demonstrated overfitting w.r.t. expressive capacity. By comprehensively analysing hypotheses and presenting corresponding empirical evidence that either validates or contradicts these hypotheses, we aim to provide fresh insights into the phenomenon of double descent and benign over-parameterization and facilitate future explorations. By comprehensively studying different hypotheses and the corresponding empirical evidence either supports or challenges these hypotheses, our goal is to offer new insights into the phenomena of double descent and benign over-parameterization, thereby enabling further explorations in the field. The source code is available at https://github.com/Yufei-Gu-451/sparse-generalization.git.
[500] arXiv:2405.07680 [pdf, other]: Title: Establishing a Unified Evaluation Framework for Human Motion Generation: A Comparative Analysis of Metrics

Authors: Ali Ismail-Fawaz, Maxime Devanne, Stefano Berretti, Jonathan Weber, Germain Forestier

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

The development of generative artificial intelligence for human motion generation has expanded rapidly, necessitating a unified evaluation framework. This paper presents a detailed review of eight evaluation metrics for human motion generation, highlighting their unique features and shortcomings. We propose standardized practices through a unified evaluation setup to facilitate consistent model comparisons. Additionally, we introduce a novel metric that assesses diversity in temporal distortion by analyzing warping diversity, thereby enhancing the evaluation of temporal data. We also conduct experimental analyses of three generative models using a publicly available dataset, offering insights into the interpretation of each metric in specific case scenarios. Our goal is to offer a clear, user-friendly evaluation framework for newcomers, complemented by publicly accessible code.

[ total of 1149 entries: 1-500 | 501-1000 | 1001-1149 ]
[ showing 500 entries per page: fewer | more | all ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, recent, 2405, contact, help (Access key information)

> cs

Computer Science

New submissions

New submissions for Tue, 14 May 24 (showing first 500 of 634 entries)