We gratefully acknowledge support from
the Simons Foundation and member institutions.

Electrical Engineering and Systems Science

New submissions

[ total of 84 entries: 1-84 ]
[ showing up to 2000 entries per page: fewer | more ]

New submissions for Fri, 26 Apr 24

[1]  arXiv:2404.16080 [pdf, other]
Title: Enhancing Diagnosis through AI-driven Analysis of Reflectance Confocal Microscopy
Subjects: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

Reflectance Confocal Microscopy (RCM) is a non-invasive imaging technique used in biomedical research and clinical dermatology. It provides virtual high-resolution images of the skin and superficial tissues, reducing the need for physical biopsies. RCM employs a laser light source to illuminate the tissue, capturing the reflected light to generate detailed images of microscopic structures at various depths. Recent studies explored AI and machine learning, particularly CNNs, for analyzing RCM images. Our study proposes a segmentation strategy based on textural features to identify clinically significant regions, empowering dermatologists in effective image interpretation and boosting diagnostic confidence. This approach promises to advance dermatological diagnosis and treatment.

[2]  arXiv:2404.16104 [pdf, other]
Title: Evolution of Voices in French Audiovisual Media Across Genders and Age in a Diachronic Perspective
Comments: 5 pages, 2 figures, keywords:, Gender, Diachrony, Vocal Tract Resonance, Vocal register, Broadcast speech
Journal-ref: Radek Skarnitzl & Jan Vol\'in (Eds.), Proceedings of the 20th International Congress of Phonetic Sciences (ICPhS), Prague 2023, pp. 753-757. Guarant International. ISBN 978-80-908 114-2-3
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)

We present a diachronic acoustic analysis of the voice of 1023 speakers from French media archives. The speakers are spread across 32 categories based on four periods (years 1955/56, 1975/76, 1995/96, 2015/16), four age groups (20-35; 36-50; 51-65, >65), and two genders. The fundamental frequency ($F_0$) and the first four formants (F1-4) were estimated. Procedures used to ensure the quality of these estimations on heterogeneous data are described. From each speaker's $F_0$ distribution, the base-$F_0$ value was calculated to estimate the register. Average vocal tract length was estimated from formant frequencies. Base-$F_0$ and vocal tract length were fit by linear mixed models to evaluate how they may have changed across time periods and genders, corrected for age effects. Results show an effect of the period with a tendency to lower voices, independently of gender. A lowering of pitch is observed with age for female but not male speakers.

[3]  arXiv:2404.16113 [pdf, other]
Title: Joint operation of a fast-charging EV hub with a stand-alone independent battery storage system under fairness considerations
Subjects: Systems and Control (eess.SY)

The need for larger-scale fast-charging electric vehicle (EV) hubs is on the rise due to the growth in EV adoption. Another area of power infrastructure growth is the proliferation of independently operated stand-alone battery storage systems (BSS), which is fueled by improvements and cost reductions in battery technology. Many possible uses of the stand-alone BSS are being explored including participation in the energy and ancillary markets, load balancing for renewable generations, and supporting large-scale load-consuming entities like hospitals. In this paper, we study a novel usage of the stand-alone BSS whereby in addition to participating in the electricity reserve market, it allows an EV hub to use a part of its storage capacity, when profitable. The hub uses the BSS storage capacity for arbitrage consequently reducing its operating cost. We formulate this joint operation as a bi-objective optimization model. We then reformulate it into a second-order cone Nash bargaining problem, the solution of which guarantees fairness to both the hub and the BSS. A sample numerical case study is formulated using actual prices of electricity and simulated data for the reserve market and EV charging demand. The Nash bargaining solution shows that both participants can benefit from the joint operation.

[4]  arXiv:2404.16134 [pdf, ps, other]
Title: Power Failure Cascade Prediction using Graph Neural Networks
Comments: 2023 IEEE International Conference on Communications, Control, and Computing Technologies for Smart Grids (SmartGridComm). Oct. 31, 2023. See implementations at this https URL
Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG)

We consider the problem of predicting power failure cascades due to branch failures. We propose a flow-free model based on graph neural networks that predicts grid states at every generation of a cascade process given an initial contingency and power injection values. We train the proposed model using a cascade sequence data pool generated from simulations. We then evaluate our model at various levels of granularity. We present several error metrics that gauge the model's ability to predict the failure size, the final grid state, and the failure time steps of each branch within the cascade. We benchmark the graph neural network model against influence models. We show that, in addition to being generic over randomly scaled power injection values, the graph neural network model outperforms multiple influence models that are built specifically for their corresponding loading profiles. Finally, we show that the proposed model reduces the computational time by almost two orders of magnitude.

[5]  arXiv:2404.16165 [pdf, other]
Title: Comparative Analysis of Information Theoretic and Statistical Methods for Line Parameter Estimation
Comments: 6 pages
Subjects: Signal Processing (eess.SP); Applications (stat.AP)

Recent studies indicate that the noise characteristics of phasor measurement units (PMUs) can be more accurately described by non-Gaussian distributions. Consequently, estimation techniques based on Gaussian noise assumptions may produce poor results with PMU data. This paper considers the PMU based line parameter estimation (LPE) problem, and investigates the performance of four state-of-the-art techniques in solving this problem in presence of non-Gaussian measurement noise. The rigorous comparative analysis highlights the merits and demerits of each technique w.r.t. the LPE problem, and identifies conditions under which they are expected to give good results.

[6]  arXiv:2404.16253 [pdf, other]
Title: Mitigating Automotive Radar Interference using Onboard Intelligent Reflective Surface
Comments: 7 pages, 9 Figures
Subjects: Signal Processing (eess.SP)

The use of automotive radars is gaining popularity as a means to enhance a vehicle's sensing capabilities. However, these radars can suffer from interference caused by transmissions from other radars mounted on nearby vehicles. To address this issue, we investigate the use of an onboard intelligent reflective surface (IRS) to artificially increase a vehicle's effective radar cross section (RCS), or its "electromagnetic visibility." Our proposed method utilizes the IRS's ability to form a coherent reflection of the incident radar waveform back towards the source radar, thereby improving radar performance under interference. We evaluated both passive and active IRS options. Passive IRS, which does not support reflection amplification, was found to be counter-productive and actually decreased the vehicle's effective RCS instead of enhancing it. In contrast, active IRS, which can amplify the reflection power of individual elements, effectively combats all types of automotive radar interference when the reflective elements are configured with a 15-35 dB reflection gain.

[7]  arXiv:2404.16282 [pdf, other]
Title: Adaptive tracking control for non-periodic reference signals under quantized observations
Subjects: Systems and Control (eess.SY)

This paper considers an adaptive tracking control problem for stochastic regression systems with multi-threshold quantized observations. Different from the existing studies for periodic reference signals, the reference signal in this paper is non-periodic. Its main difficulty is how to ensure that the designed controller satisfies the uniformly bounded and excitation conditions that guarantee the convergence of the estimation in the controller under non-periodic signal conditions. This paper designs two backward-shifted polynomials with time-varying parameters and a special projection structure, which break through periodic limitations and establish the convergence and tracking properties. To be specific, the adaptive tracking control law can achieve asymptotically optimal tracking for the non-periodic reference signal; Besides, the proposed estimation algorithm is proved to converge to the true values in almost sure and mean square sense, and the convergence speed can reach $O\left(\frac{1}{k}\right)$ under suitable conditions. Finally, the effectiveness of the proposed adaptive tracking control scheme is verified through a simulation.

[8]  arXiv:2404.16312 [pdf, other]
Title: 3D Guidance Law for Maximal Coverage and Target Enclosing with Inherent Safety
Subjects: Systems and Control (eess.SY); Multiagent Systems (cs.MA); Robotics (cs.RO)

In this paper, we address the problem of enclosing an arbitrarily moving target in three dimensions by a single pursuer, which is an unmanned aerial vehicle (UAV), for maximum coverage while also ensuring the pursuer's safety by preventing collisions with the target. The proposed guidance strategy steers the pursuer to a safe region of space surrounding the target, allowing it to maintain a certain distance from the latter while offering greater flexibility in positioning and converging to any orbit within this safe zone. Our approach is distinguished by the use of nonholonomic constraints to model vehicles with accelerations serving as control inputs and coupled engagement kinematics to craft the pursuer's guidance law meticulously. Furthermore, we leverage the concept of the Lyapunov Barrier Function as a powerful tool to constrain the distance between the pursuer and the target within asymmetric bounds, thereby ensuring the pursuer's safety within the predefined region. To validate the efficacy and robustness of our algorithm, we conduct experimental tests by implementing a high-fidelity quadrotor model within Software-in-the-loop (SITL) simulations, encompassing various challenging target maneuver scenarios. The results obtained showcase the resilience of the proposed guidance law, effectively handling arbitrarily maneuvering targets, vehicle/autopilot dynamics, and external disturbances. Our method consistently delivers stable global enclosing behaviors, even in response to aggressive target maneuvers, and requires only relative information for successful execution.

[9]  arXiv:2404.16318 [pdf, other]
Title: The Continuous-Time Weighted-Median Opinion Dynamics
Comments: 13 pages, 1 figure
Subjects: Systems and Control (eess.SY)

Opinion dynamics models are important in understanding and predicting opinion formation processes within social groups. Although the weighted-averaging opinion-update mechanism is widely adopted as the micro-foundation of opinion dynamics, it bears a non-negligibly unrealistic implication: opinion attractiveness increases with opinion distance. Recently, the weighted-median mechanism has been proposed as a new microscopic mechanism of opinion exchange. Numerous advancements have been achieved regarding this new micro-foundation, from theoretical analysis to empirical validation, in a discrete-time asynchronous setup. However, the original discrete-time weighted-median model does not allow for "compromise behavior" in opinion exchanges, i.e., no intermediate opinions are created between disagreeing agents. To resolve this problem, this paper propose a novel continuous-time weighted-median opinion dynamics model, in which agents' opinions move towards the weighted-medians of their out-neighbors' opinions. It turns out that the proof methods for the original discrete-time asynchronous model are no longer applicable to the analysis of the continuous-time model. In this paper, we first establish the existence and uniqueness of the solution to the continuous-time weighted-median opinion dynamics by showing that the weighted-median mapping is contractive on any graph. We also characterize the set of all the equilibria. Then, by leveraging a new LaSalle invariance principle argument, we prove the convergence of the continuous-time weighted-median model for any initial condition and derive a necessary and sufficient condition for the convergence to consensus.

[10]  arXiv:2404.16346 [pdf, other]
Title: Light-weight Retinal Layer Segmentation with Global Reasoning
Comments: IEEE Transactions on Instrumentation & Measurement
Subjects: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

Automatic retinal layer segmentation with medical images, such as optical coherence tomography (OCT) images, serves as an important tool for diagnosing ophthalmic diseases. However, it is challenging to achieve accurate segmentation due to low contrast and blood flow noises presented in the images. In addition, the algorithm should be light-weight to be deployed for practical clinical applications. Therefore, it is desired to design a light-weight network with high performance for retinal layer segmentation. In this paper, we propose LightReSeg for retinal layer segmentation which can be applied to OCT images. Specifically, our approach follows an encoder-decoder structure, where the encoder part employs multi-scale feature extraction and a Transformer block for fully exploiting the semantic information of feature maps at all scales and making the features have better global reasoning capabilities, while the decoder part, we design a multi-scale asymmetric attention (MAA) module for preserving the semantic information at each encoder scale. The experiments show that our approach achieves a better segmentation performance compared to the current state-of-the-art method TransUnet with 105.7M parameters on both our collected dataset and two other public datasets, with only 3.3M parameters.

[11]  arXiv:2404.16391 [pdf, other]
Title: Stability-Oriented Prediction Horizons Design of Generalized Predictive Control for DC/DC Boost Converter
Subjects: Systems and Control (eess.SY)

This paper introduces a novel approach in designing prediction horizons on a generalized predictive control for a DC/DC boost converter. This method involves constructing a closed-loop system model and assessing the impact of different prediction horizons on system stability. In contrast to conventional design approaches that often rely on empirical prediction horizon selection or incorporate non-linear observers, the proposed method establishes a rigorous boundary for the prediction horizon to ensure system stability. This approach facilitates the selection of an appropriate prediction horizon while avoiding excessively short horizons that can lead to instability and preventing the adoption of unnecessarily long horizons that would burden the controller with high computational demands. Finally, the accuracy of the design method has been confirmed through experimental testing. Moreover, it has been demonstrated that the prediction horizon determined by this method reduces the computational burden by 10\%-20\% compared to the empirically selected prediction horizon.

[12]  arXiv:2404.16397 [pdf, other]
Title: Deep Learning-based Prediction of Breast Cancer Tumor and Immune Phenotypes from Histopathology
Comments: Paper accepted at the First Workshop on Imageomics (Imageomics-AAAI-24) - Discovering Biological Knowledge from Images using AI (this https URL), held as part of the 38th Annual AAAI Conference on Artificial Intelligence (this https URL)
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Quantitative Methods (q-bio.QM)

The interactions between tumor cells and the tumor microenvironment (TME) dictate therapeutic efficacy of radiation and many systemic therapies in breast cancer. However, to date, there is not a widely available method to reproducibly measure tumor and immune phenotypes for each patient's tumor. Given this unmet clinical need, we applied multiple instance learning (MIL) algorithms to assess activity of ten biologically relevant pathways from the hematoxylin and eosin (H&E) slide of primary breast tumors. We employed different feature extraction approaches and state-of-the-art model architectures. Using binary classification, our models attained area under the receiver operating characteristic (AUROC) scores above 0.70 for nearly all gene expression pathways and on some cases, exceeded 0.80. Attention maps suggest that our trained models recognize biologically relevant spatial patterns of cell sub-populations from H&E. These efforts represent a first step towards developing computational H&E biomarkers that reflect facets of the TME and hold promise for augmenting precision oncology.

[13]  arXiv:2404.16412 [pdf, ps, other]
Title: Distributed Matrix Pencil Formulations for Prescribed-Time Leader-Following Consensus of MASs with Unknown Sensor Sensitivity
Comments: 10 pages, 1 figure
Subjects: Systems and Control (eess.SY)

In this paper, we address the problem of prescribed-time leader-following consensus of heterogeneous multi-agent systems (MASs) in the presence of unknown sensor sensitivity. Under a connected undirected topology, we propose a time-varying dual observer/controller design framework that makes use of regular local and inaccurate feedback to achieve consensus tracking within a prescribed time. In particular, the developed analysis framework is applicable to MASs equipped with sensors of different sensitivities. One of the design innovations involves constructing a distributed matrix pencil formulation based on worst-case sensors, yielding control parameters with sufficient robustness yet relatively low conservatism. Another novelty is the construction of the control gains, which consists of the product of a proportional coefficient obtained from the matrix pencil formulation and a classic time-varying function that grows to infinity or a novel bounded time-varying function. Furthermore, it is possible to extend the prescribed-time distributed protocol to infinite time domain by introducing the bounded time-varying gain technique without sacrificing the ultimate control accuracy, and the corresponding technical proof is comprehensive. The effectiveness of the method is demonstrated through a group of 5 single-link robot manipulators.

[14]  arXiv:2404.16476 [pdf, ps, other]
Title: A Novel Channel Coding Scheme for Digital Multiple Access Computing
Comments: accepted version to the IEEE 2024 ICC conference
Subjects: Signal Processing (eess.SP)

In this paper, we consider the ChannelComp framework, which facilitates the computation of desired functions by multiple transmitters over a common receiver using digital modulations across a multiple access channel. While ChannelComp currently offers a broad framework for computation by designing digital constellations for over-the-air computation and employing symbol-level encoding, encoding the repeated transmissions of the same symbol and using the corresponding received sequence may significantly improve the computation performance and reduce the encoding complexity. In this paper, we propose an enhancement involving the encoding of the repetitive transmission of the same symbol at each transmitter over multiple time slots and the design of constellation diagrams, with the aim of minimizing computational errors. We frame this enhancement as an optimization problem, which jointly identifies the constellation diagram and the channel code for repetition, which we call ReChCompCode. To manage the computational complexity of the optimization, we divide it into two tractable subproblems. Through numerical experiments, we evaluate the performance of ReChCompCode. The simulation results reveal that ReChCompCode can reduce the computation error by approximately up to 30 dB compared to standard ChannelComp, particularly for product functions.

[15]  arXiv:2404.16481 [pdf, other]
Title: Secret Key Generation Rates for Line of Sight Multipath Channels in the Presence of Eavesdroppers
Subjects: Signal Processing (eess.SP)

In this paper, the feasibility of implementing a lightweight key distribution scheme using physical layer security for secret key generation (SKG) is explored. Specifically, we focus on examining SKG with the received signal strength (RSS) serving as the primary source of shared randomness. Our investigation centers on a frequency-selective line-of-sight (LoS) multipath channel, with a particular emphasis on assessing SKG rates derived from the distributions of RSS. We derive the received signal distributions based on how the multipath components resolve at the receiver. The mutual information (MI) is evaluated based on LoS 3GPP channel models using a numerical estimator. We study how the bandwidth, delay spread, and Rician K-factor impact the estimated MI. This MI then serves as a benchmark setting bounds for the SKG rates in our exploration.

[16]  arXiv:2404.16502 [pdf, other]
Title: A Prototypical Expert-Driven Approach Towards Capability-Based Monitoring of Automated Driving Systems
Comments: submitted for publication
Subjects: Systems and Control (eess.SY)

Supervising the safe operation of automated vehicles is a key requirement in order to unleash their full potential in future transportation systems. In particular, previous publications have argued that SAE Level 4 vehicles should be aware of their capabilities at runtime to make appropriate behavioral decisions. In this paper, we present a framework that enables the implementation of an online capability monitor. We derive a graphical system model that captures the relationships between the quality of system elements across different architectural views. In an expert-driven approach, we parameterize Bayesian Networks based on this structure using Fuzzy Logic. Using the online monitor, we infer the quality of the system's capabilities based on technical measurements acquired at runtime. Our approach is demonstrated in the context of the UNICAR.agil research project in an urban example scenario.

[17]  arXiv:2404.16514 [pdf, ps, other]
Title: Adaptive Learning-based Model Predictive Control for Uncertain Interconnected Systems: A Set Membership Identification Approach
Subjects: Systems and Control (eess.SY)

We propose a novel adaptive learning-based model predictive control (MPC) scheme for interconnected systems which can be decomposed into several smaller dynamically coupled subsystems with uncertain coupling. The proposed scheme is mainly divided into two main online phases; a learning phase and an adaptation phase. Set membership identification is used in the learning phase to learn an uncertainty set that contains the coupling strength using online data. In the adaptation phase, rigid tube-based robust MPC is used to compute the optimal predicted states and inputs. Besides computing the optimal trajectories, the MPC ingredients are adapted in the adaptation phase taking the learnt uncertainty set into account. These MPC ingredients include the prestabilizing controller, the rigid tube, the tightened constraints and the terminal ingredients. The recursive feasibility of the proposed scheme as well as the stability of the corresponding closed-loop system are discussed. The developed scheme is compared in simulations to existing schemes including robust, adaptive and learning-based MPC.

[18]  arXiv:2404.16522 [pdf, other]
Title: A Deep Learning-Driven Pipeline for Differentiating Hypertrophic Cardiomyopathy from Cardiac Amyloidosis Using 2D Multi-View Echocardiography
Subjects: Image and Video Processing (eess.IV); Machine Learning (cs.LG)

Hypertrophic cardiomyopathy (HCM) and cardiac amyloidosis (CA) are both heart conditions that can progress to heart failure if untreated. They exhibit similar echocardiographic characteristics, often leading to diagnostic challenges. This paper introduces a novel multi-view deep learning approach that utilizes 2D echocardiography for differentiating between HCM and CA. The method begins by classifying 2D echocardiography data into five distinct echocardiographic views: apical 4-chamber, parasternal long axis of left ventricle, parasternal short axis at levels of the mitral valve, papillary muscle, and apex. It then extracts features of each view separately and combines five features for disease classification. A total of 212 patients diagnosed with HCM, and 30 patients diagnosed with CA, along with 200 individuals with normal cardiac function(Normal), were enrolled in this study from 2018 to 2022. This approach achieved a precision, recall of 0.905, and micro-F1 score of 0.904, demonstrating its effectiveness in accurately identifying HCM and CA using a multi-view analysis.

[19]  arXiv:2404.16544 [pdf, other]
Title: Image registration based automated lesion correspondence pipeline for longitudinal CT data
Subjects: Image and Video Processing (eess.IV)

Patients diagnosed with metastatic breast cancer (mBC) typically undergo several radiographic assessments during their treatment. mBC often involves multiple metastatic lesions in different organs, it is imperative to accurately track and assess these lesions to gain a comprehensive understanding of the disease's response to treatment. Computerized analysis methods that rely on lesion-level tracking have often used manual matching of corresponding lesions, a time-consuming process that is prone to errors. This paper introduces an automated lesion correspondence algorithm designed to precisely track both targets' lesions and non-targets' lesions in longitudinal data. Here we demonstrate the applicability of our algorithm on the anonymized data from two Phase III trials. The dataset contains imaging data of patients for different follow-up timepoints and the radiologist annotations for the patients enrolled in the trials. Target and non-target lesions are annotated by either one or two groups of radiologists. To facilitate accurate tracking, we have developed a registration-assisted lesion correspondence algorithm. The algorithm employs a sequential two-step pipeline: (a) Firstly, an adaptive Hungarian algorithm is used to establish correspondence among lesions within a single volumetric image series which have been annotated by multiple radiologists at a specific timepoint. (b) Secondly, after establishing correspondence and assigning unique names to the lesions, three-dimensional rigid registration is applied to various image series at the same timepoint. Registration is followed by ongoing lesion correspondence based on the adaptive Hungarian algorithm and updating lesion names for accurate tracking. Validation of our automated lesion correspondence algorithm is performed through triaxial plots based on axial, sagittal, and coronal views, confirming its efficacy in matching lesions.

[20]  arXiv:2404.16547 [pdf, other]
Title: Developing Acoustic Models for Automatic Speech Recognition in Swedish
Authors: Giampiero Salvi
Comments: 16 pages, 7 figures
Journal-ref: European Student Journal of Language and Speech, 1999
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Sound (cs.SD)

This paper is concerned with automatic continuous speech recognition using trainable systems. The aim of this work is to build acoustic models for spoken Swedish. This is done employing hidden Markov models and using the SpeechDat database to train their parameters. Acoustic modeling has been worked out at a phonetic level, allowing general speech recognition applications, even though a simplified task (digits and natural number recognition) has been considered for model evaluation. Different kinds of phone models have been tested, including context independent models and two variations of context dependent models. Furthermore many experiments have been done with bigram language models to tune some of the system parameters. System performance over various speaker subsets with different sex, age and dialect has also been examined. Results are compared to previous similar studies showing a remarkable improvement.

[21]  arXiv:2404.16564 [pdf, other]
Title: Deep learning-based blind image super-resolution with iterative kernel reconstruction and noise estimation
Comments: 17 pages, 13 figures. The code of this paper is available in github: this https URL
Journal-ref: Computer Vision and Image Understanding, Volume 233, 2023, 103718
Subjects: Image and Video Processing (eess.IV); Machine Learning (cs.LG)

Blind single image super-resolution (SISR) is a challenging task in image processing due to the ill-posed nature of the inverse problem. Complex degradations present in real life images make it difficult to solve this problem using na\"ive deep learning approaches, where models are often trained on synthetically generated image pairs. Most of the effort so far has been focused on solving the inverse problem under some constraints, such as for a limited space of blur kernels and/or assuming noise-free input images. Yet, there is a gap in the literature to provide a well-generalized deep learning-based solution that performs well on images with unknown and highly complex degradations. In this paper, we propose IKR-Net (Iterative Kernel Reconstruction Network) for blind SISR. In the proposed approach, kernel and noise estimation and high-resolution image reconstruction are carried out iteratively using dedicated deep models. The iterative refinement provides significant improvement in both the reconstructed image and the estimated blur kernel even for noisy inputs. IKR-Net provides a generalized solution that can handle any type of blur and level of noise in the input low-resolution image. IKR-Net achieves state-of-the-art results in blind SISR, especially for noisy images with motion blur.

[22]  arXiv:2404.16592 [pdf, ps, other]
Title: Uninterrupted Maximum Flow on Signalized Traffic Networks
Comments: 31 pages, 16 figures, 3 tables
Subjects: Systems and Control (eess.SY)

This paper describes a traffic signal control procedure that allows motorists who travel at a recommended speed on suburban arterial two-way roads with a common cycle-time to make every traffic signal. A road-to-traveler-feedback-device advises motorists how fast they should travel to do this. Signalized arterial roads where vehicles that travel at the recommended speed make every traffic signal are termed Ride-the-Green-Wave (RGW) roads. Left-turn-arounds allow vehicles to turn left from one two-way RGW-road to an intersecting/orthogonal two-way RGW-road while allowing maximum flow on the intersecting RGW-roads. In addition to introducing novel traffic signal control strategies, the methods presented in this paper have implications for: road network design, public transport control, connected and automated vehicles and environmental impacts.

[23]  arXiv:2404.16607 [pdf, other]
Title: A Comprehensive Design Framework for UE-side and BS-Side RIS Deployments
Comments: Submitted in IEEE
Subjects: Signal Processing (eess.SP)

Integrating reconfigurable intelligent surfaces (RISs) in emerging communication systems is a fast-growing research field that has recently earned much attention. While implementing RISs near the base station (BS), i.e., BS-side RIS, or user equipment (UE), i.e., UE-side RIS, exhibits optimum performance, understanding the differences between these two deployments in terms of the system design perspective needs to be clarified. Critical design parameters, such as RIS size, phase shift adjustment, control link, and element type (passive/active), require greater clarity across these scenarios. Overlooking the intricacies of such critical design parameters in light of 6G demands endangers practical implementation, widening the gap between theoretical insights and practical applications. In this regard, our study investigates the impact of each RIS deployment strategy on the anticipated 6G requirements and offers tailored RIS design recommendations to fulfill these forward-looking requirements. Through this, we clarify the practical distinctions and propose a comprehensive framework for differentiating between BS-side and UE-side RIS scenarios in terms of their design parameters. Highlighting the unique needs of each and the potential challenges ahead, we aim to fuse the theoretical underpinnings of RIS with tangible implementation considerations, propelling progress in both the academic sphere and the industry.

[24]  arXiv:2404.16646 [pdf, other]
Title: Improving TAS Adaptability with a Variable Temperature Threshold
Subjects: Systems and Control (eess.SY)

Thermal-Aware Scheduling (TAS) provides methods to manage the thermal dissipation of a computing chip during task execution. These methods aim to avoid issues such as accelerated aging of the device, premature failure and degraded chip performance. In this work, we implement a new TAS algorithm, VTF-TAS, which makes use of a variable temperature threshold to control task execution and thermal dissipation. To enable adequate execution of the tasks to reach their deadlines, this threshold is managed based on the theory of fluid scheduling. Using an evaluation methodology as described in POD-TAS, we evaluate VTF-TAS using a set of 4 benchmarks from the COMBS benchmark suite to examine its ability to minimize chip temperature throughout schedule execution. Through our evaluation, we demonstrate that this new algorithm is able to adaptively manage the temperature threshold such that the peak temperature during schedule execution is lower than POD-TAS, with no requirement for an expensive search procedure to obtain an optimal threshold for scheduling.

[25]  arXiv:2404.16708 [pdf, other]
Title: Multi-view Cardiac Image Segmentation via Trans-Dimensional Priors
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

We propose a novel multi-stage trans-dimensional architecture for multi-view cardiac image segmentation. Our method exploits the relationship between long-axis (2D) and short-axis (3D) magnetic resonance (MR) images to perform a sequential 3D-to-2D-to-3D segmentation, segmenting the long-axis and short-axis images. In the first stage, 3D segmentation is performed using the short-axis image, and the prediction is transformed to the long-axis view and used as a segmentation prior in the next stage. In the second step, the heart region is localized and cropped around the segmentation prior using a Heart Localization and Cropping (HLC) module, focusing the subsequent model on the heart region of the image, where a 2D segmentation is performed. Similarly, we transform the long-axis prediction to the short-axis view, localize and crop the heart region and again perform a 3D segmentation to refine the initial short-axis segmentation. We evaluate our proposed method on the Multi-Disease, Multi-View & Multi-Center Right Ventricular Segmentation in Cardiac MRI (M&Ms-2) dataset, where our method outperforms state-of-the-art methods in segmenting cardiac regions of interest in both short-axis and long-axis images. The pre-trained models, source code, and implementation details will be publicly available.

[26]  arXiv:2404.16718 [pdf, other]
Title: Features Fusion for Dual-View Mammography Mass Detection
Comments: Accepted at ISBI 2024 (21st IEEE International Symposium on Biomedical Imaging)
Subjects: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Detection of malignant lesions on mammography images is extremely important for early breast cancer diagnosis. In clinical practice, images are acquired from two different angles, and radiologists can fully utilize information from both views, simultaneously locating the same lesion. However, for automatic detection approaches such information fusion remains a challenge. In this paper, we propose a new model called MAMM-Net, which allows the processing of both mammography views simultaneously by sharing information not only on an object level, as seen in existing works, but also on a feature level. MAMM-Net's key component is the Fusion Layer, based on deformable attention and designed to increase detection precision while keeping high recall. Our experiments show superior performance on the public DDSM dataset compared to the previous state-of-the-art model, while introducing new helpful features such as lesion annotation on pixel-level and classification of lesions malignancy.

[27]  arXiv:2404.16727 [pdf, other]
Title: Learning-Based Efficient Approximation of Data-enabled Predictive Control
Subjects: Systems and Control (eess.SY)

Data-Enabled Predictive Control (DeePC) bypasses the need for system identification by directly leveraging raw data to formulate optimal control policies. However, the size of the optimization problem in DeePC grows linearly with respect to the data size, which prohibits its application due to high computational costs. In this paper, we propose an efficient approximation of DeePC, whose size is invariant with respect to the amount of data collected, via differentiable convex programming. Specifically, the optimization problem in DeePC is decomposed into two parts: a control objective and a scoring function that evaluates the likelihood of a guessed I/O sequence, the latter of which is approximated with a size-invariant learned optimization problem. The proposed method is validated through numerical simulations on a quadruple tank system, illustrating that the learned controller can reduce the computational time of DeePC by 5x while maintaining its control performance.

[28]  arXiv:2404.16802 [pdf, other]
Title: Transformer-Based Local Feature Matching for Multimodal Image Registration
Comments: Accepted to SPIE Medical Imaging 2024
Subjects: Image and Video Processing (eess.IV)

Ultrasound imaging is a cost-effective and radiation-free modality for visualizing anatomical structures in real-time, making it ideal for guiding surgical interventions. However, its limited field-of-view, speckle noise, and imaging artifacts make it difficult to interpret the images for inexperienced users. In this paper, we propose a new 2D ultrasound to 3D CT registration method to improve surgical guidance during ultrasound-guided interventions. Our approach adopts a dense feature matching method called LoFTR to our multimodal registration problem. We learn to predict dense coarse-to-fine correspondences using a Transformer-based architecture to estimate a robust rigid transformation between a 2D ultrasound frame and a CT scan. Additionally, a fully differentiable pose estimation method is introduced, optimizing LoFTR on pose estimation error during training. Experiments conducted on a multimodal dataset of ex vivo porcine kidneys demonstrate the method's promising results for intraoperative, trackerless ultrasound pose estimation. By mapping 2D ultrasound frames into the 3D CT volume space, the method provides intraoperative guidance, potentially improving surgical workflows and image interpretation.

Cross-lists for Fri, 26 Apr 24

[29]  arXiv:2404.16049 (cross-list from physics.med-ph) [pdf, other]
Title: Exploring the limitations of blood pressure estimation using the photoplethysmography signal
Comments: 17 pages, 7 figures, 3 tables
Subjects: Medical Physics (physics.med-ph); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV); Signal Processing (eess.SP)

Hypertension, a leading contributor to cardiovascular morbidity, underscores the need for accurate and continuous blood pressure (BP) monitoring. Photoplethysmography (PPG) presents a promising approach to this end. However, the precision of BP estimates derived from PPG signals has been the subject of ongoing debate, necessitating a comprehensive evaluation of their effectiveness and constraints. We developed a calibration-based Siamese ResNet model for BP estimation, using a signal input paired with a reference BP reading. We compared the use of normalized PPG (N-PPG) against the normalized Invasive Arterial Blood Pressure (N-IABP) signals as input. The N-IABP signals do not directly present systolic and diastolic values but theoretically provide a more accurate BP measure than PPG signals since it is a direct pressure sensor inside the body. Our strategy establishes a critical benchmark for PPG performance, realistically calibrating expectations for PPG's BP estimation capabilities. Nonetheless, we compared the performance of our models using different signal-filtering conditions to evaluate the impact of filtering on the results. We evaluated our method using the AAMI and the BHS standards employing the VitalDB dataset. The N-IABP signals meet with AAMI standards for both Systolic Blood Pressure (SBP) and Diastolic Blood Pressure (DBP), with errors of 1.29+-6.33mmHg for systolic pressure and 1.17+-5.78mmHg for systolic and diastolic pressure respectively for the raw N-IABP signal. In contrast, N-PPG signals, in their best setup, exhibited inferior performance than N-IABP, presenting 1.49+-11.82mmHg and 0.89+-7.27mmHg for systolic and diastolic pressure respectively. Our findings highlight the potential and limitations of employing PPG for BP estimation, showing that these signals contain information correlated to BP but may not be sufficient for predicting it accurately.

[30]  arXiv:2404.16065 (cross-list from cs.HC) [pdf, other]
Title: mmWave Wearable Antenna for Interaction with VR Devices
Subjects: Human-Computer Interaction (cs.HC); Signal Processing (eess.SP)

The VR industry is one of the most promising industries for the near future, as it can provide a more immersive connection between people and the virtual world. Currently, VR devices interact with people using inconvenient controllers or cameras that perform poorly in dark environments. Interaction through millimeter-wave wearable devices has the potential to conveniently track human behavior regardless of the lighting conditions. In this study, a millimeter-wave wearable antenna was developed, opening up the possibility for more immersive interaction with VR devices. The antenna features a low loss tangent polyester fabric to minimize dielectric losses and a smooth coating to reduce losses due to rough surfaces. The antenna operates in the 24GHz ISM band, with an S11 value of -29dB at 24.15GHz.

[31]  arXiv:2404.16112 (cross-list from cs.LG) [pdf, other]
Title: Mamba-360: Survey of State Space Models as Transformer Alternative for Long Sequence Modelling: Methods, Applications, and Challenges
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Image and Video Processing (eess.IV)

Sequence modeling is a crucial area across various domains, including Natural Language Processing (NLP), speech recognition, time series forecasting, music generation, and bioinformatics. Recurrent Neural Networks (RNNs) and Long Short Term Memory Networks (LSTMs) have historically dominated sequence modeling tasks like Machine Translation, Named Entity Recognition (NER), etc. However, the advancement of transformers has led to a shift in this paradigm, given their superior performance. Yet, transformers suffer from $O(N^2)$ attention complexity and challenges in handling inductive bias. Several variations have been proposed to address these issues which use spectral networks or convolutions and have performed well on a range of tasks. However, they still have difficulty in dealing with long sequences. State Space Models(SSMs) have emerged as promising alternatives for sequence modeling paradigms in this context, especially with the advent of S4 and its variants, such as S4nd, Hippo, Hyena, Diagnol State Spaces (DSS), Gated State Spaces (GSS), Linear Recurrent Unit (LRU), Liquid-S4, Mamba, etc. In this survey, we categorize the foundational SSMs based on three paradigms namely, Gating architectures, Structural architectures, and Recurrent architectures. This survey also highlights diverse applications of SSMs across domains such as vision, video, audio, speech, language (especially long sequence modeling), medical (including genomics), chemical (like drug design), recommendation systems, and time series analysis, including tabular data. Moreover, we consolidate the performance of SSMs on benchmark datasets like Long Range Arena (LRA), WikiText, Glue, Pile, ImageNet, Kinetics-400, sstv2, as well as video datasets such as Breakfast, COIN, LVU, and various time series datasets. The project page for Mamba-360 work is available on this webpage.\url{https://github.com/badripatro/mamba360}.

[32]  arXiv:2404.16137 (cross-list from cs.IT) [pdf, ps, other]
Title: Learned Pulse Shaping Design for PAPR Reduction in DFT-s-OFDM
Comments: 5 pages, under review
Subjects: Information Theory (cs.IT); Machine Learning (cs.LG); Signal Processing (eess.SP)

High peak-to-average power ratio (PAPR) is one of the main factors limiting cell coverage for cellular systems, especially in the uplink direction. Discrete Fourier transform spread orthogonal frequency-domain multiplexing (DFT-s-OFDM) with spectrally-extended frequency-domain spectrum shaping (FDSS) is one of the efficient techniques deployed to lower the PAPR of the uplink waveforms. In this work, we propose a machine learning-based framework to determine the FDSS filter, optimizing a tradeoff between the symbol error rate (SER), the PAPR, and the spectral flatness requirements. Our end-to-end optimization framework considers multiple important design constraints, including the Nyquist zero-ISI (inter-symbol interference) condition. The numerical results show that learned FDSS filters lower the PAPR compared to conventional baselines, with minimal SER degradation. Tuning the parameters of the optimization also helps us understand the fundamental limitations and characteristics of the FDSS filters for PAPR reduction.

[33]  arXiv:2404.16152 (cross-list from cs.IT) [pdf, ps, other]
Title: Rethinking Grant-Free Protocol in mMTC
Comments: Submitted to IEEE for possible publication
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

This paper revisits the identity detection problem under the current grant-free protocol in massive machine-type communications (mMTC) by asking the following question: for stable identity detection performance, is it enough to permit active devices to transmit preambles without any handshaking with the base station (BS)? Specifically, in the current grant-free protocol, the BS blindly allocates a fixed length of preamble to devices for identity detection as it lacks the prior information on the number of active devices $K$. However, in practice, $K$ varies dynamically over time, resulting in degraded identity detection performance especially when $K$ is large. Consequently, the current grant-free protocol fails to ensure stable identity detection performance. To address this issue, we propose a two-stage communication protocol which consists of estimation of $K$ in Phase I and detection of identities of active devices in Phase II. The preamble length for identity detection in Phase II is dynamically allocated based on the estimated $K$ in Phase I through a table lookup manner such that the identity detection performance could always be better than a predefined threshold. In addition, we design an algorithm for estimating $K$ in Phase I, and exploit the estimated $K$ to reduce the computational complexity of the identity detector in Phase II. Numerical results demonstrate the effectiveness of the proposed two-stage communication protocol and algorithms.

[34]  arXiv:2404.16193 (cross-list from cs.CV) [pdf, other]
Title: Improving Multi-label Recognition using Class Co-Occurrence Probabilities
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM); Image and Video Processing (eess.IV)

Multi-label Recognition (MLR) involves the identification of multiple objects within an image. To address the additional complexity of this problem, recent works have leveraged information from vision-language models (VLMs) trained on large text-images datasets for the task. These methods learn an independent classifier for each object (class), overlooking correlations in their occurrences. Such co-occurrences can be captured from the training data as conditional probabilities between a pair of classes. We propose a framework to extend the independent classifiers by incorporating the co-occurrence information for object pairs to improve the performance of independent classifiers. We use a Graph Convolutional Network (GCN) to enforce the conditional probabilities between classes, by refining the initial estimates derived from image and text sources obtained using VLMs. We validate our method on four MLR datasets, where our approach outperforms all state-of-the-art methods.

[35]  arXiv:2404.16216 (cross-list from cs.CV) [pdf, other]
Title: ActiveRIR: Active Audio-Visual Exploration for Acoustic Environment Modeling
Comments: Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO); Sound (cs.SD); Audio and Speech Processing (eess.AS)

An environment acoustic model represents how sound is transformed by the physical characteristics of an indoor environment, for any given source/receiver location. Traditional methods for constructing acoustic models involve expensive and time-consuming collection of large quantities of acoustic data at dense spatial locations in the space, or rely on privileged knowledge of scene geometry to intelligently select acoustic data sampling locations. We propose active acoustic sampling, a new task for efficiently building an environment acoustic model of an unmapped environment in which a mobile agent equipped with visual and acoustic sensors jointly constructs the environment acoustic model and the occupancy map on-the-fly. We introduce ActiveRIR, a reinforcement learning (RL) policy that leverages information from audio-visual sensor streams to guide agent navigation and determine optimal acoustic data sampling positions, yielding a high quality acoustic model of the environment from a minimal set of acoustic samples. We train our policy with a novel RL reward based on information gain in the environment acoustic model. Evaluating on diverse unseen indoor environments from a state-of-the-art acoustic simulation platform, ActiveRIR outperforms an array of methods--both traditional navigation agents based on spatial novelty and visual exploration as well as existing state-of-the-art methods.

[36]  arXiv:2404.16223 (cross-list from cs.CV) [pdf, other]
Title: Deep RAW Image Super-Resolution. A NTIRE 2024 Challenge Survey
Comments: CVPR 2024 - NTIRE Workshop
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)

This paper reviews the NTIRE 2024 RAW Image Super-Resolution Challenge, highlighting the proposed solutions and results. New methods for RAW Super-Resolution could be essential in modern Image Signal Processing (ISP) pipelines, however, this problem is not as explored as in the RGB domain. Th goal of this challenge is to upscale RAW Bayer images by 2x, considering unknown degradations such as noise and blur. In the challenge, a total of 230 participants registered, and 45 submitted results during thee challenge period. The performance of the top-5 submissions is reviewed and provided here as a gauge for the current state-of-the-art in RAW Image Super-Resolution.

[37]  arXiv:2404.16240 (cross-list from cs.SI) [pdf, other]
Title: A communication protocol based on NK boolean networks for coordinating collective action
Authors: Yori Ong
Comments: 13 pages, 4 figures
Subjects: Social and Information Networks (cs.SI); Human-Computer Interaction (cs.HC); Multiagent Systems (cs.MA); Systems and Control (eess.SY)

In this paper, I describe a digital social communication protocol (Gridt) based on Kauffman's NK boolean networks. The main assertion is that a communication network with this topology supports infinitely scalable self-organization of collective action without requiring hierarchy or central control. The paper presents the functionality of this protocol and substantiates the following propositions about its function and implications: (1) Communication via NK boolean networks facilitates coordination on collective action games for any variable number of users, and justifies the assumption that the game's payoff structure is common knowledge; (2) Use of this protocol increases its users' transfer empowerment, a form of intrinsic motivation that motivates coordinated action independent of the task or outcome; (3) Communication via this network can be considered 'cheap talk' and benefits the strategy of players with aligned interests, but not of players with conflicting interests; (4) Absence of significant barriers for its realization warrants a timely and continuing discussion on the ethics and implications of this technology; (5) Full realization of the technology's potential calls for a free-to-use service with maximal transparency of design and associated economic incentives.

[38]  arXiv:2404.16259 (cross-list from cs.SD) [pdf, other]
Title: An Experiment with Electric Guitar Signals for Exploring the Virtuosity based on the Entropy of Music
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)

We analyze the concept of virtuosity as a collective attribute in music and its relationship with the entropy based on an experiment that compares two sets of digital signals played by composer-performer electric guitarists. Based on an interdisciplinary approach related to the complex systems, we computed the spectrum of signals, identified statistical distributions that best describe them, and measured the Shannon entropy to establish their diversity. Findings suggested that virtuosity might be related to a range of entropy values that identify levels of diversity of the frequency components of audio signals. Despite the presence of different values of entropy in the two sets of signals, they are statistically similar. Therefore, entropy values can be interpreted as levels of virtuosity in music.

[39]  arXiv:2404.16269 (cross-list from math.OC) [pdf, other]
Title: Expected Time-Optimal Control: a Particle MPC-based Approach via Sequential Convex Programming
Comments: submitted to CDC 2024
Subjects: Optimization and Control (math.OC); Systems and Control (eess.SY)

In this paper, we consider the problem of minimum-time optimal control for a dynamical system with initial state uncertainties and propose a sequential convex programming (SCP) solution framework. We seek to minimize the expected terminal (mission) time, which is an essential capability for planetary exploration missions where ground rovers have to carry out scientific tasks efficiently within the mission timelines in uncertain environments. Our main contribution is to convert the underlying stochastic optimal control problem into a deterministic, numerically tractable, optimal control problem. To this end, the proposed solution framework combines two strategies from previous methods: i) a partial model predictive control with consensus horizon approach and ii) a sum-of-norm cost, a temporally strictly increasing weighted-norm, promoting minimum-time trajectories. Our contribution is to adopt these formulations into an SCP solution framework and obtain a numerically tractable stochastic control algorithm. We then demonstrate the resulting control method in multiple applications: i) a closed-loop linear system as a representative result (a spacecraft double integrator model), ii) an open-loop linear system (the same model), and then iii) a nonlinear system (Dubin's car).

[40]  arXiv:2404.16275 (cross-list from cs.NI) [pdf, ps, other]
Title: Spectrum Sharing Policy in the Asia-Pacific Region
Comments: 33 pages, 17figures
Subjects: Networking and Internet Architecture (cs.NI); Information Theory (cs.IT); Signal Processing (eess.SP)

In this chapter, we investigate the spectrum measurement results in Asia-Pacific region. Then the spectrum sharing policy in the Asia-Pacific region is reviewed in details, where the national projects and strategies on spectrum refarming and spectrum sharing in China, Japan, Singapore, India, Korea and Australia are investigated. Then we introduce the spectrum sharing test-bed that is developed in China, which is a cognitive radio enabled TD-LTE test-bed utilizing TVWS. This chapter provides a brief introduction of the spectrum sharing mechanism and policy of Asia-Pacific region.

[41]  arXiv:2404.16289 (cross-list from cs.IT) [pdf, other]
Title: Deep Joint CSI Feedback and Multiuser Precoding for MIMO OFDM Systems
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

The design of precoding plays a crucial role in achieving a high downlink sum-rate in multiuser multiple-input multiple-output (MIMO) orthogonal frequency-division multiplexing (OFDM) systems. In this correspondence, we propose a deep learning based joint CSI feedback and multiuser precoding method in frequency division duplex systems, aiming at maximizing the downlink sum-rate performance in an end-to-end manner. Specifically, the eigenvectors of the CSI matrix are compressed using deep joint source-channel coding techniques. This compression method enhances the resilience of the feedback CSI information against degradation in the feedback channel. A joint multiuser precoding module and a power allocation module are designed to adjust the precoding direction and the precoding power for users based on the feedback CSI information. Experimental results demonstrate that the downlink sum-rate can be significantly improved by using the proposed method, especially in scenarios with low signal-to-noise ratio and low feedback overhead.

[42]  arXiv:2404.16302 (cross-list from cs.CV) [pdf, other]
Title: CFMW: Cross-modality Fusion Mamba for Multispectral Object Detection under Adverse Weather Conditions
Comments: The dataset and source code will be made publicly available at this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Robotics (cs.RO); Image and Video Processing (eess.IV)

Cross-modality images that integrate visible-infrared spectra cues can provide richer complementary information for object detection. Despite this, existing visible-infrared object detection methods severely degrade in severe weather conditions. This failure stems from the pronounced sensitivity of visible images to environmental perturbations, such as rain, haze, and snow, which frequently cause false negatives and false positives in detection. To address this issue, we introduce a novel and challenging task, termed visible-infrared object detection under adverse weather conditions. To foster this task, we have constructed a new Severe Weather Visible-Infrared Dataset (SWVID) with diverse severe weather scenes. Furthermore, we introduce the Cross-modality Fusion Mamba with Weather-removal (CFMW) to augment detection accuracy in adverse weather conditions. Thanks to the proposed Weather Removal Diffusion Model (WRDM) and Cross-modality Fusion Mamba (CFM) modules, CFMW is able to mine more essential information of pedestrian features in cross-modality fusion, thus could transfer to other rarer scenarios with high efficiency and has adequate availability on those platforms with low computing power. To the best of our knowledge, this is the first study that targeted improvement and integrated both Diffusion and Mamba modules in cross-modality object detection, successfully expanding the practical application of this type of model with its higher accuracy and more advanced architecture. Extensive experiments on both well-recognized and self-created datasets conclusively demonstrate that our CFMW achieves state-of-the-art detection performance, surpassing existing benchmarks. The dataset and source code will be made publicly available at https://github.com/lhy-zjut/CFMW.

[43]  arXiv:2404.16305 (cross-list from cs.MM) [pdf, other]
Title: Semantically consistent Video-to-Audio Generation using Multimodal Language Large Model
Subjects: Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Existing works have made strides in video generation, but the lack of sound effects (SFX) and background music (BGM) hinders a complete and immersive viewer experience. We introduce a novel semantically consistent v ideo-to-audio generation framework, namely SVA, which automatically generates audio semantically consistent with the given video content. The framework harnesses the power of multimodal large language model (MLLM) to understand video semantics from a key frame and generate creative audio schemes, which are then utilized as prompts for text-to-audio models, resulting in video-to-audio generation with natural language as an interface. We show the satisfactory performance of SVA through case study and discuss the limitations along with the future research direction. The project page is available at https://huiz-a.github.io/audio4video.github.io/.

[44]  arXiv:2404.16324 (cross-list from math.NA) [pdf, other]
Title: Improved impedance inversion by deep learning and iterated graph Laplacian
Subjects: Numerical Analysis (math.NA); Machine Learning (cs.LG); Signal Processing (eess.SP)

Deep learning techniques have shown significant potential in many applications through recent years. The achieved results often outperform traditional techniques. However, the quality of a neural network highly depends on the used training data. Noisy, insufficient, or biased training data leads to suboptimal results.
We present a hybrid method that combines deep learning with iterated graph Laplacian and show its application in acoustic impedance inversion which is a routine procedure in seismic explorations. A neural network is used to obtain a first approximation of the underlying acoustic impedance and construct a graph Laplacian matrix from this approximation. Afterwards, we use a Tikhonov-like variational method to solve the impedance inversion problem where the regularizer is based on the constructed graph Laplacian. The obtained solution can be shown to be more accurate and stable with respect to noise than the initial guess obtained by the neural network. This process can be iterated several times, each time constructing a new graph Laplacian matrix from the most recent reconstruction. The method converges after only a few iterations returning a much more accurate reconstruction.
We demonstrate the potential of our method on two different datasets and under various levels of noise. We use two different neural networks that have been introduced in previous works. The experiments show that our approach improves the reconstruction quality in the presence of noise.

[45]  arXiv:2404.16327 (cross-list from cs.IT) [pdf, other]
Title: Generalized Step-Chirp Sequences With Flexible Bandwidth
Authors: Cheng Du, Yi Jiang
Comments: Accepted by 2024 IEEE International Symposium on Information Theory
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

Sequences with low aperiodic autocorrelation sidelobes have been extensively researched in literatures. With sufficiently low integrated sidelobe level (ISL), their power spectrums are asymptotically flat over the whole frequency domain. However, for the beam sweeping in the massive multi-input multi-output (MIMO) broadcast channels, the flat spectrum should be constrained in a passband with tunable bandwidth to achieve the flexible tradeoffs between the beamforming gain and the beam sweeping time. Motivated by this application, we construct a family of sequences termed the generalized step-chirp (GSC) sequence with a closed-form expression, where some parameters can be tuned to adjust the bandwidth flexibly. In addition to the application in beam sweeping, some GSC sequences are closely connected with Mow's unified construction of sequences with perfect periodic autocorrelations, and may have a coarser phase resolution than the Mow sequence while their ISLs are comparable.

[46]  arXiv:2404.16357 (cross-list from q-bio.NC) [pdf, other]
Title: Reverse engineering the brain input: Network control theory to identify cognitive task-related control nodes
Subjects: Neurons and Cognition (q-bio.NC); Systems and Control (eess.SY)

The human brain receives complex inputs when performing cognitive tasks, which range from external inputs via the senses to internal inputs from other brain regions. However, the explicit inputs to the brain during a cognitive task remain unclear. Here, we present an input identification framework for reverse engineering the control nodes and the corresponding inputs to the brain. The framework is verified with synthetic data generated by a predefined linear system, indicating it can robustly reconstruct data and recover the inputs. Then we apply the framework to the real motor-task fMRI data from 200 human subjects. Our results show that the model with sparse inputs can reconstruct neural dynamics in motor tasks ($EV=0.779$) and the identified 28 control nodes largely overlap with the motor system. Underpinned by network control theory, our framework offers a general tool for understanding brain inputs.

[47]  arXiv:2404.16376 (cross-list from cs.IT) [pdf, ps, other]
Title: A Hypergraph Approach to Distributed Broadcast
Subjects: Information Theory (cs.IT); Multiagent Systems (cs.MA); Systems and Control (eess.SY)

This paper explores the distributed broadcast problem within the context of network communications, a critical challenge in decentralized information dissemination. We put forth a novel hypergraph-based approach to address this issue, focusing on minimizing the number of broadcasts to ensure comprehensive data sharing among all network users. A key contribution of our work is the establishment of a general lower bound for the problem using the min-cut capacity of hypergraphs. Additionally, we present the distributed broadcast for quasi-trees (DBQT) algorithm tailored for the unique structure of quasi-trees, which is proven to be optimal. This paper advances both network communication strategies and hypergraph theory, with implications for a wide range of real-world applications, from vehicular and sensor networks to distributed storage systems.

[48]  arXiv:2404.16394 (cross-list from cs.IT) [pdf, ps, other]
Title: STAR-RIS-Assisted Communication Radar Coexistence: Analysis and Optimization
Comments: accepted in IEEE TVT, 14 pages, 7 figures
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

Integrated sensing and communication (ISAC) is expected to play a prominent role among emerging technologies in future wireless communications. In particular, a communication radar coexistence system is degraded significantly by mutual interference. In this work, given the advantages of promising reconfigurable intelligent surface (RIS), we propose a simultaneously transmitting and reflecting RIS (STAR-RIS)-assisted radar coexistence system where a STAR-RIS is introduced to improve the communication performance while suppressing the mutual interference and providing full space coverage. Based on the realistic conditions of correlated fading, and the presence of multiple user equipments (UEs) at both sides of the RIS, we derive the achievable rates at the radar and the communication receiver side in closed forms in terms of statistical channel state information (CSI). Next, we perform alternating optimization (AO) for optimizing the STAR-RIS and the radar beamforming. Regarding the former, we optimize the amplitudes and phase shifts of the STAR-RIS through a projected gradient ascent algorithm (PGAM) simultaneously with respect to the amplitudes and phase shifts of the surface for both energy splitting (ES) and mode switching (MS) operation protocols. The proposed optimization saves enough overhead since it can be performed every several coherence intervals. This property is particularly beneficial compared to reflecting-only RIS because a STAR-RIS includes the double number of variables, which require increased overhead. Finally, simulation results illustrate how the proposed architecture outperforms the conventional RIS counterpart, and show how the various parameters affect the performance. Moreover, a benchmark full instantaneous CSI (I-CSI) based design is provided and shown to result in higher sum-rate but also in large overhead associated with complexity.

[49]  arXiv:2404.16407 (cross-list from cs.CL) [pdf, other]
Title: U2++ MoE: Scaling 4.7x parameters with minimal impact on RTF
Subjects: Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)

Scale has opened new frontiers in natural language processing, but at a high cost. In response, by learning to only activate a subset of parameters in training and inference, Mixture-of-Experts (MoE) have been proposed as an energy efficient path to even larger and more capable language models and this shift towards a new generation of foundation models is gaining momentum, particularly within the field of Automatic Speech Recognition (ASR). Recent works that incorporating MoE into ASR models have complex designs such as routing frames via supplementary embedding network, improving multilingual ability for the experts, and utilizing dedicated auxiliary losses for either expert load balancing or specific language handling. We found that delicate designs are not necessary, while an embarrassingly simple substitution of MoE layers for all Feed-Forward Network (FFN) layers is competent for the ASR task. To be more specific, we benchmark our proposed model on a large scale inner-source dataset (160k hours), the results show that we can scale our baseline Conformer (Dense-225M) to its MoE counterparts (MoE-1B) and achieve Dense-1B level Word Error Rate (WER) while maintaining a Dense-225M level Real Time Factor (RTF). Furthermore, by applying Unified 2-pass framework with bidirectional attention decoders (U2++), we achieve the streaming and non-streaming decoding modes in a single MoE based model, which we call U2++ MoE. We hope that our study can facilitate the research on scaling speech foundation models without sacrificing deployment efficiency.

[50]  arXiv:2404.16408 (cross-list from cs.IT) [pdf, other]
Title: Event-Triggered Resilient Filtering for 2-D Systems with Asynchronous-Delay: Handling Binary Encoding Decoding with Probabilistic Bit Flips
Authors: Yu Chen, Wei Wang
Subjects: Information Theory (cs.IT); Systems and Control (eess.SY)

In this paper, the event-triggered resilient filtering problem is investigated for a class of two-dimensional systems with asynchronous-delay under binary encoding-decoding schemes with probabilistic bit flips. To reduce unnecessary communications and computations in complex network systems, alleviate network energy consumption, and optimize the use of network resources, a new event-triggered mechanism is proposed, which focuses on broadcasting necessary measurement information to update innovation only when the event generator function is satisfied. A binary encoding-decoding scheme is used in the communication process to quantify the measurement information into a bit stream, transmit it through a memoryless binary symmetric channel with a certain probability of bit flipping, and restore it at the receiver. In order to utilize the delayed decoded measurement information that a measurement reconstruction approach is proposed. Through generating space equivalence verification, it is found that the reconstructed delay-free decoded measurement sequence contains the same information as the original delayed decoded measurement sequence. In addition, resilient filter is utilized to accommodate possible estimation gain perturbations. Then, a recursive estimator framework is presented based on the reconstructed decoded measurement sequence. By means of the mathematical induction technique, the unbiased property of the proposed estimator is proved. The estimation gain is obtained by minimizing an upper bound on the filtering error covariance. Subsequently, through rigorous mathematical analysis, the monotonicity of filtering performance with respect to triggering parameters is discussed.

[51]  arXiv:2404.16409 (cross-list from cs.CV) [pdf, other]
Title: Cross-sensor super-resolution of irregularly sampled Sentinel-2 time series
Authors: Aimi Okabayashi (IRISA, OBELIX), Nicolas Audebert (CEDRIC - VERTIGO, CNAM, LaSTIG, IGN), Simon Donike (IPL), Charlotte Pelletier (OBELIX, IRISA)
Journal-ref: EARTHVISION 2024 IEEE/CVF CVPR Workshop. Large Scale Computer Vision for Remote Sensing Imagery, Jun 2024, Seattle, United States
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)

Satellite imaging generally presents a trade-off between the frequency of acquisitions and the spatial resolution of the images. Super-resolution is often advanced as a way to get the best of both worlds. In this work, we investigate multi-image super-resolution of satellite image time series, i.e. how multiple images of the same area acquired at different dates can help reconstruct a higher resolution observation. In particular, we extend state-of-the-art deep single and multi-image super-resolution algorithms, such as SRDiff and HighRes-net, to deal with irregularly sampled Sentinel-2 time series. We introduce BreizhSR, a new dataset for 4x super-resolution of Sentinel-2 time series using very high-resolution SPOT-6 imagery of Brittany, a French region. We show that using multiple images significantly improves super-resolution performance, and that a well-designed temporal positional encoding allows us to perform super-resolution for different times of the series. In addition, we observe a trade-off between spectral fidelity and perceptual quality of the reconstructed HR images, questioning future directions for super-resolution of Earth Observation data.

[52]  arXiv:2404.16424 (cross-list from physics.ins-det) [pdf, ps, other]
Title: Reciprocity in laser ultrasound revisited: Is wavefield characterisation by scanning laser excitation strictly reciprocal to that by scanning laser detection?
Comments: 20 pages, 15 figures
Subjects: Instrumentation and Detectors (physics.ins-det); Systems and Control (eess.SY); Optics (physics.optics)

The common believe about strict measurement reciprocity between scanning laser detection and scanning laser excitation is disproved by a simple experiment. Nevertheless, a deeper study based on the reciprocity relation reveals correct reciprocal measurement set-ups for both the probe-excitation / laser-detection and the laser-excitation / probe-detection case. Similarly, the all-laser measurement, that is thermoelastic laser excitation with laser vibrometer detection, is not in general reciprocal with respect to the exchange of excitation and detection positions. Again, a substitute for the laser doppler vibrometer out-of-plane displacement measurement was found which ensures measurement reciprocity together with laser excitation. The apparent confusion in literature about strict validity/non-validity of measurement reciprocity is mitigated by classifying the measurement situations systematically.

[53]  arXiv:2404.16436 (cross-list from cs.SD) [pdf, ps, other]
Title: Leveraging tropical reef, bird and unrelated sounds for superior transfer learning in marine bioacoustics
Comments: 18 pages, 5 figures
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)

Machine learning has the potential to revolutionize passive acoustic monitoring (PAM) for ecological assessments. However, high annotation and compute costs limit the field's efficacy. Generalizable pretrained networks can overcome these costs, but high-quality pretraining requires vast annotated libraries, limiting its current applicability primarily to bird taxa. Here, we identify the optimum pretraining strategy for a data-deficient domain using coral reef bioacoustics. We assemble ReefSet, a large annotated library of reef sounds, though modest compared to bird libraries at 2% of the sample count. Through testing few-shot transfer learning performance, we observe that pretraining on bird audio provides notably superior generalizability compared to pretraining on ReefSet or unrelated audio alone. However, our key findings show that cross-domain mixing which leverages bird, reef and unrelated audio during pretraining maximizes reef generalizability. SurfPerch, our pretrained network, provides a strong foundation for automated analysis of marine PAM data with minimal annotation and compute costs.

[54]  arXiv:2404.16468 (cross-list from cs.LG) [pdf, other]
Title: A Dual Perspective of Reinforcement Learning for Imposing Policy Constraints
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Systems and Control (eess.SY)

Model-free reinforcement learning methods lack an inherent mechanism to impose behavioural constraints on the trained policies. While certain extensions exist, they remain limited to specific types of constraints, such as value constraints with additional reward signals or visitation density constraints. In this work we try to unify these existing techniques and bridge the gap with classical optimization and control theory, using a generic primal-dual framework for value-based and actor-critic reinforcement learning methods. The obtained dual formulations turn out to be especially useful for imposing additional constraints on the learned policy, as an intrinsic relationship between such dual constraints (or regularization terms) and reward modifications in the primal is reveiled. Furthermore, using this framework, we are able to introduce some novel types of constraints, allowing to impose bounds on the policy's action density or on costs associated with transitions between consecutive states and actions. From the adjusted primal-dual optimization problems, a practical algorithm is derived that supports various combinations of policy constraints that are automatically handled throughout training using trainable reward modifications. The resulting $\texttt{DualCRL}$ method is examined in more detail and evaluated under different (combinations of) constraints on two interpretable environments. The results highlight the efficacy of the method, which ultimately provides the designer of such systems with a versatile toolbox of possible policy constraints.

[55]  arXiv:2404.16484 (cross-list from cs.CV) [pdf, other]
Title: Real-Time 4K Super-Resolution of Compressed AVIF Images. AIS 2024 Challenge Survey
Comments: CVPR 2024, AI for Streaming (AIS) Workshop
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)

This paper introduces a novel benchmark as part of the AIS 2024 Real-Time Image Super-Resolution (RTSR) Challenge, which aims to upscale compressed images from 540p to 4K resolution (4x factor) in real-time on commercial GPUs. For this, we use a diverse test set containing a variety of 4K images ranging from digital art to gaming and photography. The images are compressed using the modern AVIF codec, instead of JPEG. All the proposed methods improve PSNR fidelity over Lanczos interpolation, and process images under 10ms. Out of the 160 participants, 25 teams submitted their code and models. The solutions present novel designs tailored for memory-efficiency and runtime on edge devices. This survey describes the best solutions for real-time SR of compressed high-resolution images.

[56]  arXiv:2404.16500 (cross-list from cs.RO) [pdf, other]
Title: Conformal Prediction of Motion Control Performance for an Automated Vehicle in Presence of Actuator Degradations and Failures
Comments: submitted for publication
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)

Automated driving systems require monitoring mechanisms to ensure safe operation, especially if system components degrade or fail. Their runtime self-representation plays a key role as it provides a-priori knowledge about the system's capabilities and limitations. In this paper, we propose a data-driven approach for deriving such a self-representation model for the motion controller of an automated vehicle. A conformalized prediction model is learned and allows estimating how operational conditions as well as potential degradations and failures of the vehicle's actuators impact motion control performance. During runtime behavior generation, our predictor can provide a heuristic for determining the admissible action space.

[57]  arXiv:2404.16504 (cross-list from cs.CR) [pdf, ps, other]
Title: Hardware Implementation of Double Pendulum Pseudo Random Number Generator
Comments: 15 pages, 12 figure
Subjects: Cryptography and Security (cs.CR); Signal Processing (eess.SP)

The objective of this project is to utilize an FPGA board which is the CMOD A7 35t to obtain a pseudo random number which can be used for encryption. We aim to achieve this by leveraging the inherent randomness present in environmental data captured by sensors. This data will be used as a seed to initialize an algorithm implemented on the CMOD A7 35t FPGA board. The project will focus on interfacing the sensors with the FPGA and developing suitable algorithms to ensure the generated numbers exhibit strong randomness properties.

[58]  arXiv:2404.16527 (cross-list from cs.NI) [pdf, ps, other]
Title: Energy Efficient Service Placement for IoT Networks
Comments: 5 pages, 5 Figures, Conference
Subjects: Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)

In recent years, there has been a significant expansion in the Internet of Things (IoT), with a growing number of devices being connected to the internet. This has led to an increase in data collection and analysis as well as the development of new technologies and applications. The rise of IoT has also brought about new challenges, such as security concerns and energy efficiency. This study investigates a layered IoT architecture that combines fog and cloud computing, aiming to assess the impact of service placement on energy efficiency. Through simulations, we analyse energy use across Access Fog, Metro Fog, and Cloud Data Centre layers for different IoT request volumes. Findings indicate that Access Fog is optimal for single requests, while Metro Fog efficiently manages higher demands from multiple devices. The study emphasizes the need for adaptive service deployment, responsive to network load variations, to improve energy efficiency. Hence, we propose the implementation of dynamic service placement strategies within Internet of Things (IoT) environments.

[59]  arXiv:2404.16611 (cross-list from cs.IT) [pdf, ps, other]
Title: Towards Symbiotic SAGIN Through Inter-operator Resource and Service Sharing: Joint Orchestration of User Association and Radio Resources
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

The space-air-ground integrated network (SAGIN) is a pivotal architecture to support ubiquitous connectivity in the upcoming 6G era. Inter-operator resource and service sharing is a promising way to realize such a huge network, utilizing resources efficiently and reducing construction costs. Given the rationality of operators, the configuration of resources and services in SAGIN should focus on both the overall system performance and individual benefits of operators. Motivated by emerging symbiotic communication facilitating mutual benefits across different radio systems, we investigate the resource and service sharing in SAGIN from a symbiotic communication perspective in this paper. In particular, we consider a SAGIN consisting of a ground network operator (GNO) and a satellite network operator (SNO). Specifically, we aim to maximize the weighted sum rate (WSR) of the whole SAGIN by jointly optimizing the user association, resource allocation, and beamforming. Besides, we introduce a sharing coefficient to characterize the revenue of operators. Operators may suffer revenue loss when only focusing on maximizing the WSR. In pursuit of mutual benefits, we propose a mutual benefit constraint (MBC) to ensure that each operator obtains revenue gains. Then, we develop a centralized algorithm based on the successive convex approximation (SCA) method. Considering that the centralized algorithm is difficult to implement, we propose a distributed algorithm based on Lagrangian dual decomposition and the consensus alternating direction method of multipliers (ADMM). Finally, we provide extensive numerical simulations to demonstrate the effectiveness of the two proposed algorithms, and the distributed optimization algorithm can approach the performance of the centralized one.

[60]  arXiv:2404.16619 (cross-list from cs.SD) [pdf, other]
Title: The THU-HCSI Multi-Speaker Multi-Lingual Few-Shot Voice Cloning System for LIMMITS'24 Challenge
Comments: Accepted in Grand Challenge of ICASSP 2024
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)

This paper presents the multi-speaker multi-lingual few-shot voice cloning system developed by THU-HCSI team for LIMMITS'24 Challenge. To achieve high speaker similarity and naturalness in both mono-lingual and cross-lingual scenarios, we build the system upon YourTTS and add several enhancements. For further improving speaker similarity and speech quality, we introduce speaker-aware text encoder and flow-based decoder with Transformer blocks. In addition, we denoise the few-shot data, mix up them with pre-training data, and adopt a speaker-balanced sampling strategy to guarantee effective fine-tuning for target speakers. The official evaluations in track 1 show that our system achieves the best speaker similarity MOS of 4.25 and obtains considerable naturalness MOS of 3.97.

[61]  arXiv:2404.16712 (cross-list from math.OC) [pdf, other]
Title: Distributed MPC for PWA Systems Based on Switching ADMM
Comments: 14 pages, 8 figures, submitted to IEEE Transactions on Automatic Control, code available at this https URL and this https URL
Subjects: Optimization and Control (math.OC); Systems and Control (eess.SY)

This paper presents a novel approach for distributed model predictive control (MPC) for piecewise affine (PWA) systems. Existing approaches rely on solving mixed-integer optimization problems, requiring significant computation power or time. We propose a distributed MPC scheme that requires solving only convex optimization problems. The key contribution is a novel method, based on the alternating direction method of multipliers, for solving the non-convex optimal control problem that arises due to the PWA dynamics. We present a distributed MPC scheme, leveraging this method, that explicitly accounts for the coupling between subsystems by reaching agreement on the values of coupled states. Stability and recursive feasibility are shown under additional assumptions on the underlying system. Two numerical examples are provided, in which the proposed controller is shown to significantly improve the CPU time and closed-loop performance over existing state-of-the-art approaches.

[62]  arXiv:2404.16743 (cross-list from cs.CL) [pdf, other]
Title: Automatic Speech Recognition System-Independent Word Error Rate Estimatio
Comments: Accepted to LREC-COLING 2024 (long)
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Word error rate (WER) is a metric used to evaluate the quality of transcriptions produced by Automatic Speech Recognition (ASR) systems. In many applications, it is of interest to estimate WER given a pair of a speech utterance and a transcript. Previous work on WER estimation focused on building models that are trained with a specific ASR system in mind (referred to as ASR system-dependent). These are also domain-dependent and inflexible in real-world applications. In this paper, a hypothesis generation method for ASR System-Independent WER estimation (SIWE) is proposed. In contrast to prior work, the WER estimators are trained using data that simulates ASR system output. Hypotheses are generated using phonetically similar or linguistically more likely alternative words. In WER estimation experiments, the proposed method reaches a similar performance to ASR system-dependent WER estimators on in-domain data and achieves state-of-the-art performance on out-of-domain data. On the out-of-domain data, the SIWE model outperformed the baseline estimators in root mean square error and Pearson correlation coefficient by relative 17.58% and 18.21%, respectively, on Switchboard and CALLHOME. The performance was further improved when the WER of the training set was close to the WER of the evaluation dataset.

[63]  arXiv:2404.16825 (cross-list from cs.CV) [pdf, other]
Title: ResVR: Joint Rescaling and Viewport Rendering of Omnidirectional Images
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)

With the advent of virtual reality technology, omnidirectional image (ODI) rescaling techniques are increasingly embraced for reducing transmitted and stored file sizes while preserving high image quality. Despite this progress, current ODI rescaling methods predominantly focus on enhancing the quality of images in equirectangular projection (ERP) format, which overlooks the fact that the content viewed on head mounted displays (HMDs) is actually a rendered viewport instead of an ERP image. In this work, we emphasize that focusing solely on ERP quality results in inferior viewport visual experiences for users. Thus, we propose ResVR, which is the first comprehensive framework for the joint Rescaling and Viewport Rendering of ODIs. ResVR allows obtaining LR ERP images for transmission while rendering high-quality viewports for users to watch on HMDs. In our ResVR, a novel discrete pixel sampling strategy is developed to tackle the complex mapping between the viewport and ERP, enabling end-to-end training of ResVR pipeline. Furthermore, a spherical pixel shape representation technique is innovatively derived from spherical differentiation to significantly improve the visual quality of rendered viewports. Extensive experiments demonstrate that our ResVR outperforms existing methods in viewport rendering tasks across different fields of view, resolutions, and view directions while keeping a low transmission overhead.

Replacements for Fri, 26 Apr 24

[64]  arXiv:2201.10017 (replaced) [pdf, ps, other]
Title: Online Convex Optimization Using Coordinate Descent Algorithms
Comments: Accepted for publication in Automatica
Journal-ref: Automatica, vol. 165, Article 111681, 2024
Subjects: Optimization and Control (math.OC); Systems and Control (eess.SY)
[65]  arXiv:2205.06891 (replaced) [pdf, ps, other]
Title: Unsupervised Representation Learning for 3D MRI Super Resolution with Degradation Adaptation
Comments: Accepted by IEEE Transactions on Artificial Intelligence
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Medical Physics (physics.med-ph)
[66]  arXiv:2306.11214 (replaced) [pdf, ps, other]
Title: A Sample-Deficient Analysis of the Leading Generalized Eigenvalue for the Detection of Signals in Colored Gaussian Noise
Comments: 38 pages, 6 figures
Subjects: Signal Processing (eess.SP)
[67]  arXiv:2307.03812 (replaced) [pdf, ps, other]
Title: Coordinate-based neural representations for computational adaptive optics in widefield microscopy
Comments: 36 pages, 5 figures
Subjects: Image and Video Processing (eess.IV); Systems and Control (eess.SY); Optics (physics.optics)
[68]  arXiv:2307.15388 (replaced) [pdf, other]
Title: An Empirical Study of Large-Scale Data-Driven Full Waveform Inversion
Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP); Geophysics (physics.geo-ph)
[69]  arXiv:2309.02961 (replaced) [pdf, other]
Title: LuViRA Dataset Validation and Discussion: Comparing Vision, Radio, and Audio Sensors for Indoor Localization
Comments: 10 pages, 11 figures
Subjects: Signal Processing (eess.SP); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[70]  arXiv:2310.16139 (replaced) [pdf, other]
Title: Pix2HDR -- A pixel-wise acquisition and deep learning-based synthesis approach for high-speed HDR videos
Comments: 17 pages, 18 figures
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
[71]  arXiv:2310.17101 (replaced) [pdf, other]
Title: Boosting Multi-Speaker Expressive Speech Synthesis with Semi-supervised Contrastive Learning
Comments: 6 pages, 4 figures; Accepted by ICME 2024
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[72]  arXiv:2311.04791 (replaced) [pdf, other]
Title: Integrated Distributed Semantic Communication and Over-the-air Computation for Cooperative Spectrum Sensing
Comments: 13 pages,10 figures. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible
Subjects: Signal Processing (eess.SP)
[73]  arXiv:2312.09040 (replaced) [pdf, other]
Title: STaR: Distilling Speech Temporal Relation for Lightweight Speech Self-Supervised Learning Models
Comments: ICASSP 2024 Best Student Paper Awarded. Code URL: this https URL
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[74]  arXiv:2401.06966 (replaced) [pdf, other]
Title: Channel Estimation for RIS-Aided mmWave MU-MIMO Systems with Hybrid Beamforming Structures
Comments: submitted to IEEE Transactions on Communications
Subjects: Signal Processing (eess.SP)
[75]  arXiv:2401.09643 (replaced) [pdf, ps, other]
Title: OFDM Reference Signal Pattern Design Criteria for Integrated Communication and Sensing
Subjects: Signal Processing (eess.SP); Networking and Internet Architecture (cs.NI)
[76]  arXiv:2401.09648 (replaced) [pdf, ps, other]
Title: Staggered Comb Reference Signal Design for Integrated Communication and Sensing
Comments: accepted by IEEE International Symposium on Personal, Indoor and Mobile Radio Communications. arXiv admin note: substantial text overlap with arXiv:2401.09643
Subjects: Signal Processing (eess.SP); Networking and Internet Architecture (cs.NI)
[77]  arXiv:2402.16003 (replaced) [pdf, other]
Title: Exploring the Power of Pure Attention Mechanisms in Blind Room Parameter Estimation
Comments: 28 pages, 9 figures, accepted for publishing to EURASIP Journal On Audio Speech And Music Processing
Subjects: Audio and Speech Processing (eess.AS)
[78]  arXiv:2403.01150 (replaced) [pdf, ps, other]
Title: Sequential Rotations and Error Analysis of a Simple Quaternion Estimator
Subjects: Methodology (stat.ME); Systems and Control (eess.SY)
[79]  arXiv:2403.05187 (replaced) [pdf, other]
Title: Robust Semantic Communications for Speech Transmission
Subjects: Audio and Speech Processing (eess.AS)
[80]  arXiv:2403.05743 (replaced) [pdf, ps, other]
Title: Forecasting Electricity Market Signals via Generative AI
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG); General Economics (econ.GN)
[81]  arXiv:2403.09083 (replaced) [pdf, other]
Title: Asymptotically Near-Optimal Hybrid Beamforming for mmWave IRS-Aided MIMO Systems
Comments: Submitted to IEEE Transactions on Vehicular Technology
Subjects: Signal Processing (eess.SP)
[82]  arXiv:2403.15360 (replaced) [pdf, other]
Title: SiMBA: Simplified Mamba-Based Architecture for Vision and Multivariate Time series
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV); Systems and Control (eess.SY)
[83]  arXiv:2404.14700 (replaced) [pdf, other]
Title: FlashSpeech: Efficient Zero-Shot Speech Synthesis
Comments: Efficient zero-shot speech synthesis
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[84]  arXiv:2404.15009 (replaced) [pdf, other]
[ total of 84 entries: 1-84 ]
[ showing up to 2000 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, eess, recent, 2404, contact, help  (Access key information)