Electrical Engineering and Systems Science
New submissions
[ showing up to 2000 entries per page: fewer | more ]
New submissions for Thu, 28 Mar 24
- [1] arXiv:2403.18015 [pdf, other]
-
Title: A Constructive Method for Designing Safe Multirate Controllers for Differentially-Flat SystemsAuthors: Devansh R. Agrawal, Hardik Parwana, Ryan K. Cosner, Ugo Rosolia, Aaron D. Ames, Dimitra PanagouComments: 6 pages, 3 figures, accepted at IEEE Control Systems Letters 2021Journal-ref: IEEE Control Systems Letters, Vol 6, Page 2138--2143, 2021Subjects: Systems and Control (eess.SY); Robotics (cs.RO)
We present a multi-rate control architecture that leverages fundamental properties of differential flatness to synthesize controllers for safety-critical nonlinear dynamical systems. We propose a two-layer architecture, where the high-level generates reference trajectories using a linear Model Predictive Controller, and the low-level tracks this reference using a feedback controller. The novelty lies in how we couple these layers, to achieve formal guarantees on recursive feasibility of the MPC problem, and safety of the nonlinear system. Furthermore, using differential flatness, we provide a constructive means to synthesize the multi-rate controller, thereby removing the need to search for suitable Lyapunov or barrier functions, or to approximately linearize/discretize nonlinear dynamics. We show the synthesized controller is a convex optimization problem, making it amenable to real-time implementations. The method is demonstrated experimentally on a ground rover and a quadruped robotic system.
- [2] arXiv:2403.18026 [pdf, ps, other]
-
Title: Cross-system biological image quality enhancement based on the generative adversarial network as a foundation for establishing a multi-institute microscopy cooperative networkAuthors: Dominik Panek, Carina Rząca, Maksymilian Szczypior, Joanna Sorysz, Krzysztof Misztal, Zbigniew Baster, Zenon RajfurComments: 15 Pages, 5 Figures, 1 Table, 3 pages Supplementary MaterialsSubjects: Image and Video Processing (eess.IV); Machine Learning (cs.LG); Quantitative Methods (q-bio.QM)
High-quality fluorescence imaging of biological systems is limited by processes like photobleaching and phototoxicity, and also in many cases, by limited access to the latest generations of microscopes. Moreover, low temporal resolution can lead to a motion blur effect in living systems. Our work presents a deep learning (DL) generative-adversarial approach to the problem of obtaining high-quality (HQ) images based on their low-quality (LQ) equivalents. We propose a generative-adversarial network (GAN) for contrast transfer between two different separate microscopy systems: a confocal microscope (producing HQ images) and a wide-field fluorescence microscope (producing LQ images). Our model proves that such transfer is possible, allowing us to receive HQ-generated images characterized by low mean squared error (MSE) values, high structural similarity index (SSIM), and high peak signal-to-noise ratio (PSNR) values. For our best model in the case of comparing HQ-generated images and HQ-ground truth images, the median values of the metrics are 6x10-4, 0.9413, and 31.87, for MSE, SSIM, and PSNR, respectively. In contrast, in the case of comparison between LQ and HQ ground truth median values of the metrics are equal to 0.0071, 0.8304, and 21.48 for MSE, SSIM, and PSNR respectively. Therefore, we observe a significant increase ranging from 14% to 49% for SSIM and PSNR respectively. These results, together with other single-system cross-modality studies, provide proof of concept for further implementation of a cross-system biological image quality enhancement.
- [3] arXiv:2403.18041 [pdf, other]
-
Title: Learning Piecewise Residuals of Control Barrier Functions for Safety of Switching Systems using Multi-Output Gaussian ProcessesComments: arXiv admin note: text overlap with arXiv:2403.09573Subjects: Systems and Control (eess.SY); Robotics (cs.RO)
Control barrier functions (CBFs) have recently been introduced as a systematic tool to ensure safety by establishing set invariance. When combined with a control Lyapunov function (CLF), they form a safety-critical control mechanism. However, the effectiveness of CBFs and CLFs is closely tied to the system model. In practice, model uncertainty can jeopardize safety and stability guarantees and may lead to undesirable performance. In this paper, we develop a safe learning-based control strategy for switching systems in the face of uncertainty. We focus on the case that a nominal model is available for a true underlying switching system. This uncertainty results in piecewise residuals for each switching surface, impacting the CLF and CBF constraints. We introduce a batch multi-output Gaussian process (MOGP) framework to approximate these piecewise residuals, thereby mitigating the adverse effects of uncertainty. A particular structure of the covariance function enables us to convert the MOGP-based chance constraints CLF and CBF into second-order cone constraints, which leads to a convex optimization. We analyze the feasibility of the resulting optimization and provide the necessary and sufficient conditions for feasibility. The effectiveness of the proposed strategy is validated through a simulation of a switching adaptive cruise control system.
- [4] arXiv:2403.18055 [pdf, other]
-
Title: Adaptive Boundary Control of the Kuramoto-Sivashinsky Equation Under Intermittent SensingComments: Submitted to AutomaticaSubjects: Systems and Control (eess.SY); Analysis of PDEs (math.AP)
We study in this paper boundary stabilization, in the L2 sense, of the one-dimensional Kuramoto-Sivashinsky equation subject to intermittent sensing. We assume that we measure the state of this spatio-temporal equation on a given spatial subdomain during certain intervals of time, while we measure the state on the remaining spatial subdomain during the remaining intervals of time. As a result, we assign a feedback law at the boundary of the spatial domain and force to zero the value of the state at the junction of the two subdomains. Throughout the study, the destabilizing coefficient is assumed to be space-dependent and bounded but unknown. Adaptive boundary controllers are designed under different assumptions on the forcing term. In particular, when the forcing term is null, we guarantee global exponential stability of the origin. Furthermore, when the forcing term is bounded and admits a known upper bound, we guarantee input-to-state stability, and only global uniform ultimate boundedness is guaranteed when the upper bound is unknown. Numerical simulations are performed to illustrate our results
- [5] arXiv:2403.18066 [pdf, ps, other]
-
Title: Path Integral Control with Rollout Clustering and Dynamic ObstaclesComments: 8 pages, 5 figures, extended version of ACC 2024 submissionSubjects: Systems and Control (eess.SY); Robotics (cs.RO)
Model Predictive Path Integral (MPPI) control has proven to be a powerful tool for the control of uncertain systems (such as systems subject to disturbances and systems with unmodeled dynamics). One important limitation of the baseline MPPI algorithm is that it does not utilize simulated trajectories to their fullest extent. For one, it assumes that the average of all trajectories weighted by their performance index will be a safe trajectory. In this paper, multiple examples are shown where the previous assumption does not hold, and a trajectory clustering technique is presented that reduces the chances of the weighted average crossing in an unsafe region. Secondly, MPPI does not account for dynamic obstacles, so the authors put forward a novel cost function that accounts for dynamic obstacles without adding significant computation time to the overall algorithm. The novel contributions proposed in this paper were evaluated with extensive simulations to demonstrate improvements upon the state-of-the-art MPPI techniques.
- [6] arXiv:2403.18071 [pdf, other]
-
Title: From Sontag s to Cardano-Lyapunov Formula for Systems Not Affine in the Control: Convection-Enabled PDE StabilizationComments: To be presented at the 2024 American Control ConferenceSubjects: Systems and Control (eess.SY); Analysis of PDEs (math.AP)
We propose the first generalization of Sontag s universal controller to systems not affine in the control, particularly, to PDEs with boundary actuation. We assume that the system admits a control Lyapunov function (CLF) whose derivative, rather than being affine in the control, has either a depressed cubic, quadratic, or depressed quartic dependence on the control. For each case, a continuous universal controller that vanishes at the origin and achieves global exponential stability is derived. We prove our result in the context of convectionreaction-diffusion PDEs with Dirichlet actuation. We show that if the convection has a certain structure, then the L2 norm of the state is a CLF. In addition to generalizing Sontag s formula to some non-affine systems, we present the first general Lyapunov approach for boundary control of nonlinear PDEs. We illustrate our results via a numerical example.
- [7] arXiv:2403.18085 [pdf, other]
-
Title: ANOCA: AC Network-aware Optimal Curtailment Approach for Dynamic Hosting CapacitySubjects: Systems and Control (eess.SY)
With exponential growth in distributed energy resources (DERs) coupled with at-capacity distribution grid infrastructure, prosumers cannot always export all extra power to the grid without violating technical limits. Consequently, a slew of dynamic hosting capacity (DHC) algorithms have emerged for optimal utilization of grid infrastructure while maximizing export from DERs. Most of these DHC algorithms utilize the concept of operating envelopes (OE)}, where the utility gives prosumers technical power export limits, and they are free to export power within these limits. Recent studies have shown that OE-based frameworks have drawbacks, as most develop power export limits based on convex or linear grid models. As OEs must capture extreme operating conditions, both convex and linear models can violate technical limits in practice because they approximate grid physics. However, AC models are unsuitable because they may not be feasible within the whole region of OE. We propose a new two-stage optimization framework for DHC built on three-phase AC models to address the current gaps. In this approach, the prosumers first run a receding horizon multi-period optimization to identify optimal export power setpoints to communicate with the utility. The utility then performs an infeasibility-based optimization to either accept the prosumer's request or dispatch an optimal curtail signal such that overall system technical constraints are not violated. To explore various curtailment strategies, we develop an L1, L2, and Linf norm-based dispatch algorithm with an exact three-phase AC model. We test our framework on a 1420 three-phase node meshed distribution network and show that the proposed algorithm optimally curtails DERs while guaranteeing the AC feasibility of the network.
- [8] arXiv:2403.18087 [pdf, other]
-
Title: Channel Estimation and Beamforming for Beyond Diagonal Reconfigurable Intelligent SurfacesComments: 12 pages, 10 figures, submitted to IEEE journalSubjects: Signal Processing (eess.SP); Information Theory (cs.IT)
Beyond diagonal reconfigurable intelligent surface (BD-RIS) is a new advance and generalization of the RIS technique. BD-RIS breaks through the isolation between RIS elements by creatively introducing inter-element connections, thereby enabling smarter wave manipulation and enlarging coverage. However, exploring proper channel estimation schemes suitable for BD-RIS aided communication systems still remains an open problem. In this paper, we study channel estimation and beamforming design for BD-RIS aided multi-antenna systems. We first describe the channel estimation strategy based on the least square (LS) method, derive the mean square error (MSE) of the LS estimation, and formulate the joint pilot sequence and BD-RIS design problem with unique constraints induced by BD-RIS architectures. Specifically, we propose an efficient pilot sequence and BD-RIS design which theoretically guarantees to achieve the minimum MSE. With the estimated channel, we then consider two BD-RIS scenarios and propose beamforming design algorithms. Finally, we provide simulation results to verify the effectiveness of the proposed channel estimation scheme and beamforming design algorithms. We also show that more interelement connections in BD-RIS improves the performance while increasing the training overhead for channel estimation.
- [9] arXiv:2403.18119 [pdf, other]
-
Title: Multiple Model Reference Adaptive Control with Blending for Non-Square Multivariable SystemsComments: 10 pages, 7 figures, IEEE Journal SubmissionSubjects: Systems and Control (eess.SY)
In this paper we develop a multiple model reference adaptive controller (MMRAC) with blending. The systems under consideration are non-square, i.e., the number of inputs is not equal to the number of states; multi-input, linear, time-invariant with uncertain parameters that lie inside of a known, compact, and convex set. Moreover, the full state of the plant is available for feedback. A multiple model online identification scheme for the plant's state and input matrices is developed that guarantees the estimated parameters converge to the underlying plant model under the assumption of persistence of excitation. Using an exact matching condition, the parameter estimates are used in a control law such that the plant's states asymptotically track the reference signal generated by a state-space model reference. The control architecture is proven to provide boundedness of all closed-loop signals and to asymptotically drive the state tracking error to zero. Numerical simulations illustrate the stability and efficacy of the proposed MMRAC scheme.
- [10] arXiv:2403.18129 [pdf, other]
-
Title: On the Statistical Analysis of the Multipath Propagation Model Parameters for Power Line CommunicationsComments: 9 pages, 7 figuresSubjects: Signal Processing (eess.SP)
This paper proposes a fitting procedure that aims to identify the statistical properties of the parameters that describe the most widely known multipath propagation model (MPM) used in power line communication (PLC). Firstly, the MPM parameters are computed by fitting the theoretical model to a large database of single-input-single-output (SISO) experimental measurements, carried out in typical home premises. Secondly, the determined parameters are substituted back into the MPM formulation with the aim to prove their faithfulness, thus validating the proposed computation procedure. Then, the MPM parameters properties have been evaluated. In particular, the statistical behavior is established identifying the best fitting distribution by comparing the most common distributions through the use of the likelihood function. Moreover, the relationship among the different paths is highlighted in terms of statistical correlation. The identified statistical behavior for the MPM parameters confirms the assumptions of the previous works that, however, were mostly established in an heuristic way.
- [11] arXiv:2403.18134 [pdf, other]
-
Title: Integrative Graph-Transformer Framework for Histopathology Whole Slide Image Representation and ClassificationSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
In digital pathology, the multiple instance learning (MIL) strategy is widely used in the weakly supervised histopathology whole slide image (WSI) classification task where giga-pixel WSIs are only labeled at the slide level. However, existing attention-based MIL approaches often overlook contextual information and intrinsic spatial relationships between neighboring tissue tiles, while graph-based MIL frameworks have limited power to recognize the long-range dependencies. In this paper, we introduce the integrative graph-transformer framework that simultaneously captures the context-aware relational features and global WSI representations through a novel Graph Transformer Integration (GTI) block. Specifically, each GTI block consists of a Graph Convolutional Network (GCN) layer modeling neighboring relations at the local instance level and an efficient global attention model capturing comprehensive global information from extensive feature embeddings. Extensive experiments on three publicly available WSI datasets: TCGA-NSCLC, TCGA-RCC and BRIGHT, demonstrate the superiority of our approach over current state-of-the-art MIL methods, achieving an improvement of 1.0% to 2.6% in accuracy and 0.7%-1.6% in AUROC.
- [12] arXiv:2403.18139 [pdf, other]
-
Title: Pseudo-MRI-Guided PET Image Reconstruction Method Based on a Diffusion Probabilistic ModelAuthors: Weijie Gan, Huidong Xie, Carl von Gall, Günther Platsch, Michael T. Jurkiewicz, Andrea Andrade, Udunna C. Anazodo, Ulugbek S. Kamilov, Hongyu An, Jorge CabelloSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
Anatomically guided PET reconstruction using MRI information has been shown to have the potential to improve PET image quality. However, these improvements are limited to PET scans with paired MRI information. In this work we employed a diffusion probabilistic model (DPM) to infer T1-weighted-MRI (deep-MRI) images from FDG-PET brain images. We then use the DPM-generated T1w-MRI to guide the PET reconstruction. The model was trained with brain FDG scans, and tested in datasets containing multiple levels of counts. Deep-MRI images appeared somewhat degraded than the acquired MRI images. Regarding PET image quality, volume of interest analysis in different brain regions showed that both PET reconstructed images using the acquired and the deep-MRI images improved image quality compared to OSEM. Same conclusions were found analysing the decimated datasets. A subjective evaluation performed by two physicians confirmed that OSEM scored consistently worse than the MRI-guided PET images and no significant differences were observed between the MRI-guided PET images. This proof of concept shows that it is possible to infer DPM-based MRI imagery to guide the PET reconstruction, enabling the possibility of changing reconstruction parameters such as the strength of the prior on anatomically guided PET reconstruction in the absence of MRI.
- [13] arXiv:2403.18151 [pdf, ps, other]
-
Title: Automated Report Generation for Lung Cytological Images Using a CNN Vision Classifier and Multiple-Transformer Text Decoders: Preliminary StudyAuthors: Atsushi Teramoto, Ayano Michiba, Yuka Kiriyama, Tetsuya Tsukamoto, Kazuyoshi Imaizumi, Hiroshi FujitaComments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessibleSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Medical Physics (physics.med-ph)
Cytology plays a crucial role in lung cancer diagnosis. Pulmonary cytology involves cell morphological characterization in the specimen and reporting the corresponding findings, which are extremely burdensome tasks. In this study, we propose a report-generation technique for lung cytology images. In total, 71 benign and 135 malignant pulmonary cytology specimens were collected. Patch images were extracted from the captured specimen images, and the findings were assigned to each image as a dataset for report generation. The proposed method consists of a vision model and a text decoder. In the former, a convolutional neural network (CNN) is used to classify a given image as benign or malignant, and the features related to the image are extracted from the intermediate layer. Independent text decoders for benign and malignant cells are prepared for text generation, and the text decoder switches according to the CNN classification results. The text decoder is configured using a Transformer that uses the features obtained from the CNN for report generation. Based on the evaluation results, the sensitivity and specificity were 100% and 96.4%, respectively, for automated benign and malignant case classification, and the saliency map indicated characteristic benign and malignant areas. The grammar and style of the generated texts were confirmed as correct and in better agreement with gold standard compared to existing LLM-based image-captioning methods and single-text-decoder ablation model. These results indicate that the proposed method is useful for pulmonary cytology classification and reporting.
- [14] arXiv:2403.18164 [pdf, other]
-
Title: Incentive Designs for Learning Agents to Stabilize Coupled Exogenous SystemsComments: 8 pages, 3 figuresSubjects: Systems and Control (eess.SY); Dynamical Systems (math.DS); Optimization and Control (math.OC)
We consider a large population of learning agents noncooperatively selecting strategies from a common set, influencing the dynamics of an exogenous system (ES) we seek to stabilize at a desired equilibrium. Our approach is to design a dynamic payoff mechanism capable of shaping the population's strategy profile, thus affecting the ES's state, by offering incentives for specific strategies within budget limits. Employing system-theoretic passivity concepts, we establish conditions under which a payoff mechanism can be systematically constructed to ensure the global asymptotic stabilization of the ES's equilibrium. In comparison to previous approaches originally studied in the context of the so-called epidemic population games, the method proposed here allows for more realistic epidemic models and other types of ES, such as predator-prey dynamics. Stabilization is established with the support of a Lyapunov function, which provides useful bounds on the transients.
- [15] arXiv:2403.18166 [pdf, other]
-
Title: Incentive-Compatible Vertiport Reservation in Advanced Air Mobility: An Auction-Based ApproachComments: 26 pages, 2 figures, 1 tableSubjects: Systems and Control (eess.SY); Multiagent Systems (cs.MA); Theoretical Economics (econ.TH); Optimization and Control (math.OC)
The rise of advanced air mobility (AAM) is expected to become a multibillion-dollar industry in the near future. Market-based mechanisms are touted to be an integral part of AAM operations, which comprise heterogeneous operators with private valuations. In this work, we study the problem of designing a mechanism to coordinate the movement of electric vertical take-off and landing (eVTOL) aircraft, operated by multiple operators each having heterogeneous valuations associated with their fleet, between vertiports, while enforcing the arrival, departure, and parking constraints at vertiports. Particularly, we propose an incentive-compatible and individually rational vertiport reservation mechanism that maximizes a social welfare metric, which encapsulates the objective of maximizing the overall valuations of all operators while minimizing the congestion at vertiports. Additionally, we improve the computational tractability of designing the reservation mechanism by proposing a mixed binary linear programming approach that is based on constructing network flow graph corresponding to the underlying problem.
- [16] arXiv:2403.18198 [pdf, other]
-
Title: Generative Medical SegmentationSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
Rapid advancements in medical image segmentation performance have been significantly driven by the development of Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs). However, these models introduce high computational demands and often have limited ability to generalize across diverse medical imaging datasets. In this manuscript, we introduce Generative Medical Segmentation (GMS), a novel approach leveraging a generative model for image segmentation. Concretely, GMS employs a robust pre-trained Variational Autoencoder (VAE) to derive latent representations of both images and masks, followed by a mapping model that learns the transition from image to mask in the latent space. This process culminates in generating a precise segmentation mask within the image space using the pre-trained VAE decoder. The design of GMS leads to fewer learnable parameters in the model, resulting in a reduced computational burden and enhanced generalization capability. Our extensive experimental analysis across five public datasets in different medical imaging domains demonstrates GMS outperforms existing discriminative segmentation models and has remarkable domain generalization. Our experiments suggest GMS could set a new benchmark for medical image segmentation, offering a scalable and effective solution. GMS implementation and model weights are available at https://github.com/King-HAW/GMS.
- [17] arXiv:2403.18200 [pdf, other]
-
Title: Fault-tolerant properties of scale-free linear protocols for synchronization of homogeneous multi-agent systemsComments: The article was submitted to IEEE Transactions on Automatic Control for review at March 27th, 2024Subjects: Systems and Control (eess.SY)
Originally, protocols were designed for multi-agent systems (MAS) using information about the network. However, in many cases there is no or only limited information available about the network. Recently, there has been a focus on scale-free synchronization of multi-agent systems (MAS). In this case, the protocol is designed without any prior information about the network. As long as the network contains a directed spanning tree, the scale-free protocol guarantees that the network achieves synchronization.
If there is no directed spanning tree for the network then synchronization cannot be achieved. But what happens when these scale-free protocols are applied to such a network where the directed spanning tree no longer exists? The latter might arise if, for instance, a fault occurs in one of more crucial links. This paper establishes that the network decomposes into a number of basic bicomponents which achieves synchronization among all nodes in this basic bicomponent. On the other hand, nodes which are not part of any basic bicomponent converge to a weighted average of the synchronized trajectories of the basic bicomponents. The weights are independent of the initial conditions and are independent of the designed protocol. - [18] arXiv:2403.18233 [pdf, other]
-
Title: Benchmarking Image Transformers for Prostate Cancer Detection from Ultrasound DataAuthors: Mohamed Harmanani, Paul F. R. Wilson, Fahimeh Fooladgar, Amoon Jamzad, Mahdi Gilany, Minh Nguyen Nhat To, Brian Wodlinger, Purang Abolmaesumi, Parvin MousaviComments: early draft, 7 pages; Accepted to SPIE Medical Imaging 2024Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Tissues and Organs (q-bio.TO)
PURPOSE: Deep learning methods for classifying prostate cancer (PCa) in ultrasound images typically employ convolutional networks (CNNs) to detect cancer in small regions of interest (ROI) along a needle trace region. However, this approach suffers from weak labelling, since the ground-truth histopathology labels do not describe the properties of individual ROIs. Recently, multi-scale approaches have sought to mitigate this issue by combining the context awareness of transformers with a CNN feature extractor to detect cancer from multiple ROIs using multiple-instance learning (MIL). In this work, we present a detailed study of several image transformer architectures for both ROI-scale and multi-scale classification, and a comparison of the performance of CNNs and transformers for ultrasound-based prostate cancer classification. We also design a novel multi-objective learning strategy that combines both ROI and core predictions to further mitigate label noise. METHODS: We evaluate 3 image transformers on ROI-scale cancer classification, then use the strongest model to tune a multi-scale classifier with MIL. We train our MIL models using our novel multi-objective learning strategy and compare our results to existing baselines. RESULTS: We find that for both ROI-scale and multi-scale PCa detection, image transformer backbones lag behind their CNN counterparts. This deficit in performance is even more noticeable for larger models. When using multi-objective learning, we can improve performance of MIL, with a 77.9% AUROC, a sensitivity of 75.9%, and a specificity of 66.3%. CONCLUSION: Convolutional networks are better suited for modelling sparse datasets of prostate ultrasounds, producing more robust features than transformers in PCa detection. Multi-scale methods remain the best architecture for this task, with multi-objective learning presenting an effective way to improve performance.
- [19] arXiv:2403.18235 [pdf, other]
-
Title: An Execution-time-certified QP Algorithm for $\ell_1$ penalty-based Soft-constrained MPCComments: 6 pagesSubjects: Systems and Control (eess.SY); Optimization and Control (math.OC)
Providing an execution time certificate and handling possible infeasibility in closed-loop are two pressing requirements of Model Predictive Control (MPC). To simultaneously meet these two requirements, this paper uses $\ell_1$ penalty-based soft-constrained MPC formulation and innovatively transforms the resulting non-smooth QP into a box-constrained QP, which is solved by our previously proposed direct and execution-time certified algorithm with only dimension-dependent (data-independent) and exact number of iterations [1]. This approach not only overcomes the limitation of our previously proposed algorithm [1], only applicable to input-constrained MPC, but also enjoys exact recovery feature (exactly recover the same solution when the original problem is feasible) of $\ell_1$ penalty-based soft-constrained MPC formulation without suffering numerical difficulty of the resulting non-smoothness. Other various real-time QP applications, not limited to MPC, will also benefit from our QP algorithm with execution-time certificate and global feasibility.
- [20] arXiv:2403.18250 [pdf, other]
-
Title: Linear Hybrid Asymmetrical Load-Modulated Balanced Amplifier with Multi-Band Reconfigurability and Antenna-VSWR ResilienceComments: This work has been submitted to the IEEE for possible publicationSubjects: Systems and Control (eess.SY)
This paper presents the first-ever highly linear and load-insensitive three-way load-modulation power amplifier (PA) based on reconfigurable hybrid asymmetrical load modulated balanced amplifier (H-ALMBA). Through proper amplitude and phase controls, the carrier, control amplifier (CA), and two peaking balanced amplifiers (BA1 and BA2) can form a linear high-order load modulation over wide bandwidth. Moreover, it is theoretically unveiled that the load modulation behavior of H-ALMBA can be insensitive to load mismatch by leveraging bias reconfiguration and the intrinsic load-insensitivity of balanced topology. Specifically, the PA's linearity and efficiency profiles can be maintained against arbitrary load mismatch through $Z_\mathrm{L}$-dependent reconfiguration of CA supply voltage ($V_\mathrm{DD,CA}$) and turning-on sequence of BA1 and BA2. Based on the proposed theory, an RF-input linear H-ALMBA is developed with GaN transistors and wideband quadrature hybrids. Over the design bandwidth from $1.7$-$2.9$ GHz, an efficiency of $56.8\%$$-$$72.9\%$ at peak power and $49.8\%$$-$$61.2\%$ at $10$-dB PBO are measured together with linear AMAM and AMPM responses. In modulated evaluation with 4G LTE signal, an EVM of $3.1\%$, ACPR of $-39$ dB, and average efficiency of up to $52\%$ are measured. Moreover, the reconfigurable H-ALMBA experimentally maintains an excellent average efficiency and linearity against arbitrary load mismatch at $2:1$ VSWR, and this mismatch-resilient operation can be achieved at any in-band frequencies. The overall measured performance favorably outperforms the state-of-the-art.
- [21] arXiv:2403.18254 [pdf, other]
-
Title: Differentially Private Distributed Nonconvex Stochastic Optimization with Quantized CommunicationsSubjects: Systems and Control (eess.SY)
This paper proposes a new distributed nonconvex stochastic optimization algorithm that can achieve privacy protection, communication efficiency and convergence simultaneously. Specifically, each node adds time-varying privacy noises to its local state to avoid information leakage, and then quantizes its noise-perturbed state before transmitting to improve communication efficiency. By employing the subsampling method controlled through the sample-size parameter, the proposed algorithm reduces the impact of privacy noises, and enhances the differential privacy level. When the global cost function satisfies the Polyak-Lojasiewicz condition, the mean and high-probability convergence rate and the oracle complexity of the proposed algorithm are given. Importantly, the proposed algorithm achieves both the mean convergence and a finite cumulative differential privacy budget over infinite iterations as the sample-size goes to infinity. A numerical example of the distributed training on the "MNIST" dataset is given to show the effectiveness of the algorithm.
- [22] arXiv:2403.18257 [pdf, other]
-
Title: Dual-path Mamba: Short and Long-term Bidirectional Selective Structured State Space Models for Speech SeparationSubjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
Transformers have been the most successful architecture for various speech modeling tasks, including speech separation. However, the self-attention mechanism in transformers with quadratic complexity is inefficient in computation and memory. Recent models incorporate new layers and modules along with transformers for better performance but also introduce extra model complexity. In this work, we replace transformers with Mamba, a selective state space model, for speech separation. We propose dual-path Mamba, which models short-term and long-term forward and backward dependency of speech signals using selective state spaces. Our experimental results on the WSJ0-2mix data show that our dual-path Mamba models match or outperform dual-path transformer models Sepformer with only 60% of its parameters, and the QDPN with only 30% of its parameters. Our large model also reaches a new state-of-the-art SI-SNRi of 24.4 dB.
- [23] arXiv:2403.18275 [pdf, other]
-
Title: Differentially Private Dual Gradient Tracking for Distributed Resource AllocationSubjects: Systems and Control (eess.SY)
This paper investigates privacy issues in distributed resource allocation over directed networks, where each agent holds a private cost function and optimizes its decision subject to a global coupling constraint through local interaction with other agents. Conventional methods for resource allocation over directed networks require all agents to transmit their original data to neighbors, which poses the risk of disclosing sensitive and private information. To address this issue, we propose an algorithm called differentially private dual gradient tracking (DP-DGT) for distributed resource allocation, which obfuscates the exchanged messages using independent Laplacian noise. Our algorithm ensures that the agents' decisions converge to a neighborhood of the optimal solution almost surely. Furthermore, without the assumption of bounded gradients, we prove that the cumulative differential privacy loss under the proposed algorithm is finite even when the number of iterations goes to infinity. To the best of our knowledge, we are the first to simultaneously achieve these two goals in distributed resource allocation problems over directed networks. Finally, numerical simulations on economic dispatch problems within the IEEE 14-bus system illustrate the effectiveness of our proposed algorithm.
- [24] arXiv:2403.18311 [pdf, other]
-
Title: UAV Corridor Coverage Analysis with Base Station Antenna Uptilt and Strongest Signal AssociationSubjects: Signal Processing (eess.SP)
Unmanned aerial vehicle (UAV) corridors are sky lanes where UAVs fly through safely between their origin and destination. To ensure the successful operation of UAV corridors, beyond visual line of sight (BVLOS) wireless connectivity within the corridor is crucial. One promising solution to support this is the use of cellular-connected UAV (C-UAV) networks, which offer long-range and seamless wireless coverage. However, conventional terrestrial base stations (BSs) that typically employ down-tilted sector antennas to serve ground users are not ideally suited to serve the aerial vehicles positioned above the BSs. In our previous work, we focused on studying the optimal uptilt angle of BS antennas to maximize the wireless coverage probability in UAV corridors. However, the association of BSs with UAVs was restricted to the nearest BS association, which limits the potential coverage benefits. In this paper, we address this limitation by considering the strongest BS signal association in UAV corridors, which enables enhanced coverage within the corridor compared to the nearest BS association. The strongest BS association allows UAVs to connect with the second nearest BSs while also accounting for interference from the third nearest BSs. Closed-form expression analysis and simulation results show that the strongest BSs association in UAV corridors yields a superior coverage probability when compared to the nearest BS association.
- [25] arXiv:2403.18339 [pdf, other]
-
Title: H2ASeg: Hierarchical Adaptive Interaction and Weighting Network for Tumor Segmentation in PET/CT ImagesComments: 10 pages,4 figuresSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
Positron emission tomography (PET) combined with computed tomography (CT) imaging is routinely used in cancer diagnosis and prognosis by providing complementary information. Automatically segmenting tumors in PET/CT images can significantly improve examination efficiency. Traditional multi-modal segmentation solutions mainly rely on concatenation operations for modality fusion, which fail to effectively model the non-linear dependencies between PET and CT modalities. Recent studies have investigated various approaches to optimize the fusion of modality-specific features for enhancing joint representations. However, modality-specific encoders used in these methods operate independently, inadequately leveraging the synergistic relationships inherent in PET and CT modalities, for example, the complementarity between semantics and structure. To address these issues, we propose a Hierarchical Adaptive Interaction and Weighting Network termed H2ASeg to explore the intrinsic cross-modal correlations and transfer potential complementary information. Specifically, we design a Modality-Cooperative Spatial Attention (MCSA) module that performs intra- and inter-modal interactions globally and locally. Additionally, a Target-Aware Modality Weighting (TAMW) module is developed to highlight tumor-related features within multi-modal features, thereby refining tumor segmentation. By embedding these modules across different layers, H2ASeg can hierarchically model cross-modal correlations, enabling a nuanced understanding of both semantic and structural tumor features. Extensive experiments demonstrate the superiority of H2ASeg, outperforming state-of-the-art methods on AutoPet-II and Hecktor2022 benchmarks. The code is released at https://github.com/JinPLu/H2ASeg.
- [26] arXiv:2403.18371 [pdf, other]
-
Title: Multivariable control of modular multilevel converters with convergence and safety guaranteesComments: Submitted to IEEE Open Journal of the Industrial ElectronicsSubjects: Systems and Control (eess.SY)
Well-designed current control is a key factor in ensuring the efficient and safe operation of modular multilevel converters (MMCs). Even though this control problem involves multiple control objectives, conventional current control schemes are comprised of independently designed decoupled controllers, e.g., proportional-integral (PI) or proportional-resonant (PR). Due to the bilinearity of the MMC dynamics, tuning PI and PR controllers so that good performance and constraint satisfaction are guaranteed is quite challenging. This challenge becomes more relevant in an AC/AC MMC configuration due to the complexity of tracking the single-phase sinusoidal components of the MMC output. In this paper, we propose a method to design a multivariable controller, i.e., a static feedback gain, to regulate the MMC currents. We use a physics-informed transformation to model the MMC dynamics linearly and synthesise the proposed controller. We use this linear model to formulate a linear matrix inequality that computes a feedback gain that guarantees safe and effective operation, including (i) limited tracking error, (ii) stability, and (iii) meeting all constraints. To test the efficacy of our method, we examine its performance in a direct AC/AC MMC simulated in Simulink/PLECS and in a scaled-down AC/AC MMC prototype to investigate the ultra-fast charging of electric vehicles.
- [27] arXiv:2403.18398 [pdf, ps, other]
-
Title: Adaptive Economic Model Predictive Control for linear systems with performance guaranteesComments: 8 pages, 3 figures, submitted to IEEE CDC 2024Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC)
We present a model predictive control (MPC) formulation to directly optimize economic criteria for linear constrained systems subject to disturbances and uncertain model parameters. The proposed formulation combines a certainty equivalent economic MPC with a simple least-squares parameter adaptation. For the resulting adaptive economic MPC scheme, we derive strong asymptotic and transient performance guarantees. We provide a numerical example involving building temperature control and demonstrate performance benefits of online parameter adaptation.
- [28] arXiv:2403.18422 [pdf, ps, other]
-
Title: Feedback Linearizable Discretizations of Second Order Mechanical Systems using Retraction MapsSubjects: Systems and Control (eess.SY)
Mechanical systems, in nature, are often described by a set of continuous-time, nonlinear, second-order differential equations (SODEs). This has motivated designs of various control laws implemented on digital controllers, consequently requiring numerical discretization schemes. Feedback linearizability of such sampled systems depends on the discretization scheme or map choice. In this article, we utilize retraction maps and their lifts to construct feedback linearizable discretizations for SODEs, which can be applied to various mechanical systems.
- [29] arXiv:2403.18445 [pdf, other]
-
Title: Asymptotic Analysis of Synchronous Signal ProcessingComments: 14 pages, 7 figures, submitted to the IEEE for possible publicationSubjects: Signal Processing (eess.SP)
This paper extends various theoretical results from stationary data processing to cyclostationary (CS) processes under a unified framework. We first derive their asymptotic eigenbasis, which provides a link between their Fourier and Karhunen-Lo\`eve (KL) expansions, through a unitary transformation dictated by the cyclic spectrum. By exploiting this connection and the optimalities offered by the KL representation, we study the asymptotic performance of smoothing, filtering and prediction of CS processes, without the need for deriving explicit implementations. We obtain minimum mean squared error expressions that depend on the cyclic spectrum and include classical limits based on the power spectral density as particular cases. We conclude this work by applying the results to a practical scenario, in order to quantify the achievable gains of synchronous signal processing.
- [30] arXiv:2403.18468 [pdf, ps, other]
-
Title: Deep Learning Segmentation and Classification of Red Blood Cells Using a Large Multi-Scanner DatasetComments: 15 pages, 12 figures, 8 tablesSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
Digital pathology has recently been revolutionized by advancements in artificial intelligence, deep learning, and high-performance computing. With its advanced tools, digital pathology can help improve and speed up the diagnostic process, reduce human errors, and streamline the reporting step. In this paper, we report a new large red blood cell (RBC) image dataset and propose a two-stage deep learning framework for RBC image segmentation and classification. The dataset is a highly diverse dataset of more than 100K RBCs containing eight different classes. The dataset, which is considerably larger than any publicly available hematopathology dataset, was labeled independently by two hematopathologists who also manually created masks for RBC cell segmentation. Subsequently, in the proposed framework, first, a U-Net model was trained to achieve automatic RBC image segmentation. Second, an EfficientNetB0 model was trained to classify RBC images into one of the eight classes using a transfer learning approach with a 5X2 cross-validation scheme. An IoU of 98.03% and an average classification accuracy of 96.5% were attained on the test set. Moreover, we have performed experimental comparisons against several prominent CNN models. These comparisons show the superiority of the proposed model with a good balance between performance and computational cost.
- [31] arXiv:2403.18501 [pdf, other]
-
Title: HEMIT: H&E to Multiplex-immunohistochemistry Image Translation with Dual-Branch Pix2pix GeneratorSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
Computational analysis of multiplexed immunofluorescence histology data is emerging as an important method for understanding the tumour micro-environment in cancer. This work presents HEMIT, a dataset designed for translating Hematoxylin and Eosin (H&E) sections to multiplex-immunohistochemistry (mIHC) images, featuring DAPI, CD3, and panCK markers. Distinctively, HEMIT's mIHC images are multi-component and cellular-level aligned with H&E, enriching supervised stain translation tasks. To our knowledge, HEMIT is the first publicly available cellular-level aligned dataset that enables H&E to multi-target mIHC image translation. This dataset provides the computer vision community with a valuable resource to develop novel computational methods which have the potential to gain new insights from H&E slide archives.
We also propose a new dual-branch generator architecture, using residual Convolutional Neural Networks (CNNs) and Swin Transformers which achieves better translation outcomes than other popular algorithms. When evaluated on HEMIT, it outperforms pix2pixHD, pix2pix, U-Net, and ResNet, achieving the highest overall score on key metrics including the Structural Similarity Index Measure (SSIM), Pearson correlation score (R), and Peak signal-to-noise Ratio (PSNR). Additionally, downstream analysis has been used to further validate the quality of the generated mIHC images. These results set a new benchmark in the field of stain translation tasks. - [32] arXiv:2403.18514 [pdf, other]
-
Title: CT-3DFlow : Leveraging 3D Normalizing Flows for Unsupervised Detection of Pathological Pulmonary CT scansAuthors: Aissam Djahnine, Alexandre Popoff, Emilien Jupin-Delevaux, Vincent Cottin, Olivier Nempont, Loic BousselSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Unsupervised pathology detection can be implemented by training a model on healthy data only and measuring the deviation from the training set upon inference, for example with CNN-based feature extraction and one-class classifiers, or reconstruction-score-based methods such as AEs, GANs and Diffusion models. Normalizing Flows (NF) have the ability to directly learn the probability distribution of training examples through an invertible architecture. We leverage this property in a novel 3D NF-based model named CT-3DFlow, specifically tailored for patient-level pulmonary pathology detection in chest CT data. Our model is trained unsupervised on healthy 3D pulmonary CT patches, and detects deviations from its log-likelihood distribution as anomalies. We aggregate patches-level likelihood values from a patient's CT scan to provide a patient-level 'normal'/'abnormal' prediction. Out-of-distribution detection performance is evaluated using expert annotations on a separate chest CT test dataset, outperforming other state-of-the-art methods.
- [33] arXiv:2403.18535 [pdf, other]
-
Title: Theoretical Bound-Guided Hierarchical VAE for Neural Image CodecsComments: 2024 IEEE International Conference on Multimedia and Expo (ICME2024)Subjects: Image and Video Processing (eess.IV); Machine Learning (cs.LG)
Recent studies reveal a significant theoretical link between variational autoencoders (VAEs) and rate-distortion theory, notably in utilizing VAEs to estimate the theoretical upper bound of the information rate-distortion function of images. Such estimated theoretical bounds substantially exceed the performance of existing neural image codecs (NICs). To narrow this gap, we propose a theoretical bound-guided hierarchical VAE (BG-VAE) for NIC. The proposed BG-VAE leverages the theoretical bound to guide the NIC model towards enhanced performance. We implement the BG-VAE using Hierarchical VAEs and demonstrate its effectiveness through extensive experiments. Along with advanced neural network blocks, we provide a versatile, variable-rate NIC that outperforms existing methods when considering both rate-distortion performance and computational complexity. The code is available at BG-VAE.
- [34] arXiv:2403.18560 [pdf, other]
-
Title: Noise-Robust Keyword Spotting through Self-supervised PretrainingSubjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
Voice assistants are now widely available, and to activate them a keyword spotting (KWS) algorithm is used. Modern KWS systems are mainly trained using supervised learning methods and require a large amount of labelled data to achieve a good performance. Leveraging unlabelled data through self-supervised learning (SSL) has been shown to increase the accuracy in clean conditions. This paper explores how SSL pretraining such as Data2Vec can be used to enhance the robustness of KWS models in noisy conditions, which is under-explored.
Models of three different sizes are pretrained using different pretraining approaches and then fine-tuned for KWS. These models are then tested and compared to models trained using two baseline supervised learning methods, one being standard training using clean data and the other one being multi-style training (MTR). The results show that pretraining and fine-tuning on clean data is superior to supervised learning on clean data across all testing conditions, and superior to supervised MTR for testing conditions of SNR above 5 dB. This indicates that pretraining alone can increase the model's robustness. Finally, it is found that using noisy data for pretraining models, especially with the Data2Vec-denoising approach, significantly enhances the robustness of KWS models in noisy conditions. - [35] arXiv:2403.18561 [pdf, other]
-
Title: A Dynamic Programming Approach for Road Traffic EstimationSubjects: Systems and Control (eess.SY)
We consider a road network represented by a directed graph. We assume to collect many measurements of traffic flows on all the network arcs, or on a subset of them. We assume that the users are divided into different groups. Each group follows a different path. The flows of all user groups are modeled as a set of independent Poisson processes. Our focus is estimating the paths followed by each user group, and the means of the associated Poisson processes. We present a possible solution based on a Dynamic Programming algorithm. The method relies on the knowledge of high order cumulants. We discuss the theoretical properties of the introduced method. Finally, we present some numerical tests on well-known benchmark networks, using synthetic data.
- [36] arXiv:2403.18564 [pdf, ps, other]
-
Title: Formal Verification with Constrained Polynomial Logical ZonotopeSubjects: Systems and Control (eess.SY); Logic in Computer Science (cs.LO)
In this paper, we propose using constrained polynomial logical zonotopes for formal verification of logical systems. We perform reachability analysis to compute the set of states that could be reached. To do this, we utilize a recently introduced set representation called polynomial logical zonotopes for performing computationally efficient and exact reachability analysis on logical systems. Notably, polynomial logical zonotopes address the "curse of dimensionality" when analyzing the reachability of logical systems since the set representation can represent 2^n binary vectors using n generators. After finishing the reachability analysis, the formal verification involves verifying whether the intersection of the calculated reachable set and the unsafe set is empty or not. However, polynomial logical zonotopes are not closed under intersections. To address this, we formulate constrained polynomial logical zonotopes, which maintain the computational efficiency and exactness of polynomial logical zonotopes for reachability analysis while supporting exact intersections. Furthermore, we present an extensive empirical study illustrating and verifying the benefits of using constrained polynomial logical zonotopes for the formal verification of logical systems.
- [37] arXiv:2403.18571 [pdf, ps, other]
-
Title: Bootstrapping Guarantees: Stability and Performance Analysis for Dynamic Encrypted ControlSubjects: Systems and Control (eess.SY); Cryptography and Security (cs.CR); Optimization and Control (math.OC)
Encrypted dynamic controllers that operate for an unlimited time have been a challenging subject of research. The fundamental difficulty is the accumulation of errors and scaling factors in the internal state during operation. Bootstrapping, a technique commonly employed in fully homomorphic cryptosystems, can be used to avoid overflows in the controller state but can potentially introduce significant numerical errors. In this paper, we analyze dynamic encrypted control with explicit consideration of bootstrapping. By recognizing the bootstrapping errors occurring in the controller's state as an uncertainty in the robust control framework, we can provide stability and performance guarantees for the whole encrypted control system. Further, the conservatism of the stability and performance test is reduced by using a lifted version of the control system.
- [38] arXiv:2403.18589 [pdf, ps, other]
-
Title: Users prefer Jpegli over same-sized libjpeg-turbo or MozJPEGSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
We performed pairwise comparisons by human raters of JPEG images from MozJPEG, libjpeg-turbo and our new Jpegli encoder. When compressing images at a quality similar to libjpeg-turbo quality 95, the Jpegli images were 54% likely to be preferred over both libjpeg-turbo and MozJPEG images, but used only 2.8 bits per pixel compared to libjpeg-turbo and MozJPEG that used 3.8 and 3.5 bits per pixel respectively. The raw ratings and source images are publicly available for further analysis and study.
- [39] arXiv:2403.18632 [pdf, other]
-
Title: Optimal Control Synthesis of Markov Decision Processes for Efficiency with Surveillance TasksSubjects: Systems and Control (eess.SY)
We investigate the problem of optimal control synthesis for Markov Decision Processes (MDPs), addressing both qualitative and quantitative objectives. Specifically, we require the system to fulfill a qualitative surveillance task in the sense that a specific region of interest can be visited infinitely often with probability one. Furthermore, to quantify the performance of the system, we consider the concept of efficiency, which is defined as the ratio between rewards and costs. This measure is more general than the standard long-run average reward metric as it aims to maximize the reward obtained per unit cost. Our objective is to synthesize a control policy that ensures the surveillance task while maximizes the efficiency. We provide an effective approach to synthesize a stationary control policy achieving $\epsilon$-optimality by integrating state classifications of MDPs and perturbation analysis in a novel manner. Our results generalize existing works on efficiency-optimal control synthesis for MDP by incorporating qualitative surveillance tasks. A robot motion planning case study is provided to illustrate the proposed algorithm.
- [40] arXiv:2403.18636 [pdf, other]
-
Title: A Diffusion-Based Generative Equalizer for Music RestorationComments: Submitted to DAFx24. Historical music restoration examples are available at: this http URLSubjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
This paper presents a novel approach to audio restoration, focusing on the enhancement of low-quality music recordings, and in particular historical ones. Building upon a previous algorithm called BABE, or Blind Audio Bandwidth Extension, we introduce BABE-2, which presents a series of significant improvements. This research broadens the concept of bandwidth extension to \emph{generative equalization}, a novel task that, to the best of our knowledge, has not been explicitly addressed in previous studies. BABE-2 is built around an optimization algorithm utilizing priors from diffusion models, which are trained or fine-tuned using a curated set of high-quality music tracks. The algorithm simultaneously performs two critical tasks: estimation of the filter degradation magnitude response and hallucination of the restored audio. The proposed method is objectively evaluated on historical piano recordings, showing a marked enhancement over the prior version. The method yields similarly impressive results in rejuvenating the works of renowned vocalists Enrico Caruso and Nellie Melba. This research represents an advancement in the practical restoration of historical music.
- [41] arXiv:2403.18637 [pdf, other]
-
Title: Transformers-based architectures for stroke segmentation: A reviewSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Stroke remains a significant global health concern, necessitating precise and efficient diagnostic tools for timely intervention and improved patient outcomes. The emergence of deep learning methodologies has transformed the landscape of medical image analysis. Recently, Transformers, initially designed for natural language processing, have exhibited remarkable capabilities in various computer vision applications, including medical image analysis. This comprehensive review aims to provide an in-depth exploration of the cutting-edge Transformer-based architectures applied in the context of stroke segmentation. It commences with an exploration of stroke pathology, imaging modalities, and the challenges associated with accurate diagnosis and segmentation. Subsequently, the review delves into the fundamental ideas of Transformers, offering detailed insights into their architectural intricacies and the underlying mechanisms that empower them to effectively capture complex spatial information within medical images. The existing literature is systematically categorized and analyzed, discussing various approaches that leverage Transformers for stroke segmentation. A critical assessment is provided, highlighting the strengths and limitations of these methods, including considerations of performance and computational efficiency. Additionally, this review explores potential avenues for future research and development
- [42] arXiv:2403.18638 [pdf, other]
-
Title: Mind the Domain Gap: a Systematic Analysis on Bioacoustic Sound Event DetectionSubjects: Audio and Speech Processing (eess.AS)
Detecting the presence of animal vocalisations in nature is essential to study animal populations and their behaviors. A recent development in the field is the introduction of the task known as few-shot bioacoustic sound event detection, which aims to train a versatile animal sound detector using only a small set of audio samples. Previous efforts in this area have utilized different architectures and data augmentation techniques to enhance model performance. However, these approaches have not fully bridged the domain gap between source and target distributions, limiting their applicability in real-world scenarios. In this work, we introduce an new dataset designed to augment the diversity and breadth of classes available for few-shot bioacoustic event detection, building on the foundations of our previous datasets. To establish a robust baseline system tailored for the DCASE 2024 Task 5 challenge, we delve into an array of acoustic features and adopt negative hard sampling as our primary domain adaptation strategy. This approach, chosen in alignment with the challenge's guidelines that necessitate the independent treatment of each audio file, sidesteps the use of transductive learning to ensure compliance while aiming to enhance the system's adaptability to domain shifts. Our experiments show that the proposed baseline system achieves a better performance compared with the vanilla prototypical network. The findings also confirm the effectiveness of each domain adaptation method by ablating different components within the networks. This highlights the potential to improve few-shot bioacoustic sound event detection by further reducing the impact of domain shift.
- [43] arXiv:2403.18650 [pdf, other]
-
Title: MPC-CBF with Adaptive Safety Margins for Safety-critical Teleoperation over Imperfect Network ConnectionsComments: Accepted for publication in the 2024 European Control Conference (ECC)Subjects: Systems and Control (eess.SY)
The paper focuses on the design of a control strategy for safety-critical remote teleoperation. The main goal is to make the controlled system track the desired velocity specified by an operator while avoiding obstacles despite communication delays. Control Barrier Functions (CBFs) are used to define the safety constraints that the system has to respect to avoid obstacles, while Model Predictive Control (MPC) provides the framework for adjusting the desired input, taking the constraints into account. The resulting input is sent to the remote system, where appropriate low-level velocity controllers translate it into system-specific commands. The main novelty of the paper is a method to make the CBFs robust against the uncertainties caused by the network delays affecting the system's state and do so in a less conservative manner. The results show how the proposed method successfully solves the safety-critical teleoperation problem, making the controlled systems avoid obstacles with different types of network delay. The controller has also been tested in simulation and on a real manipulator, demonstrating its general applicability when reliable low-level velocity controllers are available.
- [44] arXiv:2403.18651 [pdf, ps, other]
-
Title: Do High-Performance Image-to-Image Translation Networks Enable the Discovery of Radiomic Features? Application to MRI Synthesis from Ultrasound in Prostate CancerComments: Submitted to MICCAI 2024Subjects: Image and Video Processing (eess.IV)
This study investigates the foundational characteristics of image-to-image translation networks, specifically examining their suitability and transferability within the context of routine clinical environments, despite achieving high levels of performance, as indicated by a Structural Similarity Index (SSIM) exceeding 0.95. The evaluation study was conducted using data from 794 patients diagnosed with Prostate cancer. To synthesize MRI from Ultrasound images, we employed five widely recognized image to image translation networks in medical imaging: 2DPix2Pix, 2DCycleGAN, 3DCycleGAN, 3DUNET, and 3DAutoEncoder. For quantitative assessment, we report four prevalent evaluation metrics Mean Absolute Error, Mean Square Error, Structural Similarity Index (SSIM), and Peak Signal to Noise Ratio. Moreover, a complementary analysis employing Radiomic features (RF) via Spearman correlation coefficient was conducted to investigate, for the first time, whether networks achieving high performance, SSIM greater than 0.9, could identify low-level RFs. The RF analysis showed 76 features out of 186 RFs were discovered via just 2DPix2Pix algorithm while half of RFs were lost in the translation process. Finally, a detailed qualitative assessment by five medical doctors indicated a lack of low level feature discovery in image to image translation tasks.
- [45] arXiv:2403.18695 [pdf, other]
-
Title: An Efficient Risk-aware Branch MPC for Automated Driving that is Robust to Uncertain Vehicle BehaviorsSubjects: Systems and Control (eess.SY); Robotics (cs.RO)
One of the critical challenges in automated driving is ensuring safety of automated vehicles despite the unknown behavior of the other vehicles. Although motion prediction modules are able to generate a probability distribution associated with various behavior modes, their probabilistic estimates are often inaccurate, thus leading to a possibly unsafe trajectory. To overcome this challenge, we propose a risk-aware motion planning framework that appropriately accounts for the ambiguity in the estimated probability distribution. We formulate the risk-aware motion planning problem as a min-max optimization problem and develop an efficient iterative method by incorporating a regularization term in the probability update step. Via extensive numerical studies, we validate the convergence of our method and demonstrate its advantages compared to the state-of-the-art approaches.
- [46] arXiv:2403.18703 [pdf, other]
-
Title: FPGA-Based Neural Thrust Controller for UAVsAuthors: Sharif Azem, David Scheunert, Mengguang Li, Jonas Gehrunger, Kai Cui, Christian Hochberger, Heinz KoepplSubjects: Systems and Control (eess.SY); Machine Learning (cs.LG)
The advent of unmanned aerial vehicles (UAVs) has improved a variety of fields by providing a versatile, cost-effective and accessible platform for implementing state-of-the-art algorithms. To accomplish a broader range of tasks, there is a growing need for enhanced on-board computing to cope with increasing complexity and dynamic environmental conditions. Recent advances have seen the application of Deep Neural Networks (DNNs), particularly in combination with Reinforcement Learning (RL), to improve the adaptability and performance of UAVs, especially in unknown environments. However, the computational requirements of DNNs pose a challenge to the limited computing resources available on many UAVs. This work explores the use of Field Programmable Gate Arrays (FPGAs) as a viable solution to this challenge, offering flexibility, high performance, energy and time efficiency. We propose a novel hardware board equipped with an Artix-7 FPGA for a popular open-source micro-UAV platform. We successfully validate its functionality by implementing an RL-based low-level controller using real-world experiments.
- [47] arXiv:2403.18734 [pdf, other]
-
Title: A vascular synthetic model for improved aneurysm segmentation and detection via Deep Neural NetworksSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
We hereby present a full synthetic model, able to mimic the various constituents of the cerebral vascular tree: the cerebral arteries, the bifurcations and the intracranial aneurysms. By building this model, our goal was to provide a substantial dataset of brain arteries which could be used by a 3D Convolutional Neural Network (CNN) to either segment or detect/recognize various vascular diseases (such as artery dissection/thrombosis) or even some portions of the cerebral vasculature, such as the bifurcations or aneurysms. In this study, we will particularly focus on Intra-Cranial Aneurysm (ICA) detection and segmentation. The cerebral aneurysms most often occur on a particular structure of the vascular tree named the Circle of Willis. Various studies have been conducted to detect and monitor the ICAs and those based on Deep Learning (DL) achieve the best performances. Specifically, in this work, we propose a full synthetic 3D model able to mimic the brain vasculature as acquired by Magnetic Resonance Angiography (MRA), and more particularly the Time Of Flight (TOF) principle. Among the various MRI modalities, the MRA-TOF allows to have a relatively good rendering of the blood vessels and is non-invasive (no contrast liquid injection). Our model has been designed to simultaneously mimic the arteries geometry, the ICA shape and the background noise. The geometry of the vascular tree is modeled thanks to an interpolation with 3D Spline functions, and the statistical properties of the background MRI noise is collected from MRA acquisitions and reproduced within the model. In this work, we thoroughly describe the synthetic vasculature model, we build up a neural network designed for ICA segmentation and detection, and finally, we carry out an in-depth evaluation of the performance gap gained thanks to the synthetic model data augmentation.
Cross-lists for Thu, 28 Mar 24
- [48] arXiv:2403.17938 (cross-list from cs.NE) [pdf, other]
-
Title: Circuit-centric Genetic Algorithm (CGA) for Analog and Radio-Frequency Circuit OptimizationComments: 15 pages, 6 figures, submission to Circuits, Systems and Signal ProcessingSubjects: Neural and Evolutionary Computing (cs.NE); Systems and Control (eess.SY)
This paper presents an automated method for optimizing parameters in analog/high-frequency circuits, aiming to maximize performance parameters of a radio-frequency (RF) receiver. The design target includes a reduction of power consumption and noise figure and an increase in conversion gain. This study investigates the use of an artificial algorithm for the optimization of a receiver, illustrating how to fulfill the performance parameters with diverse circuit parameters. To overcome issues observed in the traditional Genetic Algorithm (GA), the concept of the Circuit-centric Genetic Algorithm (CGA) is proposed as a viable approach. The new method adopts an inference process that is simpler and computationally more efficient than the existing deep learning models. In addition, CGA offers significant advantages over manual design of finding optimal points and the conventional GA, mitigating the designer's workload while searching for superior optimum points.
- [49] arXiv:2403.17947 (cross-list from physics.class-ph) [pdf, other]
-
Title: RLC resonator with diode nonlinearity: Bifurcation comparison of numerical predictions and circuit measurementsAuthors: Edward H. HellenComments: 10 pages, 11 figures, submitted for publicationSubjects: Classical Physics (physics.class-ph); Signal Processing (eess.SP); Chaotic Dynamics (nlin.CD)
A nonlinear RLC resonator is investigated experimentally and numerically using bifurcation analysis. The nonlinearity is due to the parallel combination of a semiconductor rectifier diode and a fixed capacitor. The diode's junction capacitance, diffusion capacitance, and DC current-voltage relation each contribute to the nonlinearity. The closely related RL-diode resonator has been of interest for many years since its demonstration of period-doubling cascades to chaos. In this study a direct comparison is made of dynamical regime maps produced from simulations and circuit measurements. The maps show the variety of limit cycles, their bifurcations, and regions of chaos over the 2-d parameter space of the source voltage's frequency and amplitude. The similar structures of the simulated and experimental maps suggests that the diode models commonly used in circuit simulators (e.g., SPICE) work well in bifurcation analyses, successfully predicting complex and chaotic dynamics detected in the circuit. These results may be useful for applications of varactor-loaded split ring resonators.
- [50] arXiv:2403.17992 (cross-list from q-bio.QM) [pdf, other]
-
Title: Interpretable cancer cell detection with phonon microscopy using multi-task conditional neural networks for inter-batch calibrationSubjects: Quantitative Methods (q-bio.QM); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Image and Video Processing (eess.IV); Signal Processing (eess.SP)
Advances in artificial intelligence (AI) show great potential in revealing underlying information from phonon microscopy (high-frequency ultrasound) data to identify cancerous cells. However, this technology suffers from the 'batch effect' that comes from unavoidable technical variations between each experiment, creating confounding variables that the AI model may inadvertently learn. We therefore present a multi-task conditional neural network framework to simultaneously achieve inter-batch calibration, by removing confounding variables, and accurate cell classification of time-resolved phonon-derived signals. We validate our approach by training and validating on different experimental batches, achieving a balanced precision of 89.22% and an average cross-validated precision of 89.07% for classifying background, healthy and cancerous regions. Classification can be performed in 0.5 seconds with only simple prior batch information required for multiple batch corrections. Further, we extend our model to reconstruct denoised signals, enabling physical interpretation of salient features indicating disease state including sound velocity, sound attenuation and cell-adhesion to substrate.
- [51] arXiv:2403.18052 (cross-list from astro-ph.IM) [pdf, other]
-
Title: R2D2 image reconstruction with model uncertainty quantification in radio astronomyComments: submitted to IEEE EUSIPCO 2024Subjects: Instrumentation and Methods for Astrophysics (astro-ph.IM); Machine Learning (cs.LG); Image and Video Processing (eess.IV); Signal Processing (eess.SP)
The ``Residual-to-Residual DNN series for high-Dynamic range imaging'' (R2D2) approach was recently introduced for Radio-Interferometric (RI) imaging in astronomy. R2D2's reconstruction is formed as a series of residual images, iteratively estimated as outputs of Deep Neural Networks (DNNs) taking the previous iteration's image estimate and associated data residual as inputs. In this work, we investigate the robustness of the R2D2 image estimation process, by studying the uncertainty associated with its series of learned models. Adopting an ensemble averaging approach, multiple series can be trained, arising from different random DNN initializations of the training process at each iteration. The resulting multiple R2D2 instances can also be leveraged to generate ``R2D2 samples'', from which empirical mean and standard deviation endow the algorithm with a joint estimation and uncertainty quantification functionality. Focusing on RI imaging, and adopting a telescope-specific approach, multiple R2D2 instances were trained to encompass the most general observation setting of the Very Large Array (VLA). Simulations and real-data experiments confirm that: (i) R2D2's image estimation capability is superior to that of the state-of-the-art algorithms; (ii) its ultra-fast reconstruction capability (arising from series with only few DNNs) makes the computation of multiple reconstruction samples and of uncertainty maps practical even at large image dimension; (iii) it is characterized by a very low model uncertainty.
- [52] arXiv:2403.18074 (cross-list from cs.CV) [pdf, other]
-
Title: Every Shot Counts: Using Exemplars for Repetition Counting in VideosComments: Project website: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Video repetition counting infers the number of repetitions of recurring actions or motion within a video. We propose an exemplar-based approach that discovers visual correspondence of video exemplars across repetitions within target videos. Our proposed Every Shot Counts (ESCounts) model is an attention-based encoder-decoder that encodes videos of varying lengths alongside exemplars from the same and different videos. In training, ESCounts regresses locations of high correspondence to the exemplars within the video. In tandem, our method learns a latent that encodes representations of general repetitive motions, which we use for exemplar-free, zero-shot inference. Extensive experiments over commonly used datasets (RepCount, Countix, and UCFRep) showcase ESCounts obtaining state-of-the-art performance across all three datasets. On RepCount, ESCounts increases the off-by-one from 0.39 to 0.56 and decreases the mean absolute error from 0.38 to 0.21. Detailed ablations further demonstrate the effectiveness of our method.
- [53] arXiv:2403.18146 (cross-list from cs.IT) [pdf, ps, other]
-
Title: Adaptive TTD Configurations for Near-Field Communications: An Unsupervised Transformer ApproachSubjects: Information Theory (cs.IT); Signal Processing (eess.SP)
True-time delayers (TTDs) are popular analog devices for facilitating near-field wideband beamforming subject to the spatial-wideband effect. In this paper, an adaptive TTD configuration is proposed for short-range TTDs. Compared to the existing TTD configurations, the proposed one can effectively combat the spatial-widebandd effect for arbitrary user locations and array shapes with the aid of a switch network. A novel end-to-end deep neural network is proposed to optimize the hybrid beamforming with adaptive TTDs for maximizing spectral efficiency. 1) First, based on the U-Net architecture, a near-field channel learning module (NFC-LM) is proposed for adaptive beamformer design through extracting the latent channel response features of various users across different frequencies. In the NFC-LM, an improved cross attention (CA) is introduced to further optimize beamformer design by enhancing the latent feature connection between near-field channel and different beamformers. 2) Second, a switch multi-user transformer (S-MT) is proposed to adaptively control the connection between TTDs and phase shifters (PSs). In the S-MT, an improved multi-head attention, namely multi-user attention (MSA), is introduced to optimize the switch network through exploring the latent channel relations among various users. 3) Third, a multi feature cross attention (MCA) is introduced to simultaneously optimize the NFC-LM and S-MT by enhancing the latent feature correlation between beamformers and switch network. Numerical simulation results show that 1) the proposed adaptive TTD configuration effectively eliminates the spatial-wideband effect under uniform linear array (ULA) and uniform circular array (UCA) architectures, and 2) the proposed deep neural network can provide near optimal spectral efficiency, and solve the multi-user bemformer design and dynamical connection problem in real-time.
- [54] arXiv:2403.18149 (cross-list from cs.RO) [pdf, other]
-
Title: Code Generation for Conic Model-Predictive Control on Microcontrollers with TinyMPCComments: Submitted to CDC, 2024. First two authors contributed equallySubjects: Robotics (cs.RO); Systems and Control (eess.SY); Optimization and Control (math.OC)
Conic constraints appear in many important control applications like legged locomotion, robotic manipulation, and autonomous rocket landing. However, current solvers for conic optimization problems have relatively heavy computational demands in terms of both floating-point operations and memory footprint, making them impractical for use on small embedded devices. We extend TinyMPC, an open-source, high-speed solver targeting low-power embedded control applications, to handle second-order cone constraints. We also present code-generation software to enable deployment of TinyMPC on a variety of microcontrollers. We benchmark our generated code against state-of-the-art embedded QP and SOCP solvers, demonstrating a two-order-of-magnitude speed increase over ECOS while consuming less memory. Finally, we demonstrate TinyMPC's efficacy on the Crazyflie, a lightweight, resource-constrained quadrotor with fast dynamics. TinyMPC and its code-generation tools are publicly available at https://tinympc.org.
- [55] arXiv:2403.18163 (cross-list from cs.SI) [pdf, other]
-
Title: A Study of Three Influencer Archetypes for the Control of Opinion Spread in Time-Varying Social NetworksComments: Submission to IEEE 2024 Conference on Decision and Control. 8 pages, 7 figures, 1 tableSubjects: Social and Information Networks (cs.SI); Systems and Control (eess.SY); Physics and Society (physics.soc-ph)
In this work we consider the impact of information spread in time-varying social networks, where agents request to follow other agents with aligned opinions while dropping ties to neighbors whose posts are too dissimilar to their own views. Opinion control and rhetorical influence has a very long history, employing various methods including education, persuasion, propaganda, marketing, and manipulation through mis-, dis-, and mal-information. The automation of opinion controllers, however, has only recently become easily deployable at a wide scale, with the advent of large language models (LLMs) and generative AI that can translate the quantified commands from opinion controllers into actual content with the appropriate nuance. Automated agents in social networks can be deployed for various purposes, such as breaking up echo chambers, bridging valuable new connections between agents, or shaping the opinions of a target population -- and all of these raise important ethical concerns that deserve serious attention and thoughtful discussion and debate. This paper attempts to contribute to this discussion by considering three archetypal influencing styles observed by human drivers in these settings, comparing and contrasting the impact of these different control methods on the opinions of agents in the network. We will demonstrate the efficacy of current generative AI for generating nuanced content consistent with the command signal from automatic opinion controllers like these, and we will report on frameworks for approaching the relevant ethical considerations.
- [56] arXiv:2403.18270 (cross-list from cs.CV) [pdf, other]
-
Title: Image Deraining via Self-supervised Reinforcement LearningSubjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
The quality of images captured outdoors is often affected by the weather. One factor that interferes with sight is rain, which can obstruct the view of observers and computer vision applications that rely on those images. The work aims to recover rain images by removing rain streaks via Self-supervised Reinforcement Learning (RL) for image deraining (SRL-Derain). We locate rain streak pixels from the input rain image via dictionary learning and use pixel-wise RL agents to take multiple inpainting actions to remove rain progressively. To our knowledge, this work is the first attempt where self-supervised RL is applied to image deraining. Experimental results on several benchmark image-deraining datasets show that the proposed SRL-Derain performs favorably against state-of-the-art few-shot and self-supervised deraining and denoising methods.
- [57] arXiv:2403.18296 (cross-list from cs.LG) [pdf, other]
-
Title: GeNet: A Graph Neural Network-based Anti-noise Task-Oriented Semantic Communication ParadigmSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Signal Processing (eess.SP)
Traditional approaches to semantic communication tasks rely on the knowledge of the signal-to-noise ratio (SNR) to mitigate channel noise. However, these methods necessitate training under specific SNR conditions, entailing considerable time and computational resources. In this paper, we propose GeNet, a Graph Neural Network (GNN)-based paradigm for semantic communication aimed at combating noise, thereby facilitating Task-Oriented Communication (TOC). We propose a novel approach where we first transform the input data image into graph structures. Then we leverage a GNN-based encoder to extract semantic information from the source data. This extracted semantic information is then transmitted through the channel. At the receiver's end, a GNN-based decoder is utilized to reconstruct the relevant semantic information from the source data for TOC. Through experimental evaluation, we show GeNet's effectiveness in anti-noise TOC while decoupling the SNR dependency. We further evaluate GeNet's performance by varying the number of nodes, revealing its versatility as a new paradigm for semantic communication. Additionally, we show GeNet's robustness to geometric transformations by testing it with different rotation angles, without resorting to data augmentation.
- [58] arXiv:2403.18307 (cross-list from cs.IT) [pdf, ps, other]
-
Title: Mutual Information Optimization for SIM-Based Holographic MIMO SystemsComments: 5 pages, 2 figuresSubjects: Information Theory (cs.IT); Signal Processing (eess.SP)
In the context of emerging stacked intelligent metasurface (SIM)-based holographic MIMO (HMIMO) systems, a fundamental problem is to study the mutual information (MI) between transmitted and received signals to establish their capacity. However, direct optimization or analytical evaluation of the MI, particularly for discrete signaling, is often intractable. To address this challenge, we adopt the channel cutoff rate (CR) as an alternative optimization metric for the MI maximization. In this regard, we propose an alternating projected gradient method (APGM), which optimizes the CR of a SIM-based HMIMO system by adjusting signal precoding and the phase shifts across the transmit and receive SIMs in a layer-by-layer basis. Simulation results indicate that the proposed algorithm significantly enhances the CR, achieving substantial gains proportional to those observed for the corresponding MI. This justifies the effectiveness of using the channel CR for the MI optimization. Moreover, we demonstrate that the integration of digital precoding, even on a modest scale, has a significant impact on the ultimate performance of SIM-aided systems.
- [59] arXiv:2403.18326 (cross-list from cs.CR) [pdf, ps, other]
-
Title: Privacy-Preserving Distributed Nonnegative Matrix FactorizationComments: 5 pages, 1 figure, submitted to EUSIPCO 2024 conferenceSubjects: Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG); Signal Processing (eess.SP)
Nonnegative matrix factorization (NMF) is an effective data representation tool with numerous applications in signal processing and machine learning. However, deploying NMF in a decentralized manner over ad-hoc networks introduces privacy concerns due to the conventional approach of sharing raw data among network agents. To address this, we propose a privacy-preserving algorithm for fully-distributed NMF that decomposes a distributed large data matrix into left and right matrix factors while safeguarding each agent's local data privacy. It facilitates collaborative estimation of the left matrix factor among agents and enables them to estimate their respective right factors without exposing raw data. To ensure data privacy, we secure information exchanges between neighboring agents utilizing the Paillier cryptosystem, a probabilistic asymmetric algorithm for public-key cryptography that allows computations on encrypted data without decryption. Simulation results conducted on synthetic and real-world datasets demonstrate the effectiveness of the proposed algorithm in achieving privacy-preserving distributed NMF over ad-hoc networks.
- [60] arXiv:2403.18375 (cross-list from cs.LG) [pdf, other]
-
Title: Stragglers-Aware Low-Latency Synchronous Federated Learning via Layer-Wise Model UpdatesSubjects: Machine Learning (cs.LG); Signal Processing (eess.SP)
Synchronous federated learning (FL) is a popular paradigm for collaborative edge learning. It typically involves a set of heterogeneous devices locally training neural network (NN) models in parallel with periodic centralized aggregations. As some of the devices may have limited computational resources and varying availability, FL latency is highly sensitive to stragglers. Conventional approaches discard incomplete intra-model updates done by stragglers, alter the amount of local workload and architecture, or resort to asynchronous settings; which all affect the trained model performance under tight training latency constraints. In this work, we propose straggler-aware layer-wise federated learning (SALF) that leverages the optimization procedure of NNs via backpropagation to update the global model in a layer-wise fashion. SALF allows stragglers to synchronously convey partial gradients, having each layer of the global model be updated independently with a different contributing set of users. We provide a theoretical analysis, establishing convergence guarantees for the global model under mild assumptions on the distribution of the participating devices, revealing that SALF converges at the same asymptotic rate as FL with no timing limitations. This insight is matched with empirical observations, demonstrating the performance gains of SALF compared to alternative mechanisms mitigating the device heterogeneity gap in FL.
- [61] arXiv:2403.18413 (cross-list from cs.RO) [pdf, ps, other]
-
Title: HyRRT-Connect: A Bidirectional Rapidly-Exploring Random Trees Motion Planning Algorithm for Hybrid SystemsComments: Accepted by the 8th IFAC International Conference on Analysis and Design of Hybrid Systems (ADHS 2024)Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
This paper proposes a bidirectional rapidly-exploring random trees (RRT) algorithm to solve the motion planning problem for hybrid systems. The proposed algorithm, called HyRRT-Connect, propagates in both forward and backward directions in hybrid time until an overlap between the forward and backward propagation results is detected. Then, HyRRT-Connect constructs a motion plan through the reversal and concatenation of functions defined on hybrid time domains, ensuring the motion plan thoroughly satisfies the given hybrid dynamics. To address the potential discontinuity along the flow caused by tolerating some distance between the forward and backward partial motion plans, we reconstruct the backward partial motion plan by a forward-in-hybrid-time simulation from the final state of the forward partial motion plan. By applying the reversed input of the backward partial motion plan, the reconstruction process effectively eliminates the discontinuity and ensures that as the tolerance distance decreases to zero, the distance between the endpoint of the reconstructed motion plan and the final state set approaches zero. The proposed algorithm is applied to an actuated bouncing ball example and a walking robot example so as to highlight its generality and computational improvement.
- [62] arXiv:2403.18486 (cross-list from cs.LG) [pdf, other]
-
Title: Synthesizing EEG Signals from Event-Related Potential Paradigms with Conditional Diffusion ModelsComments: submitted to 9th Graz BCI conference, 6 pages, 3 figures, first figure is split into two subfigures, 1 tableSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Signal Processing (eess.SP)
Data scarcity in the brain-computer interface field can be alleviated through the use of generative models, specifically diffusion models. While diffusion models have previously been successfully applied to electroencephalogram (EEG) data, existing models lack flexibility w.r.t.~sampling or require alternative representations of the EEG data. To overcome these limitations, we introduce a novel approach to conditional diffusion models that utilizes classifier-free guidance to directly generate subject-, session-, and class-specific EEG data. In addition to commonly used metrics, domain-specific metrics are employed to evaluate the specificity of the generated samples. The results indicate that the proposed model can generate EEG data that resembles real data for each subject, session, and class.
- [63] arXiv:2403.18495 (cross-list from cs.CV) [pdf, other]
-
Title: Direct mineral content prediction from drill core images via transfer learningAuthors: Romana Boiger, Sergey V. Churakov, Ignacio Ballester Llagaria, Georg Kosakowski, Raphael Wüst, Nikolaos I. PrasianakisSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
Deep subsurface exploration is important for mining, oil and gas industries, as well as in the assessment of geological units for the disposal of chemical or nuclear waste, or the viability of geothermal energy systems. Typically, detailed examinations of subsurface formations or units are performed on cuttings or core materials extracted during drilling campaigns, as well as on geophysical borehole data, which provide detailed information about the petrophysical properties of the rocks. Depending on the volume of rock samples and the analytical program, the laboratory analysis and diagnostics can be very time-consuming. This study investigates the potential of utilizing machine learning, specifically convolutional neural networks (CNN), to assess the lithology and mineral content solely from analysis of drill core images, aiming to support and expedite the subsurface geological exploration. The paper outlines a comprehensive methodology, encompassing data preprocessing, machine learning methods, and transfer learning techniques. The outcome reveals a remarkable 96.7% accuracy in the classification of drill core segments into distinct formation classes. Furthermore, a CNN model was trained for the evaluation of mineral content using a learning data set from multidimensional log analysis data (silicate, total clay, carbonate). When benchmarked against laboratory XRD measurements on samples from the cores, both the advanced multidimensional log analysis model and the neural network approach developed here provide equally good performance. This work demonstrates that deep learning and particularly transfer learning can support extracting petrophysical properties, including mineral content and formation classification, from drill core images, thus offering a road map for enhancing model performance and data set quality in image-based analysis of drill cores.
- [64] arXiv:2403.18509 (cross-list from cs.DC) [pdf, ps, other]
-
Title: Distributed Maximum Consensus over Noisy LinksComments: 5 pages, 7 figures, submitted to EUSIPCO 2024 conferenceSubjects: Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG); Signal Processing (eess.SP)
We introduce a distributed algorithm, termed noise-robust distributed maximum consensus (RD-MC), for estimating the maximum value within a multi-agent network in the presence of noisy communication links. Our approach entails redefining the maximum consensus problem as a distributed optimization problem, allowing a solution using the alternating direction method of multipliers. Unlike existing algorithms that rely on multiple sets of noise-corrupted estimates, RD-MC employs a single set, enhancing both robustness and efficiency. To further mitigate the effects of link noise and improve robustness, we apply moving averaging to the local estimates. Through extensive simulations, we demonstrate that RD-MC is significantly more robust to communication link noise compared to existing maximum-consensus algorithms.
- [65] arXiv:2403.18539 (cross-list from cs.LG) [pdf, other]
-
Title: Safe and Robust Reinforcement-Learning: Principles and PracticeSubjects: Machine Learning (cs.LG); Systems and Control (eess.SY)
Reinforcement Learning (RL) has shown remarkable success in solving relatively complex tasks, yet the deployment of RL systems in real-world scenarios poses significant challenges related to safety and robustness. This paper aims to identify and further understand those challenges thorough the exploration of the main dimensions of the safe and robust RL landscape, encompassing algorithmic, ethical, and practical considerations. We conduct a comprehensive review of methodologies and open problems that summarizes the efforts in recent years to address the inherent risks associated with RL applications.
After discussing and proposing definitions for both safe and robust RL, the paper categorizes existing research works into different algorithmic approaches that enhance the safety and robustness of RL agents. We examine techniques such as uncertainty estimation, optimisation methodologies, exploration-exploitation trade-offs, and adversarial training. Environmental factors, including sim-to-real transfer and domain adaptation, are also scrutinized to understand how RL systems can adapt to diverse and dynamic surroundings. Moreover, human involvement is an integral ingredient of the analysis, acknowledging the broad set of roles that humans can take in this context.
Importantly, to aid practitioners in navigating the complexities of safe and robust RL implementation, this paper introduces a practical checklist derived from the synthesized literature. The checklist encompasses critical aspects of algorithm design, training environment considerations, and ethical guidelines. It will serve as a resource for developers and policymakers alike to ensure the responsible deployment of RL systems in many application domains. - [66] arXiv:2403.18557 (cross-list from math.OC) [pdf, other]
-
Title: Stability Properties of the Impulsive Goodwin's Oscillator in 1-cycleComments: submitted to IEEE CDC 2024Subjects: Optimization and Control (math.OC); Systems and Control (eess.SY)
The Impulsive Goodwin's Oscillator (IGO) is a mathematical model of a hybrid closed-loop system. It arises by closing a special kind of continuous linear positive time-invariant system with impulsive feedback, which employs both amplitude and frequency pulse modulation. The structure of IGO precludes the existence of equilibria, and all its solutions are oscillatory. With its origin in mathematical biology, the IGO also presents a control paradigm useful in a wide range of applications, in particular dosing of chemicals and medicines. Since the pulse modulation feedback mechanism introduces significant nonlinearity and non-smoothness in the closedloop dynamics, conventional controller design methods fail to apply. However, the hybrid dynamics of IGO reduce to a nonlinear, time-invariant discrete-time system, exhibiting a one-to-one correspondence between periodic solutions of the original IGO and those of the discrete-time system. The paper proposes a design approach that leverages the linearization of the equivalent discrete-time dynamics in the vicinity of a fixed point. A simple and efficient local stability condition of the 1-cycle in terms of the characteristics of the amplitude and frequency modulation functions is obtained.
- [67] arXiv:2403.18572 (cross-list from cs.SD) [pdf, ps, other]
-
Title: ACES: Evaluating Automated Audio Captioning Models on the Semantics of SoundsSubjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
Automated Audio Captioning is a multimodal task that aims to convert audio content into natural language. The assessment of audio captioning systems is typically based on quantitative metrics applied to text data. Previous studies have employed metrics derived from machine translation and image captioning to evaluate the quality of generated audio captions. Drawing inspiration from auditory cognitive neuroscience research, we introduce a novel metric approach -- Audio Captioning Evaluation on Semantics of Sound (ACES). ACES takes into account how human listeners parse semantic information from sounds, providing a novel and comprehensive evaluation perspective for automated audio captioning systems. ACES combines semantic similarities and semantic entity labeling. ACES outperforms similar automated audio captioning metrics on the Clotho-Eval FENSE benchmark in two evaluation categories.
- [68] arXiv:2403.18588 (cross-list from cs.HC) [pdf, other]
-
Title: From Virtual Reality to the Emerging Discipline of Perception EngineeringAuthors: Steven M. LaValle, Evan G. Center, Timo Ojala, Matti Pouke, Nicoletta Prencipe, Basak Sakcak, Markku Suomalainen, Kalle G. Timperi, Vadim K. WeinsteinComments: 30 pages, 5 figuresJournal-ref: Annu. Rev. Control Robot. Auton. Syst. v. 7, 2023Subjects: Human-Computer Interaction (cs.HC); Systems and Control (eess.SY)
This paper makes the case that a powerful new discipline, which we term perception engineering, is steadily emerging. It follows from a progression of ideas that involve creating illusions, from historical paintings and film, to video games and virtual reality in modern times. Rather than creating physical artifacts such as bridges, airplanes, or computers, perception engineers create illusory perceptual experiences. The scope is defined over any agent that interacts with the physical world, including both biological organisms (humans, animals) and engineered systems (robots, autonomous systems). The key idea is that an agent, called a producer, alters the environment with the intent to alter the perceptual experience of another agent, called a receiver. Most importantly, the paper introduces a precise mathematical formulation of this process, based on the von Neumann-Morgenstern notion of information, to help scope and define the discipline. It is then applied to the cases of engineered and biological agents with discussion of its implications on existing fields such as virtual reality, robotics, and even social media. Finally, open challenges and opportunities for involvement are identified.
- [69] arXiv:2403.18607 (cross-list from cs.CR) [pdf, other]
-
Title: Spikewhisper: Temporal Spike Backdoor Attacks on Federated Neuromorphic Learning over Low-power DevicesSubjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Signal Processing (eess.SP)
Federated neuromorphic learning (FedNL) leverages event-driven spiking neural networks and federated learning frameworks to effectively execute intelligent analysis tasks over amounts of distributed low-power devices but also perform vulnerability to poisoning attacks. The threat of backdoor attacks on traditional deep neural networks typically comes from time-invariant data. However, in FedNL, unknown threats may be hidden in time-varying spike signals. In this paper, we start to explore a novel vulnerability of FedNL-based systems with the concept of time division multiplexing, termed Spikewhisper, which allows attackers to evade detection as much as possible, as multiple malicious clients can imperceptibly poison with different triggers at different timeslices. In particular, the stealthiness of Spikewhisper is derived from the time-domain divisibility of global triggers, in which each malicious client pastes only one local trigger to a certain timeslice in the neuromorphic sample, and also the polarity and motion of each local trigger can be configured by attackers. Extensive experiments based on two different neuromorphic datasets demonstrate that the attack success rate of Spikewispher is higher than the temporally centralized attacks. Besides, it is validated that the effect of Spikewispher is sensitive to the trigger duration.
- [70] arXiv:2403.18621 (cross-list from cs.IT) [pdf, other]
-
Title: Performance Analysis of Integrated Sensing and Communication Networks with Blockage EffectsComments: Submitted to IEEE Transactions on Vehicular TechnologySubjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Communication-sensing integration represents an up-and-coming area of research, enabling wireless networks to simultaneously perform communication and sensing tasks. However, in urban cellular networks, the blockage of buildings results in a complex signal propagation environment, affecting the performance analysis of integrated sensing and communication (ISAC) networks. To overcome this obstacle, this paper constructs a comprehensive framework considering building blockage and employs a distance-correlated blockage model to analyze interference from line of sight (LoS), non-line of sight (NLoS), and target reflection cascading (TRC) links. Using stochastic geometric theory, expressions for signal-to-interference-plus-noise ratio (SINR) and coverage probability for communication and sensing in the presence of blockage are derived, allowing for a comprehensive comparison under the same parameters. The research findings indicate that blockage can positively impact coverage, especially in enhancing communication performance. The analysis also suggests that there exists an optimal base station (BS) density when blockage is of the same order of magnitude as the BS density, maximizing communication or sensing coverage probability.
- [71] arXiv:2403.18635 (cross-list from cs.LG) [pdf, other]
-
Title: Fusion approaches for emotion recognition from speech using acoustic and text-based featuresComments: 5 pages. Accepted in ICASSP 2020Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
In this paper, we study different approaches for classifying emotions from speech using acoustic and text-based features. We propose to obtain contextualized word embeddings with BERT to represent the information contained in speech transcriptions and show that this results in better performance than using Glove embeddings. We also propose and compare different strategies to combine the audio and text modalities, evaluating them on IEMOCAP and MSP-PODCAST datasets. We find that fusing acoustic and text-based systems is beneficial on both datasets, though only subtle differences are observed across the evaluated fusion approaches. Finally, for IEMOCAP, we show the large effect that the criteria used to define the cross-validation folds have on results. In particular, the standard way of creating folds for this dataset results in a highly optimistic estimation of performance for the text-based system, suggesting that some previous works may overestimate the advantage of incorporating transcriptions.
- [72] arXiv:2403.18649 (cross-list from cs.CV) [pdf, other]
-
Title: Addressing Data Annotation Challenges in Multiple Sensors: A Solution for Scania Collected DatasetsAuthors: Ajinkya Khoche, Aron Asefaw, Alejandro Gonzalez, Bogdan Timus, Sina Sharif Mansouri, Patric JensfeltComments: Accepted to European Control Conference 2024Subjects: Computer Vision and Pattern Recognition (cs.CV); Systems and Control (eess.SY)
Data annotation in autonomous vehicles is a critical step in the development of Deep Neural Network (DNN) based models or the performance evaluation of the perception system. This often takes the form of adding 3D bounding boxes on time-sequential and registered series of point-sets captured from active sensors like Light Detection and Ranging (LiDAR) and Radio Detection and Ranging (RADAR). When annotating multiple active sensors, there is a need to motion compensate and translate the points to a consistent coordinate frame and timestamp respectively. However, highly dynamic objects pose a unique challenge, as they can appear at different timestamps in each sensor's data. Without knowing the speed of the objects, their position appears to be different in different sensor outputs. Thus, even after motion compensation, highly dynamic objects are not matched from multiple sensors in the same frame, and human annotators struggle to add unique bounding boxes that capture all objects. This article focuses on addressing this challenge, primarily within the context of Scania collected datasets. The proposed solution takes a track of an annotated object as input and uses the Moving Horizon Estimation (MHE) to robustly estimate its speed. The estimated speed profile is utilized to correct the position of the annotated box and add boxes to object clusters missed by the original annotation.
- [73] arXiv:2403.18664 (cross-list from stat.ML) [pdf, other]
-
Title: Neural Network-Based Piecewise Survival ModelsComments: 7 pagesSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Systems and Control (eess.SY)
In this paper, a family of neural network-based survival models is presented. The models are specified based on piecewise definitions of the hazard function and the density function on a partitioning of the time; both constant and linear piecewise definitions are presented, resulting in a family of four models. The models can be seen as an extension of the commonly used discrete-time and piecewise exponential models and thereby add flexibility to this set of standard models. Using a simulated dataset the models are shown to perform well compared to the highly expressive, state-of-the-art energy-based model, while only requiring a fraction of the computation time.
- [74] arXiv:2403.18707 (cross-list from math.OC) [pdf, other]
-
Title: Connections between Reachability and Time OptimalityComments: Submitted to AutomaticaSubjects: Optimization and Control (math.OC); Systems and Control (eess.SY)
This paper presents the concept of an equivalence relation between the set of optimal control problems. By leveraging this concept, we show that the boundary of the reachability set can be constructed by the solutions of time optimal problems. Alongside, a more generalized equivalence theorem is presented together. The findings facilitate the use of solution structures from a certain class of optimal control problems to address problems in corresponding equivalent classes. As a byproduct, we state and prove the construction methods of the reachability sets of three-dimensional curves with prescribed curvature bound. The findings are twofold: Firstly, we prove that any boundary point of the reachability set, with the terminal direction taken into account, can be accessed via curves of H, CSC, CCC, or their respective subsegments, where H denotes a helicoidal arc, C a circular arc with maximum curvature, and S a straight segment. Secondly, we show that any boundary point of the reachability set, without considering the terminal direction, can be accessed by curves of CC, CS, or their respective subsegments. These findings extend the developments presented in literature regarding planar curves, or Dubins car dynamics, into spatial curves in $\mathbb{R}^3$. For higher dimensions, we confirm that the problem of identifying the reachability set of curvature bounded paths subsumes the well-known Markov-Dubins problem. These advancements in understanding the reachability of curvature bounded paths in $\mathbb{R}^3$ hold significant practical implications, particularly in the contexts of mission planning problems and time optimal guidance.
- [75] arXiv:2403.18713 (cross-list from cs.IT) [pdf, other]
-
Title: Characterization of Spatial-Temporal Channel Statistics from Indoor Measurement Data at D BandAuthors: Chathuri Weragama, Joonas Kokkoniemi, Mar Francis De Guzman, Katsuyuki Haneda, Pekka Kyosti, Markku JunttiComments: 6 pages, 22 figuresSubjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Millimeter-wave (mmWave) and D Band (110--170~GHz) frequencies are poised to play a pivotal role in the advancement of sixth-generation (6G) systems and beyond, owing to their ability to enhance performance metrics such as capacity, ultra-low latency, and spectral efficiency. This paper concentrates on deriving statistical insights into power, delay, and the number of paths based on measurements conducted across four distinct locations at a center frequency of 143.1 GHz. The findings underscore the suitability of various distributions in characterizing power behavior in line-of-sight (LOS) scenarios, including lognormal, Nakagami, gamma, and beta distributions, whereas the loglogistic distribution gives the optimal fit for power distribution in non-line-of-sight (NLOS) scenarios. Moreover, the exponential distribution shows to be the most appropriate model for the delay distribution in both LOS and NLOS scenarios. In terms of the number of paths, observations indicate a tendency for the highest concentration within the 10 m to 30 m distance range between the transmitter (Tx) and receiver (Rx). These insights shed light on the statistical nature of D band propagation characteristics, which are vital for informing the design and optimization of future 6G communication systems
- [76] arXiv:2403.18739 (cross-list from cs.LG) [pdf, other]
-
Title: Usage-Specific Survival Modeling Based on Operational Data and Neural NetworksComments: 7 pagesSubjects: Machine Learning (cs.LG); Systems and Control (eess.SY); Machine Learning (stat.ML)
Accurate predictions of when a component will fail are crucial when planning maintenance, and by modeling the distribution of these failure times, survival models have shown to be particularly useful in this context. The presented methodology is based on conventional neural network-based survival models that are trained using data that is continuously gathered and stored at specific times, called snapshots. An important property of this type of training data is that it can contain more than one snapshot from a specific individual which results in that standard maximum likelihood training can not be directly applied since the data is not independent. However, the papers show that if the data is in a specific format where all snapshot times are the same for all individuals, called homogeneously sampled, maximum likelihood training can be applied and produce desirable results. In many cases, the data is not homogeneously sampled and in this case, it is proposed to resample the data to make it homogeneously sampled. How densely the dataset is sampled turns out to be an important parameter; it should be chosen large enough to produce good results, but this also increases the size of the dataset which makes training slow. To reduce the number of samples needed during training, the paper also proposes a technique to, instead of resampling the dataset once before the training starts, randomly resample the dataset at the start of each epoch during the training. The proposed methodology is evaluated on both a simulated dataset and an experimental dataset of starter battery failures. The results show that if the data is homogeneously sampled the methodology works as intended and produces accurate survival models. The results also show that randomly resampling the dataset on each epoch is an effective way to reduce the size of the training data.
- [77] arXiv:2403.18776 (cross-list from physics.optics) [pdf, other]
-
Title: Breaking the Limitations with Sparse Inputs by Variational Frameworks (BLIss) in Terahertz Super-Resolution 3D ReconstructionComments: 15 pages, 7 figures. Supplemental Document: this https URLJournal-ref: Optics Express (OE) 2024Subjects: Optics (physics.optics); Image and Video Processing (eess.IV)
Data acquisition, image processing, and image quality are the long-lasting issues for terahertz (THz) 3D reconstructed imaging. Existing methods are primarily designed for 2D scenarios, given the challenges associated with obtaining super-resolution (SR) data and the absence of an efficient SR 3D reconstruction framework in conventional computed tomography (CT). Here, we demonstrate BLIss, a new approach for THz SR 3D reconstruction with sparse 2D data input. BLIss seamlessly integrates conventional CT techniques and variational framework with the core of the adapted Euler-Elastica-based model. The quantitative 3D image evaluation metrics, including the standard deviation of Gaussian, mean curvatures, and the multi-scale structural similarity index measure (MS-SSIM), validate the superior smoothness and fidelity achieved with our variational framework approach compared with conventional THz CT modal. Beyond its contributions to advancing THz SR 3D reconstruction, BLIss demonstrates potential applicability in other imaging modalities, such as X-ray and MRI. This suggests extensive impacts on the broader field of imaging applications.
- [78] arXiv:2403.18811 (cross-list from cs.CV) [pdf, other]
-
Title: Duolando: Follower GPT with Off-Policy Reinforcement Learning for Dance AccompanimentAuthors: Li Siyao, Tianpei Gu, Zhitao Yang, Zhengyu Lin, Ziwei Liu, Henghui Ding, Lei Yang, Chen Change LoyComments: ICLR 2024Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Sound (cs.SD); Audio and Speech Processing (eess.AS)
We introduce a novel task within the field of 3D dance generation, termed dance accompaniment, which necessitates the generation of responsive movements from a dance partner, the "follower", synchronized with the lead dancer's movements and the underlying musical rhythm. Unlike existing solo or group dance generation tasks, a duet dance scenario entails a heightened degree of interaction between the two participants, requiring delicate coordination in both pose and position. To support this task, we first build a large-scale and diverse duet interactive dance dataset, DD100, by recording about 117 minutes of professional dancers' performances. To address the challenges inherent in this task, we propose a GPT-based model, Duolando, which autoregressively predicts the subsequent tokenized motion conditioned on the coordinated information of the music, the leader's and the follower's movements. To further enhance the GPT's capabilities of generating stable results on unseen conditions (music and leader motions), we devise an off-policy reinforcement learning strategy that allows the model to explore viable trajectories from out-of-distribution samplings, guided by human-defined rewards. Based on the collected dataset and proposed method, we establish a benchmark with several carefully designed metrics.
- [79] arXiv:2403.18821 (cross-list from cs.SD) [pdf, other]
-
Title: Real Acoustic Fields: An Audio-Visual Room Acoustics Dataset and BenchmarkAuthors: Ziyang Chen, Israel D. Gebru, Christian Richardt, Anurag Kumar, William Laney, Andrew Owens, Alexander RichardComments: Accepted to CVPR 2024. Project site: this https URLSubjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
We present a new dataset called Real Acoustic Fields (RAF) that captures real acoustic room data from multiple modalities. The dataset includes high-quality and densely captured room impulse response data paired with multi-view images, and precise 6DoF pose tracking data for sound emitters and listeners in the rooms. We used this dataset to evaluate existing methods for novel-view acoustic synthesis and impulse response generation which previously relied on synthetic data. In our evaluation, we thoroughly assessed existing audio and audio-visual models against multiple criteria and proposed settings to enhance their performance on real-world data. We also conducted experiments to investigate the impact of incorporating visual data (i.e., images and depth) into neural acoustic field models. Additionally, we demonstrated the effectiveness of a simple sim2real approach, where a model is pre-trained with simulated data and fine-tuned with sparse real-world data, resulting in significant improvements in the few-shot learning approach. RAF is the first dataset to provide densely captured room acoustic data, making it an ideal resource for researchers working on audio and audio-visual neural acoustic field modeling techniques. Demos and datasets are available on our project page: https://facebookresearch.github.io/real-acoustic-fields/
Replacements for Thu, 28 Mar 24
- [80] arXiv:2201.06180 (replaced) [pdf, other]
-
Title: Nonlinear Control Allocation: A Learning Based ApproachComments: submitted to IEEE Conference on Decision and Control (CDC), 2024Subjects: Systems and Control (eess.SY); Artificial Intelligence (cs.AI); Optimization and Control (math.OC)
- [81] arXiv:2302.13483 (replaced) [pdf, other]
-
Title: CrystalBox: Future-Based Explanations for Input-Driven Deep RL SystemsSubjects: Machine Learning (cs.LG); Networking and Internet Architecture (cs.NI); Systems and Control (eess.SY)
- [82] arXiv:2304.06427 (replaced) [pdf, other]
-
Title: In-Distribution and Out-of-Distribution Self-supervised ECG Representation Learning for Arrhythmia DetectionComments: This paper has been published in the IEEE Journal of Biomedical and Health Informatics (JBHI). Copyright IEEE. Please cite as: S. Soltanieh, J. Hashemi and A. Etemad, "In-Distribution and Out-of-Distribution Self-Supervised ECG Representation Learning for Arrhythmia Detection," in IEEE Journal of Biomedical and Health Informatics, vol. 28, no. 2, pp. 789-800, Feb. 2024Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Signal Processing (eess.SP)
- [83] arXiv:2305.12523 (replaced) [pdf, ps, other]
-
Title: Multi-Static Target Detection and Power Allocation for Integrated Sensing and Communication in Cell-Free Massive MIMOComments: 16 pages, 7 figuresSubjects: Information Theory (cs.IT); Signal Processing (eess.SP)
- [84] arXiv:2306.09014 (replaced) [pdf, other]
-
Title: Geometric Wide-Angle Camera Calibration: A Review and Comparative StudyAuthors: Jianzhu Huai, Yuan Zhuang, Yuxin Shao, Grzegorz Jozkow, Binliang Wang, Yijia He, Alper YilmazComments: 18 pages, 12 figuresSubjects: Image and Video Processing (eess.IV)
- [85] arXiv:2307.07572 (replaced) [pdf, other]
-
Title: High-Rate Phase Association with Travel Time Neural FieldsSubjects: Geophysics (physics.geo-ph); Machine Learning (cs.LG); Signal Processing (eess.SP)
- [86] arXiv:2307.11016 (replaced) [pdf, other]
-
Title: Treatment And Follow-Up Guidelines For Multiple Brain Metastases: A Systematic ReviewSubjects: Image and Video Processing (eess.IV)
- [87] arXiv:2307.16071 (replaced) [pdf, other]
-
Title: ÌròyìnSpeech: A multi-purpose Yorùbá Speech CorpusComments: Accepted to LREC-COLING 2024Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
- [88] arXiv:2307.16075 (replaced) [pdf, ps, other]
-
Title: Redesigning Large-Scale Multimodal Transit Networks with Shared Autonomous Mobility ServicesComments: 48 pages, 18 figures, accepted for publication in Transportation Research Part C: Emerging Technologies, and presentation in the 25th International Symposium on Transportation and Traffic Theory (ISTTT25)Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC)
- [89] arXiv:2308.02396 (replaced) [pdf, other]
-
Title: HOOD: Real-Time Human Presence and Out-of-Distribution Detection Using FMCW RadarComments: 10 pages, 2 figures, project page: this https URLSubjects: Signal Processing (eess.SP); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [90] arXiv:2308.10483 (replaced) [pdf, ps, other]
-
Title: Aggregate Model of District Heating Network for Integrated Energy Dispatch: A Physically Informed Data-Driven ApproachSubjects: Systems and Control (eess.SY)
- [91] arXiv:2308.12882 (replaced) [pdf, other]
-
Title: LCANets++: Robust Audio Classification using Multi-layer Neural Networks with Lateral CompetitionComments: Accepted at 2024 IEEE International Conference on Acoustics, Speech and Signal Processing Workshops (ICASSPW)Subjects: Sound (cs.SD); Cryptography and Security (cs.CR); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
- [92] arXiv:2308.13356 (replaced) [pdf, ps, other]
-
Title: CEIMVEN: An Approach of Cutting Edge Implementation of Modified Versions of EfficientNet (V1-V2) Architecture for Breast Cancer Detection and Classification from Ultrasound ImagesSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
- [93] arXiv:2309.06075 (replaced) [pdf, other]
-
Title: A2V: A Semi-Supervised Domain Adaptation Framework for Brain Vessel Segmentation via Two-Phase Training Angiography-to-Venography TranslationAuthors: Francesco Galati, Daniele Falcetta, Rosa Cortese, Barbara Casolla, Ferran Prados, Ninon Burgos, Maria A. ZuluagaComments: Accepted at the 34th British Machine Vision Conference (BMVC)Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [94] arXiv:2309.07798 (replaced) [pdf, other]
-
Title: Enhancing Performance, Calibration Time and Efficiency in Brain-Machine Interfaces through Transfer Learning and Wearable EEG TechnologySubjects: Signal Processing (eess.SP); Systems and Control (eess.SY)
- [95] arXiv:2311.08787 (replaced) [pdf, other]
-
Title: Polygonal Cone Control Barrier Functions (PolyC2BF) for safe navigation in cluttered environmentsComments: 6 Pages, 6 Figures. Accepted at European Control Conference (ECC) 2024. arXiv admin note: text overlap with arXiv:2303.15871Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
- [96] arXiv:2311.10551 (replaced) [pdf, other]
-
Title: A Tutorial on 5G PositioningAuthors: Lorenzo Italiano, Bernardo Camajori Tedeschini, Mattia Brambilla, Huiping Huang, Monica Nicoli, Henk WymeerschComments: This work has been submitted to the IEEE Communications Surveys & Tutorials for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessibleSubjects: Signal Processing (eess.SP)
- [97] arXiv:2311.13967 (replaced) [pdf, other]
-
Title: Unconstrained learning of networked nonlinear systems via free parametrization of stable interconnected operatorsComments: Full version of the paper to appear at ECC 2024Subjects: Systems and Control (eess.SY)
- [98] arXiv:2312.03620 (replaced) [pdf, other]
-
Title: Golden Gemini is All You Need: Finding the Sweet Spots for Speaker VerificationComments: Accepted to IEEE/ACM Transactions on Audio, Speech, and Language Processing. Copyright may be transferred without notice, after which this version may no longer be accessibleSubjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
- [99] arXiv:2312.10287 (replaced) [pdf, other]
-
Title: Towards 6G Digital Twin Channel Using Radio Environment Knowledge PoolAuthors: Jialin Wang, Jianhua Zhang, Yuxiang Zhang, Yutong Sun, Gaofeng, Nie, Lianzheng Shi, Ping Zhang, Guangyi LiuSubjects: Signal Processing (eess.SP)
- [100] arXiv:2312.10842 (replaced) [pdf, other]
-
Title: Compositional Inductive Invariant Based Verification of Neural Network Controlled SystemsSubjects: Logic in Computer Science (cs.LO); Machine Learning (cs.LG); Systems and Control (eess.SY)
- [101] arXiv:2401.07494 (replaced) [pdf, other]
-
Title: Input Convex Lipschitz RNN: A Fast and Robust Approach for Engineering TasksSubjects: Machine Learning (cs.LG); Computational Engineering, Finance, and Science (cs.CE); Systems and Control (eess.SY)
- [102] arXiv:2401.10345 (replaced) [pdf, other]
-
Title: Attack and Defense Analysis of Learned Image CompressionSubjects: Image and Video Processing (eess.IV)
- [103] arXiv:2401.11542 (replaced) [pdf, other]
-
Title: Nigel -- Mechatronic Design and Robust Sim2Real Control of an Over-Actuated Autonomous VehicleSubjects: Robotics (cs.RO); Systems and Control (eess.SY)
- [104] arXiv:2402.01216 (replaced) [pdf, other]
-
Title: Robust Commutation Design: Applied to Switched Reluctance MotorsComments: 6 pages, 7 figures. Final versionSubjects: Systems and Control (eess.SY)
- [105] arXiv:2402.11800 (replaced) [pdf, other]
-
Title: Stochastic Approximation with Delayed Updates: Finite-Time Rates under Markovian SamplingAuthors: Arman Adibi, Nicolo Dal Fabbro, Luca Schenato, Sanjeev Kulkarni, H. Vincent Poor, George J. Pappas, Hamed Hassani, Aritra MitraComments: Accepted to the 27th International Conference on Artificial Intelligence and Statistics (AISTATS) 2024!Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA); Systems and Control (eess.SY); Optimization and Control (math.OC)
- [106] arXiv:2403.03100 (replaced) [pdf, other]
-
Title: NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion ModelsAuthors: Zeqian Ju, Yuancheng Wang, Kai Shen, Xu Tan, Detai Xin, Dongchao Yang, Yanqing Liu, Yichong Leng, Kaitao Song, Siliang Tang, Zhizheng Wu, Tao Qin, Xiang-Yang Li, Wei Ye, Shikun Zhang, Jiang Bian, Lei He, Jinyu Li, Sheng ZhaoComments: Achieving human-level quality and naturalness on multi-speaker datasets (e.g., LibriSpeech) in a zero-shot waySubjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
- [107] arXiv:2403.03271 (replaced) [pdf, ps, other]
-
Title: Low-Complexity Linear Decoupling of Users for Uplink Massive MU-MIMO DetectionSubjects: Signal Processing (eess.SP); Information Theory (cs.IT)
- [108] arXiv:2403.06054 (replaced) [pdf, other]
-
Title: Decoupled Data Consistency with Diffusion Purification for Image RestorationSubjects: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Signal Processing (eess.SP)
- [109] arXiv:2403.09258 (replaced) [pdf, other]
-
Title: Near-Field EM-Based Multistatic Radar Range EstimationAuthors: François De Saint Moulin, Guillaume Thiran, Christophe Craeye, Luc Vandendorpe, Claude OestgesComments: This paper has been submitted to EUSIPCO 2024Subjects: Signal Processing (eess.SP)
- [110] arXiv:2403.13680 (replaced) [pdf, other]
-
Title: Step-Calibrated Diffusion for Biomedical Optical Image RestorationAuthors: Yiwei Lyu, Sung Jik Cha, Cheng Jiang, Asadur Chowdury, Xinhai Hou, Edward Harake, Akhil Kondepudi, Christian Freudiger, Honglak Lee, Todd C. HollonSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
- [111] arXiv:2403.16335 (replaced) [pdf, other]
-
Title: MEDDAP: Medical Dataset Enhancement via Diversified Augmentation PipelineComments: submitted to miccai 2024 submitted to miccai 2024 Submitted to MICCAI-2024Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [112] arXiv:2403.16488 (replaced) [pdf, ps, other]
-
Title: Ensuring Disturbance Rejection Performance by Synthesizing Grid-Following and Grid-Forming Inverters in Power SystemsComments: 6 pagesSubjects: Systems and Control (eess.SY)
- [113] arXiv:2403.17392 (replaced) [pdf, other]
-
Title: Natural-artificial hybrid swarm: Cyborg-insect group navigation in unknown obstructed soft terrainAuthors: Yang Bai, Phuoc Thanh Tran Ngoc, Huu Duoc Nguyen, Duc Long Le, Quang Huy Ha, Kazuki Kai, Yu Xiang See To, Yaosheng Deng, Jie Song, Naoki Wakamiya, Hirotaka Sato, Masaki OguraSubjects: Robotics (cs.RO); Systems and Control (eess.SY); Adaptation and Self-Organizing Systems (nlin.AO)
- [114] arXiv:2403.17905 (replaced) [pdf, other]
-
Title: Scalable Non-Cartesian Magnetic Resonance Imaging with R2D2Comments: submitted to IEEE EUSIPCO 2024Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Signal Processing (eess.SP)
[ showing up to 2000 entries per page: fewer | more ]
Disable MathJax (What is MathJax?)
Links to: arXiv, form interface, find, eess, recent, 2403, contact, help (Access key information)